← Learning Types

Machine Learning › Learning Types ›

Interpretability

173 directly classified papers

Papers per year

Papers

AUTOSUMM: A Comprehensive Framework for LLM-Based Conversation Summarization ACL 2025

Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition CVPR 2025

LLaMAs Have Feelings Too: Unveiling Sentiment and Emotion Representations in LLaMA Models Through Probing ACL 2025

ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs ACL 2025

Serial Position Effects of Large Language Models ACL 2025

Natural Language Counterfactual Explanations in Financial Text Classification: A Comparison of Generators and Evaluation Metrics ACL 2025

Position-aware Automatic Circuit Discovery ACL 2025

Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities in LLMs ACL 2025

Tuning-Free Accountable Intervention for LLM Deployment – a Metacognitive Approach AAAI 2025

Attributive Reasoning for Hallucination Diagnosis of Large Language Models AAAI 2025

Towards Trustable SHAP Scores AAAI 2025

Conditional Feature Importance with Generative Modeling Using Adversarial Random Forests AAAI 2025

Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis CVPR 2025

Improving Large Language Model Confidence Estimates using Extractive Rationales for Classification ACL 2025

Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework EMNLP 2025

Even-if Explanations: Formal Foundations, Priorities and Complexity AAAI 2025

How Your Location Relates to Health: Variable Importance and Interpretable Machine Learning for Environmental and Sociodemographic Data AAAI 2025

Interpretable DNFs IJCAI 2025

Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding ICCV 2025

Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models AAAI 2025

GeoPro-Net: Learning Interpretable Spatiotemporal Prediction Models Through Statistically-Guided Geo-Prototyping AAAI 2025

BEE: Metric-Adapted Explanations via Baseline Exploration-Exploitation AAAI 2025

Interpretable Image Classification via Non-parametric Part Prototype Learning CVPR 2025

Accurate Estimation of Feature Importance Faithfulness for Tree Models AAAI 2025

Unsupervised Hallucination Detection by Inspecting Reasoning Processes EMNLP 2025