Deep Learning › Techniques ›

Interpretability

36 directly classified papers

Papers per year

Papers

ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs ACL 2025

JoPA: Explaining Large Language Model’s Generation via Joint Prompt Attribution ACL 2025

Designing and Contextualising Probes for African Languages ACL 2025

Interpretable Image Classification via Non-parametric Part Prototype Learning CVPR 2025

Improving LLM Reasoning through Interpretable Role-Playing Steering EMNLP 2025

Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models EMNLP 2025

BEE: Metric-Adapted Explanations via Baseline Exploration-Exploitation AAAI 2025

Graph Segmentation and Contrastive Enhanced Explainer for Graph Neural Networks AAAI 2025

Interpreting Object-level Foundation Models via Visual Precision Search CVPR 2025

Attention IoU: Examining Biases in CelebA using Attention Maps CVPR 2025

Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation CVPR 2025

Position-aware Automatic Circuit Discovery ACL 2025

Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing ACL 2025

Improving Explainable Fact-Checking via Sentence-Level Factual Reasoning EMNLP 2024

Keep the Faith: Faithful Explanations in Convolutional Neural Networks for Case-Based Reasoning AAAI 2024

A Convolutional Neural Network Interpretable Framework for Human Ventral Visual Pathway Representation AAAI 2024

Q-SENN: Quantized Self-Explaining Neural Networks AAAI 2024

Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable? NIPS 2024

Interpreting Learned Feedback Patterns in Large Language Models NIPS 2024

InternalInspector I2: Robust Confidence Estimation in LLMs through Internal States EMNLP 2024

Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency NIPS 2023

VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers EMNLP 2023

Global Concept-Based Interpretability for Graph Neural Networks via Neuron Analysis AAAI 2023

Adversarial Normalization: I Can Visualize Everything (ICE) CVPR 2023

B-Cos Networks: Alignment Is All We Need for Interpretability CVPR 2022