Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Techniques
Deep Learning
›
Techniques
›
Interpretability
36 directly classified papers
Papers per year
2018: 1
2019: 1
2020: 2
2021: 4
2022: 4
2023: 4
2024: 7
2025: 13
Papers
ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs
ACL 2025
JoPA: Explaining Large Language Model’s Generation via Joint Prompt Attribution
ACL 2025
Designing and Contextualising Probes for African Languages
ACL 2025
Interpretable Image Classification via Non-parametric Part Prototype Learning
CVPR 2025
Improving LLM Reasoning through Interpretable Role-Playing Steering
EMNLP 2025
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models
EMNLP 2025
BEE: Metric-Adapted Explanations via Baseline Exploration-Exploitation
AAAI 2025
Graph Segmentation and Contrastive Enhanced Explainer for Graph Neural Networks
AAAI 2025
Interpreting Object-level Foundation Models via Visual Precision Search
CVPR 2025
Attention IoU: Examining Biases in CelebA using Attention Maps
CVPR 2025
Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
CVPR 2025
Position-aware Automatic Circuit Discovery
ACL 2025
Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing
ACL 2025
Improving Explainable Fact-Checking via Sentence-Level Factual Reasoning
EMNLP 2024
Keep the Faith: Faithful Explanations in Convolutional Neural Networks for Case-Based Reasoning
AAAI 2024
A Convolutional Neural Network Interpretable Framework for Human Ventral Visual Pathway Representation
AAAI 2024
Q-SENN: Quantized Self-Explaining Neural Networks
AAAI 2024
Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?
NIPS 2024
Interpreting Learned Feedback Patterns in Large Language Models
NIPS 2024
InternalInspector I2: Robust Confidence Estimation in LLMs through Internal States
EMNLP 2024
Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency
NIPS 2023
VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers
EMNLP 2023
Global Concept-Based Interpretability for Graph Neural Networks via Neuron Analysis
AAAI 2023
Adversarial Normalization: I Can Visualize Everything (ICE)
CVPR 2023
B-Cos Networks: Alignment Is All We Need for Interpretability
CVPR 2022
<
1
2
>