Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models EACL 2026

Targeted Syntactic Evaluation of Language Models on Georgian Case Alignment EACL 2026

Early-Exit and Instant Confidence Translation Quality Estimation EACL 2026

Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models EACL 2026

Can Activation Steering Generalize Across Languages? A Study on Syllogistic Reasoning in Language Models EACL 2026

Personality Editing for Language Models through Adjusting Self-Referential Queries EACL 2026

HateXScore: A Metric Suite for Evaluating Reasoning Quality in Hate Speech Explanations EACL 2026

Feature Drift: How Fine-Tuning Repurposes Representations in LLMs EACL 2026

Persona Switch: Mixing Distinct Perspectives in Decoding Time EACL 2026

Detection of Adversarial Prompts with Model Predictive Entropy EACL 2026

Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders EACL 2026

Hearing Between the Lines: Unlocking the Reasoning Power of LLMs for Speech Evaluation EACL 2026

Defeating Cerberus: Privacy-Leakage Mitigation in Vision Language Models EACL 2026

Do LLMs model human linguistic variation? A case study in Hindi-English Verb code-mixing EACL 2026

FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs EACL 2026

Punctuations and Predicates in Language Models EACL 2026

Argument-Based Consistency in Toxicity Explanations of LLMs EACL 2026

Learning to Judge: LLMs Designing and Applying Evaluation Rubrics EACL 2026

DeVisE: Towards the Behavioral Testing of Medical Large Language Models EACL 2026

A Hybrid Confidence-Aware Framework for Arabic Toxicity Detection in Social Media EACL 2026

A Knowledge Graph Based Diagnostic Framework for Analyzing Hallucinations in Arabic Machine Reading Comprehension EACL 2026

Position: Biomedical NLP Demands Specialization, Not Generalization EACL 2026

Funny or Persuasive, but Not Both: Evaluating Fine-Grained Multi-Concept Control in LLMs EACL 2026

Confidence Leaps in LLM Reasoning: Early Stopping and Cross-Model Transfer EACL 2026

CHiRPE: A Step Towards Real-World Clinical NLP with Clinician-Oriented Model Explanations EACL 2026