Interpreting Pretrained Language Models via Concept Bottlenecks (Extended Abstract)

Zhen Tan; Lu Cheng; Song Wang; Yuan Bo; Jundong Li; Huan Liu

2025 IJCAI IJCAI 2025

Interpreting Pretrained Language Models via Concept Bottlenecks (Extended Abstract)

Abstract

Pretrained language models (PLMs) achieve state-of-the-art results but often function as ``black boxes'', hindering interpretability and responsible deployment. While methods like attention analysis exist, they often lack clarity and intuitiveness. We propose interpreting PLMs through high-level, human-understandable concepts using Concept Bottleneck Models (CBMs). This extended abstract introduces C3M (ChatGPT-guided Concept augmentation with Concept-level Mixup), a novel framework for training Concept-Bottleneck-Enabled PLMs (CBE-PLMs). C3M leverages Large Language Models (LLMs) like ChatGPT to augment concept sets and generate noisy concept labels, combined with a concept-level MixUp mechanism to enhance robustness and effectively learn from both human-annotated and machine-generated concepts. Empirical results show our approach provides intuitive explanations, aids model diagnosis via test-time intervention, and improves the interpretability-utility trade-off, even with limited or noisy concept annotations. This is an concise version of [Tan et al., 2024b], recipient of the Best Paper Award at PAKDD 2024. Code and data are released at https://github.com/Zhen-Tan-dmml/CBM_NLP.git.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🧭 Keyword Pioneer — concept-level mixup

Authors

Zhen Tan , Lu Cheng , Song Wang , Yuan Bo , Jundong Li , Huan Liu

Topics

Artificial Intelligence > Core AI > Interpretability Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Types > Representation Learning Artificial Intelligence > Core AI > Large Language Models

Keywords

concept bottleneck model pretrained language model model diagnosis pretrain language model large language model concept-level mixup

Download PDF

Related papers

Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain 2025

Responsibility Anticipation and Attribution in LTLf 2025

Argument-based Multi-Issue Negotiation 2025

Online Resource Sharing: Better Robust Guarantees via Randomized Strategies 2025

Equitable Mechanism Design for Facility Location 2025