Interpretability Guarantees with Merlin-Arthur Classifiers

Stephan Wäldchen; Kartikey Sharma; Berkant Turan; Max Zimmer; Sebastian Pokutta

2024 AISTATS AISTATS 2024

Interpretability Guarantees with Merlin-Arthur Classifiers

Abstract

We propose an interactive multi-agent classifier that provides provable interpretability guarantees even for complex agents such as neural networks. These guarantees consist of lower bounds on the mutual information between selected features and the classification decision. Our results are inspired by the Merlin-Arthur protocol from Interactive Proof Systems and express these bounds in terms of measurable metrics such as soundness and completeness. Compared to existing interactive setups, we rely neither on optimal agents nor on the assumption that features are distributed independently. Instead, we use the relative strength of the agents as well as the new concept of Asymmetric Feature Correlation which captures the precise kind of correlations that make interpretability guarantees difficult. We evaluate our results on two small-scale datasets where high mutual information can be verified explicitly.

🧭 Keyword Pioneer — interpretability guarantee

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

Authors

Stephan Wäldchen , Kartikey Sharma , Berkant Turan , Max Zimmer , Sebastian Pokutta

Topics

Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling Machine Learning > Learning Types > Uncertainty Quantification

Keywords

probabilistic modeling feature correlation mutual information interpretability guarantee interactive proof neural network multi-agent system

Download PDF

Related papers

Causal Bandits with General Causal Models and Interventions 2024

Boundary-Aware Uncertainty for Feature Attribution Explainers 2024

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective 2024

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning 2024

Pure Exploration in Bandits with Linear Constraints 2024