Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI

Suzanna Sia; Anton Belyy; Amjad Almahairi; Madian Khabsa; Luke Zettlemoyer; Lambert Mathias

2023 AAAI AAAI 2023

Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI

Abstract

Abstract Evaluating an explanation's faithfulness is desired for many reasons such as trust, interpretability and diagnosing the sources of model's errors. In this work, which focuses on the NLI task, we introduce the methodology of Faithfulness-through-Counterfactuals, which first generates a counterfactual hypothesis based on the logical predicates expressed in the explanation, and then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic (i.e. if the new formula is \textit{logically satisfiable}). In contrast to existing approaches, this does not require any explanations for training a separate verification model. We first validate the efficacy of automatic counterfactual hypothesis generation, leveraging on the few-shot priming paradigm. Next, we show that our proposed metric distinguishes between human-model agreement and disagreement on new counterfactual input. In addition, we conduct a sensitivity analysis to validate that our metric is sensitive to unfaithful explanations.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — logical satisfiability

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Suzanna Sia , Anton Belyy , Amjad Almahairi , Madian Khabsa , Luke Zettlemoyer , Lambert Mathias

Topics

Artificial Intelligence > Core AI > Interpretability Natural Language Processing > Applications > Natural Language Inference Machine Learning > Learning Types > Reasoning

Keywords

natural language inference explainable ai counterfactual reasoning counterfactual explanation faithful explanation faithfulness evaluation logical satisfiability

Download PDF

Related papers

A Model-Agnostic Heuristics for Selective Classification 2023

Tackling Safe and Efficient Multi-Agent Reinforcement Learning via Dynamic Shielding (Student Abstract) 2023

Head-Free Lightweight Semantic Segmentation with Linear Transformer 2023

Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning 2023

Deep Spiking Neural Networks with High Representation Similarity Model Visual Pathways of Macaque and Mouse 2023