Neighboring Words Affect Human Interpretation of Saliency Explanations

Alon Jacovi; Hendrik Schuff; Heike Adel; Ngoc Thang Vu; Yoav Goldberg

2023 ACL ACL 2023

Neighboring Words Affect Human Interpretation of Saliency Explanations

Abstract

AbstractWord-level saliency explanations (“heat maps over words”) are often used to communicate feature-attribution in text-based models. Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores. We conduct a user study to investigate how the marking of a word’s *neighboring words* affect the explainee’s perception of the word’s importance in the context of a saliency explanation. We find that neighboring words have significant effects on the word’s importance rating. Concretely, we identify that the influence changes based on neighboring direction (left vs. right) and a-priori linguistic and computational measures of phrases and collocations (vs. unrelated neighboring words).Our results question whether text-based saliency explanations should be continued to be communicated at word level, and inform future research on alternative saliency explanation methods.

🧭 Keyword Pioneer — word-level explanation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio