Towards Explainable Evaluation of Language Models on the Semantic Similarity of Visual Concepts

Maria Lymperaiou; George Manoliadis; Orfeas Menis Mastromichalakis; Edmund G. Dervakos; Giorgos Stamou

2022 COLING COLING 2022

Towards Explainable Evaluation of Language Models on the Semantic Similarity of Visual Concepts

Abstract

AbstractRecent breakthroughs in NLP research, such as the advent of Transformer models have indisputably contributed to major advancements in several tasks. However, few works research robustness and explainability issues of their evaluation strategies. In this work, we examine the behavior of high-performing pre-trained language models, focusing on the task of semantic similarity for visual vocabularies. First, we address the need for explainable evaluation metrics, necessary for understanding the conceptual quality of retrieved instances. Our proposed metrics provide valuable insights in local and global level, showcasing the inabilities of widely used approaches. Secondly, adversarial interventions on salient query semantics expose vulnerabilities of opaque metrics and highlight patterns in learned linguistic representations.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — adversarial intervention

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Maria Lymperaiou , George Manoliadis , Orfeas Menis Mastromichalakis , Edmund G. Dervakos , Giorgos Stamou

Topics

Artificial Intelligence > Core AI > Interpretability Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Resources & Methods > Language Modeling Machine Learning > Learning Types > Evaluation

Keywords

adversarial robustness language model evaluation visual concept semantic similarity language model robustness adversarial testing explainable evaluation adversarial intervention transformer model conceptual quality

Download PDF

Related papers

MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation 2022

The Role of Context and Uncertainty in Shallow Discourse Parsing 2022

SelfMix: Robust Learning against Textual Label Noise with Self-Mixup Training 2022

Complicate Then Simplify: A Novel Way to Explore Pre-trained Models for Text Classification 2022

Repo4QA: Answering Coding Questions via Dense Retrieval on GitHub Repositories 2022