Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness

Srija Mukhopadhyay; Adnan Qidwai; Aparna Garimella; Pritika Ramu; Vivek Gupta; Dan Roth

2024 EMNLP EMNLP 2024

Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness

Abstract

AbstractChart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual Language Models (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art VLMs on comprehensive datasets, developed specifically for this study, encompassing diverse question categories and chart formats. We investigate two key aspects: 1) the models’ ability to handle varying levels of chart and question complexity, and 2) their robustness across different visual representations of the same underlying data. Our analysis reveals significant performance variations based on question and chart types, highlighting both strengths and weaknesses of current models. Additionally, we identify areas for improvement and propose future research directions to build more robust and reliable CQA systems. This study sheds light on the limitations of current models and paves the way for future advancements in the field.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🐣 Hot Topic Early Bird — visual understanding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Srija Mukhopadhyay , Adnan Qidwai , Aparna Garimella , Pritika Ramu , Vivek Gupta , Dan Roth

Topics

Artificial Intelligence > Core AI > Multimodal Learning Computer Vision > Core AI > Multimodal Learning Deep Learning > Models > Vision-Language Models Computer Vision > Applications > Visual Question Answering

Keywords

model robustness visual understanding visual language model chart question answering

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024