The Meaning of “Most” for Visual Question Answering Models

Alexander Kuhnle; Ann Copestake

2019 ACL ACL 2019

The Meaning of “Most” for Visual Question Answering Models

Abstract

AbstractThe correct interpretation of quantifier statements in the context of a visual scene requires non-trivial inference mechanisms. For the example of “most”, we discuss two strategies which rely on fundamentally different cognitive concepts. Our aim is to identify what strategy deep learning models for visual question answering learn when trained on such questions. To this end, we carefully design data to replicate experiments from psycholinguistics where the same question was investigated for humans. Focusing on the FiLM visual question answering model, our experiments indicate that a form of approximate number system emerges whose performance declines with more difficult scenes as predicted by Weber’s law. Moreover, we identify confounding factors, like spatial arrangement of the scene, which impede the effectiveness of this system.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — approximate number system

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alexander Kuhnle , Ann Copestake

Topics

Machine Learning > Core Methods > Representation Learning Natural Language Processing > Applications > Question Answering Artificial Intelligence > Core AI > Computer Vision Computer Vision > Applications > Visual Question Answering

Keywords

visual question answering deep learning feature-wise linear modulation approximate number system weber law

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019