Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

Wanrong Zhu; Xin Wang; Pradyumna Narayana; Kazoo Sone; Sugato Basu; William Yang Wang

2020 EMNLP EMNLP 2020

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

Abstract

AbstractA major challenge in visually grounded language generation is to build robust benchmark datasets and models that can generalize well in real-world settings. To do this, it is critical to ensure that our evaluation protocols are correct, and benchmarks are reliable. In this work, we set forth to design a set of experiments to understand an important but often ignored problem in visually grounded language generation: given that humans have different utilities and visual attention, how will the sample variance in multi-reference datasets affect the models’ performance? Empirically, we study several multi-reference datasets and corresponding vision-and-language tasks. We show that it is of paramount importance to report variance in experiments; that human-generated references could vary drastically in different datasets/tasks, revealing the nature of each task; that metric-wise, CIDEr has shown systematically larger variances than others. Our evaluations on reference-per-instance shed light on the design of reliable datasets in the future.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — metric analysis

🐣 Hot Topic Early Bird — evaluation benchmark

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Wanrong Zhu , Xin Wang , Pradyumna Narayana , Kazoo Sone , Sugato Basu , William Yang Wang

Topics

Machine Learning > Optimization & Theory > Theory Computer Vision > Generation > Image Captioning

Keywords

evaluation benchmark sample variance metric analysis visually grounded language multi-reference dataset

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020