Re-evaluating Automatic Metrics for Image Captioning

Mert Kilickaya; Aykut Erdem; Nazli Ikizler-Cinbis; Erkut Erdem

2017 EACL EACL 2017

Re-evaluating Automatic Metrics for Image Captioning

Abstract

AbstractThe task of generating natural language descriptions from images has received a lot of attention in recent years. Consequently, it is becoming increasingly important to evaluate such image captioning approaches in an automatic manner. In this paper, we provide an in-depth evaluation of the existing image captioning metrics through a series of carefully designed experiments. Moreover, we explore the utilization of the recently proposed Word Mover’s Distance (WMD) document metric for the purpose of image captioning. Our findings outline the differences and/or similarities between metrics and their relative robustness by means of extensive correlation, accuracy and distraction based evaluations. Our results also demonstrate that WMD provides strong advantages over other metrics.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — metric correlation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mert Kilickaya , Aykut Erdem , Nazli Ikizler-Cinbis , Erkut Erdem

Topics

Computer Vision > Generation > Image Captioning Natural Language Processing > Applications > Natural Language Inference Machine Learning > Learning Types > Evaluation

Keywords

image captioning correlation analysis automatic evaluation metric correlation natural language description word mover distance

Download PDF

Related papers

Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages 2017

Learning and Knowledge Transfer with Memory Networks for Machine Comprehension 2017

Is this a Child, a Girl or a Car? Exploring the Contribution of Distributional Similarity to Learning Referential Word Meanings 2017

Building Web-Interfaces for Vector Semantic Models with the WebVectors Toolkit 2017

Assessing Convincingness of Arguments in Online Debates with Limited Number of Features 2017