REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning

Ming Jiang; Junjie Hu; Qiuyuan Huang; Lei Zhang; Jana Diesner; Jianfeng Gao

2019 IJCNLP IJCNLP 2019

REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning

Abstract

AbstractPopular metrics used for evaluating image captioning systems, such as BLEU and CIDEr, provide a single score to gauge the system’s overall effectiveness. This score is often not informative enough to indicate what specific errors are made by a given system. In this study, we present a fine-grained evaluation method REO for automatically measuring the performance of image captioning systems. REO assesses the quality of captions from three perspectives: 1) Relevance to the ground truth, 2) Extraness of the content that is irrelevant to the ground truth, and 3) Omission of the elements in the images and human references. Experiments on three benchmark datasets demonstrate that our method achieves a higher consistency with human judgments and provides more intuitive evaluation results than alternative metrics.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — human judgment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ming Jiang , Junjie Hu , Qiuyuan Huang , Lei Zhang , Jana Diesner , Jianfeng Gao

Topics

Computer Vision > Generation > Image Captioning Natural Language Processing > Applications > Summarization Machine Learning > Learning Types > Evaluation

Keywords

image captioning human judgment semantic similarity evaluation metrics fine-grained evaluation image captioning evaluation

Download PDF

Related papers

Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation 2019

Exploiting Monolingual Data at Scale for Neural Machine Translation 2019

Distributionally Robust Language Modeling 2019

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling 2019

ARAML: A Stable Adversarial Training Framework for Text Generation 2019