COSMic: A Coherence-Aware Generation Metric for Image Descriptions

Mert Inan; Piyush Sharma; Baber Khalid; Radu Soricut; Matthew Stone; Malihe Alikhani

2021 EMNLP EMNLP 2021

COSMic: A Coherence-Aware Generation Metric for Image Descriptions

Abstract

AbstractDevelopers of text generation models rely on automated evaluation metrics as a stand-in for slow and expensive manual evaluations. However, image captioning metrics have struggled to give accurate learned estimates of the semantic and pragmatic success of output text. We address this weakness by introducing the first discourse-aware learned generation metric for evaluating image descriptions. Our approach is inspired by computational theories of discourse for capturing information goals using coherence. We present a dataset of image–description pairs annotated with coherence relations. We then train a coherence-aware metric on a subset of the Conceptual Captions dataset and measure its effectiveness—its ability to predict human ratings of output captions—on a test set composed of out-of-domain images. We demonstrate a higher Kendall Correlation Coefficient for our proposed metric with the human judgments for the results of a number of state-of-the-art coherence-aware caption generation models when compared to several other metrics including recently proposed learned metrics such as BLEURT and BERTScore.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — generation metric

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mert Inan , Piyush Sharma , Baber Khalid , Radu Soricut , Matthew Stone , Malihe Alikhani

Topics

Artificial Intelligence > Core AI > Multimodal Learning Computer Vision > Generation > Image Captioning Natural Language Processing > Generation > Text Generation Machine Learning > Learning Types > Evaluation Deep Learning > Learning Types > Representation Learning

Keywords

text generation multimodal learning image captioning evaluation metric coherence relation human rating generation metric

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021