QACE: Asking Questions to Evaluate an Image Caption

Hwanhee Lee; Thomas Scialom; Seunghyun Yoon; Franck Dernoncourt; Kyomin Jung

2021 EMNLP EMNLP 2021

QACE: Asking Questions to Evaluate an Image Caption

Abstract

AbstractIn this paper we propose QACE, a new metric based on Question Answering for Caption Evaluation to evaluate image captioning based on Question Generation(QG) and Question Answering(QA) systems. QACE generates questions on the evaluated caption and check its content by asking the questions on either the reference caption or the source image. We first develop QACE_Ref that compares the answers of the evaluated caption to its reference, and report competitive results with the state-of-the-art metrics. To go further, we propose QACE_Img, that asks the questions directly on the image, instead of reference. A Visual-QA system is necessary for QACE_Img. Unfortunately, the standard VQA models are actually framed a classification among only few thousands categories. Instead, we propose Visual-T5, an abstractive VQA system. The resulting metric, QACE_Img is multi-modal, reference-less and explainable. Our experiments show that QACE_Img compares favorably w.r.t. other reference-less metrics.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hwanhee Lee , Thomas Scialom , Seunghyun Yoon , Franck Dernoncourt , Kyomin Jung

Topics

Artificial Intelligence > Core AI > Multimodal Learning Computer Vision > Generation > Image Captioning Natural Language Processing > Generation > Summarization Natural Language Processing > Applications > Machine Reading Comprehension Computer Vision > Generation > Visual Question Answering Computer Vision > Applications > Computer Vision

Keywords

visual question answering image captioning question generation image caption evaluation caption evaluation reference-less metric

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021