Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues

Qingxiu Dong; Ziwei Qin; Heming Xia; Tian Feng; Shoujie Tong; Haoran Meng; Lin Xu; zhongyu wei; Weidong Zhan; Baobao Chang; Sujian Li; Tianyu Liu; Zhifang Sui

2022 ACL ACL 2022

Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues

Abstract

AbstractIt is a common practice for recent works in vision language cross-modal reasoning to adopt a binary or multi-choice classification formulation taking as input a set of source image(s) and textual query. In this work, we take a sober look at such an “unconditional” formulation in the sense that no prior knowledge is specified with respect to the source image(s). Inspired by the designs of both visual commonsense reasoning and natural language inference tasks, we propose a new task termed “Premise-based Multi-modal Reasoning” (PMR) where a textual premise is the background presumption on each source image. The PMR dataset contains 15,360 manually annotated samples which are created by a multi-phase crowd-sourcing process. With selected high-quality movie screenshots and human-curated premise templates from 6 pre-defined categories, we ask crowd-source workers to write one true hypothesis and three distractors (4 choices) given the premise and image through a cross-check procedure.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — premise-based inference

🐣 Hot Topic Early Bird — multimodal reasoning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Qingxiu Dong , Ziwei Qin , Heming Xia , Tian Feng , Shoujie Tong , Haoran Meng , Lin Xu , zhongyu wei , Weidong Zhan , Baobao Chang , Sujian Li , Tianyu Liu , Zhifang Sui

Topics

Artificial Intelligence > Core AI > Causal Inference Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Reasoning Computer Vision > Core AI > Multimodal Learning Natural Language Processing > Applications > Visual Question Answering Deep Learning > Learning Types > Multi-Modal Learning

Keywords

natural language inference hypothesis testing multimodal reasoning visual commonsense vision language visual commonsense reasoning premise-based inference

Download PDF

KG-CRuSE: Recurrent Walks over Knowledge Graph for Explainable Conversation Reasoning using Semantic Embeddings 2022

Toward Knowledge-Enriched Conversational Recommendation Systems 2022

Investigating the Medical Coverage of a Translation System into Pictographs for Patients with an Intellectual Disability 2022

TableFormer: Robust Transformer Modeling for Table-Text Encoding 2022

Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues

Abstract

Authors

Topics

Keywords

Related papers