In-the-Wild Video Question Answering

Santiago Castro; Naihao Deng; Pingxuan Huang; Mihai Burzo; Rada Mihalcea

2022 COLING COLING 2022

In-the-Wild Video Question Answering

Abstract

AbstractExisting video understanding datasets mostly focus on human interactions, with little attention being paid to the “in the wild” settings, where the videos are recorded outdoors. We propose WILDQA, a video understanding dataset of videos recorded in outside settings. In addition to video question answering (Video QA), we also introduce the new task of identifying visual support for a given question and answer (Video Evidence Selection). Through evaluations using a wide range of baseline models, we show that WILDQA poses new challenges to the vision and language research communities. The dataset is available at https: //lit.eecs.umich.edu/wildqa/.

🌉 Interdisciplinary Bridge — Computer Vision and Natural Language Processing

🧭 Keyword Pioneer — video evidence selection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Santiago Castro , Naihao Deng , Pingxuan Huang , Mihai Burzo , Rada Mihalcea

Topics

Computer Vision > Processing > Video Understanding Natural Language Processing > Applications > Question Answering

Keywords

video understanding video question answering video evidence selection

Download PDF

Related papers

MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation 2022

The Role of Context and Uncertainty in Shallow Discourse Parsing 2022

SelfMix: Robust Learning against Textual Label Noise with Self-Mixup Training 2022

Complicate Then Simplify: A Novel Way to Explore Pre-trained Models for Text Classification 2022

Repo4QA: Answering Coding Questions via Dense Retrieval on GitHub Repositories 2022