Embodied Question Answering

Abhishek Das; Samyak Datta; Georgia Gkioxari; Stefan Lee; Devi Parikh; Dhruv Batra

2018 CVPR CVPR 2018

Embodied Question Answering

Abstract

We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?"). In order to answer, the agent must first intelligently navigate to explore the environment, gather necessary visual information through first-person (egocentric) vision, and then answer the question ("orange"). EmbodiedQA requires a range of AI skills -- language understanding, visual recognition, active perception, goal-driven navigation, commonsense reasoning, long-term memory, and grounding language into actions. In this work, we develop a dataset of questions and answers in House3D environments, evaluation metrics, and a hierarchical model trained with imitation and reinforcement learning.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing and Reinforcement Learning

🐣 Hot Topic Early Bird — egocentric vision

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Abhishek Das , Samyak Datta , Georgia Gkioxari , Stefan Lee , Devi Parikh , Dhruv Batra

Topics

Artificial Intelligence > Core AI > Agent Systems Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics Artificial Intelligence > Core AI > Robotics Deep Learning > Learning Types > Reinforcement Learning Natural Language Processing > Applications > Visual Question Answering Artificial Intelligence > Core AI > Visual Question Answering

Keywords

reinforcement learning imitation learning visual question answering egocentric vision active perception visual navigation hierarchical model commonsense reasoning agent system embodied question answering goal-driven navigation

Download PDF

Related papers

Multi-Shot Pedestrian Re-Identification via Sequential Decision Making 2018

Multi-Cue Correlation Filters for Robust Visual Tracking 2018

Pointwise Convolutional Neural Networks 2018

Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking 2018

Image Generation From Scene Graphs 2018