Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions

Peter Jansen

2020 EMNLP EMNLP 2020

Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions

Abstract

AbstractThe recently proposed ALFRED challenge task aims for a virtual robotic agent to complete complex multi-step everyday tasks in a virtual home environment from high-level natural language directives, such as “put a hot piece of bread on a plate”. Currently, the best-performing models are able to complete less than 1% of these tasks successfully. In this work we focus on modeling the translation problem of converting natural language directives into detailed multi-step sequences of actions that accomplish those goals in the virtual environment. We empirically demonstrate that it is possible to generate gold multi-step plans from language directives alone without any visual input in 26% of unseen cases. When a small amount of visual information, the starting location in the virtual environment, is incorporated, our best-performing GPT-2 model successfully generates gold command sequences in 58% of cases, suggesting contextualized language models may provide strong planning modules for grounded virtual agents.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Knowledge & Reasoning

🧭 Keyword Pioneer — action sequence generation

🐣 Hot Topic Early Bird — instruction following

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Peter Jansen

Topics

Artificial Intelligence > Core AI > Planning Knowledge & Reasoning > Reasoning > Automated Planning Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Language

Keywords

automated planning instruction following multi-step planning language model action sequence virtual agent action sequence generation natural language directive

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020