Multimodal Embodied Plan Prediction Augmented with Synthetic Embodied Dialogue

Aishwarya Padmakumar; Mert Inan; Spandana Gella; Patrick Lange; Dilek Hakkani-Tur

2023 EMNLP EMNLP 2023

Multimodal Embodied Plan Prediction Augmented with Synthetic Embodied Dialogue

Abstract

AbstractEmbodied task completion is a challenge where an agent in a simulated environment must predict environment actions to complete tasks based on natural language instructions and ego-centric visual observations. We propose a variant of this problem where the agent predicts actions at a higher level of abstraction called a plan, which helps make agent actions more interpretable and can be obtained from the appropriate prompting of large language models. We show that multimodal transformer models can outperform language-only models for this problem but fall significantly short of oracle plans. Since collecting human-human dialogues for embodied environments is expensive and time-consuming, we propose a method to synthetically generate such dialogues, which we then use as training data for plan prediction. We demonstrate that multimodal transformer models can attain strong zero-shot performance from our synthetic data, outperforming language-only models trained on human-human data.

🧭 Keyword Pioneer — plan prediction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Aishwarya Padmakumar , Mert Inan , Spandana Gella , Patrick Lange , Dilek Hakkani-Tur

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Learning Paradigms > Zero-Shot Learning

Keywords

zero-shot learning multimodal learning embodied ai transformer model plan prediction

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023