Action Inference for Destination Prediction in Vision-and-Language Navigation

Anirudh Kondapally; Kentaro Yamada; Hitomi Yanaka

2024 ACL ACL 2024

Action Inference for Destination Prediction in Vision-and-Language Navigation

Abstract

AbstractVision-and-Language Navigation (VLN) encompasses interacting with autonomous vehicles using language and visual input from the perspective of mobility.Most of the previous work in this field focuses on spatial reasoning and the semantic grounding of visual information.However, reasoning based on the actions of pedestrians in the scene is not much considered.In this study, we provide a VLN dataset for destination prediction with action inference to investigate the extent to which current VLN models perform action inference.We introduce a crowd-sourcing process to construct a dataset for this task in two steps: (1) collecting beliefs about the next action for a pedestrian and (2) annotating the destination considering the pedestrian’s next action.Our benchmarking results of the models on destination prediction lead us to believe that the models can learn to reason about the effect of the action and the next action on the destination to a certain extent.However, there is still much scope for improvement.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Knowledge & Reasoning

🧭 Keyword Pioneer — pedestrian reasoning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Anirudh Kondapally , Kentaro Yamada , Hitomi Yanaka

Topics

Artificial Intelligence > Core AI > Autonomous Vehicles Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Planning Artificial Intelligence > Core AI > Trajectory Prediction Computer Vision > Domain-Specific > Autonomous Driving Knowledge & Reasoning > Reasoning > Causal Inference Artificial Intelligence > Core AI > Multi-Modal Learning

Keywords

vision-language navigation autonomous vehicle spatial reasoning pedestrian behavior vision-and-language navigation action inference destination prediction pedestrian reasoning pedestrian modeling

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024