INVIGORATE: Interactive Visual Grounding and Grasping in Clutter

Hanbo Zhang; Yunfan Lu; Cunjun Yu; David Hsu; Xuguang Lan; Nanning Zheng

2021 RSS RSS 2021

INVIGORATE: Interactive Visual Grounding and Grasping in Clutter

Abstract

This paper presents INVIGORATE; a robot system that interacts with humans through natural language and grasps a specified object in clutter. The objects may occlude; obstruct; or even stack on top of one another. INVIGORATE embodies several challenges: (i) infer the target object among other occluding objects; from input language expressions and RGB images; (ii) infer object blocking relationships (OBRs) from the images; and (iii) synthesize a multi-step plan to ask questions that disambiguate the target object and to grasp it successfully. We train separate neural networks for object detection; for visual grounding; for question generation; and for OBR detection and grasping. They allow for unrestricted object categories and language expressions; subject to the training datasets. However; errors in visual perception and ambiguity in human languages are inevitable and negatively impact the robot’s performance. To overcome these uncertainties; we build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules. Through approximate POMDP planning; the robot tracks the history of observations and asks disambiguation questions in order to achieve a near-optimal sequence of actions that identify and grasp the target object. INVIGORATE combines the benefits of model-based POMDP planning and data-driven deep learning. Preliminary experiments with INVIGORATE on a Fetch robot show significant benefits of this integrated approach to object grasping in clutter with natural language interactions. A demonstration video is available online: https://youtu.be/zYakh80SGcU.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hanbo Zhang , Yunfan Lu , Cunjun Yu , David Hsu , Xuguang Lan , Nanning Zheng

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Planning Computer Vision > Analysis > Object Detection

Keywords

object detection visual grounding pomdp planning robot manipulation natural language interaction object grasping

Download PDF

Related papers

Resolving Conflict in Decision-Making for Autonomous Driving 2021

Variational Inference MPC using Tsallis Divergence 2021

Jerk-limited Real-time Trajectory Generation with Arbitrary Target States 2021

Sampling-Based Motion Planning on Sequenced Manifolds 2021

Real-Time Multi-View 3D Human Pose Estimation using Semantic Feedback to Smart Edge Sensors 2021