2022 IJCAI IJCAI 2022

Cross-modal Representation Learning and Relation Reasoning for Bidirectional Adaptive Manipulation

Abstract

Since single-modal controllable manipulation typically requires supervision of information from other modalities or cooperation with complex software and experts, this paper addresses the problem of cross-modal adaptive manipulation (CAM). The novel task performs cross-modal semantic alignment from mutual supervision and implements bidirectional exchange of attributes, relations, or objects in parallel, benefiting both modalities while significantly reducing manual effort. We introduce a robust solution for CAM, which includes two essential modules, namely Heterogeneous Representation Learning (HRL) and Cross-modal Relation Reasoning (CRR). The former is designed to perform representation learning for cross-modal semantic alignment on heterogeneous graph nodes. The latter is adopted to identify and exchange the focused attributes, relations, or objects in both modalities. Our method produces pleasing cross-modal outputs on CUB and Visual Genome.

πŸŒ‰ Interdisciplinary Bridge β€” Artificial Intelligence and Deep Learning and Machine Learning
🧭 Keyword Pioneer β€” cross-modal representation learning
🐝 Cross-Pollinator β€” Artificial Intelligence, Computer Vision, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing

Authors