Segment beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation

Renjie Wu; Hu Wang; Feras Dayoub; Hsiang-Ting Chen

2024 AAAI AAAI 2024

Segment beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation

Abstract

Abstract Augmented Reality (AR) devices, emerging as prominent mobile interaction platforms, face challenges in user safety, particularly concerning oncoming vehicles. While some solutions leverage onboard camera arrays, these cameras often have limited field-of-view (FoV) with front or downward perspectives. Addressing this, we propose a new out-of-view semantic segmentation task and Segment Beyond View (SBV), a novel audio-visual semantic segmentation method. SBV supplements the visual modality, which miss the information beyond FoV, with the auditory information using a teacher-student distillation model (Omni2Ego). The model consists of a vision teacher utilising panoramic information, an auditory teacher with 8-channel audio, and an audio-visual student that takes views with limited FoV and binaural audio as input and produce semantic segmentation for objects outside FoV. SBV outperforms existing models in comparative evaluations and shows a consistent performance across varying FoV ranges and in monaural audio settings.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — modality completion

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Renjie Wu , Hu Wang , Feras Dayoub , Hsiang-Ting Chen

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Knowledge Distillation Computer Vision > Domain-Specific > Autonomous Driving Computer Vision > Processing > Semantic Segmentation Deep Learning > Learning Types > Multi-Modal Learning

Keywords

semantic segmentation knowledge distillation autonomous driving audio-visual learning augmented reality teacher-student model modality completion teacher-student distillation

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024