Motion Based Audio-Visual Segmentation

Jiahao Li; Miao Liu; Shu Yang; Jing Wang; Xiang Xie

2024 INTERSPEECH INTERSPEECH 2024

Motion Based Audio-Visual Segmentation

Abstract

Recently, a novel task called audio-visual segmentation (AVS) has emerged, focusing on pixel-wise segmentation of sounding objects in videos. This task is particularly challenging as it involves segmenting individual pixels based on objects in video frames accompanied by sound. We propose a Motion Based Audio-Visual Segmentation model, which incorporates optical flow maps with motion information into the AVS task for the first time. The Motion-Vision Attention Module (MVA) is proposed to facilitate the fusion of motion and visual features to exploit motion information. Additionally, the Cross-Modal Bilateral-Attention Module (CMBA) is introduced to integrate multimodal features through crossmodal attention. The proposed model is evaluated on two distinct datasets, S4 and MS3, the outperformance of which demonstrates its effectiveness and feasibility in addressing the AVS task.

🧭 Keyword Pioneer — audio-visual segmentation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization

Authors

Jiahao Li , Miao Liu , Shu Yang , Jing Wang , Xiang Xie

Topics

Computer Vision > Processing > Image Segmentation Computer Vision > Processing > Video Processing

Keywords

optical flow audio-visual segmentation pixel-wise segmentation crossmodal attention motion information

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024