Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Di Hu; Rui Qian; Minyue Jiang; Xiao Tan; Shilei Wen; Errui Ding; Weiyao Lin; Dejing Dou

2020 NIPS NeurIPS 2020

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Abstract

Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes. Then, class-aware object localization maps are generated in the cocktail-party scenarios by referring the pre-learned object knowledge, and the sounding objects are accordingly selected by matching audio and visual object category distributions, where the audiovisual consistency is viewed as the self-supervised signal. Experimental results in both realistic and synthesized cocktail-party videos demonstrate that our model is superior in filtering out silent objects and pointing out the location of sounding objects of different classes. Code is available at https://github.com/DTaoo/Discriminative-Sounding-Objects-Localization.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — sounding object localization

🐣 Hot Topic Early Bird — source separation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Di Hu , Rui Qian , Minyue Jiang , Xiao Tan , Shilei Wen , Errui Ding , Weiyao Lin , Dejing Dou

Topics

Machine Learning > Learning Types > Self-Supervised Learning Computer Vision > Analysis > Object Detection

Keywords

source separation self-supervised learning visual object sounding object localization audiovisual matching

Download PDF

Related papers

Higher-Order Spectral Clustering of Directed Graphs 2020

Self-Supervised MultiModal Versatile Networks 2020

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation 2020

Taming Discrete Integration via the Boon of Dimensionality 2020