2021 INTERSPEECH INTERSPEECH 2021

An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection

Abstract

In this paper, we present a novel mutual mean teaching based domain adaptation (MMT-DA) method for sound event detection (SED) task, which can effectively exploit synthetic data to improve the SED performance. Existing methods simply treat the synthetic data as strongly-labeled data in semi-supervised learning (SSL) framework. Benefiting from the strong labels of synthetic data, superior SED performance can be achieved. However, a distribution mismatch between synthetic and real data raises an evident challenge for domain adaptation (DA). In MMT-DA, convolutional recurrent neural networks (CRNN) learned from different datasets (i.e. total data:real+synthetic, and real data) are exploited for DA. Specifically, mean teacher method using CRNN is employed for utilizing the unlabeled real data. To compensate the domain diversity, an additional domain classifier with gradient reverse layer(GRL) is used for training a mean teacher for total data. The student CRNNs are mutually taught using the soft predictions of unlabeled data obtained from different teachers. Furthermore, a strip pooling based attention module is exploited to model the inter-dependencies between channels and time-frequency dimensions to exploit the structure information. Experimental results on Task4 of DCASE2020 demonstrate the ability of the proposed method, achieving 52.0% F1-score on the validation dataset, which outperforms the winning system’s 50.6%.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — mutual mean teaching
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio