Confidence-aware Hypothesis Transfer Networks for Source-Free Cross-Corpus Speech Emotion Recognition

Jincen Wang; Yan Zhao; Cheng Lu; Hailun Lian; Hongli Chang; Yuan Zong; Wenming Zheng

2024 INTERSPEECH INTERSPEECH 2024

Confidence-aware Hypothesis Transfer Networks for Source-Free Cross-Corpus Speech Emotion Recognition

Abstract

The goal of Source-free cross-corpus speech emotion recognition (SER) is to transfer emotion knowledge from source corpus to target one without access to source data. To address this challenge, we develop a novel method named Confidence-aware Hypothesis Transfer Network (CaHTN) including two modules. To be specific, the first module called hypothesis implicit transfer leverages the frozen source classifier (hypothesis) to force target samples to implicitly align the source hypothesis by information maximization. Besides, a bidirectional confident self-training module is designed to exploit not only the positive pseudo label information but also the negative ones for target feature extraction enhancement. To verify its effectiveness, we design twelve source-free cross-corpus SER tasks and conduct extensive experiments on CASIA, EmoDB, EMOVO and eNTERFACE. Experimental results indicate CaHTN obtains state-of-the-art performance in addressing source-free cross-corpus SER.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jincen Wang , Yan Zhao , Cheng Lu , Hailun Lian , Hongli Chang , Yuan Zong , Wenming Zheng

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Application Areas > Domain Adaptation

Keywords

pseudo label emotion classification speech emotion recognition cross-corpus adaptation hypothesis transfer

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024