2023 INTERSPEECH INTERSPEECH 2023

Capturing Mismatch between Textual and Acoustic Emotion Expressions for Mood Identification in Bipolar Disorder

Abstract

Emotion is a complex behavioral phenomenon, which is expressed and perceived through various modalities, such as language, vocal and facial expressions. Psychiatric research has suggested that the lack of emotional alignment between modalities is a symptom of emotion disorders. In this work, we quantify the mismatch between emotion expressed through language and acoustics, which we refer to as Emotional MisMatch (EMM), as an intermediate step for mood identification. We use a longitudinal dataset collected from people with Bipolar Disorder (BP) and show that symptomatic mood episodes show significantly more EMM, compared to euthymic moods. We propose a fully automatic mood identification pipeline with automatic speech transcription, emotion recognition, and EMM feature extraction. We find that EMM features, although smaller in size, outperform a language-based baseline, and consistently provide improvement when combined with language and/or raw emotion features on mood classification.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio