Video Multimodal Emotion Recognition System for Real World Applications

Sun-Kyung Lee; Jong-Hwan Kim

2023 INTERSPEECH INTERSPEECH 2023

Video Multimodal Emotion Recognition System for Real World Applications

Abstract

This paper proposes a system capable of recognizing a speaker's utterance-level emotion through multimodal cues in a video. The system seamlessly integrates multiple AI models to first extract and pre-process multimodal information from the raw video input. Next, an end-to-end MER model sequentially predicts the speaker's emotions at the utterance level. Additionally, users can interactively demonstrate the system through the implemented interface.

🧭 Keyword Pioneer — utterance-level emotion

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio