2017 INTERSPEECH INTERSPEECH 2017

Speaker Dependency Analysis, Audiovisual Fusion Cues and a Multimodal BLSTM for Conversational Engagement Recognition

Abstract

Conversational engagement is a multimodal phenomenon and an essential cue to assess both human-human and human-robot communication. Speaker-dependent and speaker-independent scenarios were addressed in our engagement study. Handcrafted audio-visual features were used. Fixed window sizes for feature fusion method were analysed. Novel dynamic window size selection and multimodal bi-directional long short term memory (Multimodal BLSTM) approaches were proposed and evaluated for engagement level recognition.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning
🧭 Keyword Pioneer — speaker dependency
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Natural Language Processing, Robotics