2018
INTERSPEECH
INTERSPEECH 2018
Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions
Abstract
Head-nods and turn-taking both significantly contribute conversational dynamics in dyadic interactions. Timely prediction and use of these events is quite valuable for dialog management systems in human-robot interaction. In this study, we present an audio-visual prediction framework for the head-nod and turn-taking events that can also be utilized in real-time systems. Prediction systems based on Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) are trained on human-human conversational data. Unimodal and multimodal classification performances of head-nod and turn-taking events are reported over the IEMOCAP dataset.
📈
Trend Setter
— Trajectory Prediction
🧭
Keyword Pioneer
— head nod prediction
🐣
Hot Topic Early Bird
— multimodal learning
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Robotics and Speech & Audio