Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models

Ricardo Kleinlein; Cristina Luna Jiménez; Juan Manuel Montero; Zoraida Callejas; Fernando Fernández-Martínez

2019 INTERSPEECH INTERSPEECH 2019

Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models

Abstract

Electrodermal activity (EDA) is a psychophysiological indicator that can be considered a somatic marker of the emotional and attentional reaction of subjects towards stimuli like audiovisual content. EDA measurements are not biased by the cognitive process of giving an opinion or a score to characterize the subjective perception, and group-level EDA recordings integrate the reaction of an audience, thus reducing the signal noise. This paper contributes to the field of audience’s attention prediction to video content, extending previous novel work on the use of EDA as ground truth for prediction algorithms. Videos are segmented into shorter clips attending to the audience’s increasing or decreasing attention, and we process videos’ audio waveform to extract meaningful aural embeddings from a VGGish model pretrained on the Audioset database. While previous similar work on attention level prediction using only audio accomplished 69.83% accuracy, we propose a Mixture of Experts approach to train a binary classifier that outperforms the main existing state-of-the-art approaches predicting increasing and decreasing attention levels with 81.76% accuracy. These results confirm the usefulness of providing acoustic features with a semantic significance, and the convenience of considering experts over partitions of the dataset in order to predict group-level attention from audio.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — electrodermal activity

🐣 Hot Topic Early Bird — mixture of expert

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Ricardo Kleinlein , Cristina Luna Jiménez , Juan Manuel Montero , Zoraida Callejas , Fernando Fernández-Martínez

Topics

Machine Learning > Core Methods > Classification Deep Learning > Architectures > Neural Networks Machine Learning > Learning Types > Multi-Task Learning Speech & Audio > Analysis > Speech Analysis Machine Learning > Learning Types > Deep Learning

Keywords

binary classification speech analysis audio classification mixture of expert audio embedding electrodermal activity attention prediction vggish model

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019