Exploiting Fine-tuning of Self-supervised Learning Models for Improving Bi-modal Sentiment Analysis and Emotion Recognition

Wei Yang; Satoru Fukayama; Panikos Heracleous; Jun Ogata

2022 INTERSPEECH INTERSPEECH 2022

Exploiting Fine-tuning of Self-supervised Learning Models for Improving Bi-modal Sentiment Analysis and Emotion Recognition

Abstract

Speech-based multimodal affective computing has recently attracted significant research attention. Previous experimental results have shown that the audio-only approach exhibits inferior performance than the text-only approach in sentiment analysis and emotion recognition tasks. In this paper, we propose a new strategy to improve the performance of uni-modal and bi-modal affective computing systems via fine-tuning of two pre-trained self-supervised learning models (Text-RoBERTa and Speech-RoBERTa). We fine-tune the models on sentiment analysis and emotion recognition tasks using a shallow architecture, and apply crossmodal attention fusion to the models for further learning and final prediction or classification. We evaluate our proposed method on the CMU-MOSI, CMU-MOSEI and IEMOCAP datasets. The experimental results demonstrate that our approach exhibits superior performance for all benchmarks compared to existing state-of-the-art results, establishing the effectiveness of the proposed method.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wei Yang , Satoru Fukayama , Panikos Heracleous , Jun Ogata

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Learning Types > Self-Supervised Learning Natural Language Processing > Understanding > Sentiment Analysis

Keywords

sentiment analysis self-supervised learning multimodal learning emotion recognition crossmodal attention

Download PDF

Related papers

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis 2022

Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset 2022

Evidence of Onset and Sustained Neural Responses to Isolated Phonemes from Intracranial Recordings in a Voice-based Cursor Control Task 2022

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications 2022

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction 2022