Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition

Hanyang Wang; Bo Li; Shuang Wu; Siyuan Shen; Feng Liu; Shouhong Ding; Aimin Zhou

2023 CVPR CVPR 2023

Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition

Abstract

Dynamic Facial Expression Recognition (DFER) is a rapidly developing field that focuses on recognizing facial expressions in video format. Previous research has considered non-target frames as noisy frames, but we propose that it should be treated as a weakly supervised problem. We also identify the imbalance of short- and long-term temporal relationships in DFER. Therefore, we introduce the Multi-3D Dynamic Facial Expression Learning (M3DFEL) framework, which utilizes Multi-Instance Learning (MIL) to handle inexact labels. M3DFEL generates 3D-instances to model the strong short-term temporal relationship and utilizes 3DCNNs for feature extraction. The Dynamic Long-term Instance Aggregation Module (DLIAM) is then utilized to learn the long-term temporal relationships and dynamically aggregate the instances. Our experiments on DFEW and FERV39K datasets show that M3DFEL outperforms existing state-of-the-art approaches with a vanilla R3D18 backbone. The source code is available at https://github.com/faceeyes/M3DFEL.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — dynamic aggregation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hanyang Wang , Bo Li , Shuang Wu , Siyuan Shen , Feng Liu , Shouhong Ding , Aimin Zhou

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Deep Learning > Architectures > Neural Networks Computer Vision > Analysis > Action Recognition Computer Vision > Processing > Video Processing Computer Vision > Analysis > Video Understanding Deep Learning > Learning Types > Weakly Supervised Learning

Keywords

multi-instance learning video recognition video classification temporal relationship dynamic aggregation facial expression recognition 3d convolutional network 3d convolutional neural network dynamic facial expression recognition

Download PDF

Related papers

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching 2023

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 2023

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata 2023