2026 AAAI AAAI 2026

Dep-MAP: A Multi-level Alignment Framework with Semantic Prototypes for Video-based Automatic Depression Assessment

Abstract

Abstract Spatiotemporal analysis of facial behavior is a crucial method for evaluating the mental state of depression patients. However, in practice, depressed patients often display facial behaviors similar to healthy individuals due to masking tendencies. Additionally, facial expressions among depressed patients are also different, increasing the difficulty of assessment. To address this, we propose a video-based automatic depression assessment model Dep-MAP for complex facial behaviors of depression patients. Dep-MAP adopts a dual-branch architecture to extract visual features of facial behavior and capture corresponding emotional semantic features. Specifically, the extracted deep semantic features are clustered, resulting in semantically distinct prototype sets, where each severity group learns a set of discriminative facial behavior prototype representations, to suppress inter-class semantic confusion. Subsequently, we propose a semantic prototype-supervised contrastive learning method, which aligns latent semantics between shallow and deep features, realizing emotional semantic guidance and self-knowledge distillation for the visual feature branch, effectively suppressing intra-class difference. Then, we integrate key depression cues across multiple spatiotemporal scales via a multi-scale weighted fusion strategy, achieving automatic depression assessment. Experimental results demonstrate that Dep-MAP effectively identifies potential key frames in temporal sequences, and aggregates key frame representations with semantic consistency, achieving significantly superior state-of-the-art results on the AVEC2013 and AVEC2014 public datasets.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio