WDMIR: Wavelet-Driven Multimodal Intent Recognition

Weiyin Gong; Kai Zhang; Yanghai Zhang; Qi Liu; Xinjie Sun; Junyu Lu; Linbo Zhu

2025 IJCAI IJCAI 2025

WDMIR: Wavelet-Driven Multimodal Intent Recognition

Abstract

Multimodal intent recognition (MIR) seeks to accurately interpret user intentions by integrating verbal and non-verbal information across video, audio and text modalities. While existing approaches prioritize text analysis, they often overlook the rich semantic content embedded in non-verbal cues. This paper presents a novel Wavelet-Driven Multimodal Intent Recognition (WDMIR) framework that enhances intent understanding through frequency-domain analysis of non-verbal information. To be more specific, we propose: (1) a wavelet-driven fusion module that performs synchronized decomposition and integration of video-audio features in the frequency domain, enabling fine-grained analysis of temporal dynamics; (2) a cross-modal interaction mechanism that facilitates progressive feature enhancement from bimodal to trimodal integration, effectively bridging the semantic gap between verbal and non-verbal information. Extensive experiments on MIntRec demonstrate that our approach achieves state-of-the-art performance, surpassing previous methods by 1.13% on accuracy. Ablation studies further verify that the wavelet-driven fusion module significantly improves the extraction of semantic information from non-verbal sources, with a 0.41% increase in recognition accuracy when analyzing subtle emotional cues.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — wavelet-driven fusion

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Weiyin Gong , Kai Zhang , Yanghai Zhang , Qi Liu , Xinjie Sun , Junyu Lu , Linbo Zhu

Topics

Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Applications > Intent Classification

Keywords

temporal dynamics frequency-domain analysis cross-modal interaction multimodal intent recognition wavelet-driven fusion

Download PDF

Related papers

Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain 2025

Responsibility Anticipation and Attribution in LTLf 2025

Argument-based Multi-Issue Negotiation 2025

Online Resource Sharing: Better Robust Guarantees via Randomized Strategies 2025

Equitable Mechanism Design for Facility Location 2025