2025 IJCAI IJCAI 2025

DUQ: Dual Uncertainty Quantification for Text-Video Retrieval

Abstract

Text-video retrieval establishes accurate similarity relationships between text and video through feature enhancement and granularity alignment. However, relying solely on similarity to associate intra-pair features and distinguish inter-pair features is insufficient, \textit{e.g.}, when querying a multi-scene video with sparse text or selecting the most relevant video from many similar candidates. In this paper, we propose a novel Dual Uncertainty Quantification (DUQ) model that separately handles uncertainties in intra-pair interaction and inter-pair exclusion. Specifically, to enhance intra-pair interaction, we propose an intra-pair similarity uncertainty module to provide similarity-based trustworthy predictions and explicitly model this uncertainty. To increase inter-pair exclusion, we propose an inter-pair distance uncertainty module to construct a distance-based diversity probability embeding, thereby widening the gap between similar features. The two components work synergistically, jointly improving the calculation of similarity between features. We evaluate our model on six benchmark datasets: MSRVTT (51.2%), DiDeMo, MSVD, LSMDC, Charades, and VATEX, achieving state-of-the-art retrieval performance.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision
🧭 Keyword Pioneer — intra-pair interaction
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio