Vision-guided Text Mining for Unsupervised Cross-modal Hashing with Community Similarity Quantization

Haozhi Fan; Yuan Cao

2025 AAAI AAAI 2025

Vision-guided Text Mining for Unsupervised Cross-modal Hashing with Community Similarity Quantization

Abstract

Abstract Cross-modal retrieval, as an emerging field within multimedia research, has gained significant attention in recent years. Unsupervised cross-modal hashing methods are attractive due to their ability to capture latent relationships within the data without label supervision and to produce compact hash codes for high search efficiency. However, the text modality exhibits worse representation ability compared with the image modality, leading to weak guidance to construct the joint similarity matrix. Moreover, most unsupervised cross-modal hashing methods are based on pairwise similarities for training, resulting in non-aggregating data distribution in the hash space. In this paper, we propose a novel Vision-guided Text Mining for Unsupervised Cross-modal Hashing via Community Similarity Quantization, termed VTM-UCH. Specifically, we first find the one-to-one correspondence between each word and each vision (image or object) based on the Contrastive Language-Image Pre-training (CLIP) model and compute the text similarities according to the clustering of their corresponding visions. Then, we define the fine-grained object-level image similarities and design the joint similarity matrix based on the text and image similarities. Accordingly, we construct an undirected graph to compute the communities as the pseudo-centers and adjust the pairwise similarities to improve the hash codes distribution. The experimental results on two common datasets verify the accuracy improvements in comparison with state-of-the-art baselines.

🌉 Interdisciplinary Bridge — Computer Science and Deep Learning and Machine Learning

🧭 Keyword Pioneer — community similarity quantization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haozhi Fan , Yuan Cao

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Core Methods > Metric Learning Machine Learning > Learning Types > Unsupervised Learning Computer Science > Applications > Information Retrieval Machine Learning > Learning Types > Multi-Modal Learning Deep Learning > Learning Types > Self-Supervised Learning Deep Learning > Learning Types > Contrastive Learning

Keywords

unsupervised learning contrastive learning metric learning community detection cross-modal hashing contrastive language-image pre-training similarity quantization community similarity quantization joint similarity matrix

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025