Learning Token-Based Representation for Image Retrieval

Hui Wu; Min Wang; Wengang Zhou; Yang Hu; Houqiang Li

2022 AAAI AAAI 2022

Learning Token-Based Representation for Image Retrieval

Abstract

Abstract In image retrieval, deep local features learned in a data-driven manner have been demonstrated effective to improve retrieval performance. To realize efficient retrieval on large image database, some approaches quantize deep local features with a large codebook and match images with aggregated match kernel. However, the complexity of these approaches is non-trivial with large memory footprint, which limits their capability to jointly perform feature learning and aggregation. To generate compact global representations while maintaining regional matching capability, we propose a unified framework to jointly learn local feature representation and aggregation. In our framework, we first extract local features using CNNs. Then, we design a tokenizer module to aggregate them into a few visual tokens, each corresponding to a specific visual pattern. This helps to remove background noise, and capture more discriminative regions in the image. Next, a refinement block is introduced to enhance the visual tokens with self-attention and cross-attention. Finally, different visual tokens are concatenated to generate a compact global representation. The whole framework is trained end-to-end with image-level labels. Extensive experiments are conducted to evaluate our approach, which outperforms the state-of-the-art methods on the Revisited Oxford and Paris datasets.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — tokenizer module

🐣 Hot Topic Early Bird — visual token

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hui Wu , Min Wang , Wengang Zhou , Yang Hu , Houqiang Li

Topics

Machine Learning > Core Methods > Representation Learning Computer Vision > Analysis > Scene Understanding Deep Learning > Techniques > Attention Deep Learning > Techniques > Representation Learning Computer Vision > Processing > Image Retrieval

Keywords

representation learning image retrieval convolutional neural network global representation visual token local feature tokenizer module

Download PDF

Related papers

Dynamic Spatial Propagation Network for Depth Completion 2022

FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition 2022

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding 2022

AnchorFace: Boosting TAR@FAR for Practical Face Recognition 2022

Parallel and High-Fidelity Text-to-Lip Generation 2022