2023
CVPR
CVPR 2023
Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding
Abstract
A phrase grounding model receives an input image and a text phrase and outputs a suitable localization map. We present an effective way to refine a phrase ground model by considering self-similarity maps extracted from the latent representation of the model's image encoder. Our main insights are that these maps resemble localization maps and that by combining such maps, one can obtain useful pseudo-labels for performing self-training. Our results surpass, by a large margin, the state-of-the-art in weakly supervised phrase grounding. A similar gap in performance is obtained for a recently proposed downstream task called WWbL, in which the input image is given without any text. Our code is available as supplementary.
🌉
Interdisciplinary Bridge
— Computer Vision and Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— similarity map
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Learning Types > Self-Supervised Learning
Machine Learning > Learning Types > Weakly Supervised Learning
Natural Language Processing > Applications > Information Extraction
Deep Learning > Learning Types > Self-Supervised Learning
Machine Learning > Learning Paradigms > Weakly Supervised Learning
Computer Vision > Analysis > Visual Question Answering