Learning Representations by Predicting Bags of Visual Words

Spyros Gidaris; Andrei Bursuc; Nikos Komodakis; Patrick Pérez; Matthieu Cord

2020 CVPR CVPR 2020

Learning Representations by Predicting Bags of Visual Words

Abstract

Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data. Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions that encode discrete visual concepts, here called visual words. To build such discrete representations, we quantize the feature maps of a first pre-trained self-supervised convnet, over a k-means based vocabulary. Then, as a self-supervised task, we train another convnet to predict the histogram of visual words of an image (i.e., its Bag-of-Words representation) given as input a perturbed version of that image. The proposed task forces the convnet to learn perturbation-invariant and context-aware image features, useful for downstream image understanding tasks. We extensively evaluate our method and demonstrate very strong empirical results, e.g., our pre-trained self-supervised representations transfer better on detection task and similarly on classification over classes "unseen" during pre-training, when compared to the supervised case. This also shows that the process of image discretization into visual words can provide the basis for very powerful self-supervised approaches in the image domain, thus allowing further connections to be made to related methods from the NLP domain that have been extremely successful so far.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Spyros Gidaris , Andrei Bursuc , Nikos Komodakis , Patrick Pérez , Matthieu Cord

Topics

Machine Learning > Learning Types > Self-Supervised Learning Computer Vision > Analysis > Object Detection Deep Learning > Learning Types > Self-Supervised Learning Computer Vision > Core AI > Computer Vision Deep Learning > Learning Types > Representation Learning

Keywords

representation learning image classification feature learning object detection self-supervised learning visual word

Download PDF

Related papers

Deep Polarization Cues for Transparent Object Segmentation 2020

HRank: Filter Pruning Using High-Rank Feature Map 2020

Panoptic-Based Image Synthesis 2020

Select, Supplement and Focus for RGB-D Saliency Detection 2020

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings 2020