Web Scale Photo Hash Clustering on A Single Machine

Yunchao Gong; Marcin Pawlowski; Fei Yang; Louis Brandy; Lubomir Bourdev; Rob Fergus

2015 CVPR CVPR 2015

Web Scale Photo Hash Clustering on A Single Machine

Abstract

This paper addresses the problem of clustering a very large number of photos (i.e. hundreds of millions a day) in a stream into millions of clusters. This is particularly important as the popularity of photo sharing websites, such as Facebook, Google, and Instagram. Given large number of photos available online, how to efficiently organize them is an open problem. To address this problem, we propose to cluster the binary hash codes of a large number of photos into binary cluster centers. We present a fast binary k-means algorithm that works directly on the similarity-preserving hashes of images and clusters them into binary centers on which we can build hash indexes to speedup computation. The proposed method is capable of clustering millions of photos on a single machine in a few minutes. We show that this approach is usually several magnitude faster than standard k-means and produces comparable clustering accuracy. In addition, we propose an online clustering method based on binary k-means that is capable of clustering large photo stream on a single machine, and show applications to spam detection and trending photo discovery.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning

🧭 Keyword Pioneer — hash indexing

🐣 Hot Topic Early Bird — k-means clustering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yunchao Gong , Marcin Pawlowski , Fei Yang , Louis Brandy , Lubomir Bourdev , Rob Fergus

Topics

Machine Learning > Core Methods > Clustering Data Science & Analytics > Applications > Clustering

Keywords

online clustering k-means clustering image retrieval binary hashing hash indexing photo clustering

Download PDF

Related papers

Long-Term Correlation Tracking 2015

Hierarchically-Constrained Optical Flow 2015

Propagated Image Filtering 2015

Expanding Object Detector's Horizon: Incremental Learning Framework for Object Detection in Videos 2015

Supervised Discrete Hashing 2015