Simple and Efficient Weighted Minwise Hashing

Anshumali Shrivastava

2016 NIPS NeurIPS 2016

Simple and Efficient Weighted Minwise Hashing

Abstract

Weighted minwise hashing (WMH) is one of the fundamental subroutine, required by many celebrated approximation algorithms, commonly adopted in industrial practice for large -scale search and learning. The resource bottleneck with WMH is the computation of multiple (typically a few hundreds to thousands) independent hashes of the data. We propose a simple rejection type sampling scheme based on a carefully designed red-green map, where we show that the number of rejected sample has exactly the same distribution as weighted minwise sampling. The running time of our method, for many practical datasets, is an order of magnitude smaller than existing methods. Experimental evaluations, on real datasets, show that for computing 500 WMH, our proposal can be 60000x faster than the Ioffe's method without losing any accuracy. Our method is also around 100x faster than approximate heuristics capitalizing on the efficient ``densified" one permutation hashing schemes~\cite{Proc:OneHashLSHICML14,Proc:ShrivastavaUAI14}. Given the simplicity of our approach and its significant advantages, we hope that it will replace existing implementations in practice.

🌉 Interdisciplinary Bridge — Computer Science and Machine Learning and Mathematics & Optimization

📈 Trend Setter — Approximation Algorithms

🧭 Keyword Pioneer — weighted minwise hashing

🐣 Hot Topic Early Bird — approximation algorithm

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Anshumali Shrivastava

Topics

Machine Learning > Application Areas > Efficient Computing Mathematics & Optimization > Optimization > Stochastic Methods Computer Science > Foundations > Algorithms Computer Science > Applications > Information Retrieval Machine Learning > Core Methods > Feature Selection Mathematics & Optimization > Optimization > Approximation Algorithms

Keywords

Download PDF

Related papers

Bayesian Intermittent Demand Forecasting for Large Inventories 2016

Dynamic Network Surgery for Efficient DNNs 2016

Beyond Exchangeability: The Chinese Voting Process 2016

Safe and Efficient Off-Policy Reinforcement Learning 2016

Tagger: Deep Unsupervised Perceptual Grouping 2016