Improving Hashing Algorithms for Similarity Search via MLE and the Control Variates Trick

Keegan Kang; Sergey Kushnarev; Wei Pin Wong; Rameshwar Pratap; Haikal Yeo; Chen Yijia

2021 ACML ACML 2021

Improving Hashing Algorithms for Similarity Search via MLE and the Control Variates Trick

Abstract

Hashing algorithms are continually used for large-scale learning and similarity search, with computationally cheap and better algorithms being proposed every year. In this paper we focus on hashing algorithms which involve estimating a distance measure $d(\vec{x}_i,\vec{x}_j)$ between two vectors $\vec{x}_i, \vec{x}_j$. Such hashing algorithms require generation of random variables, and we propose two approaches to reduce the variance of our hashed estimates: control variates and maximum likelihood estimates. We explain how these approaches can be immediately applied to a wide subset of hashing algorithms. Further, we evaluate the impact of these methods on various datasets. We finally run empirical simulations to verify our results.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Keegan Kang , Sergey Kushnarev , Wei Pin Wong , Rameshwar Pratap , Haikal Yeo , Chen Yijia

Topics

Machine Learning > Core Methods > Embedding Learning Machine Learning > Optimization & Theory > Stochastic Methods

Keywords

Download PDF

Related papers

Transfer Learning with Adaptive Online TrAdaBoost for Data Streams 2021

$h$-DBSCAN: A simple fast DBSCAN algorithm for big data 2021

Iterative Deep Model Compression and Acceleration in the Frequency Domain 2021

Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations 2021

Contrastive Neural Processes for Self-Supervised Learning 2021