Randomized partition trees for exact nearest neighbor search

Sanjoy Dasgupta; Kaushik Sinha

2013 COLT COLT 2013

Randomized partition trees for exact nearest neighbor search

Abstract

The k-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in two situations of interest: the first, when data come from a doubling measure, and the second, when the data are documents from a topic model.

🧭 Keyword Pioneer — spatial data structure

🐝 Cross-Pollinator — Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Computer Science and Data Science & Analytics and Machine Learning

🐣 Hot Topic Early Bird — nearest neighbor search

Authors

Sanjoy Dasgupta , Kaushik Sinha

Topics

Computer Science > Foundations > Algorithms Computer Science > Foundations > Data Structures Machine Learning > Core Methods > Dimensionality Reduction Machine Learning > Optimization & Theory > Statistics Data Science & Analytics > Applications > Information Retrieval

Keywords

nearest neighbor search k-d tree randomized algorithm spatial data structure doubling measure

Download PDF

Related papers

A Tensor Spectral Approach to Learning Mixed Membership Community Models 2013

Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem 2013

Boosting with the Logistic Loss is Consistent 2013

Online Learning with Predictable Sequences 2013

Recovering the Optimal Solution by Dual Random Projection 2013