Same-Cluster Querying for Overlapping Clusters

Wasim Huleihel; Arya Mazumdar; Muriel Medard; Soumyabrata Pal

2019 NIPS NeurIPS 2019

Same-Cluster Querying for Overlapping Clusters

Abstract

Overlapping clusters are common in models of many practical data-segmentation applications. Suppose we are given $n$ elements to be clustered into $k$ possibly overlapping clusters, and an oracle that can interactively answer queries of the form ``do elements $u$ and $v$ belong to the same cluster?'' The goal is to recover the clusters with minimum number of such queries. This problem has been of recent interest for the case of disjoint clusters. In this paper, we look at the more practical scenario of overlapping clusters, and provide upper bounds (with algorithms) on the sufficient number of queries. We provide algorithmic results under both arbitrary (worst-case) and statistical modeling assumptions. Our algorithms are parameter free, efficient, and work in the presence of random noise. We also derive information-theoretic lower bounds on the number of queries needed, proving that our algorithms are order optimal. Finally, we test our algorithms over both synthetic and real-world data, showing their practicality and effectiveness.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning

🐣 Hot Topic Early Bird — query complexity

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wasim Huleihel , Arya Mazumdar , Muriel Medard , Soumyabrata Pal

Topics

Machine Learning > Core Methods > Clustering Data Science & Analytics > Applications > Clustering Machine Learning > Learning Paradigms > Active Learning

Keywords

information theory query complexity information-theoretic bound oracle query cluster recovery overlapping clustering active querying overlapping cluster

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019