Bayesian Partitioning of Large-Scale Distance Data

David Adametz; Volker Roth

2011 NIPS NeurIPS 2011

Bayesian Partitioning of Large-Scale Distance Data

Abstract

A Bayesian approach to partitioning distance matrices is presented. It is inspired by the 'Translation-Invariant Wishart-Dirichlet' process (TIWD) in (Vogt et al., 2010) and shares a number of advantageous properties like the fully probabilistic nature of the inference model, automatic selection of the number of clusters and applicability in semi-supervised settings. In addition, our method (which we call 'fastTIWD') overcomes the main shortcoming of the original TIWD, namely its high computational costs. The fastTIWD reduces the workload in each iteration of a Gibbs sampler from O(n^3) in the TIWD to O(n^2). Our experiments show that this cost reduction does not compromise the quality of the inferred partitions. With this new method it is now possible to 'mine' large relational datasets with a probabilistic model, thereby automatically detecting new and potentially interesting clusters.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning

📈 Trend Setter — Data Mining

🧭 Keyword Pioneer — distance matrices

🐣 Hot Topic Early Bird — unsupervised learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

David Adametz , Volker Roth

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Optimization & Theory > Bayesian Inference Data Science & Analytics > Methods > Data Mining Data Science & Analytics > Applications > Clustering Machine Learning > Bayesian & Probabilistic > Bayesian Inference Machine Learning > Bayesian & Probabilistic > Nonparametric Bayesian

Keywords

bayesian clustering unsupervised learning dirichlet process bayesian nonparametrics relational data gibbs sampling distance matrix distance matrices

Download PDF

Related papers

Co-Training for Domain Adaptation 2011

The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning 2011

Learning to Agglomerate Superpixel Hierarchies 2011

A Reinforcement Learning Theory for Homeostatic Regulation 2011

A Global Structural EM Algorithm for a Model of Cancer Progression 2011