2006 NIPS NeurIPS 2006

Modeling Dyadic Data with Binary Latent Factors

Abstract

We introduce binary matrix factorization, a novel model for unsupervised ma- trix decomposition. The decomposition is learned by fitting a non-parametric Bayesian probabilistic model with binary latent variables to a matrix of dyadic data. Unlike bi-clustering models, which assign each row or column to a single cluster based on a categorical hidden feature, our binary feature model reflects the prior belief that items and attributes can be associated with more than one latent cluster at a time. We provide simple learning and inference rules for this new model and show how to extend it to an infinite model in which the number of features is not a priori fixed but is allowed to grow with the size of the data. 1 Distributed representations for dyadic data One of the major goals of probabilistic unsupervised learning is to discover underlying or hidden structure in a dataset by using latent variables to describe a complex data generation process. In this paper we focus on dyadic data: our domains have two finite sets of objects/entities and observa- tions are made on dyads (pairs with one element from each set). Examples include sparse matrices of movie-viewer ratings, word-document counts or product-customer purchases. A simple way to capture structure in this kind of data is to do “bi-clustering” (possibly using mixture models) by grouping the rows and (independently or simultaneously) the columns[6, 13, 9]. The modelling as- sumption in such a case is that movies come in types and viewers in types and that knowing componential structure: each item (row) has associated with it an unobserved vector of binary features; similarly each attribute (column) has a hidden vector of binary features. Knowing the matrixX into (a distribution defined by) the productUWV> , whereU andV are binary feature matrices, andW is a real-valued weight matrix. Below, we develop this binary matrix factorization the type of movie and type of viewer is sufficient to predict the response. Clustering

🚀 Conference Pioneer — NIPS 2006
🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning
📈 Trend Setter — Variational Inference
🧭 Keyword Pioneer — dyadic data
🐣 Hot Topic Early Bird — unsupervised learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio