On the Accuracy of Self-Normalized Log-Linear Models

Jacob Andreas; Maxim Rabinovich; Dan Klein; Michael I. Jordan; Michael I Jordan

2015 NIPS NeurIPS 2015

On the Accuracy of Self-Normalized Log-Linear Models

Abstract

Calculation of the log-normalizer is a major computational obstacle in applications of log-linear models with large output spaces. The problem of fast normalizer computation has therefore attracted significant attention in the theoretical and applied machine learning literature. In this paper, we analyze a recently proposed technique known as ``self-normalization'', which introduces a regularization term in training to penalize log normalizers for deviating from zero. This makes it possible to use unnormalized model scores as approximate probabilities. Empirical evidence suggests that self-normalization is extremely effective, but a theoretical understanding of why it should work, and how generally it can be applied, is largely lacking.We prove upper bounds on the loss in accuracy due to self-normalization, describe classes of input distributionsthat self-normalize easily, and construct explicit examples of high-variance input distributions. Our theoretical results make predictions about the difficulty of fitting self-normalized models to several classes of distributions, and we conclude with empirical validation of these predictions on both real and synthetic datasets.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — large output space

🐣 Hot Topic Early Bird — theoretical analysis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jacob Andreas , Maxim Rabinovich , Michael I. Jordan , Michael I Jordan , Dan Klein

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Loss Functions Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Optimization & Theory > Theory Mathematics & Optimization > Mathematics > Probability Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling Machine Learning > Core Methods > Optimization

Keywords

theoretical analysis probability estimation probabilistic model log-linear model upper bound regularization term probabilistic classification large output space log normalizer

Download PDF

Related papers

Data Generation as Sequential Decision Making 2015

A Recurrent Latent Variable Model for Sequential Data 2015

Combinatorial Cascading Bandits 2015

Accelerated Mirror Descent in Continuous and Discrete Time 2015

Matrix Completion with Noisy Side Information 2015