Calibration, Entropy Rates, and Memory in Language Models

Mark Braverman; Xinyi Chen; Sham Kakade; Karthik Narasimhan; Cyril Zhang; Yi Zhang

2020 ICML ICML 2020

Calibration, Entropy Rates, and Memory in Language Models

Abstract

Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that state-of-the-art language models, including LSTMs and Transformers, are miscalibrated: the entropy rates of their generations drift dramatically upward over time. We then provide provable methods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mark Braverman , Xinyi Chen , Sham Kakade , Karthik Narasimhan , Cyril Zhang , Yi Zhang

Topics

Machine Learning > Optimization & Theory > Stochastic Processes Machine Learning > Optimization & Theory > Theory Natural Language Processing > Generation > Language Modeling

Keywords

language model sequence model long-term dependency entropy rate

Download PDF

Related papers

Correlation Clustering with Asymmetric Classification Errors 2020

Learning Portable Representations for High-Level Planning 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020

Minimax Pareto Fairness: A Multi Objective Perspective 2020

DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training 2020