Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings

Aviral Kumar; Sunita Sarawagi; Ujjwal Jain

2018 ICML ICML 2018

Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings

Abstract

Modern neural networks have recently been found to be poorly calibrated, primarily in the direction of over-confidence. Methods like entropy penalty and temperature smoothing improve calibration by clamping confidence, but in doing so compromise the many legitimately confident predictions. We propose a more principled fix that minimizes an explicit calibration error during training. We present MMCE, a RKHS kernel based measure of calibration that is efficiently trainable alongside the negative likelihood loss without careful hyper-parameter tuning. Theoretically too, MMCE is a sound measure of calibration that is minimized at perfect calibration, and whose finite sample estimates are consistent and enjoy fast convergence rates. Extensive experiments on several network architectures demonstrate that MMCE is a fast, stable, and accurate method to minimize calibration error while maximally preserving the number of high confidence predictions.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — calibration error

🐣 Hot Topic Early Bird — confidence calibration

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Aviral Kumar , Sunita Sarawagi , Ujjwal Jain

Topics

Machine Learning > Optimization & Theory > Loss Functions Deep Learning > Architectures > Neural Networks Deep Learning > Learning Types > Deep Learning Machine Learning > Learning Types > Uncertainty Quantification

Keywords

confidence calibration kernel mean embedding temperature scaling calibration error platt scaling kernel methods neural network

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018