Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

Ehsan Amid; Manfred K. Warmuth; Rohan Anil; Tomer Koren

2019 NIPS NeurIPS 2019

Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

Abstract

We introduce a temperature into the exponential function and replace the softmax output layer of the neural networks by a high-temperature generalization. Similarly, the logarithm in the loss we use for training is replaced by a low-temperature logarithm. By tuning the two temperatures, we create loss functions that are non-convex already in the single layer case. When replacing the last layer of the neural networks by our bi-temperature generalization of the logistic loss, the training becomes more robust to noise. We visualize the effect of tuning the two temperatures in a simple setting and show the efficacy of our method on large datasets. Our methodology is based on Bregman divergences and is superior to a related two-temperature method that uses the Tsallis divergence.

🧭 Keyword Pioneer — robustness to noise

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Ehsan Amid , Manfred K. Warmuth , Rohan Anil , Tomer Koren

Topics

Machine Learning > Optimization & Theory > Loss Functions Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Theory

Keywords

neural network training bregman divergence logistic loss temperature parameter robustness to noise

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019