Learning with a Wasserstein Loss

Charlie Frogner; Chiyuan Zhang; Hossein Mobahi; Mauricio Araya; Tomaso A Poggio

2015 NIPS NeurIPS 2015

Learning with a Wasserstein Loss

Abstract

Learning to predict multi-label outputs is challenging, but in many problems there is a natural metric on the outputs that can be used to improve predictions. In this paper we develop a loss function for multi-label learning, based on the Wasserstein distance. The Wasserstein distance provides a natural notion of dissimilarity for probability measures. Although optimizing with respect to the exact Wasserstein distance is costly, recent work has described a regularized approximation that is efficiently computed. We describe an efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures. We also describe a statistical learning bound for the loss. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data tag prediction problem, using the Yahoo Flickr Creative Commons dataset, outperforming a baseline that doesn't use the metric.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

📈 Trend Setter — Optimal Transport

🧭 Keyword Pioneer — tag prediction

🐣 Hot Topic Early Bird — wasserstein distance

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Charlie Frogner , Chiyuan Zhang , Hossein Mobahi , Mauricio Araya , Tomaso A Poggio

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Regression Machine Learning > Core Methods > Metric Learning Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Optimal Transport Machine Learning > Learning Types > Multi-Label Classification

Keywords

wasserstein distance statistical learning multi-label learning optimal transport probability measure loss function tag prediction

Download PDF

Related papers

Data Generation as Sequential Decision Making 2015

A Recurrent Latent Variable Model for Sequential Data 2015

Combinatorial Cascading Bandits 2015

Accelerated Mirror Descent in Continuous and Discrete Time 2015

Matrix Completion with Noisy Side Information 2015