From PAC-Bayes Bounds to KL Regularization

Pascal Germain; Alexandre Lacasse; Mario Marchand; Sara Shanian; François Laviolette

2009 NIPS NeurIPS 2009

From PAC-Bayes Bounds to KL Regularization

Abstract

We show that convex KL-regularized objective functions are obtained from a PAC-Bayes risk bound when using convex loss functions for the stochastic Gibbs classifier that upper-bound the standard zero-one loss used for the weighted majority vote. By restricting ourselves to a class of posteriors, that we call quasi uniform, we propose a simple coordinate descent learning algorithm to minimize the proposed KL-regularized cost function. We show that standard ellp-regularized objective functions currently used, such as ridge regression and ellp-regularized boosting, are obtained from a relaxation of the KL divergence between the quasi uniform posterior and the uniform prior. We present numerical experiments where the proposed learning algorithm generally outperforms ridge regression and AdaBoost.

📈 Trend Setter — Loss Functions

🧭 Keyword Pioneer — ridge regression

🐣 Hot Topic Early Bird — variational inference

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Pascal Germain , Alexandre Lacasse , Mario Marchand , Sara Shanian , François Laviolette

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Loss Functions Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Bayesian & Probabilistic > Bayesian Inference

Keywords

variational inference ridge regression gibbs classifier coordinate descent kl regularization stochastic classifier pac-bayes bound

Download PDF

Related papers

Solving Stochastic Games 2009

Bilinear classifiers for visual recognition 2009

Zero-shot Learning with Semantic Output Codes 2009

Matrix Completion from Power-Law Distributed Samples 2009

Heavy-Tailed Symmetric Stochastic Neighbor Embedding 2009