Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance

Ulysse Marteau-Ferey; Dmitrii Ostrovskii; Francis Bach; Alessandro Rudi

2019 COLT COLT 2019

Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance

Abstract

We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and non-linear predictors through positive-definite kernels. In order to go beyond the generic analysis leading to convergence rates of the excess risk as $O(1/\sqrt{n})$ from $n$ observations, we assume that the individual losses are self-concordant, that is, their third-order derivatives are bounded by their second-order derivatives. This setting includes least-squares, as well as all generalized linear models such as logistic and softmax regression. For this class of losses, we provide a bias-variance decomposition and show that the assumptions commonly made in least-squares regression, such as the source and capacity conditions, can be adapted to obtain fast non-asymptotic rates of convergence by improving the bias terms, the variance terms or both.

🐣 Hot Topic Early Bird — generalized linear model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ulysse Marteau-Ferey , Dmitrii Ostrovskii , Francis Bach , Alessandro Rudi

Topics

Machine Learning > Core Methods > Regression Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Statistical Learning

Keywords

generalized linear model bias-variance decomposition regularized empirical risk minimization fast convergence rate kernel methods

Download PDF

Related papers

Inference under Information Constraints: Lower Bounds from Chi-Square Contraction 2019

Learning in Non-convex Games with an Optimization Oracle 2019

Learning to Prune: Speeding up Repeated Computations 2019

A Universal Algorithm for Variational Inequalities Adaptive to Smoothness and Noise 2019

Learning Two Layer Rectified Neural Networks in Polynomial Time 2019