Optimal learners for multiclass problems

Amit Daniely; Shai Shalev-shwartz

2014 COLT COLT 2014

Optimal learners for multiclass problems

Abstract

The fundamental theorem of statistical learning states that for \emphbinary classification problems, any Empirical Risk Minimization (ERM) learning rule has close to optimal sample complexity. In this paper we seek for a generic optimal learner for \emphmulticlass prediction. We start by proving a surprising result: a generic optimal multiclass learner must be \emphimproper, namely, it must have the ability to output hypotheses which do not belong to the hypothesis class, even though it knows that all the labels are generated by some hypothesis from the class. In particular, no ERM learner is optimal. This brings back the fundamental question of “how to learn”? We give a complete answer to this question by giving a new analysis of the one-inclusion multiclass learner of Rubinstein et el (2006) showing that its sample complexity is essentially optimal. Then, we turn to study the popular hypothesis class of generalized linear classifiers. We derive optimal learners that, unlike the one-inclusion algorithm, are computationally efficient. Furthermore, we show that the sample complexity of these learners is better than the sample complexity of the ERM rule, thus settling in negative an open question due to Collins (2005)

🧭 Keyword Pioneer — improper learning

🐣 Hot Topic Early Bird — empirical risk minimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Amit Daniely , Shai Shalev-shwartz

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Statistical Learning

Keywords

sample complexity empirical risk minimization multiclass classification hypothesis class generalized linear model improper learning

Download PDF

Related papers

Open Problem: Shifting Experts on Easy Data 2014

Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms 2014

Sample Complexity Bounds on Differentially Private Learning via Communication Complexity 2014

Principal Component Analysis and Higher Correlations for Distributed Data 2014

Compressed Counting Meets Compressed Sensing 2014