A Boo(n) for Evaluating Architecture Performance

Ondrej Bajgar; Rudolf Kadlec; Jan Kleindienst

2018 ICML ICML 2018

A Boo(n) for Evaluating Architecture Performance

Abstract

We point out important problems with the common practice of using the best single model performance for comparing deep learning architectures, and we propose a method that corrects these flaws. Each time a model is trained, one gets a different result due to random factors in the training process, which include random parameter initialization and random data shuffling. Reporting the best single model performance does not appropriately address this stochasticity. We propose a normalized expected best-out-of-$n$ performance ($\text{Boo}_n$) as a way to correct these problems.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

📈 Trend Setter — Evaluation

🧭 Keyword Pioneer — model performance

🐣 Hot Topic Early Bird — model selection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ondrej Bajgar , Rudolf Kadlec , Jan Kleindienst

Topics

Machine Learning > Optimization & Theory > Theory Deep Learning > Techniques > Model Architecture Machine Learning > Optimization & Theory > Evaluation Machine Learning > Learning Types > Evaluation Machine Learning > Core Methods > Evaluation Deep Learning > Optimization & Theory > Evaluation

Keywords

model selection hyperparameter optimization deep learning model performance architecture evaluation architecture performance expected best-out-of n

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018