Learning to Learn without Gradient Descent by Gradient Descent

Yutian Chen; Matthew W. Hoffman; Sergio Gómez Colmenarejo; Misha Denil; Timothy P. Lillicrap; Matt Botvinick; Nando Freitas

2017 ICML ICML 2017

Learning to Learn without Gradient Descent by Gradient Descent

Abstract

We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

📈 Trend Setter — Meta-Learning

🐣 Hot Topic Early Bird — gradient descent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Yutian Chen , Matthew W. Hoffman , Sergio Gómez Colmenarejo , Misha Denil , Timothy P. Lillicrap , Matt Botvinick , Nando Freitas

Topics

Artificial Intelligence > Learning Paradigms > Meta-Learning Machine Learning > Optimization & Theory > Neural Network Optimization

Keywords

black-box optimization gradient descent bayesian optimization hyperparameter tuning neural network optimizer

Download PDF

Related papers

Bottleneck Conditional Density Estimation 2017

Constrained Policy Optimization 2017

Near-Optimal Design of Experiments via Regret Minimization 2017

Input Convex Neural Networks 2017

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation 2017