2016 AUTOML AutoML 2016

TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning

Abstract

As data science becomes more mainstream, there will be an ever-growing demand for data science tools that are more accessible, flexible, and scalable. In response to this demand, automated machine learning (autoML) researchers have begun building systems that automate the process of designing and optimizing machine learning pipelines. In this paper we present TPOT, an open source genetic programming-based autoML system that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification task. We benchmark TPOT on a series of 150 supervised classification tasks and find that it significantly outperforms a basic machine learning analysis in 22 of them, while experiencing minimal degradation in accuracy on 5 of the benchmarks—all without any domain knowledge nor human input. As such, GP-based autoML systems show considerable promise in the autoML domain.

🚀 Conference Pioneer — AUTOML 2016
🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
📈 Trend Setter — Meta-Learning
🧭 Keyword Pioneer — pipeline optimization
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio