Gradient Boosted Decision Trees for High Dimensional Sparse Output

Si Si; Huan Zhang; S. Sathiya Keerthi; Dhruv Mahajan; Inderjit S. Dhillon; Cho-jui Hsieh

2017 ICML ICML 2017

Gradient Boosted Decision Trees for High Dimensional Sparse Output

Abstract

In this paper, we study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse. For example, in multilabel classification, the output space is a $L$-dimensional 0/1 vector, where $L$ is number of labels that can grow to millions and beyond in many modern applications. We show that vanilla GBDT can easily run out of memory or encounter near-forever running time in this regime, and propose a new GBDT variant, GBDT-SPARSE, to resolve this problem by employing $L_0$ regularization. We then discuss in detail how to utilize this sparsity to conduct GBDT training, including splitting the nodes, computing the sparse residual, and predicting in sublinear time. Finally, we apply our algorithm to extreme multilabel classification problems, and show that the proposed GBDT-SPARSE achieves an order of magnitude improvements in model size and prediction time over existing methods, while yielding similar performance.

🧭 Keyword Pioneer — l0 regularization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

📈 Trend Setter — Multi-Label Classification

🐣 Hot Topic Early Bird — gradient boosting

Authors

Si Si , Huan Zhang , S. Sathiya Keerthi , Dhruv Mahajan , Inderjit S. Dhillon , Cho-jui Hsieh

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Optimization Machine Learning > Learning Types > Multi-Label Classification Machine Learning > Optimization & Theory > Sparse Optimization

Keywords

model compression multi-label classification gradient boosting sparse optimization atomic norm decision tree l0 regularization sparse output

Download PDF

Related papers

Bottleneck Conditional Density Estimation 2017

Constrained Policy Optimization 2017

Near-Optimal Design of Experiments via Regret Minimization 2017

Input Convex Neural Networks 2017

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation 2017