Proximal Quasi-Newton for Computationally Intensive L1-regularized M-estimators

Kai Zhong; Ian En-Hsu Yen; Inderjit S Dhillon; Pradeep K Ravikumar

2014 NIPS NeurIPS 2014

Proximal Quasi-Newton for Computationally Intensive L1-regularized M-estimators

Abstract

We consider the class of optimization problems arising from computationally intensive L1-regularized M-estimators, where the function or gradient values are very expensive to compute. A particular instance of interest is the L1-regularized MLE for learning Conditional Random Fields (CRFs), which are a popular class of statistical models for varied structured prediction problems such as sequence labeling, alignment, and classification with label taxonomy. L1-regularized MLEs for CRFs are particularly expensive to optimize since computing the gradient values requires an expensive inference step. In this work, we propose the use of a carefully constructed proximal quasi-Newton algorithm for such computationally intensive M-estimation problems, where we employ an aggressive active set selection technique. In a key contribution of the paper, we show that our proximal quasi-Newton algorithm is provably super-linearly convergent, even in the absence of strong convexity, by leveraging a restricted variant of strong convexity. In our experiments, the proposed algorithm converges considerably faster than current state-of-the-art on the problems of sequence labeling and hierarchical classification.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — active set selection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kai Zhong , Ian En-Hsu Yen , Inderjit S Dhillon , Pradeep K Ravikumar

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Machine Learning > Core Methods > Sparse Optimization Deep Learning > Optimization & Theory > Efficient Computing

Keywords

structured prediction l1 regularization quasi-newton method proximal algorithm conditional random field active set selection

Download PDF

Related papers

Information-based learning by agents in unbounded state spaces 2014

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm 2014

Partition-wise Linear Models 2014

Active Regression by Stratification 2014

Cone-Constrained Principal Component Analysis 2014