A Light Touch for Heavily Constrained SGD

Andrew Cotter; Maya Gupta; Jan Pfeifer

2016 COLT COLT 2016

A Light Touch for Heavily Constrained SGD

Abstract

Minimizing empirical risk subject to a set of constraints can be a useful strategy for learning restricted classes of functions, such as monotonic functions, submodular functions, classifiers that guarantee a certain class label for some subset of examples, etc. However, these restrictions may result in a very large number of constraints. Projected stochastic gradient descent (SGD) is often the default choice for large-scale optimization in machine learning, but requires a projection after each update. For heavily-constrained objectives, we propose an efficient extension of SGD that stays close to the feasible region while only applying constraints probabilistically at each iteration. Theoretical analysis shows a compelling trade-off between per-iteration work and the number of iterations needed on problems with a large number of constraints.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐣 Hot Topic Early Bird — constrained optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Andrew Cotter , Maya Gupta , Jan Pfeifer

Topics

Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Stochastic Methods

Keywords

stochastic gradient descent constrained optimization empirical risk minimization projected gradient descent

Download PDF

Related papers

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies 2016

Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture 2016

Open Problem: Kernel methods on manifolds and metric spaces. What is the probability of a positive definite geodesic exponential kernel? 2016

Learning and Testing Junta Distributions 2016

Monte Carlo Markov Chain Algorithms for Sampling Strongly Rayleigh Distributions and Determinantal Point Processes 2016