Differentiable Dynamic Programming for Structured Prediction and Attention

Arthur Mensch; Mathieu Blondel

2018 ICML ICML 2018

Differentiable Dynamic Programming for Structured Prediction and Attention

Abstract

Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, many DP algorithms are non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on structured prediction (audio-to-score alignment, NER) and on structured and sparse attention for translation.

🐣 Hot Topic Early Bird — attention mechanism

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

Authors

Arthur Mensch , Mathieu Blondel

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Neural Network Optimization Natural Language Processing > Applications > Named Entity Recognition Machine Learning > Core Methods > Structured Prediction

Keywords

structured prediction attention mechanism dynamic programming viterbi algorithm differentiable programming graphical model

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018