Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

Julia Kreutzer; Artem Sokolov; Stefan Riezler

2017 ACL ACL 2017

Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

Abstract

AbstractBandit structured prediction describes a stochastic optimization framework where learning is performed from partial feedback. This feedback is received in the form of a task loss evaluation to a predicted output structure, without having access to gold standard structures. We advance this framework by lifting linear bandit learning to neural sequence-to-sequence learning problems using attention-based recurrent neural networks. Furthermore, we show how to incorporate control variates into our learning algorithms for variance reduction and improved generalization. We present an evaluation on a neural machine translation task that shows improvements of up to 5.89 BLEU points for domain adaptation from simulated bandit feedback.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — bandit structured prediction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Julia Kreutzer , Artem Sokolov , Stefan Riezler

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Applications > Machine Translation Machine Learning > Learning Types > Online Learning Machine Learning > Optimization & Theory > Stochastic Methods Machine Learning > Learning Types > Exploration-Exploitation

Keywords

domain adaptation structured prediction attention mechanism neural machine translation partial feedback sequence-to-sequence learning control variate bandit learning bandit structured prediction

Download PDF

Related papers

A* CCG Parsing with a Supertag and Dependency Factored Model 2017

Detecting annotation noise in automatically labelled data 2017

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2017

Annotating tense, mood and voice for English, French and German 2017

Word Embedding for Response-To-Text Assessment of Evidence 2017