Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback

Khanh Nguyen; Hal Daume III; Jordan Boyd-Graber

2017 EMNLP EMNLP 2017

Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback

Abstract

AbstractMachine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve. Yet, current neural machine translation training focuses on expensive human-generated reference translations. We describe a reinforcement learning algorithm that improves neural machine translation systems from simulated human feedback. Our algorithm combines the advantage actor-critic algorithm (Mnih et al., 2016) with the attention-based neural encoder-decoder architecture (Luong et al., 2015). This algorithm (a) is well-designed for problems with a large action space and delayed rewards, (b) effectively optimizes traditional corpus-level machine translation metrics, and (c) is robust to skewed, high-variance, granular feedback modeled after actual human behaviors.

🌉 Interdisciplinary Bridge — Deep Learning and Natural Language Processing and Reinforcement Learning

🧭 Keyword Pioneer — advantage actor-critic

🐣 Hot Topic Early Bird — human feedback

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Khanh Nguyen , Hal Daume III , Jordan Boyd-Graber

Topics

Reinforcement Learning > Methods > Deep RL Natural Language Processing > Generation > Machine Translation Deep Learning > Learning Types > Reinforcement Learning

Keywords

reinforcement learning machine translation neural machine translation human feedback actor-critic algorithm bandit learning encoder-decoder architecture neural encoder-decoder advantage actor-critic

Download PDF

Related papers

Reinforced Video Captioning with Entailment Rewards 2017

Cross-lingual Character-Level Neural Morphological Tagging 2017

Inter-Weighted Alignment Network for Sentence Pair Modeling 2017

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings 2017

An Empirical Analysis of Edit Importance between Document Versions 2017