Teacher Forcing Recovers Reward Functions for Text Generation

Yongchang Hao; Yuxin Liu; Lili Mou

2022 NIPS NeurIPS 2022

Teacher Forcing Recovers Reward Functions for Text Generation

Abstract

Reinforcement learning (RL) has been widely used in text generation to alleviate the exposure bias issue or to utilize non-parallel datasets. The reward function plays an important role in making RL training successful. However, previous reward functions are typically task-specific and sparse, restricting the use of RL. In our work, we propose a task-agnostic approach that derives a step-wise reward function directly from a model trained with teacher forcing. We additionally propose a simple modification to stabilize the RL training on non-parallel datasets with our induced reward function. Empirical results show that our method outperforms self-training and reward regression methods on several text generation tasks, confirming the effectiveness of our reward function.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yongchang Hao , Yuxin Liu , Lili Mou

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Natural Language Processing > Generation > Text Generation Reinforcement Learning > Methods > Policy Learning Deep Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

reinforcement learning text generation reward function exposure bia teacher forcing

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022