Reinforced Video Captioning with Entailment Rewards

Ramakanth Pasunuru; Mohit Bansal

2017 EMNLP EMNLP 2017

Reinforced Video Captioning with Entailment Rewards

Abstract

AbstractSequence-to-sequence models have shown promising improvements on the temporal task of video captioning, but they optimize word-level cross-entropy loss during training. First, using policy gradient and mixed-loss methods for reinforcement learning, we directly optimize sentence-level task-based metrics (as rewards), achieving significant improvements over the baseline, based on both automatic metrics and human evaluation on multiple datasets. Next, we propose a novel entailment-enhanced reward (CIDEnt) that corrects phrase-matching based metrics (such as CIDEr) to only allow for logically-implied partial matches and avoid contradictions, achieving further significant improvements over the CIDEr-reward model. Overall, our CIDEnt-reward model achieves the new state-of-the-art on the MSR-VTT dataset.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Natural Language Processing and Reinforcement Learning

📈 Trend Setter — Natural Language Generation

🧭 Keyword Pioneer — sentence-level metric

🐣 Hot Topic Early Bird — policy gradient

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ramakanth Pasunuru , Mohit Bansal

Topics

Computer Vision > Generation > Video Generation Natural Language Processing > Generation > Text Generation Reinforcement Learning > Methods > Deep RL Deep Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reinforcement Learning Artificial Intelligence > Core AI > Natural Language Generation

Keywords

reinforcement learning policy gradient video captioning natural language generation sequence-to-sequence model sentence-level metric

Download PDF

Related papers

Cross-lingual Character-Level Neural Morphological Tagging 2017

Inter-Weighted Alignment Network for Sentence Pair Modeling 2017

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings 2017

An Empirical Analysis of Edit Importance between Document Versions 2017

Universal Semantic Parsing 2017