Imagination Improves Multimodal Translation

Desmond Elliott; Ákos Kádár

2017 IJCNLP IJCNLP 2017

Imagination Improves Multimodal Translation

Abstract

AbstractWe decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30K dataset. Furthermore, it is equally effective if we train the image prediction task on the external MS COCO dataset, and we find improvements if we train the translation model on the external News Commentary parallel text.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Desmond Elliott , Ákos Kádár

Topics

Machine Learning > Learning Types > Self-Supervised Learning Natural Language Processing > Applications > Machine Translation

Keywords

attention mechanism multitask learning image representation multimodal translation grounded representation

Download PDF

Related papers

Procedural Text Generation from an Execution Video 2017

DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset 2017

Roles and Success in Wikipedia Talk Pages: Identifying Latent Patterns of Behavior 2017

PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts 2017

Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task 2017