A Visual Attention Grounding Neural Model for Multimodal Machine Translation

Mingyang Zhou; Runxiang Cheng; Yong Jae Lee; Zhou Yu

2018 EMNLP EMNLP 2018

A Visual Attention Grounding Neural Model for Multimodal Machine Translation

Abstract

AbstractWe introduce a novel multimodal machine translation model that utilizes parallel visual and textual information. Our model jointly optimizes the learning of a shared visual-language embedding and a translator. The model leverages a visual attention grounding mechanism that links the visual semantics with the corresponding textual semantics. Our approach achieves competitive state-of-the-art results on the Multi30K and the Ambiguous COCO datasets. We also collected a new multilingual multimodal product description dataset to simulate a real-world international online shopping scenario. On this dataset, our visual attention grounding model outperforms other methods by a large margin.

🌉 Interdisciplinary Bridge — Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — image-text translation

🐣 Hot Topic Early Bird — visual attention

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mingyang Zhou , Runxiang Cheng , Yong Jae Lee , Zhou Yu

Topics

Natural Language Processing > Applications > Machine Translation Deep Learning > Learning Types > Multi-Modal Learning

Keywords

multimodal learning visual grounding visual attention image-text translation shared embedding multimodal machine translation

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018