Latent Variable Model for Multi-modal Translation

Iacer Calixto; Miguel Rios; Wilker Aziz

2019 ACL ACL 2019

Latent Variable Model for Multi-modal Translation

Abstract

AbstractIn this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model. This latent variable can be seen as a multi-modal stochastic embedding of an image and its description in a foreign language. It is used in a target-language decoder and also to predict image features. Importantly, our model formulation utilises visual and textual inputs during training but does not require that images be available at test time. We show that our latent variable MMT formulation improves considerably over strong baselines, including a multi-task learning approach (Elliott and Kadar, 2017) and a conditional variational auto-encoder approach (Toyama et al., 2016). Finally, we show improvements due to (i) predicting image features in addition to only conditioning on them, (ii) imposing a constraint on the KL term to promote models with non-negligible mutual information between inputs and latent variable, and (iii) by training on additional target-language image descriptions (i.e. synthetic data).

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — multi-modal translation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🐣 Hot Topic Early Bird — kl divergence

Authors

Iacer Calixto , Miguel Rios , Wilker Aziz

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Optimization & Theory > Bayesian Inference Deep Learning > Models > Variational Inference Natural Language Processing > Applications > Machine Translation Natural Language Processing > Generation > Machine Translation Machine Learning > Learning Types > Multi-Modal Learning

Keywords

kl divergence neural machine translation mutual information latent variable model variational autoencoder multi-modal translation multimodal embedding multimodal translation image feature

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019