Feature-level Incongruence Reduction for Multimodal Translation

Zhifeng Li; Yu Hong; Yuchen Pan; Jian Tang; Jianmin Yao; Guodong Zhou

2021 NAACL NAACL 2021

Feature-level Incongruence Reduction for Multimodal Translation

Abstract

AbstractCaption translation aims to translate image annotations (captions for short). Recently, Multimodal Neural Machine Translation (MNMT) has been explored as the essential solution. Besides of linguistic features in captions, MNMT allows visual(image) features to be used. The integration of multimodal features reinforces the semantic representation and considerably improves translation performance. However, MNMT suffers from the incongruence between visual and linguistic features. To overcome the problem, we propose to extend MNMT architecture with a harmonization network, which harmonizes multimodal features(linguistic and visual features)by unidirectional modal space conversion. It enables multimodal translation to be carried out in a seemingly monomodal translation pipeline. We experiment on the golden Multi30k-16 and 17. Experimental results show that, compared to the baseline,the proposed method yields the improvements of 2.2% BLEU for the scenario of translating English captions into German (En→De) at best,7.6% for the case of English-to-French translation(En→Fr) and 1.5% for English-to-Czech(En→Cz). The utilization of harmonization network leads to the competitive performance to the-state-of-the-art.

🧭 Keyword Pioneer — feature harmonization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Zhifeng Li , Yu Hong , Yuchen Pan , Jian Tang , Jianmin Yao , Guodong Zhou

Topics

Natural Language Processing > Applications > Machine Translation

Keywords

visual feature feature harmonization multimodal translation multimodal neural machine translation caption translation

Download PDF

Related papers

Knowledge Router: Learning Disentangled Representations for Knowledge Graphs 2021

Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks 2021

Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction 2021

Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing 2021

Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers 2021