In-Image Neural Machine Translation with Segmented Pixel Sequence-to-Sequence Model

Yanzhi Tian; Xiang Li; Zeming Liu; Yuhang Guo; Bin Wang

2023 EMNLP EMNLP 2023

In-Image Neural Machine Translation with Segmented Pixel Sequence-to-Sequence Model

Abstract

AbstractIn-Image Machine Translation (IIMT) aims to convert images containing texts from one language to another. Traditional approaches for this task are cascade methods, which utilize optical character recognition (OCR) followed by neural machine translation (NMT) and text rendering. However, the cascade methods suffer from compounding errors of OCR and NMT, leading to a decrease in translation quality. In this paper, we propose an end-to-end model instead of the OCR, NMT and text rendering pipeline. Our neural architecture adopts encoder-decoder paradigm with segmented pixel sequences as inputs and outputs. Through end-to-end training, our model yields improvements across various dimensions, (i) it achieves higher translation quality by avoiding error propagation, (ii) it demonstrates robustness for out domain data, and (iii) it displays insensitivity to incomplete words. To validate the effectiveness of our method and support for future research, we construct our dataset containing 4M pairs of De-En images and train our end-to-end model. The experimental results show that our approach outperforms both cascade method and current end-to-end model.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Yanzhi Tian , Xiang Li , Zeming Liu , Yuhang Guo , Bin Wang

Topics

Deep Learning > Architectures > Transformers Deep Learning > Architectures > Neural Networks Computer Vision > Generation > Image Translation Natural Language Processing > Applications > Machine Translation Natural Language Processing > Generation > Machine Translation Computer Vision > Domain-Specific > Document Analysis Computer Vision > Processing > Image Processing

Keywords

neural machine translation image translation end-to-end learning sequence-to-sequence model end-to-end model optical character recognition

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023