PROSE: A Pronoun Omission Solution for Chinese-English Spoken Language Translation

Ke Wang; Xiutian Zhao; Yanghui Li; Wei Peng

2023 EMNLP EMNLP 2023

PROSE: A Pronoun Omission Solution for Chinese-English Spoken Language Translation

Abstract

AbstractNeural Machine Translation (NMT) systems encounter a significant challenge when translating a pro-drop (‘pronoun-dropping’) language (e.g., Chinese) to a non-pro-drop one (e.g., English), since the pro-drop phenomenon demands NMT systems to recover omitted pronouns. This unique and crucial task, however, lacks sufficient datasets for benchmarking. To bridge this gap, we introduce PROSE, a new benchmark featured in diverse pro-drop instances for document-level Chinese-English spoken language translation. Furthermore, we conduct an in-depth investigation of the pro-drop phenomenon in spoken Chinese on this dataset, reconfirming that pro-drop reduces the performance of NMT systems in Chinese-English translation. To alleviate the negative impact introduced by pro-drop, we propose Mention-Aware Semantic Augmentation, a novel approach that leverages the semantic embedding of dropped pronouns to augment training pairs. Results from the experiments on four Chinese-English translation corpora show that our proposed method outperforms existing methods regarding omitted pronoun retrieval and overall translation quality.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — pro-drop phenomenon

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Ke Wang , Xiutian Zhao , Yanghui Li , Wei Peng

Topics

Machine Learning > Application Areas > Data Augmentation Natural Language Processing > Understanding > Coreference Resolution Natural Language Processing > Applications > Machine Translation Natural Language Processing > Generation > Machine Translation Artificial Intelligence > Core AI > Natural Language Processing Deep Learning > Models > Language Models

Keywords

neural machine translation document-level translation chinese-english translation pronoun resolution spoken language translation semantic augmentation pro-drop phenomenon pronoun recovery mention-aware augmentation pronoun retrieval

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023