Adversarially Improving NMT Robustness to ASR Errors with Confusion Sets

Shuaibo Wang; Yufeng Chen; Songming Zhang; Deyi Xiong; Jinan Xu

2022 AACL AACL 2022

Adversarially Improving NMT Robustness to ASR Errors with Confusion Sets

Abstract

AbstractNeural machine translation (NMT) models are known to be fragile to noisy inputs from automatic speech recognition (ASR) systems. Existing methods are usually tailored for robustness against only homophone errors which account for a small portion of realistic ASR errors. In this paper, we propose an adversarial example generation method based on confusion sets that contain words easily confusable with a target word by ASR to conduct adversarial training for NMT models. Specifically, an adversarial example is generated from the perspective of acoustic relations instead of the traditional uniform or unigram sampling from the confusion sets. Experiments on different test sets with hand-crafted and real-world noise demonstrate the effectiveness of our method over previous methods. Moreover, our approach can achieve improvements on the clean test set.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shuaibo Wang , Yufeng Chen , Songming Zhang , Deyi Xiong , Jinan Xu

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Learning Types > Adversarial Learning

Keywords

neural machine translation automatic speech recognition adversarial training confusion set robustness improvement

Download PDF

Related papers

A Japanese Corpus of Many Specialized Domains for Word Segmentation and Part-of-Speech Tagging 2022

Enhancing Tabular Reasoning with Pattern Exploiting Training 2022

Re-contextualizing Fairness in NLP: The Case of India 2022

Promoting Pre-trained LM with Linguistic Features on Automatic Readability Assessment 2022

KreolMorisienMT: A Dataset for Mauritian Creole Machine Translation 2022