2023
ACL
ACL 2023
HW-TSC at IWSLT2023: Break the Quality Ceiling of Offline Track via Pre-Training and Domain Adaptation
Abstract
AbstractThis paper presents HW-TSC’s submissions to the IWSLT 2023 Offline Speech Translation task, including speech translation of talks from English to German, Chinese, and Japanese, respectively. We participate in all three conditions (constrained training, constrained with large language models training, and unconstrained training) with models of cascaded architectures. We use data enhancement, pre-training models and other means to improve the ASR quality, and R-Drop, deep model, domain data selection, etc. to improve the translation quality. Compared with last year’s best results, we achieve 2.1 BLEU improvement on the MuST-C English-German test set.
🌉
Interdisciplinary Bridge
— Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— data enhancement
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio
Authors
Zongyao Li
,
Zhanglin Wu
,
Zhiqiang Rao
,
Xie YuHao
,
Guo JiaXin
,
Daimeng Wei
,
Hengchao Shang
,
Wang Minghan
,
Xiaoyu Chen
,
Zhengzhe Yu
,
Li ShaoJun
,
Lei LiZhi
,
Hao Yang
Topics
Artificial Intelligence > Core AI > Multimodal Learning
Machine Learning > Application Areas > Data Augmentation
Machine Learning > Application Areas > Domain Adaptation
Natural Language Processing > Applications > Machine Translation
Speech & Audio > Recognition > Speech Recognition
Machine Learning > Learning Types > Transfer Learning
Deep Learning > Learning Types > Deep Learning