Length-Aware NMT and Adaptive Duration for Automatic Dubbing

Zhiqiang Rao; Hengchao Shang; Jinlong Yang; Daimeng Wei; Zongyao Li; Jiaxin Guo; Shaojun Li; Zhengzhe Yu; Zhanglin Wu; Yuhao Xie; Bin Wei; Jiawei Zheng; Lizhi Lei; Hao Yang

2023 ACL ACL 2023

Length-Aware NMT and Adaptive Duration for Automatic Dubbing

Abstract

AbstractThis paper presents the submission of Huawei Translation Services Center for the IWSLT 2023 dubbing task in the unconstrained setting. The proposed solution consists of a Transformer-based machine translation model and a phoneme duration predictor. The Transformer is deep and multiple target-to-source length-ratio class labels are used to control target lengths. The variation predictor in FastSpeech2 is utilized to predict phoneme durations. To optimize the isochrony in dubbing, re-ranking and scaling are performed. The source audio duration is used as a reference to re-rank the translations of different length-ratio labels, and the one with minimum time deviation is preferred. Additionally, the phoneme duration outputs are scaled within a defined threshold to narrow the duration gap with the source audio.

🌉 Interdisciplinary Bridge — Natural Language Processing and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Zhiqiang Rao , Hengchao Shang , Jinlong Yang , Daimeng Wei , Zongyao Li , Jiaxin Guo , Shaojun Li , Zhengzhe Yu , Zhanglin Wu , Yuhao Xie , Bin Wei , Jiawei Zheng , Lizhi Lei , Hao Yang

Topics

Natural Language Processing > Applications > Machine Translation Speech & Audio > Synthesis > Speech Synthesis

Keywords

machine translation speech synthesis neural machine translation automatic dubbing phoneme duration

Download PDF

History Semantic Graph Enhanced Conversational KBQA with Temporal Information Modeling 2023

Efficient Transformers with Dynamic Token Pooling 2023

HHU at SemEval-2023 Task 3: An Adapter-based Approach for News Genre Classification 2023

NAP at SemEval-2023 Task 3: Is Less Really More? (Back-)Translation as Data Augmentation Strategies for Detecting Persuasion Techniques 2023

Length-Aware NMT and Adaptive Duration for Automatic Dubbing

Abstract

Authors

Topics

Keywords

Related papers