2024 INTERSPEECH INTERSPEECH 2024

RAST: A Reference-Audio Synchronization Tool for Dubbed Content

Abstract

In the film industry, audio-video synchronization issues are considered major quality defects and key drivers of viewer disengagement. This is especially true for dubbed content, which is more prone to these errors due to the added manual process of replacing the original speech with a translated version. Despite their potential benefit for dubbed media production, automatic sync detection methods are seldom explored. In this paper, we propose a Transformer-based Siamese network for dubbed audio synchronization detection. Based on a large dataset of dubbed entertainment, we demonstrate that, compared to previous methods, our approach is more robust in detecting the misalignment introduced by translated speech segments. While our method addresses the previously studied constant synchronization errors, our model is the first to handle the frequent issue of intermittent offsets.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Speech & Audio
🧭 Keyword Pioneer — audio synchronization
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio