Towards Simultaneous Machine Interpretation

Alejandro Pérez-González-de-Martos; Javier Iranzo-Sánchez; Adrià Giménez Pastor; Javier Jorge; Joan-Albert Silvestre-Cerdà; Jorge Civera; Albert Sanchis; Alfons Juan

2021 INTERSPEECH INTERSPEECH 2021

Towards Simultaneous Machine Interpretation

Abstract

Automatic speech-to-speech translation (S2S) is one of the most challenging speech and language processing tasks, especially when considering its application to real-time settings. Recent advances on streaming Automatic Speech Recognition (ASR), simultaneous Machine Translation (MT) and incremental neural Text-To-Speech (TTS) make it possible to develop real-time cascade S2S systems with greatly improved accuracy. On the way to simultaneous machine interpretation, a state-of-the-art cascade streaming S2S system is described and empirically assessed in the simultaneous interpretation of European Parliament debates. We pay particular attention to the TTS component, particularly in terms of speech naturalness under a variety of response-time settings, as well as in terms of speaker similarity for its cross-lingual voice cloning capabilities.

🌉 Interdisciplinary Bridge — Natural Language Processing and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

🐣 Hot Topic Early Bird — speech-to-speech translation

Authors

Alejandro Pérez-González-de-Martos , Javier Iranzo-Sánchez , Adrià Giménez Pastor , Javier Jorge , Joan-Albert Silvestre-Cerdà , Jorge Civera , Albert Sanchis , Alfons Juan

Topics

Natural Language Processing > Applications > Machine Translation Speech & Audio > Recognition > Automatic Speech Recognition Speech & Audio > Recognition > Speech Recognition Speech & Audio > Synthesis > Text-to-Speech Natural Language Processing > Generation > Machine Translation

Keywords

machine translation automatic speech recognition text-to-speech synthesis speech-to-speech translation simultaneous interpretation simultaneous machine translation streaming translation voice cloning

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021