Neural Speech Completion

Kazuki Tsunematsu; Johanes Effendi; Sakriani Sakti; Satoshi Nakamura

2020 INTERSPEECH INTERSPEECH 2020

Neural Speech Completion

Abstract

During a conversation, humans often predict the end of a sentence even when the other person has not finished it. In contrast, most current automatic speech recognition systems remain limited to passively recognizing what is being said. But applications like voice search, simultaneous speech translation, and spoken language communication may require a system that not only recognizes what has been said but also predicts what will be said. This paper proposes a speech completion system based on deep learning and discusses the construction in a text-to-text, speech-to-text, and speech-to-speech framework. We evaluate our system on domain-specific sentences with synthesized speech utterances that are only 25%, 50%, or 75% complete. Our proposed systems provide more natural suggestions than the Bidirectional Encoder Representations from Transformers (BERT) language representation model.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — speech completion

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Kazuki Tsunematsu , Johanes Effendi , Sakriani Sakti , Satoshi Nakamura

Topics

Deep Learning > Architectures > Transformers Speech & Audio > Recognition > Automatic Speech Recognition Speech & Audio > Synthesis > Speech Enhancement Deep Learning > Learning Types > Representation Learning

Keywords

automatic speech recognition deep learning speech completion speech prediction language prediction

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020