Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation

Elizabeth Salesky; Matthias Sperber; Alan W Black

2019 ACL ACL 2019

Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation

Abstract

AbstractPrevious work on end-to-end translation from speech has primarily used frame-level features as speech representations, which creates longer, sparser sequences than text. We show that a naive method to create compressed phoneme-like speech representations is far more effective and efficient for translation than traditional frame-level speech features. Specifically, we generate phoneme labels for speech frames and average consecutive frames with the same label to create shorter, higher-level source sequences for translation. We see improvements of up to 5 BLEU on both our high and low resource language pairs, with a reduction in training time of 60%. Our improvements hold across multiple data sizes and two language pairs.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

📈 Trend Setter — Speech Enhancement

🧭 Keyword Pioneer — end-to-end speech

🐣 Hot Topic Early Bird — speech translation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Elizabeth Salesky , Matthias Sperber , Alan W Black

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Application Areas > Domain Adaptation Deep Learning > Learning Types > Representation Learning Speech & Audio > Processing > Speech Enhancement Speech & Audio > Recognition > Speech Translation

Keywords

phoneme recognition sequence-to-sequence learning end-to-end learning speech translation speech representation end-to-end speech speech feature phoneme representation frame-level feature

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019