Speech & Audio › Synthesis ›

Text-to-Speech

835 directly classified papers

Papers per year

Papers

Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis INTERSPEECH 2024

Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations AAAI 2024

LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes INTERSPEECH 2024

FVTTS : Face Based Voice Synthesis for Text-to-Speech INTERSPEECH 2024

XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model INTERSPEECH 2024

Word-level Text Markup for Prosody Control in Speech Synthesis INTERSPEECH 2024

MunTTS: A Text-to-Speech System for Mundari EACL 2024

Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling INTERSPEECH 2024

Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration INTERSPEECH 2023

FACTSpeech: Speaking a Foreign Language Pronunciation Using Only Your Native Characters INTERSPEECH 2023

Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis INTERSPEECH 2023

ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation INTERSPEECH 2023

Learning To Dub Movies via Hierarchical Prosody Models CVPR 2023

RAD-MMM: Multilingual Multiaccented Multispeaker Text To Speech INTERSPEECH 2023

Using speech synthesis to explain automatic speaker recognition: a new application of synthetic speech INTERSPEECH 2023

Prosody-controllable Gender-ambiguous Speech Synthesis: A Tool for Investigating Implicit Bias in Speech Perception INTERSPEECH 2023

Towards Robust FastSpeech 2 by Modelling Residual Multimodality INTERSPEECH 2023

Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations INTERSPEECH 2023

Can Better Perception Become a Disadvantage? Synthetic Speech Perception in Congenitally Blind Users INTERSPEECH 2023

VC-T: Streaming Voice Conversion Based on Neural Transducer INTERSPEECH 2023

P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting NIPS 2023

MOS vs. AB: Evaluating Text-to-Speech Systems Reliably Using Clustered Standard Errors INTERSPEECH 2023

Non-parallel Accent Transfer based on Fine-grained Controllable Accent Modelling EMNLP 2023

Evaluating and reducing the distance between synthetic and real speech distributions INTERSPEECH 2023

Cross-lingual Prosody Transfer for Expressive Machine Dubbing INTERSPEECH 2023