Speech & Audio › Synthesis ›

Text-to-Speech

835 directly classified papers

Papers per year

Papers

Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing CVPR 2025

BRSpeech-DF: A Deep Fake Synthetic Speech Dataset for Portuguese Zero-Shot TTS EMNLP 2025

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation ICCV 2025

ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control ACL 2025

Scaling Under-Resourced TTS: A Data-Optimized Framework with Advanced Acoustic Modeling for Thai ACL 2025

Intoner: For Chinese Poetry Intoning Synthesis IJCAI 2025

LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis ACL 2025

ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis AAAI 2025

InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training ACL 2025

RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding ACL 2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment ACL 2025

Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching ACL 2025

A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity CONLL 2025

Text-to-speech system for low-resource languages: A case study in Shipibo-Konibo (a Panoan language from Peru) NAACL 2025

TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis ACL 2025

YodiV3: NLP for Togolese Languages with Eyaa-Tom Dataset and the Lom Metric ACL 2025

Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? ACL 2024

LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes INTERSPEECH 2024

Towards Zero-Shot Text-To-Speech for Arabic Dialects ACL 2024

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding AAAI 2024

MM-TTS: Multi-Modal Prompt Based Style Transfer for Expressive Text-to-Speech Synthesis AAAI 2024

FVTTS : Face Based Voice Synthesis for Text-to-Speech INTERSPEECH 2024

An Attribute Interpolation Method in Speech Synthesis by Model Merging INTERSPEECH 2024

Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems INTERSPEECH 2024

An inclusive approach to creating a palette of synthetic voices for gender diversity INTERSPEECH 2024