Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Synthesis
Speech & Audio
›
Synthesis
›
Text-to-Speech
835 directly classified papers
Papers per year
2010: 1
2016: 49
2017: 44
2018: 50
2019: 59
2020: 95
2021: 90
2022: 138
2023: 126
2024: 117
2025: 61
2026: 5
Papers
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
CVPR 2025
BRSpeech-DF: A Deep Fake Synthetic Speech Dataset for Portuguese Zero-Shot TTS
EMNLP 2025
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
ICCV 2025
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
ACL 2025
Scaling Under-Resourced TTS: A Data-Optimized Framework with Advanced Acoustic Modeling for Thai
ACL 2025
Intoner: For Chinese Poetry Intoning Synthesis
IJCAI 2025
LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
ACL 2025
ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis
AAAI 2025
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
ACL 2025
RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding
ACL 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
ACL 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
ACL 2025
A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity
CONLL 2025
Text-to-speech system for low-resource languages: A case study in Shipibo-Konibo (a Panoan language from Peru)
NAACL 2025
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
ACL 2025
YodiV3: NLP for Togolese Languages with Eyaa-Tom Dataset and the Lom Metric
ACL 2025
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
ACL 2024
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
INTERSPEECH 2024
Towards Zero-Shot Text-To-Speech for Arabic Dialects
ACL 2024
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
AAAI 2024
MM-TTS: Multi-Modal Prompt Based Style Transfer for Expressive Text-to-Speech Synthesis
AAAI 2024
FVTTS : Face Based Voice Synthesis for Text-to-Speech
INTERSPEECH 2024
An Attribute Interpolation Method in Speech Synthesis by Model Merging
INTERSPEECH 2024
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
INTERSPEECH 2024
An inclusive approach to creating a palette of synthetic voices for gender diversity
INTERSPEECH 2024
<
1
2
3
4
5
…
34
>