Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Synthesis
Speech & Audio
›
Synthesis
›
Text-to-Speech
835 directly classified papers
Papers per year
2010: 1
2016: 49
2017: 44
2018: 50
2019: 59
2020: 95
2021: 90
2022: 138
2023: 126
2024: 117
2025: 61
2026: 5
Papers
Read, Watch and Scream! Sound Generation from Text and Video
AAAI 2025
Scaling Under-Resourced TTS: A Data-Optimized Framework with Advanced Acoustic Modeling for Thai
ACL 2025
BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting
NAACL 2025
Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches
IJCNLP 2025
Impacts of Vocoder Selection on Tacotron-based Nepali Text-To-Speech Synthesis
COLING 2025
Multimodal Fine-grained Context Interaction Graph Modeling for Conversational Speech Synthesis
EMNLP 2025
UniCoM: A Universal Code-Switching Speech Generator
EMNLP 2025
BRSpeech-DF: A Deep Fake Synthetic Speech Dataset for Portuguese Zero-Shot TTS
EMNLP 2025
Text-to-speech system for low-resource languages: A case study in Shipibo-Konibo (a Panoan language from Peru)
NAACL 2025
RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding
ACL 2025
Intoner: For Chinese Poetry Intoning Synthesis
IJCAI 2025
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
AAAI 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
ACL 2025
LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
ACL 2025
Finding A Voice: Exploring the Potential of African American Dialect and Voice Generation for Chatbots
ACL 2025
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
ACL 2025
YodiV3: NLP for Togolese Languages with Eyaa-Tom Dataset and the Lom Metric
ACL 2025
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
EMNLP 2025
PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech
EMNLP 2025
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
CVPR 2025
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
CVPR 2025
Gender Bias in Instruction-Guided Speech Synthesis Models
NAACL 2025
DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility
NAACL 2025
BridgeVoC: Neural Vocoder with Schrödinger Bridge
IJCAI 2025
A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity
CONLL 2025
<
1
2
3
4
5
…
34
>