Speech & Audio › Synthesis ›

Text-to-Speech

835 directly classified papers

Papers per year

Papers

Learning from Scarcity: Building and Benchmarking Speech Technology for Sukuma. EACL 2026

Eyaa-Tom 26, Yodi-Mantissa and Lom Bench: A Community Benchmark for TTS in Local Languages EACL 2026

WenetSpeech-Yue: A Large-Scale Cantonese Speech Corpus with Multi-dimensional Annotation AAAI 2026

BanglaIPA: Towards Robust Text-to-IPA Transcription with Contextual Rewriting in Bengali EACL 2026

IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech AAAI 2026

Finding A Voice: Exploring the Potential of African American Dialect and Voice Generation for Chatbots ACL 2025

InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training ACL 2025

Chain-Talker: Chain Understanding and Rendering for Empathetic Conversational Speech Synthesis ACL 2025

O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion EMNLP 2025

UniCoM: A Universal Code-Switching Speech Generator EMNLP 2025

Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching ACL 2025

SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation ACL 2025

Scaling Under-Resourced TTS: A Data-Optimized Framework with Advanced Acoustic Modeling for Thai ACL 2025

LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis ACL 2025

Intoner: For Chinese Poetry Intoning Synthesis IJCAI 2025

Continuous Speech Tokenizer in Text To Speech NAACL 2025

RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding ACL 2025

BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting NAACL 2025

Text-to-speech system for low-resource languages: A case study in Shipibo-Konibo (a Panoan language from Peru) NAACL 2025

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance EMNLP 2025

Multimodal Fine-grained Context Interaction Graph Modeling for Conversational Speech Synthesis EMNLP 2025

BRSpeech-DF: A Deep Fake Synthetic Speech Dataset for Portuguese Zero-Shot TTS EMNLP 2025

FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles AAAI 2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment ACL 2025

Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches IJCNLP 2025