Speech & Audio › Synthesis ›

Text-to-Speech

835 directly classified papers

Papers per year

Papers

Read, Watch and Scream! Sound Generation from Text and Video AAAI 2025

Scaling Under-Resourced TTS: A Data-Optimized Framework with Advanced Acoustic Modeling for Thai ACL 2025

BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting NAACL 2025

Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches IJCNLP 2025

Impacts of Vocoder Selection on Tacotron-based Nepali Text-To-Speech Synthesis COLING 2025

Multimodal Fine-grained Context Interaction Graph Modeling for Conversational Speech Synthesis EMNLP 2025

UniCoM: A Universal Code-Switching Speech Generator EMNLP 2025

BRSpeech-DF: A Deep Fake Synthetic Speech Dataset for Portuguese Zero-Shot TTS EMNLP 2025

Text-to-speech system for low-resource languages: A case study in Shipibo-Konibo (a Panoan language from Peru) NAACL 2025

RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding ACL 2025

Intoner: For Chinese Poetry Intoning Synthesis IJCAI 2025

FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles AAAI 2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment ACL 2025

LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis ACL 2025

Finding A Voice: Exploring the Potential of African American Dialect and Voice Generation for Chatbots ACL 2025

TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis ACL 2025

YodiV3: NLP for Togolese Languages with Eyaa-Tom Dataset and the Lom Metric ACL 2025

OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model EMNLP 2025

PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech EMNLP 2025

Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics CVPR 2025

INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations CVPR 2025

Gender Bias in Instruction-Guided Speech Synthesis Models NAACL 2025

DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility NAACL 2025

BridgeVoC: Neural Vocoder with Schrödinger Bridge IJCAI 2025

A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity CONLL 2025