Speech & Audio › Synthesis ›

Text-to-Speech

835 directly classified papers

Papers per year

Papers

Lightweight Zero-shot Text-to-Speech with Mixture of Adapters INTERSPEECH 2024

STORiCo: Storytelling TTS for Hindi with Character Voice Modulation EACL 2024

Neural Codec Language Models for Disentangled and Textless Voice Conversion INTERSPEECH 2024

Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models INTERSPEECH 2024

Text-aware and Context-aware Expressive Audiobook Speech Synthesis INTERSPEECH 2024

MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance INTERSPEECH 2024

Positional Description for Numerical Normalization INTERSPEECH 2024

Multi-modal Adversarial Training for Zero-Shot Voice Cloning INTERSPEECH 2024

MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech EMNLP 2024

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text CVPR 2024

FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis INTERSPEECH 2024

Enabling Conversational Speech Synthesis using Noisy Spontaneous Data INTERSPEECH 2024

Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice INTERSPEECH 2024

ConnecTone: a modular AAC system prototype with contextual generative text prediction and style-adaptive conversational TTS INTERSPEECH 2024

FVTTS : Face Based Voice Synthesis for Text-to-Speech INTERSPEECH 2024

Assessing the impact of contextual framing on subjective TTS quality INTERSPEECH 2024

GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech INTERSPEECH 2024

MM-TTS: Multi-Modal Prompt Based Style Transfer for Expressive Text-to-Speech Synthesis AAAI 2024

GE2PE: Persian End-to-End Grapheme-to-Phoneme Conversion EMNLP 2024

MunTTS: A Text-to-Speech System for Mundari EACL 2024

H4C-TTS: Leveraging Multi-Modal Historical Context for Conversational Text-to-Speech INTERSPEECH 2024

Open-Source Conversational AI with SpeechBrain 1.0 JMLR 2024

Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations ACL 2024

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners ACL 2024

Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings INTERSPEECH 2024