Speech & Audio › Synthesis ›

Speech Synthesis

164 directly classified papers

Papers per year

Papers

Word-Conditioned 3D American Sign Language Motion Generation EMNLP 2024

On the Semantic Latent Space of Diffusion-Based Text-To-Speech Models ACL 2024

Aligning Speech Segments Beyond Pure Semantics ACL 2024

FT-GAN: Fine-Grained Tune Modeling for Chinese Opera Synthesis AAAI 2024

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis AAAI 2024

StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing ACL 2024

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling AAAI 2024

Audio Generation with Multiple Conditional Diffusion Model AAAI 2024

Knowledge-Preserving Pluggable Modules for Multilingual Speech Translation Tasks INTERSPEECH 2024

Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion ACL 2024

SpeechAlign: Aligning Speech Generation to Human Preferences NIPS 2024

G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model AAAI 2024

V2Meow: Meowing to the Visual Beat via Video-to-Music Generation AAAI 2024

Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation AAAI 2024

A Two-Step Approach for Data-Efficient French Pronunciation Learning EMNLP 2024

Contextual Interactive Evaluation of TTS Models in Dialogue Systems INTERSPEECH 2024

PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems EMNLP 2024

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation CVPR 2024

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control EMNLP 2024

CTC-based Non-autoregressive Textless Speech-to-Speech Translation ACL 2024

Speechworthy Instruction-tuned Language Models EMNLP 2024

IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS NIPS 2024

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild ACL 2024

Learning To Dub Movies via Hierarchical Prosody Models CVPR 2023

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment ACL 2023