Speech & Audio › Synthesis ›

Speech Synthesis

164 directly classified papers

Papers per year

Papers

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model AAAI 2025

Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling ACL 2025

OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction ACL 2025

T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback ACL 2025

Cauchy Diffusion: A Heavy-tailed Denoising Diffusion Probabilistic Model for Speech Synthesis AAAI 2025

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models ICCV 2025

ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering AAAI 2025

ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control ACL 2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment ACL 2025

Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching ACL 2025

InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training ACL 2025

SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation ACL 2025

SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models ACL 2025

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching AAAI 2025

Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback ACL 2025

Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation AAAI 2025

From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech CVPR 2025

DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech AAAI 2025

PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis AAAI 2025

CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder AAAI 2025

Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion ACL 2024

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation ACL 2024

CTC-based Non-autoregressive Textless Speech-to-Speech Translation ACL 2024

Emotion Arithmetic: Emotional Speech Synthesis via Weight Space Interpolation INTERSPEECH 2024

On the Semantic Latent Space of Diffusion-Based Text-To-Speech Models ACL 2024