Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Synthesis
Speech & Audio
›
Synthesis
›
Speech Synthesis
164 directly classified papers
Papers per year
2007: 1
2012: 2
2013: 1
2016: 1
2017: 5
2018: 3
2019: 10
2020: 14
2021: 7
2022: 23
2023: 24
2024: 28
2025: 45
Papers
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
AAAI 2025
Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
ACL 2025
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
ACL 2025
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
ACL 2025
Cauchy Diffusion: A Heavy-tailed Denoising Diffusion Probabilistic Model for Speech Synthesis
AAAI 2025
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
ICCV 2025
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering
AAAI 2025
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
ACL 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
ACL 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
ACL 2025
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
ACL 2025
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
ACL 2025
SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
ACL 2025
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
AAAI 2025
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
ACL 2025
Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
AAAI 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
CVPR 2025
DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech
AAAI 2025
PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis
AAAI 2025
CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder
AAAI 2025
Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
ACL 2024
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation
ACL 2024
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
ACL 2024
Emotion Arithmetic: Emotional Speech Synthesis via Weight Space Interpolation
INTERSPEECH 2024
On the Semantic Latent Space of Diffusion-Based Text-To-Speech Models
ACL 2024
<
1
2
3
4
5
6
7
>