Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Synthesis
Speech & Audio
›
Synthesis
›
Speech Synthesis
164 directly classified papers
Papers per year
2007: 1
2012: 2
2013: 1
2016: 1
2017: 5
2018: 3
2019: 10
2020: 14
2021: 7
2022: 23
2023: 24
2024: 28
2025: 45
Papers
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
ACL 2025
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
ACL 2025
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
AAAI 2025
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
ACL 2025
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
ACL 2025
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
ACL 2025
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
AAAI 2025
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech
AAAI 2025
Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
ACL 2025
In-the-wild Audio Spatialization with Flexible Text-guided Localization
ACL 2025
FlashAudio: Rectified Flow for Fast and High-Fidelity Text-to-Audio Generation
ACL 2025
PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis
AAAI 2025
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering
AAAI 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
ACL 2025
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
ICCV 2025
DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech
AAAI 2025
EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
AAAI 2025
CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder
AAAI 2025
Cauchy Diffusion: A Heavy-tailed Denoising Diffusion Probabilistic Model for Speech Synthesis
AAAI 2025
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
AAAI 2025
DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation
AAAI 2025
ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis
AAAI 2025
Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
AAAI 2025
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
AAAI 2025
DNASpeech: A Contextualized and Situated Text-to-Speech Dataset with Dialogues, Narratives and Actions
ACL 2025
<
1
2
3
4
5
6
7
>