Speech & Audio › Synthesis ›

Speech Enhancement

793 directly classified papers

Papers per year

Papers

Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation Network INTERSPEECH 2024

XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model INTERSPEECH 2024

PLDNet: PLD-Guided Lightweight Deep Network Boosted by Efﬁcient Attention for Handheld Dual-Microphone Speech Enhancement INTERSPEECH 2024

SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion INTERSPEECH 2024

Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition INTERSPEECH 2024

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation INTERSPEECH 2024

Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction IJCAI 2024

Personalized Speech Enhancement Without a Separate Speaker Embedding Model INTERSPEECH 2024

Centroid Estimation with Transformer-Based Speaker Embedder for Robust Target Speaker Extraction INTERSPEECH 2024

Lightweight Dynamic Sparse Transformer for Monaural Speech Enhancement INTERSPEECH 2024

Real-Time Gaze-directed speech enhancement for audio-visual hearing-aids INTERSPEECH 2024

All Neural Low-latency Directional Speech Extraction INTERSPEECH 2024

Joint prediction of subjective listening effort and speech intelligibility based on end-to-end learning INTERSPEECH 2024

Novel-view Acoustic Synthesis From 3D Reconstructed Rooms INTERSPEECH 2024

RT-LA-VocE: Real-Time Low-SNR Audio-Visual Speech Enhancement INTERSPEECH 2024

FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching INTERSPEECH 2024

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS INTERSPEECH 2024

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification ACL 2024

Textless Speech-to-Speech Translation With Limited Parallel Data EMNLP 2024

Does the Lombard Effect Matter in Speech Separation? Introducing the Lombard-GRID-2mix Dataset INTERSPEECH 2024

Hear Your Face: Face-based voice conversion with F0 estimation INTERSPEECH 2024

Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation ACL 2024

SBAAM! Eliminating Transcript Dependency in Automatic Subtitling ACL 2024

Biophysically-inspired single-channel speech enhancement in the time domain INTERSPEECH 2023

CMU’s IWSLT 2023 Simultaneous Speech Translation System ACL 2023