Speech & Audio › Processing ›

Speech Enhancement

107 directly classified papers

Papers per year

Papers

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction AAAI 2024

Continual Contrastive Spoken Language Understanding ACL 2024

Unsupervised Discrete Representations of American Sign Language EMNLP 2024

CLASP: Cross-modal Alignment Using Pre-trained Unimodal Models ACL 2024

ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers EMNLP 2024

AV-RIR: Audio-Visual Room Impulse Response Estimation CVPR 2024

OpenSep: Leveraging Large Language Models with Textual Inversion for Open World Audio Separation EMNLP 2024

Learning Spatially-Aware Language and Audio Embeddings NIPS 2024

Acoustic Volume Rendering for Neural Impulse Response Fields NIPS 2024

Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation NIPS 2024

Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems ACL 2024

VivesDebate-Speech: A Corpus of Spoken Argumentation to Leverage Audio Features for Argument Mining EMNLP 2023

Selector-Enhancer: Learning Dynamic Selection of Local and Non-local Attention Operation for Speech Enhancement AAAI 2023

PGSS: Pitch-Guided Speech Separation AAAI 2023

DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in Indo-European Languages EMNLP 2023

Toward Joint Language Modeling for Speech Units and Text EMNLP 2023

Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video AAAI 2023

Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling ACL 2023

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization ACL 2023

Improving Grammatical Error Correction with Multimodal Feature Integration ACL 2023

Speech-to-Speech Translation for a Real-world Unwritten Language ACL 2023

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units ACL 2023

Weighted Von Mises Distribution-based Loss Function for Real-time STFT Phase Reconstruction Using DNN INTERSPEECH 2023

Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units EMNLP 2023

Non-parallel Accent Transfer based on Fine-grained Controllable Accent Modelling EMNLP 2023