← Recognition

Speech & Audio › Recognition ›

Speech Recognition

1480 directly classified papers

Papers per year

Papers

QUESPA Submission for the IWSLT 2024 Dialectal and Low-resource Speech Translation Task ACL 2024

JHU IWSLT 2024 Dialectal and Low-resource System Description ACL 2024

Contextual Biasing with Confidence-based Homophone Detector for Mandarin End-to-End Speech Recognition INTERSPEECH 2024

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation INTERSPEECH 2024

Serialized Output Training by Learned Dominance INTERSPEECH 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer INTERSPEECH 2024

RepTor: Re-parameterizable Temporal Convolution for Keyword Spotting via Differentiable Kernel Search INTERSPEECH 2024

Phonological-Level Mispronunciation Detection and Diagnosis INTERSPEECH 2024

Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis INTERSPEECH 2024

Positional Description for Numerical Normalization INTERSPEECH 2024

Modality Translation Learning for Joint Speech-Text Model INTERSPEECH 2024

Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU INTERSPEECH 2024

Towards Dog Bark Decoding: Leveraging Human Speech Processing for Automated Bark Classification COLING 2024

Fine-Tuning a Pre-Trained Wav2Vec2 Model for Automatic Speech Recognition- Experiments with De Zahrar Sproche COLING 2024

DECM: Evaluating Bilingual ASR Performance on a Code-switching/mixing Benchmark COLING 2024

Gos 2: A New Reference Corpus of Spoken Slovenian COLING 2024

Lessons from Deploying the First Bilingual Peruvian Sign Language - Spanish Online Dictionary COLING 2024

Correcting Pronoun Homophones with Subtle Semantics in Chinese Speech Recognition COLING 2024

Do VSR Models Generalize Beyond LRS3? WACV 2024

Becoming a High-Resource Language in Speech: The Catalan Case in the Common Voice Corpus COLING 2024

Open-Source Conversational AI with SpeechBrain 1.0 JMLR 2024

Constructing Korean Learners’ L2 Speech Corpus of Seven Languages for Automatic Pronunciation Assessment COLING 2024

SCOUT: A Situated and Multi-Modal Human-Robot Dialogue Corpus COLING 2024

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks NIPS 2024

Language Without Borders: A Dataset and Benchmark for Code-Switching Lip Reading NIPS 2024