← Recognition

Speech & Audio › Recognition ›

Speech Recognition

1480 directly classified papers

Papers per year

Papers

Not Only Vision: Evolve Visual Speech Recognition via Peripheral Information ICCV 2025

KIT’s Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization ACL 2025

JU-CSE-NLP’s Cascaded Speech to Text Translation Systems for IWSLT 2025 in Indic Track ACL 2025

Quantum-Infused Whisper: A Framework for Replacing Classical Components IJCNLP 2025

SparQLe: Speech Queries to Text Translation Through LLMs ACL 2025

SpeechEE@XLLM25: Retrieval-Enhanced Few-Shot Prompting for Speech Event Extraction ACL 2025

NADI 2025: The First Multidialectal Arabic Speech Processing Shared Task EMNLP 2025

CourtNav: Voice-Guided, Anchor-Accurate Navigation of Long Legal Documents in Courtrooms EMNLP 2025

FSboard: Over 3 Million Characters of ASL Fingerspelling Collected via Smartphones CVPR 2025

Speech-to-Speech Machine Translation for Dialectal Variations of Hindi IJCNLP 2025

VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation COLING 2025

Generative Annotation for ASR Named Entity Correction EMNLP 2025

CA*: Addressing Evaluation Pitfalls in Computation-Aware Latency for Simultaneous Speech Translation NAACL 2025

Visual-Aware Speech Recognition for Noisy Scenarios EMNLP 2025

VALLR: Visual ASR Language Model for Lip Reading ICCV 2025

2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset Download PDF ACL 2025

Beyond WER: Probing Whisper’s Sub‐token Decoder Across Diverse Language Resource Levels EMNLP 2025

GenPTQ: Green Post-Training Quantization for Large-Scale ASR Models with Mixed-Precision Bit Allocation EMNLP 2025

ASR Under Noise: Exploring Robustness for Sundanese and Javanese EMNLP 2025

UY/CH-CHILD -- A Public Chinese L2 Speech Database of Uyghur Children INTERSPEECH 2024

Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech INTERSPEECH 2024

Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer INTERSPEECH 2024

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition INTERSPEECH 2024

Speech Recognition Models are Strong Lip-readers INTERSPEECH 2024

Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting INTERSPEECH 2024