← Recognition

Speech & Audio › Recognition ›

Speech Recognition

1480 directly classified papers

Papers per year

Papers

Fotheidil: an Automatic Transcription System for the Irish Language COLING 2025

Phonotomizer: A Compact, Unsupervised, Online Training Approach to Real-Time, Multilingual Phonetic Segmentation ACL 2025

Curved Worlds, Clear Boundaries: Generalizing Speech Deepfake Detection using Hyperbolic and Spherical Geometry Spaces IJCNLP 2025

Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI COLING 2025

kNN For Whisper And Its Effect On Bias And Speaker Adaptation NAACL 2025

Wenzhou Dialect Speech to Mandarin Text Conversion NAACL 2025

The Role of Prosody in Spoken Question Answering NAACL 2025

Distinct social-linguistic processing between humans and large audio-language models: Evidence from model-brain alignment NAACL 2025

Not Only Vision: Evolve Visual Speech Recognition via Peripheral Information ICCV 2025

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations ICCV 2025

FFSTC 2: Extending the Fongbe to French Speech Translation Corpus ACL 2025

JU-CSE-NLP’s Cascaded Speech to Text Translation Systems for IWSLT 2025 in Indic Track ACL 2025

GenPTQ: Green Post-Training Quantization for Large-Scale ASR Models with Mixed-Precision Bit Allocation EMNLP 2025

MetaMixSpeech: Meta Task Augmentation for Low-Resource Speech Recognition EMNLP 2025

ASR Under Noise: Exploring Robustness for Sundanese and Javanese EMNLP 2025

Generative Annotation for ASR Named Entity Correction EMNLP 2025

WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models ACL 2025

Measuring the Effect of Transcription Noise on Downstream Language Understanding Tasks ACL 2025

DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation ACL 2025

LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis ACL 2025

InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model ACL 2025

Slamming: Training a Speech Language Model on One GPU in a Day ACL 2025

Automatic Speech Recognition for African Low-Resource Languages: Challenges and Future Directions ACL 2025

YodiV3: NLP for Togolese Languages with Eyaa-Tom Dataset and the Lom Metric ACL 2025

Visual-Aware Speech Recognition for Noisy Scenarios EMNLP 2025