← Recognition

Speech & Audio › Recognition ›

Speech Recognition

1480 directly classified papers

Papers per year

Papers

YodiV3: NLP for Togolese Languages with Eyaa-Tom Dataset and the Lom Metric ACL 2025

SparQLe: Speech Queries to Text Translation Through LLMs ACL 2025

VoxRAG: A Step Toward Transcription-Free RAG Systems in Spoken Question Answering ACL 2025

NADI 2025: The First Multidialectal Arabic Speech Processing Shared Task EMNLP 2025

MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation EMNLP 2025

JU-CSE-NLP’s Cascaded Speech to Text Translation Systems for IWSLT 2025 in Indic Track ACL 2025

Indonesian Speech Content De-Identification in Low Resource Transcripts COLING 2025

Fotheidil: an Automatic Transcription System for the Irish Language COLING 2025

kNN For Whisper And Its Effect On Bias And Speaker Adaptation NAACL 2025

Distinct social-linguistic processing between humans and large audio-language models: Evidence from model-brain alignment NAACL 2025

Preserving Comorian Linguistic Heritage: Bidirectional Transliteration Between the Latin Alphabet and the Kamar-Eddine System NAACL 2025

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations ICCV 2025

Not Only Vision: Evolve Visual Speech Recognition via Peripheral Information ICCV 2025

Quantum-Infused Whisper: A Framework for Replacing Classical Components IJCNLP 2025

GMU Systems for the IWSLT 2025 Low-Resource Speech Translation Shared Task ACL 2025

Visual-Aware Speech Recognition for Noisy Scenarios EMNLP 2025

Generative Annotation for ASR Named Entity Correction EMNLP 2025

GenPTQ: Green Post-Training Quantization for Large-Scale ASR Models with Mixed-Precision Bit Allocation EMNLP 2025

ASR Under Noise: Exploring Robustness for Sundanese and Javanese EMNLP 2025

2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset Download PDF ACL 2025

WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models ACL 2025

It’s Not a Walk in the Park! Challenges of Idiom Translation in Speech-to-text Systems ACL 2025

DEEP: an automatic bidirectional translator leveraging an ASR for translation of Italian sign language ACL 2025

InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model ACL 2025

Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI COLING 2025