Speech & Audio › Analysis ›

Speech Analysis

998 directly classified papers

Papers per year

Papers

Capturing Intra-Dialectal Variation in Qatari Arabic: A Corpus of Cultural and Gender Dimensions EMNLP 2025

Summarizing Speech: A Comprehensive Survey EMNLP 2025

BRSpeech-DF: A Deep Fake Synthetic Speech Dataset for Portuguese Zero-Shot TTS EMNLP 2025

StandUp4AI: A New Multilingual Dataset for Humor Detection in Stand-up Comedy Videos EMNLP 2025

Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders EMNLP 2025

Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs EMNLP 2025

MULTIVOX: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions EMNLP 2025

Towards Language-Agnostic STIPA: Universal Phonetic Transcription to Support Language Documentation at Scale EMNLP 2025

Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations EMNLP 2025

InteractSpeech: A Speech Dialogue Interaction Corpus for Spoken Dialogue Model EMNLP 2025

Understanding the Modality Gap: An Empirical Study on the Speech-Text Alignment Mechanism of Large Speech Language Models EMNLP 2025

Findings of the IWSLT 2025 Evaluation Campaign ACL 2025

SPACER: A Parallel Dataset of Speech Production And Comprehension of Error Repairs NAACL 2025

Generative Annotation for ASR Named Entity Correction EMNLP 2025

Towards a Real-time Swedish Speech Analyzer for Language Learning Games: A Hybrid AI Approach to Language Assessment ACL 2025

English-based acoustic models perform well in the forced alignment of two English-based Pacific Creoles ACL 2025

VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models ACL 2025

Phonotomizer: A Compact, Unsupervised, Online Training Approach to Real-Time, Multilingual Phonetic Segmentation ACL 2025

STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation ACL 2025

Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction ACL 2025

Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution ACL 2025

Eta-WavLM: Efficient Speaker Identity Removal in Self-Supervised Speech Representations Using a Simple Linear Equation ACL 2025

Simultaneous Translation with Offline Speech and LLM Models in CUNI Submission to IWSLT 2025 ACL 2025

Automatic Phone Alignment of Code-switched Urum–Russian Field Data ACL 2025

Supervising Sound Localization by In-the-wild Egomotion CVPR 2025