Artificial Intelligence › Core AI ›

Speech Processing

181 directly classified papers

Papers per year

Papers

LLM-driven Multimodal and Multi-Identity Listening Head Generation CVPR 2025

Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs EMNLP 2025

Towards Reliable Large Audio Language Model ACL 2025

Context-Aware Lexical Stress Prediction and Phonemization for Ukrainian TTS Systems ACL 2025

LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis ACL 2025

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance EMNLP 2025

Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback ACL 2025

Scaling Under-Resourced TTS: A Data-Optimized Framework with Advanced Acoustic Modeling for Thai ACL 2025

Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution ACL 2025

Mind the Gap: Static and Interactive Evaluations of Large Audio Models ACL 2025

InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training ACL 2025

SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language Models AAAI 2025

ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering AAAI 2025

DNASpeech: A Contextualized and Situated Text-to-Speech Dataset with Dialogues, Narratives and Actions ACL 2025

OLMD: Orientation-aware Long-term Motion Decoupling for Continuous Sign Language Recognition AAAI 2025

Phonotomizer: A Compact, Unsupervised, Online Training Approach to Real-Time, Multilingual Phonetic Segmentation ACL 2025

Can LLMs Understand Unvoiced Speech? Exploring EMG-to-Text Conversion with LLMs ACL 2025

Zero-Shot Text-to-Speech for Vietnamese ACL 2025

Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models ACL 2025

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training ACL 2025

MTGA: Multi-View Temporal Granularity Aligned Aggregation for Event-Based Lip-Reading AAAI 2025

Distilling an End-to-End Voice Assistant Without Instruction Training Data ACL 2025

MockConf: A Student Interpretation Dataset: Analysis, Word- and Span-level Alignment and Baselines ACL 2025

Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning ACL 2025

Proactive Hearing Assistants that Isolate Egocentric Conversations EMNLP 2025