Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Speech Processing
181 directly classified papers
Papers per year
2015: 1
2016: 10
2017: 12
2018: 6
2019: 15
2020: 16
2021: 19
2022: 23
2023: 20
2024: 24
2025: 35
Papers
LLM-driven Multimodal and Multi-Identity Listening Head Generation
CVPR 2025
Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs
EMNLP 2025
Towards Reliable Large Audio Language Model
ACL 2025
Context-Aware Lexical Stress Prediction and Phonemization for Ukrainian TTS Systems
ACL 2025
LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
ACL 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
EMNLP 2025
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
ACL 2025
Scaling Under-Resourced TTS: A Data-Optimized Framework with Advanced Acoustic Modeling for Thai
ACL 2025
Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution
ACL 2025
Mind the Gap: Static and Interactive Evaluations of Large Audio Models
ACL 2025
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
ACL 2025
SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language Models
AAAI 2025
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering
AAAI 2025
DNASpeech: A Contextualized and Situated Text-to-Speech Dataset with Dialogues, Narratives and Actions
ACL 2025
OLMD: Orientation-aware Long-term Motion Decoupling for Continuous Sign Language Recognition
AAAI 2025
Phonotomizer: A Compact, Unsupervised, Online Training Approach to Real-Time, Multilingual Phonetic Segmentation
ACL 2025
Can LLMs Understand Unvoiced Speech? Exploring EMG-to-Text Conversion with LLMs
ACL 2025
Zero-Shot Text-to-Speech for Vietnamese
ACL 2025
Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models
ACL 2025
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
ACL 2025
MTGA: Multi-View Temporal Granularity Aligned Aggregation for Event-Based Lip-Reading
AAAI 2025
Distilling an End-to-End Voice Assistant Without Instruction Training Data
ACL 2025
MockConf: A Student Interpretation Dataset: Analysis, Word- and Span-level Alignment and Baselines
ACL 2025
Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning
ACL 2025
Proactive Hearing Assistants that Isolate Egocentric Conversations
EMNLP 2025
<
1
2
3
4
5
…
8
>