2025
EMNLP
EMNLP 2025
Spoken Conversational Agents with Large Language Models
Abstract
AbstractSpoken conversational agents are converging toward voice-native LLMs. This tutorial distills the path from cascaded ASR/NLU to end-to-end, retrieval-and vision-grounded systems. We frame adaptation of text LLMs to audio, cross-modal alignment, and joint speech–text training; review datasets, metrics, and robustness across accents; and compare design choices (cascaded vs. E2E, post-ASR correction, streaming). We link industrial assistants to current open-domain and task-oriented agents, highlight reproducible baselines, and outline open problems in privacy, safety, and evaluation. Attendees leave with practical recipes and a clear systems-level roadmap.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Natural Language Processing and Speech & Audio
🧭
Keyword Pioneer
— spoken conversational agent
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Multimodal Learning
Natural Language Processing > Generation > Dialogue Systems
Speech & Audio > Recognition > Automatic Speech Recognition
Artificial Intelligence > Core AI > Large Language Models
Natural Language Processing > Applications > Dialogue Systems
Deep Learning > Learning Types > Multimodal Learning
Artificial Intelligence > Core AI > Speech Processing
Artificial Intelligence > Core AI > Dialogue Systems