2026 AAAI AAAI 2026

Native Speech Processing with LLMs

Abstract

Abstract Recent advances in Large Language Models (LLMs) have achieved state-of-the-art performance in Automatic Speech Recognition (ASR), surpassing ASR-only systems such as Whisper. However, their application to other speech processing tasks, particularly speaker diarisation (SD), remains underexplored. This work proposes extending existing speech-aware LLM architectures with diarisation-specific training and context-based prompting to enable joint transcription and segmentation of multi-speaker audio. By exploiting the semantic reasoning and multilingual capabilities of pretrained LLMs, the proposed approach aims to improve diarisation accuracy, enhancing accessibility for assistive technologies and real-time captioning applications that rely on accurate speaker-aware transcriptions.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors