2025
ACL
ACL 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Abstract
AbstractExisting end-to-end speech large language models (LLMs) usually rely on large-scale annotated data for training, while data-efficient training has not been discussed in depth. We focus on two fundamental problems between speech and text: the representation space gap and sequence length inconsistency. We propose Soundwave, which utilizes an efficient training strategy and a novel architecture to address these issues. Results show that Soundwave outperforms other advanced speech LLMs in speech translation and AIR-Bench speech tasks with only a fraction of the training data. Further analysis shows that Soundwave still retains its intelligence during conversation.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— speech large language model
🐝
Cross-Pollinator
— Artificial Intelligence, Deep Learning, Machine Learning, Natural Language Processing, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Multimodal Learning
Machine Learning > Application Areas > Efficient Computing
Natural Language Processing > Applications > Machine Translation
Speech & Audio > Recognition > Automatic Speech Recognition
Speech & Audio > Recognition > Speech Recognition
Artificial Intelligence > Core AI > Large Language Models
Natural Language Processing > Generation > Machine Translation
Deep Learning > Learning Types > Transfer Learning