2025
ACL
ACL 2025
Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning
Abstract
AbstractThis paper presents Instituto de Telecomunicações’s submission to the IWSLT 2025 Shared Task on Instruction Following Speech Processing. We submit results for the Short Track, i.e., speech recognition, translation, and spoken question answering. Our model is a unified speech-to-text model that integrates a pretrained continuous speech encoder and text decoder through a first phase of modality alignment and a second phase of instruction fine-tuning. Crucially, we focus on using small-scale language model backbones (< 2B) and restrict to high-quality, CC-BY data along with synthetic data generation to supplement existing resources.
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning and Natural Language Processing and Speech & Audio
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Application Areas > Efficient Computing
Deep Learning > Architectures > Transformers
Deep Learning > Techniques > Pretraining
Natural Language Processing > Applications > Machine Translation
Natural Language Processing > Applications > Question Answering
Speech & Audio > Recognition > Speech Recognition
Deep Learning > Learning Types > Transfer Learning
Deep Learning > Models > Multi-Modal Learning
Speech & Audio > Recognition > Speech Translation