2026 EACL EACL 2026

Task-Level Instructions Induction for Audio Question Answering from Few Examples

Abstract

AbstractLarge audio-language models (LALMs) benefit from Chain-of-Thought (CoT) prompting for audio question answering (AQA), but acquiring audio CoT examples is particularly challenging as it requires sequential listening and careful integration of acoustic and linguistic information. Surprisingly, our experiments reveal that standard few-shot prompting yields inconsistent results compared to zero-shot CoT, with several models showing degraded accuracy. Moreover, few-shot prompting incurs substantially higher inference costs by processing multiple audio demonstrations per inference. We propose Audio-Induct, which induces reusable textual task instructions from few audio examples once per task, requiring no additional demonstrations at inference. Evaluated on 9 LALMs across two benchmarks, Audio-Induct outperforms state-of-the-art prompting methods while maintaining low inference costs. Inducted Task Instructions transfer effectively across models, enabling scalable deployment.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio