2025 WACV WACV 2025

AdQuestA: Knowledge-Guided Visual Question Answer Framework for Advertisements

Abstract

In the rapidly evolving landscape of digital marketing effective customer engagement through advertisements is crucial for brands. Thus computational understanding of ads is pivotal for recommendation authoring and customer behaviour simulation. Despite advancements in knowledge-guided visual-question-answering (VQA) models existing frameworks often lack domain-specific responses and suffer from a dearth of benchmark datasets for advertisements. To address this gap we introduce ADVQA the first dataset for ad-related VQA sourced from Facebook and X (twitter) which facilitates further research in ad comprehension. It comprises open-ended questions and detailed context obtained automatically from web articles. Moreover we present AdQuestA a novel multimodal framework for knowledge-guided open-ended question-answering tailored to advertisements. AdQuestA leverages a Retrieval Augmented Generation (RAG) to obtain question-aware ad context as explicit knowledge and image-grounded implicit knowledge effectively exploiting inherent relationships for reasoning. Extensive experiments corroborate its efficacy yielding state-of-the-art performance on the ADVQA dataset even surpassing 10X larger models such as GPT-4 on this task. Our framework not only enhances understanding of ad content but also advances the broader landscape of knowledge-guided VQA models.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio