AdQuestA: Knowledge-Guided Visual Question Answer Framework for Advertisements

Neha Choudhary; Poonam Goyal; Devashish Siwatch; Atharva Chandak; Harsh Mahajan; Varun Khurana; Yaman Kumar

2025 WACV WACV 2025

AdQuestA: Knowledge-Guided Visual Question Answer Framework for Advertisements

Abstract

In the rapidly evolving landscape of digital marketing effective customer engagement through advertisements is crucial for brands. Thus computational understanding of ads is pivotal for recommendation authoring and customer behaviour simulation. Despite advancements in knowledge-guided visual-question-answering (VQA) models existing frameworks often lack domain-specific responses and suffer from a dearth of benchmark datasets for advertisements. To address this gap we introduce ADVQA the first dataset for ad-related VQA sourced from Facebook and X (twitter) which facilitates further research in ad comprehension. It comprises open-ended questions and detailed context obtained automatically from web articles. Moreover we present AdQuestA a novel multimodal framework for knowledge-guided open-ended question-answering tailored to advertisements. AdQuestA leverages a Retrieval Augmented Generation (RAG) to obtain question-aware ad context as explicit knowledge and image-grounded implicit knowledge effectively exploiting inherent relationships for reasoning. Extensive experiments corroborate its efficacy yielding state-of-the-art performance on the ADVQA dataset even surpassing 10X larger models such as GPT-4 on this task. Our framework not only enhances understanding of ad content but also advances the broader landscape of knowledge-guided VQA models.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Neha Choudhary , Poonam Goyal , Devashish Siwatch , Atharva Chandak , Harsh Mahajan , Varun Khurana , Yaman Kumar

Topics

Artificial Intelligence > Core AI > Multimodal Learning Natural Language Processing > Applications > Question Answering

Keywords

retrieval augmented generation visual question answering multimodal learning

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025