Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Applications
Natural Language Processing
›
Applications
›
Visual Question Answering
219 directly classified papers
Papers per year
2016: 1
2017: 6
2018: 13
2019: 26
2020: 22
2021: 23
2022: 20
2023: 20
2024: 37
2025: 49
2026: 2
Papers
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
COLING 2025
TaiwanVQA: A Benchmark for Visual Question Answering for Taiwanese Daily Life
COLING 2025
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types
COLING 2025
OVQA: A Dataset for Visual Question Answering and Multimodal Research in Odia Language
COLING 2025
Visual Question Answering for Peruvian Cuisine in Regional Spanish
AAAI 2025
Express What You See: Can Multimodal LLMs Decode Visual Ciphers with Intuitive Semiosis Comprehension?
ACL 2025
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
CVPR 2025
InsightEdit: Towards Better Instruction Following for Image Editing
CVPR 2025
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
AAAI 2025
Visual Robustness Benchmark for Visual Question Answering (VQA)
WACV 2025
MPDrive: Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving
CVPR 2025
Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference
EMNLP 2025
Where is this coming from? Making groundedness count in the evaluation of Document VQA models
NAACL 2025
MSR2: A Benchmark for Multi-Source Retrieval and Reasoning in Visual Question Answering
NAACL 2025
HalLoc: Token-level Localization of Hallucinations for Vision Language Models
CVPR 2025
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
CVPR 2025
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
CVPR 2025
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
NAACL 2025
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
CVPR 2025
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025
ALLVB: All-in-One Long Video Understanding Benchmark
AAAI 2025
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis
ICCV 2025
Alleviating Textual Reliance in Medical Language-guided Segmentation via Prototype-driven Semantic Approximation
ICCV 2025
Acknowledging Focus Ambiguity in Visual Questions
ICCV 2025
ChartLens: Fine-grained Visual Attribution in Charts
ACL 2025
<
1
2
3
4
5
…
9
>