Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Generation
Computer Vision
›
Generation
›
Visual Question Answering
106 directly classified papers
Papers per year
2015: 1
2016: 7
2017: 3
2018: 11
2019: 7
2020: 20
2021: 11
2022: 11
2023: 11
2024: 9
2025: 15
Papers
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering
CVPR 2025
SimpleDoc: Multi‐Modal Document Understanding with Dual‐Cue Page Retrieval and Iterative Refinement
EMNLP 2025
ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question Answering
EMNLP 2025
What You See is What You Ask: Evaluating Audio Descriptions
EMNLP 2025
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
AAAI 2025
Explicitly Guided Difficulty-Controllable Visual Question Generation
AAAI 2025
Deploying Tiny LVLM Judges for Real-World Evaluation of Chart Models: Lessons Learned and Best Practices
EMNLP 2025
CPIQA: Climate Paper Image Question Answering Dataset for Retrieval-Augmented Generation with Context-based Query Expansion
ACL 2025
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
CVPR 2025
MovieCORE: COgnitive REasoning in Movies
EMNLP 2025
Generating Spatial Knowledge Graphs from Automotive Diagrams for Question Answering
EMNLP 2025
Can Multimodal Large Language Models Understand Spatial Relations?
ACL 2025
V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me
ACL 2025
VQAGuider: Guiding Multimodal Large Language Models to Answer Complex Video Questions
ACL 2025
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
AAAI 2025
ECIS-VQG: Generation of Entity-centric Information-seeking Questions from Videos
EMNLP 2024
Multi-Level Information Retrieval Augmented Generation for Knowledge-based Visual Question Answering
EMNLP 2024
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
NIPS 2024
Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problems
CVPR 2024
ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering
EMNLP 2024
The Illusion of Competence: Evaluating the Effect of Explanations on Users’ Mental Models of Visual Question Answering Systems
EMNLP 2024
VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models
AAAI 2024
JDocQA: Japanese Document Question Answering Dataset for Generative Language Models
COLING 2024
Exploring Question Guidance and Answer Calibration for Visually Grounded Video Question Answering
EMNLP 2024
MPMQA: Multimodal Question Answering on Product Manuals
AAAI 2023
<
1
2
3
4
5
>