Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Applications
Computer Vision
›
Applications
›
Visual Question Answering
107 directly classified papers
Papers per year
2016: 2
2017: 5
2018: 8
2019: 12
2020: 14
2021: 7
2022: 5
2023: 12
2024: 20
2025: 22
Papers
Selectively Answering Visual Questions
ACL 2024
Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness
EMNLP 2024
Question-Instructed Visual Descriptions for Zero-Shot Video Answering
ACL 2024
II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering
ACL 2024
CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
NIPS 2024
DIEM: Decomposition-Integration Enhancing Multimodal Insights
CVPR 2024
A Robust Dual-debiasing VQA Model based on Counterfactual Causal Effect
EMNLP 2024
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
EMNLP 2024
Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition
EMNLP 2024
CommVQA: Situating Visual Question Answering in Communicative Contexts
EMNLP 2024
Large Language Models Know What is Key Visual Entity: An LLM-assisted Multimodal Retrieval for VQA
EMNLP 2024
ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments
EMNLP 2024
Object Attribute Matters in Visual Question Answering
AAAI 2024
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis
EMNLP 2024
BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining
AAAI 2024
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
AAAI 2024
SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
EMNLP 2024
DePlot: One-shot visual language reasoning by plot-to-table translation
ACL 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
AAAI 2023
Compressing and Debiasing Vision-Language Pre-Trained Models for Visual Question Answering
EMNLP 2023
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task
AAAI 2023
Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in VQA
ACL 2023
Analyzing Modular Approaches for Visual Question Decomposition
EMNLP 2023
LAVIS: A One-stop Library for Language-Vision Intelligence
ACL 2023
Unifying Text, Tables, and Images for Multimodal Question Answering
EMNLP 2023
<
1
2
3
4
5
>