Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Applications
Natural Language Processing
›
Applications
›
Visual Question Answering
219 directly classified papers
Papers per year
2016: 1
2017: 6
2018: 13
2019: 26
2020: 22
2021: 23
2022: 20
2023: 20
2024: 37
2025: 49
2026: 2
Papers
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
CVPR 2024
Rethinking Two-Stage Referring Expression Comprehension: A Novel Grounding and Segmentation Method Modulated by Point
AAAI 2024
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
CVPR 2024
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
CVPR 2024
Object-Aware Adaptive-Positivity Learning for Audio-Visual Question Answering
AAAI 2024
Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problems
CVPR 2024
A Hitchhiker's Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning
NIPS 2024
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing
ACL 2024
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA
ACL 2024
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
CVPR 2024
Diversify, Rationalize, and Combine: Ensembling Multiple QA Strategies for Zero-shot Knowledge-based VQA
EMNLP 2024
Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective
EMNLP 2024
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
NIPS 2024
Connecting Vision and Language With Video Localized Narratives
CVPR 2023
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images
AAAI 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
AAAI 2023
Referring Expression Comprehension Using Language Adaptive Inference
AAAI 2023
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamic Audio-Visual Scenarios
EMNLP 2023
Variational Causal Inference Network for Explanatory Visual Question Answering
ICCV 2023
GRES: Generalized Referring Expression Segmentation
CVPR 2023
Visual Programming: Compositional Visual Reasoning Without Training
CVPR 2023
Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions
EMNLP 2023
You Can Ground Earlier Than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos
CVPR 2023
How To Practice VQA on a Resource-Limited Target Domain
WACV 2023
Large Language Models are Visual Reasoning Coordinators
NIPS 2023
<
1
2
3
4
5
…
9
>