Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Analysis
Computer Vision
›
Analysis
›
Visual Question Answering
70 directly classified papers
Papers per year
2015: 1
2016: 2
2017: 1
2018: 1
2019: 9
2020: 11
2021: 9
2022: 7
2023: 5
2024: 12
2025: 12
Papers
Few-shot Personalized Scanpath Prediction
CVPR 2025
Target Scanpath-Guided 360-Degree Image Enhancement
AAAI 2025
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
ACL 2025
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
ACL 2025
Momentum Pseudo-Labeling for Weakly Supervised Phrase Grounding
AAAI 2025
EyEar: Learning Audio Synchronized Human Gaze Trajectory Based on Physics-Informed Dynamics
AAAI 2025
Analyzing the Sensitivity of Vision Language Models in Visual Question Answering
ACL 2025
NLKI: A Lightweight Natural Language Knowledge Integration Framework for Improving Small VLMs in Commonsense VQA Tasks
EMNLP 2025
ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs’ Capability via Chart Editing
ACL 2025
DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
EMNLP 2025
Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint
EMNLP 2025
Multi-Granular Multimodal Clue Fusion for Meme Understanding
AAAI 2025
Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA
AAAI 2024
Plot Twist: Multimodal Models Don’t Comprehend Simple Chart Details
EMNLP 2024
Exploiting the Social-Like Prior in Transformer for Visual Reasoning
AAAI 2024
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models
EMNLP 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
NIPS 2024
UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models
EMNLP 2024
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
EMNLP 2024
Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
NIPS 2024
Towards Artwork Explanation in Large-scale Vision Language Models
ACL 2024
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding
CVPR 2024
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
EMNLP 2024
ReMI: A Dataset for Reasoning with Multiple Images
NIPS 2024
ECHo: A Visio-Linguistic Dataset for Event Causality Inference via Human-Centric Reasoning
EMNLP 2023
<
1
2
3
>