Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Applications
Computer Vision
›
Applications
›
Question Answering
81 directly classified papers
Papers per year
2015: 1
2016: 2
2017: 3
2018: 2
2019: 12
2020: 5
2021: 6
2022: 11
2023: 12
2024: 16
2025: 11
Papers
Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering
ACL 2025
Leveraging Large Models to Evaluate Novel Content: A Case Study on Advertisement Creativity
EMNLP 2025
Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution
CVPR 2025
Learning Sparsity for Effective and Efficient Music Performance Question Answering
ACL 2025
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
ACL 2025
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos
ACL 2025
When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
EMNLP 2025
Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language Models
EMNLP 2025
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling
ACL 2025
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts
ACL 2025
ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific Charts
EMNLP 2025
Transformer-Based No-Reference Image Quality Assessment via Supervised Contrastive Learning
AAAI 2024
Object-Aware Adaptive-Positivity Learning for Audio-Visual Question Answering
AAAI 2024
Cross-Modal Feature Distribution Calibration for Few-Shot Visual Question Answering
AAAI 2024
MISTI: Metadata-Informed Scientific Text and Image Representation through Contrastive Learning
ACL 2024
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
CVPR 2024
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture
EMNLP 2024
Synthesize Step-by-Step: Tools Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
CVPR 2024
ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning
ACL 2024
***YesBut***: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models
EMNLP 2024
Learning Musical Representations for Music Performance Question Answering
EMNLP 2024
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
CVPR 2024
GRAM: Global Reasoning for Multi-Page VQA
CVPR 2024
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation
ACL 2024
DeVAn: Dense Video Annotation for Video-Language Models
ACL 2024
<
1
2
3
4
>