Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details?
ACL 2025
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
ACL 2025
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
ACL 2025
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
CVPR 2025
Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language Models
ACL 2025
Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
ACL 2025
Aligning VLM Assistants with Personalized Situated Cognition
ACL 2025
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
ACL 2025
Sharper and Faster mean Better: Towards More Efficient Vision-Language Model for Hour-scale Long Video Understanding
ACL 2025
VLM2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues
ACL 2025
Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
ACL 2025
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts
ACL 2025
BQA: Body Language Question Answering Dataset for Video Large Language Models
ACL 2025
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
ACL 2025
Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning?
ACL 2025
Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning
ACL 2025
GlyphPattern: An Abstract Pattern Recognition for Vision-Language Models
ACL 2025
Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning
ACL 2025
VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
ACL 2025
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion
WACV 2025
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
ACL 2025
Rethinking High-speed Image Reconstruction Framework with Spike Camera
AAAI 2025
Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder
ACL 2025
Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection
AAAI 2025
Making LVLMs Look Twice: Contrastive Decoding with Contrast Images
ACL 2025
<
1
2
3
4
5
…
28
>