Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer
CVPR 2025
CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
AAAI 2025
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
CVPR 2025
Exploring the Better Multimodal Synergy Strategy for Vision-Language Models
AAAI 2025
Grounded, or a Good Guesser? A Per-Question Balanced Dataset to Separate Blind from Grounded Models for Embodied Question Answering
ACL 2025
BiMAC: Bidirectional Multimodal Alignment in Contrastive Learning
AAAI 2025
Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data
ACL 2025
A-VL: Adaptive Attention for Large Vision-Language Models
AAAI 2025
CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models
CVPR 2025
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating
ACL 2025
Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
ACL 2025
Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language Models
ACL 2025
Sharper and Faster mean Better: Towards More Efficient Vision-Language Model for Hour-scale Long Video Understanding
ACL 2025
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
ACL 2025
Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
ACL 2025
VLM2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues
ACL 2025
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
ACL 2025
SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification
ACL 2025
Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning
ACL 2025
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts
ACL 2025
Aligning VLM Assistants with Personalized Situated Cognition
ACL 2025
CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction
ACL 2025
Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder
ACL 2025
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
ACL 2025
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
CVPR 2025
<
1
…
5
6
7
…
28
>