Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba
AAAI 2025
Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details?
ACL 2025
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis
AAAI 2025
FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
ACL 2025
CLIP-MSM: A Multi-Semantic Mapping Brain Representation for Human High-Level Visual Cortex
AAAI 2025
METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling
ACL 2025
Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision
AAAI 2025
CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
ACL 2025
Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP
AAAI 2025
A Parameter-Efficient and Fine-Grained Prompt Learning for Vision-Language Models
ACL 2025
Position-Aware Guided Point Cloud Completion with CLIP Model
AAAI 2025
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models
WACV 2025
Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking
AAAI 2025
CLAIM: Mitigating Multilingual Object Hallucination in Large Vision-Language Models with Cross-Lingual Attention Intervention
ACL 2025
Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning
AAAI 2025
EcoDoc: A Cost-Efficient Multimodal Document Processing System for Enterprises Using LLMs
ACL 2025
Enhance Vision-Language Alignment with Noise
AAAI 2025
Evaluating Vision-Language Models as Evaluators in Path Planning
CVPR 2025
MoLE:Decoding by Mixture of Layer Experts Alleviates Hallucination in Large Vision-Language Models
AAAI 2025
Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language Models
ACL 2025
KPL: Training-Free Medical Knowledge Mining of Vision-Language Models
AAAI 2025
Towards Understanding How Knowledge Evolves in Large Vision-Language Models
CVPR 2025
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
AAAI 2025
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
CVPR 2025
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
CVPR 2025
<
1
…
4
5
6
…
28
>