Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow
AAAI 2025
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models
WACV 2025
Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP
AAAI 2025
Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision
AAAI 2025
CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction
ACL 2025
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
CVPR 2025
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis
AAAI 2025
Exploring the Better Multimodal Synergy Strategy for Vision-Language Models
AAAI 2025
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
AAAI 2025
Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection
AAAI 2025
Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking
AAAI 2025
Position-Aware Guided Point Cloud Completion with CLIP Model
AAAI 2025
Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning
AAAI 2025
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
CVPR 2025
KPL: Training-Free Medical Knowledge Mining of Vision-Language Models
AAAI 2025
Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning
AAAI 2025
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating
ACL 2025
LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba
AAAI 2025
Explanation Bottleneck Models
AAAI 2025
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
AAAI 2025
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
ACL 2025
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion
WACV 2025
Rethinking High-speed Image Reconstruction Framework with Spike Camera
AAAI 2025
Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection
AAAI 2025
Defining and Evaluating Visual Language Models’ Basic Spatial Abilities: A Perspective from Psychometrics
ACL 2025
<
1
2
3
4
5
…
28
>