Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
MoLE:Decoding by Mixture of Layer Experts Alleviates Hallucination in Large Vision-Language Models
AAAI 2025
A-VL: Adaptive Attention for Large Vision-Language Models
AAAI 2025
CLIP-MSM: A Multi-Semantic Mapping Brain Representation for Human High-Level Visual Cortex
AAAI 2025
Enhance Vision-Language Alignment with Noise
AAAI 2025
CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
AAAI 2025
BiMAC: Bidirectional Multimodal Alignment in Contrastive Learning
AAAI 2025
Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection
AAAI 2025
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis
AAAI 2025
Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP
AAAI 2025
Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning
AAAI 2025
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
AAAI 2025
Explanation Bottleneck Models
AAAI 2025
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models
WACV 2025
Exploring the Better Multimodal Synergy Strategy for Vision-Language Models
AAAI 2025
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
CVPR 2025
Rethinking High-speed Image Reconstruction Framework with Spike Camera
AAAI 2025
Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection
AAAI 2025
LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba
AAAI 2025
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
CVPR 2025
Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision
AAAI 2025
Position-Aware Guided Point Cloud Completion with CLIP Model
AAAI 2025
Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking
AAAI 2025
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion
WACV 2025
KPL: Training-Free Medical Knowledge Mining of Vision-Language Models
AAAI 2025
Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow
AAAI 2025
<
1
2
3
4
5
…
28
>