Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis
AAAI 2025
CLIP-MSM: A Multi-Semantic Mapping Brain Representation for Human High-Level Visual Cortex
AAAI 2025
Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision
AAAI 2025
Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP
AAAI 2025
Position-Aware Guided Point Cloud Completion with CLIP Model
AAAI 2025
Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking
AAAI 2025
Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning
AAAI 2025
Enhance Vision-Language Alignment with Noise
AAAI 2025
MoLE:Decoding by Mixture of Layer Experts Alleviates Hallucination in Large Vision-Language Models
AAAI 2025
KPL: Training-Free Medical Knowledge Mining of Vision-Language Models
AAAI 2025
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
AAAI 2025
Explanation Bottleneck Models
AAAI 2025
CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
AAAI 2025
Exploring the Better Multimodal Synergy Strategy for Vision-Language Models
AAAI 2025
BiMAC: Bidirectional Multimodal Alignment in Contrastive Learning
AAAI 2025
A-VL: Adaptive Attention for Large Vision-Language Models
AAAI 2025
Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow
AAAI 2025
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
AAAI 2025
Multi-View Empowered Structural Graph Wordification for Language Models
AAAI 2025
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
AAAI 2025
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
AAAI 2025
Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning
AAAI 2025
Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering
AAAI 2025
MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models
AAAI 2025
BQA: Body Language Question Answering Dataset for Video Large Language Models
ACL 2025
<
1
2
3
4
5
…
28
>