← Models

Deep Learning › Models ›

Vision-Language Models

685 directly classified papers

Papers per year

Papers

MoLE:Decoding by Mixture of Layer Experts Alleviates Hallucination in Large Vision-Language Models AAAI 2025

A-VL: Adaptive Attention for Large Vision-Language Models AAAI 2025

CLIP-MSM: A Multi-Semantic Mapping Brain Representation for Human High-Level Visual Cortex AAAI 2025

Enhance Vision-Language Alignment with Noise AAAI 2025

CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination AAAI 2025

BiMAC: Bidirectional Multimodal Alignment in Contrastive Learning AAAI 2025

Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection AAAI 2025

IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis AAAI 2025

Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP AAAI 2025

Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning AAAI 2025

PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures AAAI 2025

Explanation Bottleneck Models AAAI 2025

DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models WACV 2025

Exploring the Better Multimodal Synergy Strategy for Vision-Language Models AAAI 2025

Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation CVPR 2025

Rethinking High-speed Image Reconstruction Framework with Spike Camera AAAI 2025

Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection AAAI 2025

LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba AAAI 2025

Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents CVPR 2025

Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision AAAI 2025

Position-Aware Guided Point Cloud Completion with CLIP Model AAAI 2025

Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking AAAI 2025

Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion WACV 2025

KPL: Training-Free Medical Knowledge Mining of Vision-Language Models AAAI 2025

Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow AAAI 2025