Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models
NIPS 2024
Vision-Language Models are Strong Noisy Label Detectors
NIPS 2024
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
NIPS 2024
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
NIPS 2024
BendVLM: Test-Time Debiasing of Vision-Language Embeddings
NIPS 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
NIPS 2024
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge
NIPS 2024
GraphVis: Boosting LLMs with Visual Knowledge Graph Integration
NIPS 2024
BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping
NIPS 2024
SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors
NIPS 2024
SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation
NIPS 2024
Relationship Prompt Learning is Enough for Open-Vocabulary Semantic Segmentation
NIPS 2024
What to Say and When to Say it: Live Fitness Coaching as a Testbed for Situated Interaction
NIPS 2024
VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought
NIPS 2024
GuardT2I: Defending Text-to-Image Models from Adversarial Prompts
NIPS 2024
DevBench: A multimodal developmental benchmark for language learning
NIPS 2024
One-to-Normal: Anomaly Personalization for Few-shot Anomaly Detection
NIPS 2024
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
NIPS 2024
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
NIPS 2024
SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM
NIPS 2024
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling
NIPS 2024
Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models
NIPS 2024
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)
NIPS 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
NIPS 2024
What matters when building vision-language models?
NIPS 2024
<
1
…
18
19
20
…
28
>