Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models
NIPS 2024
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
NIPS 2024
DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs
NIPS 2024
Multilingual Diversity Improves Vision-Language Representations
NIPS 2024
CALVIN: Improved Contextual Video Captioning via Instruction Tuning
NIPS 2024
Homology Consistency Constrained Efficient Tuning for Vision-Language Models
NIPS 2024
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
NIPS 2024
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
NIPS 2024
Scaling Language-Image Pre-Training via Masking
CVPR 2023
VindLU: A Recipe for Effective Video-and-Language Pretraining
CVPR 2023
CLIPPING: Distilling CLIP-Based Models With a Student Base for Video-Language Retrieval
CVPR 2023
Local 3D Editing via 3D Distillation of CLIP Knowledge
CVPR 2023
REVEAL: Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory
CVPR 2023
OpenScene: 3D Scene Understanding With Open Vocabularies
CVPR 2023
Test of Time: Instilling Video-Language Models With a Sense of Time
CVPR 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
CVPR 2023
Unifying Vision, Text, and Layout for Universal Document Processing
CVPR 2023
FashionSAP: Symbols and Attributes Prompt for Fine-Grained Fashion Vision-Language Pre-Training
CVPR 2023
Side Adapter Network for Open-Vocabulary Semantic Segmentation
CVPR 2023
Task Residual for Tuning Vision-Language Models
CVPR 2023
Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification
CVPR 2023
All in One: Exploring Unified Video-Language Pre-Training
CVPR 2023
Universal Instance Perception As Object Discovery and Retrieval
CVPR 2023
Turning a CLIP Model Into a Scene Text Detector
CVPR 2023
HierVL: Learning Hierarchical Video-Language Embeddings
CVPR 2023
<
1
…
19
20
21
…
28
>