Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
DeAR: Debiasing Vision-Language Models With Additive Residuals
CVPR 2023
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
CVPR 2023
Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP
AAAI 2022
Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model
CVPR 2022
MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing
NIPS 2022
OmniVL: One Foundation Model for Image-Language and Video-Language Tasks
NIPS 2022
CyCLIP: Cyclic Contrastive Language-Image Pretraining
NIPS 2022
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
NIPS 2022
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
NIPS 2022
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
NIPS 2022
CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
NIPS 2022
Contrastive Language-Image Pre-Training with Knowledge Graphs
NIPS 2022
LAION-5B: An open large-scale dataset for training next generation image-text models
NIPS 2022
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
NIPS 2022
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
NIPS 2022
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
NIPS 2022
Robustness Analysis of Video-Language Models Against Visual and Language Perturbations
NIPS 2022
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining
NIPS 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
NIPS 2022
When does CLIP generalize better than unimodal models? When judging human-centric concepts
ACL 2022
Bridging the Gap between Recognition-level Pre-training and Commonsensical Vision-language Tasks
ACL 2022
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning
ACL 2022
Assessing Multilingual Fairness in Pre-trained Multimodal Representations
ACL 2022
DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training
ACL 2022
Vision-Language Pretraining: Current Trends and the Future
ACL 2022
<
1
…
23
24
25
…
28
>