Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
An Explainable Toolbox for Evaluating Pre-trained Vision-Language Models
EMNLP 2022
Exploring Compositional Image Retrieval with Hybrid Compositional Learning and Heuristic Negative Mining
EMNLP 2022
Focus! Relevant and Sufficient Context Selection for News Image Captioning
EMNLP 2022
DALLE-2 is Seeing Double: Flaws in Word-to-Concept Mapping in Text2Image Models
EMNLP 2022
Cross-modal Transfer Between Vision and Language for Protest Detection
EMNLP 2022
Understanding Cross-modal Interactions in V&L Models that Generate Scene Descriptions
EMNLP 2022
Towards Multimodal Vision-Language Models Generating Non-generic Text
AAAI 2022
Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective
AAAI 2022
Playing Lottery Tickets with Vision and Language
AAAI 2022
Topological Planning With Transformers for Vision-and-Language Navigation
CVPR 2021
Seeing Out of the Box: End-to-End Pre-Training for Vision-Language Representation Learning
CVPR 2021
Kaleido-BERT: Vision-Language Pre-Training on Fashion Domain
CVPR 2021
Writing by Memorizing: Hierarchical Retrieval-based Medical Report Generation
ACL 2021
UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training
CVPR 2021
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
AAAI 2021
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs
AAAI 2021
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
ACL 2021
Text2Mol: Cross-Modal Molecule Retrieval with Natural Language Queries
EMNLP 2021
Improving Pre-trained Vision-and-Language Embeddings for Phrase Grounding
EMNLP 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
EMNLP 2021
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
EMNLP 2021
Visually Grounded Reasoning across Languages and Cultures
EMNLP 2021
Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation
EMNLP 2021
Data Efficient Masked Language Modeling for Vision and Language
EMNLP 2021
MURAL: Multimodal, Multitask Representations Across Languages
EMNLP 2021
<
1
…
24
25
26
27
28
>