Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
Does Vision-and-Language Pretraining Improve Lexical Grounding?
EMNLP 2021
NICE: Neural Image Commenting with Empathy
EMNLP 2021
Investigating Negation in Pre-trained Vision-and-language Models
EMNLP 2021
VisualSem: a high-quality knowledge graph for vision and language
EMNLP 2021
Consensus Graph Representation Learning for Better Grounded Image Captioning
AAAI 2021
Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network
AAAI 2021
Recognizing Multimodal Entailment
ACL 2021
Visually Grounded Follow-up Questions: a Dataset of Spatial Questions Which Require Dialogue History
ACL 2021
MultiMET: A Multimodal Dataset for Metaphor Understanding
ACL 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
CVPR 2021
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
EMNLP 2020
The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents
ACL 2020
AI Sensing for Robotics using Deep Learning based Visual and Language Modeling
ACL 2020
They Are Not All Alike: Answering Different Spatial Questions Requires Different Grounding Strategies
EMNLP 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
NIPS 2020
MULE: Multimodal Universal Language Embedding
AAAI 2020
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training
AAAI 2020
Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning
CVPR 2020
Learning User Representations for Open Vocabulary Image Hashtag Prediction
CVPR 2020
12-in-1: Multi-Task Vision and Language Representation Learning
CVPR 2020
Graph Structured Network for Image-Text Matching
CVPR 2020
Image Search With Text Feedback by Visiolinguistic Attention Learning
CVPR 2020
Faithful Multimodal Explanation for Visual Question Answering
ACL 2019
Auto-Encoding Scene Graphs for Image Captioning
CVPR 2019
Multi-Task Learning of Hierarchical Vision-Language Representation
CVPR 2019
<
1
…
24
25
26
27
28
>