Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
ArtEmis: Affective Language for Visual Art
CVPR 2021
Connecting What To Say With Where To Look by Modeling Human Attention Traces
CVPR 2021
Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos
NIPS 2021
Cross-Domain Correspondence Learning for Exemplar-Based Image Translation
CVPR 2020
Crisis-DIAS: Towards Multimodal Damage Analysis - Deployment, Challenges and Assessment
AAAI 2020
An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos
AAAI 2020
Swoosh! Rattle! Thump! - Actions that Sound
RSS 2020
A Visually-grounded First-person Dialogue Dataset with Verbal and Non-verbal Responses
EMNLP 2020
Visual Objects As Context: Exploiting Visual Objects for Lexical Entailment
EMNLP 2020
ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERT
EMNLP 2020
Modality-Balanced Models for Visual Dialogue
AAAI 2020
Image Enhanced Event Detection in News Articles
AAAI 2020
Multi-Question Learning for Visual Question Answering
AAAI 2020
Learning Cross-Modal Context Graph for Visual Grounding
AAAI 2020
Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval
AAAI 2020
Person Tube Retrieval via Language Description
AAAI 2020
Expressing Objects Just Like Words: Recurrent Visual Embedding for Image-Text Matching
AAAI 2020
Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data
AAAI 2020
CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines
AAAI 2020
Image-Chat: Engaging Grounded Conversations
ACL 2020
Multimodal Quality Estimation for Machine Translation
ACL 2020
Fatality Killed the Cat or: BabelPic, a Multimodal Dataset for Non-Concrete Concepts
ACL 2020
What Does BERT with Vision Look At?
ACL 2020
CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning
ACL 2020
Cross-Modality Relevance for Reasoning on Language and Vision
ACL 2020
<
1
…
44
45
46
…
51
>