Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
Deep Multimodal Clustering for Unsupervised Audiovisual Learning
CVPR 2019
Image-Question-Answer Synergistic Network for Visual Dialog
CVPR 2019
Answer Them All! Toward Universal Visual Question Answering Models
CVPR 2019
Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding
CVPR 2019
Connective Cognition Network for Directional Visual Commonsense Reasoning
NIPS 2019
Visually Grounded Neural Syntax Acquisition
ACL 2019
A Novel Framework for Robustness Analysis of Visual QA Models
AAAI 2019
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering
CVPR 2019
Speech2Face: Learning the Face Behind a Voice
CVPR 2019
Multi-Scale Visual Semantics Aggregation with Self-Attention for End-to-End Image-Text Matching
ACML 2019
Cross-channel Communication Networks
NIPS 2019
Unsupervised Cross-Spectral Stereo Matching by Learning to Synthesize
AAAI 2019
Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling
NIPS 2019
Categorizing and Inferring the Relationship between the Text and Image of Twitter Posts
ACL 2019
Multi-grained Attention with Object-level Grounding for Visual Question Answering
ACL 2019
Learning to Communicate and Solve Visual Blocks-World Tasks
AAAI 2019
Cross-Modal Self-Attention Network for Referring Image Segmentation
CVPR 2019
Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation
ACL 2019
Connecting Touch and Vision via Cross-Modal Prediction
CVPR 2019
Fact-Checking Meets Fauxtography: Verifying Claims About Images
EMNLP 2019
Dual Attention Networks for Visual Reference Resolution in Visual Dialog
EMNLP 2019
Improving Generative Visual Dialog by Answering Diverse Questions
EMNLP 2019
Phrase Grounding by Soft-Label Chain Conditional Random Field
EMNLP 2019
Multi-Task Learning of Hierarchical Vision-Language Representation
CVPR 2019
A Strong and Robust Baseline for Text-Image Matching
ACL 2019
<
1
…
47
48
49
50
51
>