Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
NAACL 2022
Visual Acoustic Matching
CVPR 2022
FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback
CVPR 2022
PointCLIP: Point Cloud Understanding by CLIP
CVPR 2022
SEEG: Semantic Energized Co-Speech Gesture Generation
CVPR 2022
Vector Quantized Diffusion Model for Text-to-Image Synthesis
CVPR 2022
Coreference by Appearance: Visually Grounded Event Coreference Resolution
EMNLP 2021
Can images help recognize entities? A study of the role of images for Multimodal NER
EMNLP 2021
Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser
EMNLP 2021
MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering
EMNLP 2021
Visually Grounded Concept Composition
EMNLP 2021
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
EMNLP 2021
Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval
EMNLP 2021
Visually Grounded Reasoning across Languages and Cultures
EMNLP 2021
GLAVNet: Global-Local Audio-Visual Cues for Fine-Grained Material Recognition
CVPR 2021
Looking Into Your Speech: Learning Cross-Modal Affinity for Audio-Visual Speech Separation
CVPR 2021
Seeing Out of the Box: End-to-End Pre-Training for Vision-Language Representation Learning
CVPR 2021
TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption
CVPR 2021
Look Before You Speak: Visually Contextualized Utterances
CVPR 2021
Domain-Robust VQA With Diverse Datasets and Methods but No Target Labels
CVPR 2021
Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting
CVPR 2021
Discrete-Continuous Action Space Policy Gradient-Based Attention for Image-Text Matching
CVPR 2021
Separating Skills and Concepts for Novel Visual Question Answering
CVPR 2021
YouRefIt: Embodied Reference Understanding With Language and Gesture
ICCV 2021
Towards Domain Invariant Single Image Dehazing
AAAI 2021
<
1
…
42
43
44
…
51
>