Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multimodal Learning
323 directly classified papers
Papers per year
2014: 1
2015: 1
2017: 8
2018: 11
2019: 11
2020: 27
2021: 23
2022: 46
2023: 35
2024: 53
2025: 104
2026: 3
Papers
ESCNet: Gaze Target Detection With the Understanding of 3D Scenes
CVPR 2022
Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs With Language Structures via Dependency Relationships
CVPR 2022
VALHALLA: Visual Hallucination for Machine Translation
CVPR 2022
Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality
EMNLP 2022
LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation
EMNLP 2022
Curriculum Learning Meets Weakly Supervised Multimodal Correlation Learning
EMNLP 2022
Concadia: Towards Image-Based Text Generation with a Purpose
EMNLP 2022
Contrastive Learning with Expectation-Maximization for Weakly Supervised Phrase Grounding
EMNLP 2022
MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences
EMNLP 2022
DocFin: Multimodal Financial Prediction and Bias Mitigation using Semi-structured Documents
EMNLP 2022
DALLE-2 is Seeing Double: Flaws in Word-to-Concept Mapping in Text2Image Models
EMNLP 2022
Hate-CLIPper: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP Features
EMNLP 2022
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
NIPS 2022
Understanding Aesthetics with Language: A Photo Critique Dataset for Aesthetic Assessment
NIPS 2022
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
NIPS 2022
DUCS at SemEval-2022 Task 6: Exploring Emojis and Sentiments for Sarcasm Detection
SEMEVAL 2022
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-Training
CVPR 2021
TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption
CVPR 2021
Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset
CVPR 2021
Towards Visual Question Answering on Pathology Images
ACL 2021
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations
EMNLP 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
CVPR 2021
Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution
AAAI 2021
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
AAAI 2021
VisualMRC: Machine Reading Comprehension on Document Images
AAAI 2021
<
1
…
9
10
11
12
13
>