Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Multi-Modal Learning
115 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 1
2018: 1
2019: 1
2020: 3
2021: 3
2022: 7
2023: 5
2024: 35
2025: 57
Papers
MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding
EMNLP 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
NIPS 2022
ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model
NIPS 2022
Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource Parallel Data
AAAI 2022
UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training
CVPR 2021
Data-QuestEval: A Referenceless Metric for Data-to-Text Semantic Evaluation
EMNLP 2021
Multi-stage Pre-training over Simplified Multimodal Pre-training Models
ACL 2021
Fatality Killed the Cat or: BabelPic, a Multimodal Dataset for Non-Concrete Concepts
ACL 2020
Cross-media Structured Common Space for Multimedia Event Extraction
ACL 2020
On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation
ACL 2020
Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension
ACL 2019
A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images
EMNLP 2018
Beyond Instance-Level Image Retrieval: Leveraging Captions to Learn a Global Visual Representation for Semantic Retrieval
CVPR 2017
Multimodal Residual Learning for Visual QA
NIPS 2016
Deep Correlation for Matching Images and Text
CVPR 2015
<
1
2
3
4
5
>