Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Interview: Large-scale Modeling of Media Dialog with Discourse Patterns and Knowledge Grounding
EMNLP 2020
VolTAGE: Volatility Forecasting via Text Audio Fusion with Graph Convolution Networks for Earnings Calls
EMNLP 2020
Entity Linking in 100 Languages
EMNLP 2020
Relation-aware Graph Attention Networks with Relational Position Encodings for Emotion Recognition in Conversations
EMNLP 2020
Response Selection for Multi-Party Conversations with Dynamic Topic Tracking
EMNLP 2020
Unsupervised Natural Language Inference via Decoupled Multimodal Contrastive Learning
EMNLP 2020
Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements
EMNLP 2020
ISAAQ - Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention
EMNLP 2020
Visually Grounded Compound PCFGs
EMNLP 2020
Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts
EMNLP 2020
Sub-Instruction Aware Vision-and-Language Navigation
EMNLP 2020
VD-BERT: A Unified Vision and Dialog Transformer with BERT
EMNLP 2020
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings
EMNLP 2020
A Visually-grounded First-person Dialogue Dataset with Verbal and Non-verbal Responses
EMNLP 2020
STL-CQA: Structure-based Transformers with Localization and Encoding for Chart Question Answering
EMNLP 2020
Domain-Specific Lexical Grounding in Noisy Visual-Textual Documents
EMNLP 2020
Visually Grounded Continual Learning of Compositional Phrases
EMNLP 2020
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering
EMNLP 2020
Modality Shifting Attention Network for Multi-Modal Video Question Answering
CVPR 2020
Attention-over-Attention Field-Aware Factorization Machine
AAAI 2020
Glyph2Vec: Learning Chinese Out-of-Vocabulary Word Embedding from Glyphs
ACL 2020
12-in-1: Multi-Task Vision and Language Representation Learning
CVPR 2020
Hierarchical Human Parsing With Typed Part-Relation Reasoning
CVPR 2020
Don’t Use English Dev: On the Zero-Shot Cross-Lingual Evaluation of Contextual Embeddings
EMNLP 2020
Wish You Were Here: Context-Aware Human Generation
CVPR 2020
<
1
…
52
53
54
…
59
>