Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Reasoning with Heterogeneous Graph Alignment for Video Question Answering
AAAI 2020
Person Tube Retrieval via Language Description
AAAI 2020
Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation
CVPR 2020
Video Object Grounding Using Semantic Roles in Language Description
CVPR 2020
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
CVPR 2020
Multimodal Categorization of Crisis Events in Social Media
CVPR 2020
Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension
CVPR 2020
Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs
CVPR 2020
CoverNet: Multimodal Behavior Prediction Using Trajectory Sets
CVPR 2020
Retouchdown: Releasing Touchdown on StreetLearn as a Public Resource for Language Grounding Tasks in Street View
EMNLP 2020
They Are Not All Alike: Answering Different Spatial Questions Requires Different Grounding Strategies
EMNLP 2020
CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes
EMNLP 2020
Diverse and Relevant Visual Storytelling with Scene Graph Embeddings
EMNLP 2020
Catplayinginthesnow: Impact of Prior Segmentation on a Model of Visually Grounded Speech
EMNLP 2020
Bridging Information-Seeking Human Gaze and Machine Reading Comprehension
EMNLP 2020
Language-Conditioned Feature Pyramids for Visual Selection Tasks
EMNLP 2020
Beyond Language: Learning Commonsense from Images for Reasoning
EMNLP 2020
Gold Seeker: Information Gain From Policy Distributions for Goal-Oriented Vision-and-Langauge Reasoning
CVPR 2020
Sketch Less for More: On-the-Fly Fine-Grained Sketch-Based Image Retrieval
CVPR 2020
MultiDM-GCN: Aspect-guided Response Generation in Multi-domain Multi-modal Dialogue System using Graph Convolutional Network
EMNLP 2020
Modeling Intra and Inter-modality Incongruity for Multi-Modal Sarcasm Detection
EMNLP 2020
Counterfactual Samples Synthesizing for Robust Visual Question Answering
CVPR 2020
Intelligent Home 3D: Automatic 3D-House Design From Linguistic Descriptions Only
CVPR 2020
RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis From Prior Knowledge
CVPR 2020
Multiˆ2OIE: Multilingual Open Information Extraction Based on Multi-Head Attention with BERT
EMNLP 2020
<
1
…
50
51
52
…
59
>