Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Semi-Supervised Learning for Video Captioning
EMNLP 2020
Learning Interactions and Relationships Between Movie Characters
CVPR 2020
Two Causal Principles for Improving Visual Dialog
CVPR 2020
Vision-Dialog Navigation by Exploring Cross-Modal Memory
CVPR 2020
Telling Left From Right: Learning Spatial Correspondence of Sight and Sound
CVPR 2020
Few-Shot Video Classification via Temporal Alignment
CVPR 2020
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
CVPR 2020
Iterative Context-Aware Graph Inference for Visual Dialog
CVPR 2020
Visual-Semantic Matching by Exploring High-Order Attention and Distraction
CVPR 2020
Hierarchical Conditional Relation Networks for Video Question Answering
CVPR 2020
Multi-Modality Cross Attention Network for Image and Sentence Matching
CVPR 2020
xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation
CVPR 2020
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
CVPR 2020
GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning
CVPR 2020
Language and Visual Entity Relationship Graph for Agent Navigation
NIPS 2020
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
NIPS 2020
SIRI: Spatial Relation Induced Network For Spatial Description Resolution
NIPS 2020
DAVD-Net: Deep Audio-Aided Video Decompression of Talking Heads
CVPR 2020
PnPNet: End-to-End Perception and Prediction With Tracking in the Loop
CVPR 2020
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training
CVPR 2020
Active Speakers in Context
CVPR 2020
HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data
EMNLP 2020
Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State Tracking
EMNLP 2020
Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation
EMNLP 2020
Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation
EMNLP 2020
<
1
…
51
52
53
…
59
>