Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model
CVPR 2020
Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules
CVPR 2020
JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection
CVPR 2020
More Grounded Image Captioning by Distilling Image-Text Matching Model
CVPR 2020
A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis
ACL 2020
Latent Alignment of Procedural Concepts in Multimodal Recipes
ACL 2020
Achieving Common Ground in Multi-modal Dialogue
ACL 2020
ADVISER: A Toolkit for Developing Multi-modal, Multi-domain and Socially-engaged Conversational Agents
ACL 2020
Multilingual Universal Sentence Encoder for Semantic Retrieval
ACL 2020
GAIA: A Fine-grained Multimedia Knowledge Extraction System
ACL 2020
Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting
ACL 2020
Video-Grounded Dialogues with Pretrained Generation Language Models
ACL 2020
Towards Emotion-aided Multi-modal Dialogue Act Classification
ACL 2020
Dynamic Capsule Attention for Visual Question Answering
AAAI 2019
Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images
CVPR 2019
Progressive Attention Memory Network for Movie Story Question Answering
CVPR 2019
Listen to the Image
CVPR 2019
Speech2Face: Learning the Face Behind a Voice
CVPR 2019
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering
CVPR 2019
Cross-Modality Personalization for Retrieval
CVPR 2019
Neural Sequential Phrase Grounding (SeqGROUND)
CVPR 2019
What's to Know? Uncertainty as a Guide to Asking Goal-Oriented Questions
CVPR 2019
Language-Conditioned Graph Networks for Relational Reasoning
ICCV 2019
Image Captioning: Transforming Objects into Words
NIPS 2019
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos
NIPS 2019
<
1
…
53
54
55
…
59
>