Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Referring Image Segmentation via Recurrent Refinement Networks
CVPR 2018
GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints
CVPR 2018
Visual Grounding via Accumulated Attention
CVPR 2018
Do Neural Network Cross-Modal Mappings Really Bridge Modalities?
ACL 2018
SNAG: Spoken Narratives and Gaze Dataset
ACL 2018
Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph
ACL 2018
Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment
ACL 2018
Speaker-Follower Models for Vision-and-Language Navigation
NIPS 2018
Pushing the Limits of Radiology with Joint Modeling of Visual and Textual Information
ACL 2018
Learning to Localize Sound Source in Visual Scenes
CVPR 2018
Textbook Question Answering Under Instructor Guidance With Memory Networks
CVPR 2018
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
CVPR 2018
Finding Beans in Burgers: Deep Semantic-Visual Embedding With Localization
CVPR 2018
Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
CVPR 2018
Visual to Sound: Generating Natural Sound for Videos in the Wild
CVPR 2018
All-Neural Multi-Channel Speech Enhancement
INTERSPEECH 2018
VizWiz Grand Challenge: Answering Visual Questions From Blind People
CVPR 2018
Multimodal Visual Concept Learning With Weakly Supervised Techniques
CVPR 2018
Two Can Play This Game: Visual Dialog With Discriminative Question Generation and Answering
CVPR 2018
Investigating Audio, Video, and Text Fusion Methods for End-to-End Automatic Personality Prediction
ACL 2018
Multimodal Language Analysis with Recurrent Multistage Fusion
EMNLP 2018
TVQA: Localized, Compositional Video Question Answering
EMNLP 2018
Embedding Multimodal Relational Data for Knowledge Base Completion
EMNLP 2018
Contextual Inter-modal Attention for Multi-modal Sentiment Analysis
EMNLP 2018
Evaluating Textual Representations through Image Generation
EMNLP 2018
<
1
…
55
56
57
58
59
>