Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
End-to-end Image Captioning Exploits Distributional Similarity in Multimodal Space
EMNLP 2018
Teaching Machines to Describe Images with Natural Language Feedback
NIPS 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks
ACL 2017
Learning Cognitive Features from Gaze Data for Sentiment and Sarcasm Classification using Convolutional Neural Network
ACL 2017
Image Pivoting for Learning Multilingual Multimodal Representations
EMNLP 2017
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
CVPR 2017
Connecting Look and Feel: Associating the Visual and Tactile Properties of Physical Materials
CVPR 2017
LSTM Self-Supervision for Detailed Behavior Analysis
CVPR 2017
Tracking by Natural Language Specification
CVPR 2017
The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives
CVPR 2017
Top-Down Visual Saliency Guided by Captions
CVPR 2017
Visual Reference Resolution using Attention Memory for Visual Dialog
NIPS 2017
Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes
CVPR 2016
Unsupervised Learning of Spoken Language with Visual Context
NIPS 2016
“Congruent” and “Opposite” Neurons: Sisters for Multisensory Integration and Segregation
NIPS 2016
SoundNet: Learning Sound Representations from Unlabeled Video
NIPS 2016
Recognizing Micro-Actions and Reactions From Paired Egocentric Videos
CVPR 2016
Yin and Yang: Balancing and Answering Binary Visual Questions
CVPR 2016
Visual7W: Grounded Question Answering in Images
CVPR 2016
Multi-View People Tracking via Hierarchical Trajectory Composition
CVPR 2016
Answer-Type Prediction for Visual Question Answering
CVPR 2016
PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization
ICCV 2015
Sense Discovery via Co-Clustering on Images and Text
CVPR 2015
Class Consistent Multi-Modal Fusion With Binary Features
CVPR 2015
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
NIPS 2014
<
1
…
55
56
57
58
59
>