Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Chasing Ghosts: Instruction Following as Bayesian State Tracking
NIPS 2019
Cross-Modal Relationship Inference for Grounding Referring Expressions
CVPR 2019
Memory Grounded Conversational Reasoning
EMNLP 2019
RUN through the Streets: A New Dataset and Baseline Models for Realistic Urban Navigation
EMNLP 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
EMNLP 2019
Enhancing Context Modeling with a Query-Guided Capsule Network for Document-level Translation
EMNLP 2019
WSLLN:Weakly Supervised Natural Language Localization Networks
EMNLP 2019
Neural Naturalist: Generating Fine-Grained Image Comparisons
EMNLP 2019
Incorporating Visual Semantics into Sentence Representations within a Grounded Space
EMNLP 2019
Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning
EMNLP 2019
Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation
EMNLP 2019
Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning
AAAI 2019
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
AAAI 2019
To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression
AAAI 2019
Hierarchical Photo-Scene Encoder for Album Storytelling
AAAI 2019
Connecting Language to Images: A Progressive Attention-Guided Network for Simultaneous Image Captioning and Language Grounding
AAAI 2019
Few-Shot Image and Sentence Matching via Gated Visual-Semantic Embedding
AAAI 2019
Localizing Natural Language in Videos
AAAI 2019
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
AAAI 2019
Found in Translation: Learning Robust Joint Representations by Cyclic Translations between Modalities
AAAI 2019
LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts
AAAI 2019
GIRNet: Interleaved Multi-Task Recurrent State Sequence Models
AAAI 2019
Structured Two-Stream Attention Network for Video Question Answering
AAAI 2019
Revisiting Spatial-Temporal Similarity: A Deep Learning Framework for Traffic Prediction
AAAI 2019
Content Customization for Micro Learning using Human Augmented AI Techniques
ACL 2019
<
1
…
54
55
56
…
59
>