Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Encoder Fusion Network With Co-Attention Embedding for Referring Image Segmentation
CVPR 2021
There Is More Than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking With Sound by Distilling Multimodal Knowledge
CVPR 2021
ManipulaTHOR: A Framework for Visual Object Manipulation
CVPR 2021
UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training
CVPR 2021
Scene-Intuitive Agent for Remote Embodied Visual Grounding
CVPR 2021
Human-Like Controllable Image Captioning With Verb-Specific Semantic Roles
CVPR 2021
Improving Sign Language Translation With Monolingual Data by Sign Back-Translation
CVPR 2021
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
CVPR 2021
TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking
CVPR 2021
CoSMo: Content-Style Modulation for Image Retrieval With Text Feedback
CVPR 2021
Fingerspelling Detection in American Sign Language
CVPR 2021
Audio-Visual Instance Discrimination with Cross-Modal Agreement
CVPR 2021
Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression
CVPR 2021
Deep Multi-Task Learning for Joint Localization, Perception, and Prediction
CVPR 2021
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation
CVPR 2021
Refer-It-in-RGBD: A Bottom-Up Approach for 3D Visual Grounding in RGBD Images
CVPR 2021
iMiGUE: An Identity-Free Video Dataset for Micro-Gesture Understanding and Emotion Analysis
CVPR 2021
Towards More Flexible and Accurate Object Tracking With Natural Language: Algorithms and Benchmark
CVPR 2021
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning
CVPR 2021
Look Before You Speak: Visually Contextualized Utterances
CVPR 2021
VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency
CVPR 2021
Bidirectional Projection Network for Cross Dimension Scene Understanding
CVPR 2021
Neural Feature Search for RGB-Infrared Person Re-Identification
CVPR 2021
Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset
CVPR 2021
CNN-Based Processing of Acoustic and Radio Frequency Signals for Speaker Localization from MAVs
INTERSPEECH 2021
<
1
…
47
48
49
…
59
>