Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Feature Design for Bridging SAM and CLIP toward Referring Image Segmentation
WACV 2025
Temporally Streaming Audio-Visual Synchronization for Real-World Videos
WACV 2025
POS-Aware Neural Approaches for Word Alignment in Dravidian Languages
COLING 2025
Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks
RSS 2025
If I feel smart, I will do the right thing: Combining Complementary Multimodal Information in Visual Language Models
COLING 2025
Electromyography-Informed Facial Expression Reconstruction for Physiological-Based Synthesis and Analysis
CVPR 2025
Click&Describe: Multimodal Grounding and Tracking for Aerial Objects
WACV 2025
Multi-Resolution Guided 3D GANs for Medical Image Translation
WACV 2025
VideoGameBunny: Towards Vision Assistants for Video Games
WACV 2025
CIOL at SemEval-2025 Task 11: Multilingual Pre-trained Model Fusion for Text-based Emotion Recognition
SEMEVAL 2025
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-Grained Video-Language Learning
WACV 2025
Zhoumou at SemEval-2025 Task 1: Leveraging Multimodal Data Augmentation and Large Language Models for Enhanced Idiom Understanding
SEMEVAL 2025
HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning
WACV 2025
Event-Guided Fusion-Mamba for Context-Aware 3D Human Pose Estimation
WACV 2025
SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering
WACV 2025
PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction
WACV 2025
3D Part Segmentation via Geometric Aggregation of 2D Visual Features
WACV 2025
CTYUN-AI at SemEval-2025 Task 1: Learning to Rank for Idiomatic Expressions
SEMEVAL 2025
Deduce and Select Evidences with Language Models for Training-Free Video Goal Inference
WACV 2025
AIDE: Improving 3D Open-Vocabulary Semantic Segmentation by Aligned Vision-Language Learning
WACV 2025
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
CVPR 2025
Latency Robust Cooperative Perception using Asynchronous Feature Fusion
WACV 2025
VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos
WACV 2025
Combining Inherent Knowledge of Vision-Language Models with Unsupervised Domain Adaptation through Strong-Weak Guidance
WACV 2025
VCRMNER: Visual Cue Refinement in Multimodal NER using CLIP Prompts
COLING 2025
<
1
2
3
4
5
…
128
>