Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Patch Your Matcher: Correspondence-Aware Image-to-Image Translation Unlocks Cross-Modal Matching via Single-Modality Priors
WACV 2026
Broadcast2Pitch: Game State Reconstruction from Unconstrained Soccer Videos
WACV 2026
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
WACV 2026
Countering Multi-modal Representation Collapse through Rank-targeted Fusion
WACV 2026
MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping
WACV 2026
SuperRivolution: Fine-Scale Rivers from Coarse Temporal Satellite Imagery
WACV 2026
VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction
WACV 2026
DCText: Scheduled Attention Masking for Visual Text Generation via Divide-and-Conquer Strategy
WACV 2026
Beyond Faces: A Multimodal Person Clustering for Unconstrained Environments
WACV 2026
DreamCatcher: Efficient Multi-Concept Customization via Representation Finetuning
WACV 2026
Multimodal Graph Representation Learning over Arbitrary Sets of Modalities
WACV 2026
From Prompt to Production: Automating Brand-Safe Marketing Imagery with Text-to-Image Models
WACV 2026
SOAF: Scene Occlusion-aware Neural Acoustic Field
WACV 2026
Test-Time Consistency in Vision Language Models
WACV 2026
Can We Challenge Open-Vocabulary Object Detectors with Generated Content in Street Scenes?
WACV 2026
RampWatch: An In-the-Wild Dataset and Text-Guided Detection Framework for Recreational Vessels
WACV 2026
MemeTAG: Keyword-Driven Meme Classification through Tag Embedding Reconstruction
WACV 2026
SimForce: Force and Surface Electromyography from Full Body Video with Graph Neural Nets
WACV 2026
SceneProp: Combining Neural Network and Markov Random Field for Scene-Graph Grounding
WACV 2026
Learnable Query-Enhanced Pose Transformation
WACV 2026
Reconstructing Realistic and Relightable Eyes
WACV 2026
UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning
WACV 2026
Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships
WACV 2026
Towards Fine-Grained Adaptation of CLIP via a Self-Trained Alignment Score
WACV 2026
GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction
WACV 2026
<
1
…
4
5
6
…
523
>