Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Learning Compact Video Representations for Efficient Long-form Video Understanding in Large Multimodal Models
WACV 2026
SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination
WACV 2026
Conversational Image Generation: Towards Multi-Round Personalized Generation with Multi-Modal Language Models
WACV 2026
Improvise, Adapt, Overcome -- Telescopic Adapters for Efficient Fine-tuning of Vision Language Models in Medical Imaging
WACV 2026
Improving Language Identification for Code-Switched Speech: The Pivotal Role of Accented English
EACL 2026
Direct Visual Grounding by Directing Attention of Visual Tokens
WACV 2026
VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics
WACV 2026
Exploring Automated Recognition of Instructional Activity and Discourse from Multimodal Classroom Data
WACV 2026
Ordinal-Aware Multimodal Engagement Recognition for Collaborative Learning
WACV 2026
Streaming Real-Time Trajectory Prediction Using Endpoint-Aware Modeling
WACV 2026
Hybrid State Representation for Video Procedure Planning
WACV 2026
Feature-Disentangling RGB-NIR Fusion Network for Remote Driver Physiological Measurement
WACV 2026
VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework
WACV 2026
Delta-LLaVA: Base-then-Specialize Alignment for Token-Efficient Vision-Language Models
WACV 2026
OpenLVLM-MIA: A Controlled Benchmark Revealing the Limits of Membership Inference Attacks on Large Vision-Language Models
WACV 2026
A-V Representation Learning via Audio Shift Prediction for Multimodal Deepfake Detection and Temporal Localization
WACV 2026
Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios
WACV 2026
WarpRF: Multi-View Consistency for Training-Free Uncertainty Quantification and Applications in Radiance Fields
WACV 2026
Fused Similarity Measure Based Alignment with Dual-Scale Adaptive Selection for Weakly Supervised Video Anomaly Detection
WACV 2026
PerVL-Bench: Benchmarking Multimodal Personalization for Large Vision-Language Models
WACV 2026
T2LF: LLM-Guided Multimodal Diffusion for Text-to-Light Field Synthesis
WACV 2026
MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation
WACV 2026
Multi-Modal Soccer Scene Analysis with Masked Pre-Training
WACV 2026
VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models
WACV 2026
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
WACV 2026
<
1
…
6
7
8
…
523
>