Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Learning Compact Video Representations for Efficient Long-form Video Understanding in Large Multimodal Models WACV 2026

SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination WACV 2026

Conversational Image Generation: Towards Multi-Round Personalized Generation with Multi-Modal Language Models WACV 2026

Improvise, Adapt, Overcome -- Telescopic Adapters for Efficient Fine-tuning of Vision Language Models in Medical Imaging WACV 2026

Improving Language Identification for Code-Switched Speech: The Pivotal Role of Accented English EACL 2026

Direct Visual Grounding by Directing Attention of Visual Tokens WACV 2026

VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics WACV 2026

Exploring Automated Recognition of Instructional Activity and Discourse from Multimodal Classroom Data WACV 2026

Ordinal-Aware Multimodal Engagement Recognition for Collaborative Learning WACV 2026

Streaming Real-Time Trajectory Prediction Using Endpoint-Aware Modeling WACV 2026

Hybrid State Representation for Video Procedure Planning WACV 2026

Feature-Disentangling RGB-NIR Fusion Network for Remote Driver Physiological Measurement WACV 2026

VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework WACV 2026

Delta-LLaVA: Base-then-Specialize Alignment for Token-Efficient Vision-Language Models WACV 2026

OpenLVLM-MIA: A Controlled Benchmark Revealing the Limits of Membership Inference Attacks on Large Vision-Language Models WACV 2026

A-V Representation Learning via Audio Shift Prediction for Multimodal Deepfake Detection and Temporal Localization WACV 2026

Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios WACV 2026

WarpRF: Multi-View Consistency for Training-Free Uncertainty Quantification and Applications in Radiance Fields WACV 2026

Fused Similarity Measure Based Alignment with Dual-Scale Adaptive Selection for Weakly Supervised Video Anomaly Detection WACV 2026

PerVL-Bench: Benchmarking Multimodal Personalization for Large Vision-Language Models WACV 2026

T2LF: LLM-Guided Multimodal Diffusion for Text-to-Light Field Synthesis WACV 2026

MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation WACV 2026

Multi-Modal Soccer Scene Analysis with Masked Pre-Training WACV 2026

VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models WACV 2026

BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries WACV 2026