Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Patch Your Matcher: Correspondence-Aware Image-to-Image Translation Unlocks Cross-Modal Matching via Single-Modality Priors WACV 2026

Broadcast2Pitch: Game State Reconstruction from Unconstrained Soccer Videos WACV 2026

ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos WACV 2026

Countering Multi-modal Representation Collapse through Rank-targeted Fusion WACV 2026

MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping WACV 2026

SuperRivolution: Fine-Scale Rivers from Coarse Temporal Satellite Imagery WACV 2026

VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction WACV 2026

DCText: Scheduled Attention Masking for Visual Text Generation via Divide-and-Conquer Strategy WACV 2026

Beyond Faces: A Multimodal Person Clustering for Unconstrained Environments WACV 2026

DreamCatcher: Efficient Multi-Concept Customization via Representation Finetuning WACV 2026

Multimodal Graph Representation Learning over Arbitrary Sets of Modalities WACV 2026

From Prompt to Production: Automating Brand-Safe Marketing Imagery with Text-to-Image Models WACV 2026

SOAF: Scene Occlusion-aware Neural Acoustic Field WACV 2026

Test-Time Consistency in Vision Language Models WACV 2026

Can We Challenge Open-Vocabulary Object Detectors with Generated Content in Street Scenes? WACV 2026

RampWatch: An In-the-Wild Dataset and Text-Guided Detection Framework for Recreational Vessels WACV 2026

MemeTAG: Keyword-Driven Meme Classification through Tag Embedding Reconstruction WACV 2026

SimForce: Force and Surface Electromyography from Full Body Video with Graph Neural Nets WACV 2026

SceneProp: Combining Neural Network and Markov Random Field for Scene-Graph Grounding WACV 2026

Learnable Query-Enhanced Pose Transformation WACV 2026

Reconstructing Realistic and Relightable Eyes WACV 2026

UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning WACV 2026

Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships WACV 2026

Towards Fine-Grained Adaptation of CLIP via a Self-Trained Alignment Score WACV 2026

GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction WACV 2026