Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Optical Character Recognition for the International Phonetic Alphabet EACL 2026

MedPEFT-CL: Dual-Phase Parameter-Efficient Continual Learning with Medical Semantic Adapter and Bidirectional Memory Consolidation WACV 2026

Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video? WACV 2026

Similarity-aware Probabilistic Embeddings Modeling for Video-Text Retrieval WACV 2026

Human Knowledge Integrated Multi-modal Learning for Single Source Domain Generalization WACV 2026

UniCalib: Targetless LiDAR-camera Calibration via Probabilistic Flow on Unified Depth Representations WACV 2026

Beyond the Highlights: Video Retrieval with Salient and Surrounding Contexts WACV 2026

Analysis of Text Accuracy and Visual Alignment in Vision-Language Models for Artistic Text Generation WACV 2026

Sketch2Stitch: GANs for Abstract Sketch-Based Dress Synthesis WACV 2026

GateFusion: Hierarchical Gated Cross-Modal Fusion for Active Speaker Detection WACV 2026

FairVLM: Enhancing Fairness and Prompt Sensitivity in Vision Language Models for Medical Image Segmentation WACV 2026

V2XScene: Multi-View Consistent 3D Scene Simulation for Collaborative Perception WACV 2026

CAPE: A CLIP-Aware Pointing Ensemble of Complementary Heatmap Cues for Embodied Reference Understanding WACV 2026

Large Sign Language Models: Toward 3D American Sign Language Translation WACV 2026

PoseGaussian: Pose-Driven Novel View Synthesis for Robust 3D Human Reconstruction WACV 2026

The Correlation Between Emotion in Text and Speech Segments is Limited: A Cross-Modal Study EACL 2026

ArchitectHead: Continuous Level of Detail Control for 3D Gaussian Head Avatars WACV 2026

SAFER-AiD: Saccade-Assisted Foveal-peripheral vision Enhanced Reconstruction for Adversarial Defense WACV 2026

AuthGuard: Generalizable Deepfake Detection via Language Guidance WACV 2026

Improving Out-of-Distribution Detection Using Segmented Images and Cross-View Attention Fusion WACV 2026

Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization EACL 2026

What Happens When: Learning Temporal Orders of Events in Videos WACV 2026

SENCA-st: Integrating Spatial Transcriptomics and Histopathology with Cross Attention Shared Encoder for Region Identification in Cancer Pathology WACV 2026

Extending Audio Context for Long-Form Understanding in Large Audio-Language Models EACL 2026

UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning WACV 2026