Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Reconstructing Realistic and Relightable Eyes WACV 2026

Modality and Task Adaptation for Enhanced Zero-shot Composed Image Retrieval AAAI 2026

Dual-Domain Multimodal Hyperbolic Fusion for Cardiopulmonary Disease Diagnosis in Emergency Care WACV 2026

Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs EACL 2026

BigTokDetect: A Clinically-Informed Vision–Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok EACL 2026

Coordinates from Context: Using LLMs to Ground Complex Location References EACL 2026

StarFlow: Generating Structured Workflow Outputs From Sketch Images EACL 2026

A Computational Approach to Visual Metonymy EACL 2026

Multimodal Evaluation of Russian-language Architectures EACL 2026

AfriVox: Probing Multilingual and Accent Robustness of Speech LLMs EACL 2026

PAL: Personal Adaptive Learner AAAI 2026

SmartEyes: Plug-and-Play Event Detection for Retail Loss Prevention AAAI 2026

Docora: A System for Interactive Knowledge Extraction and Visualization from Scientific PDFs AAAI 2026

AirNavigation: Let UAV Navigation Tell Its Own Story AAAI 2026

MemoVision: A Digital Catalog for Everyday Interactions AAAI 2026

MulTiCast: A Multimodal Time Series Forecasting System AAAI 2026

City of Light (COL): A City-Scale, Geo-Anchored Urban Simulator with High-Throughput Multi-Sensor Streams AAAI 2026

VitalDiagnosis: AI-Driven Ecosystem for 24/7 Vital Monitoring and Chronic Disease Management AAAI 2026

ATM: Enhanced Alignment for Text-to-Motion Generation WACV 2026

MR-Pruner: Training-free Multi-resolution Visual Token Pruning for Multi-modal Large Language Models WACV 2026

Can We Challenge Open-Vocabulary Object Detectors with Generated Content in Street Scenes? WACV 2026

DreamCatcher: Efficient Multi-Concept Customization via Representation Finetuning WACV 2026

Training-free Conditional Image Embedding Framework Leveraging Large Vision Language Models WACV 2026

Evaluating the Capability of Video Question Generation for Expert Knowledge Elicitation WACV 2026

Broadcast2Pitch: Game State Reconstruction from Unconstrained Soccer Videos WACV 2026