Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

A Compliance-Preserving Retrieval System for Aircraft MRO Task Search EACL 2026

code-transformed: The Influence of Large Language Models on Code EACL 2026

QueStER: Query Specification for Generative Keyword-Based Retrieval EACL 2026

DCSN-NLP at MWE-2026 AdMIRe 2: Bridging Literal and Figurative Meaning Through Hierarchical Multimodal Reasoning EACL 2026

SurgXBench: Explainable Vision-Language Model Benchmark for Surgery WACV 2026

See, Think, Learn: A Self-Taught Multimodal Reasoner WACV 2026

RampWatch: An In-the-Wild Dataset and Text-Guided Detection Framework for Recreational Vessels WACV 2026

Beyond Faces: A Multimodal Person Clustering for Unconstrained Environments WACV 2026

Learning Unified Spatio-temporal Representations for Efficient Compressed Video Understanding WACV 2026

PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models WACV 2026

Geo3DVQA: Evaluating Vision-Language Models for 3D Geospatial Reasoning from Aerial Imagery WACV 2026

SceneProp: Combining Neural Network and Markov Random Field for Scene-Graph Grounding WACV 2026

MapVerse: A Benchmark for Geospatial Question Answering on Diverse Real-World Maps WACV 2026

CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video WACV 2026

Chain-of-Look Spatial Reasoning for Dense Surgical Instrument Counting WACV 2026

Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries WACV 2026

DuPLUS: Dual-Prompt Vision-Language Model for Universal Medical Image Segmentation and Prognosis WACV 2026

DermEVAL: A Dermatologist-Reviewed Benchmark for Multimodal Large Language Models WACV 2026

Histogram Assisted Quality Aware Generative Model for Resolution Invariant NIR Image Colorization WACV 2026

ExDDV: A New Dataset for Explainable Deepfake Detection in Video WACV 2026

ART: Actor-Related Tubelet for Detecting Complex-shaped Action Tubes WACV 2026

Understanding Human-Like Biases in VLMs via Subjective Face Analytics WACV 2026

BanglaProtha: Evaluating Vision Language Models in Underrepresented Long-tail Cultural Contexts WACV 2026

Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression WACV 2026

Gaussian Representations for Video WACV 2026