Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Now You Hear Me: Audio Narrative Attacks Against Large Audio–Language Models EACL 2026

Kahaani: A Multimodal Co-Creative Storytelling System EACL 2026

ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models WACV 2026

Extreme Amodal Face Detection WACV 2026

Countering Multi-modal Representation Collapse through Rank-targeted Fusion WACV 2026

MarineEval: Assessing the Marine Intelligence of Vision-Language Models WACV 2026

Vision-Language Models Align with Human Neural Representations in Concept Processing EACL 2026

FormGym: Doing Paperwork with Agents EACL 2026

Zer0-Jack: A memory-efficient gradient-based jailbreaking method for black box Multi-modal Large Language Models EACL 2026

Rethinking Open-world Prompt Tuning: A Systematic Framework for Evaluation and Optimization AAAI 2026

DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning EACL 2026

Efficient Table Retrieval and Understanding with Multimodal Large Language Models EACL 2026

RotBench: Evaluating Multi-modal Large Language Models on Identifying Image Rotation EACL 2026

Scalpel: Fine-Grained Alignment of Attention Activation Manifolds via Mixture Gaussian Bridges to Mitigate Multimodal Hallucination WACV 2026

FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation WACV 2026

DREAM: Dynamic Prompts and GuidedMix for Efficient Continual Adaptation of Visual-Language Models WACV 2026

Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention WACV 2026

VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction WACV 2026

WWE-UIE: A Wavelet & White Balance Efficient Network for Underwater Image Enhancement WACV 2026

Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models WACV 2026

DenseBEV: Transforming BEV Grid Cells into 3D Objects WACV 2026

START: Spatial and Textual Learning for Chart Understanding WACV 2026

SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection WACV 2026

UniTabBank: A Large Scale Multi-Lingual, Multi-Layout, Multi-Type, Multi-Format Dataset for Table Detection WACV 2026

Temporal Object Captioning for Street Scene Videos from LiDAR Tracks WACV 2026