Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Now You Hear Me: Audio Narrative Attacks Against Large Audio–Language Models
EACL 2026
Kahaani: A Multimodal Co-Creative Storytelling System
EACL 2026
ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models
WACV 2026
Extreme Amodal Face Detection
WACV 2026
Countering Multi-modal Representation Collapse through Rank-targeted Fusion
WACV 2026
MarineEval: Assessing the Marine Intelligence of Vision-Language Models
WACV 2026
Vision-Language Models Align with Human Neural Representations in Concept Processing
EACL 2026
FormGym: Doing Paperwork with Agents
EACL 2026
Zer0-Jack: A memory-efficient gradient-based jailbreaking method for black box Multi-modal Large Language Models
EACL 2026
Rethinking Open-world Prompt Tuning: A Systematic Framework for Evaluation and Optimization
AAAI 2026
DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning
EACL 2026
Efficient Table Retrieval and Understanding with Multimodal Large Language Models
EACL 2026
RotBench: Evaluating Multi-modal Large Language Models on Identifying Image Rotation
EACL 2026
Scalpel: Fine-Grained Alignment of Attention Activation Manifolds via Mixture Gaussian Bridges to Mitigate Multimodal Hallucination
WACV 2026
FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation
WACV 2026
DREAM: Dynamic Prompts and GuidedMix for Efficient Continual Adaptation of Visual-Language Models
WACV 2026
Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention
WACV 2026
VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction
WACV 2026
WWE-UIE: A Wavelet & White Balance Efficient Network for Underwater Image Enhancement
WACV 2026
Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models
WACV 2026
DenseBEV: Transforming BEV Grid Cells into 3D Objects
WACV 2026
START: Spatial and Textual Learning for Chart Understanding
WACV 2026
SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection
WACV 2026
UniTabBank: A Large Scale Multi-Lingual, Multi-Layout, Multi-Type, Multi-Format Dataset for Table Detection
WACV 2026
Temporal Object Captioning for Street Scene Videos from LiDAR Tracks
WACV 2026
<
1
…
7
8
9
…
523
>