Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
Express What You See: Can Multimodal LLMs Decode Visual Ciphers with Intuitive Semiosis Comprehension?
ACL 2025
FOCUS: Evaluating Pre-trained Vision-Language Models on Underspecification Reasoning
ACL 2025
V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me
ACL 2025
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
ACL 2025
Unbiased Missing-modality Multimodal Learning
ICCV 2025
WAFFLE: Fine-tuning Multi-Modal Model for Automated Front-End Development
ACL 2025
MNLP@DravidianLangTech 2025: A Deep Multimodal Neural Network for Hate Speech Detection in Dravidian Languages
NAACL 2025
Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures
ACL 2025
IMOL: Incomplete-Modality-Tolerant Learning for Multi-Domain Fake News Video Detection
ACL 2025
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
ACL 2025
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning
ACL 2025
MNLP@DravidianLangTech 2025: Transformer-based Multimodal Framework for Misogyny Meme Detection
NAACL 2025
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
ACL 2025
Ambiguity-aware Multi-level Incongruity Fusion Network for Multi-Modal Sarcasm Detection
COLING 2025
Query-LIFE: Query-aware Language Image Fusion Embedding for E-Commerce Relevance
COLING 2025
Beyond Visual Understanding Introducing PARROT-360V for Vision Language Model Benchmarking
COLING 2025
Enhancing Large Language Models for Scientific Multimodal Summarization with Multimodal Output
COLING 2025
Team ML_Forge@DravidianLangTech 2025: Multimodal Hate Speech Detection in Dravidian Languages
NAACL 2025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
ACL 2025
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
ACL 2025
MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
ACL 2025
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
ICCV 2025
MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
ACL 2025
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
ACL 2025
Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration
ACL 2025
<
1
2
3
4
5
…
51
>