Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
DCText: Scheduled Attention Masking for Visual Text Generation via Divide-and-Conquer Strategy
WACV 2026
RegionAligner: Bridging Ego-Exo Views for Object Correspondence via Unified Text-Visual Learning
WACV 2026
Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video?
WACV 2026
Countering Multi-modal Representation Collapse through Rank-targeted Fusion
WACV 2026
Being Positive about Negative Queries: Exclusion Aware Multimodal Retrieval using Disentangled Representations
WACV 2026
IMPACT: Interpretable Most Important Person Analysis and Classification using Transformer-based Models
WACV 2026
Cross-Modal Event Encoder: Bridging Image-Text Knowledge to Event Streams
WACV 2026
HOLO: Holistic Lightweight Optimization for Scene Understanding with Auto-Annotation and Multimodal Learning
WACV 2026
Enhancing Multimodal Retrieval via Complementary Information Extraction and Alignment
ACL 2025
V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me
ACL 2025
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
ACL 2025
Deep Temporal Reasoning in Video Language Models: A Cross-Linguistic Evaluation of Action Duration and Completion through Perfect Times
ACL 2025
Query-LIFE: Query-aware Language Image Fusion Embedding for E-Commerce Relevance
COLING 2025
Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration
ACL 2025
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning
ACL 2025
MemeQA: Holistic Evaluation for Meme Understanding
ACL 2025
HerWILL@DravidianLangTech 2025: Ensemble Approach for Misogyny Detection in Memes Using Pre-trained Text and Vision Transformers
NAACL 2025
A Multimodal Recaptioning Framework to Account for Perceptual Diversity Across Languages in Vision-Language Modeling
IJCNLP 2025
MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
ACL 2025
DLRG@DravidianLangTech 2025: Multimodal Hate Speech Detection in Dravidian Languages
NAACL 2025
Team ML_Forge@DravidianLangTech 2025: Multimodal Hate Speech Detection in Dravidian Languages
NAACL 2025
Unbiased Missing-modality Multimodal Learning
ICCV 2025
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
ICCV 2025
Beyond Visual Understanding Introducing PARROT-360V for Vision Language Model Benchmarking
COLING 2025
MNLP@DravidianLangTech 2025: Transformer-based Multimodal Framework for Misogyny Meme Detection
NAACL 2025
<
1
2
3
4
5
…
51
>