Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Multi-Modal Learning
1213 directly classified papers
Papers per year
2007: 2
2008: 1
2009: 1
2011: 2
2012: 5
2013: 5
2014: 1
2015: 5
2016: 8
2017: 21
2018: 42
2019: 42
2020: 69
2021: 72
2022: 149
2023: 143
2024: 258
2025: 370
2026: 17
Papers
Findings of the Shared Task on Misogyny Meme Detection: DravidianLangTech@NAACL 2025
NAACL 2025
CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages
ACL 2025
StuD: A Multimodal Approach for Stuttering Detection with RAG and Fusion Strategies
IJCNLP 2025
MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
ACL 2025
ReEdit: Multimodal Exemplar-Based Image Editing
WACV 2025
CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis
ACL 2025
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
CVPR 2025
AGRec: Adapting Autoregressive Decoders with Graph Reasoning for LLM-based Sequential Recommendation
ACL 2025
Enhancing Novel Object Detection via Cooperative Foundational Models
WACV 2025
STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation
ACL 2025
Synthetic Data is an Elegant GIFT for Continual Vision-Language Models
CVPR 2025
MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching
ACL 2025
Cross-Aligned Fusion for Multimodal Understanding
WACV 2025
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation
ACL 2025
PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors
CVPR 2025
Sign2Vis: Automated Data Visualization from Sign Language
ACL 2025
Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time Series
WACV 2025
Vision-Language Models Struggle to Align Entities across Modalities
ACL 2025
SIDE: Socially Informed Drought Estimation Toward Understanding Societal Impact Dynamics of Environmental Crisis
AAAI 2025
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
CVPR 2025
TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation
CVPR 2025
VinaBench: Benchmark for Faithful and Consistent Visual Narratives
CVPR 2025
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment
CVPR 2025
Flexible Frame Selection for Efficient Video Reasoning
CVPR 2025
Cause-Effect Driven Optimization for Robust Medical Visual Question Answering with Language Biases
IJCAI 2025
<
1
…
5
6
7
…
49
>