Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
R-Bind: Unified Enhancement of Attribute and Relation Binding in Text-to-Image Diffusion Models
EMNLP 2025
SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
EMNLP 2025
LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
EMNLP 2025
VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
EMNLP 2025
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
CVPR 2025
T-MAD: Target-driven Multimodal Alignment for Stance Detection
EMNLP 2025
TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration
EMNLP 2025
LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL
EMNLP 2025
Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models
CVPR 2025
DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
EMNLP 2025
From Shortcuts to Balance: Attribution Analysis of Speech-Text Feature Utilization in Distinguishing Original from Machine-Translated Texts
EMNLP 2025
G2SF: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection
ICCV 2025
MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
CVPR 2025
VEU-Bench: Towards Comprehensive Understanding of Video Editing
CVPR 2025
AdaDARE-gamma: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation
CVPR 2025
BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence
CVPR 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
CVPR 2025
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
CVPR 2025
EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models
CVPR 2025
VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models
CVPR 2025
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
CVPR 2025
Just Dance with pi! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection
CVPR 2025
Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body
CVPR 2025
Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction
CVPR 2025
MotionMap: Representing Multimodality in Human Pose Forecasting
CVPR 2025
<
1
…
7
8
9
…
128
>