Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Multi-Modal Learning
115 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 1
2018: 1
2019: 1
2020: 3
2021: 3
2022: 7
2023: 5
2024: 35
2025: 57
Papers
Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation
ACL 2025
AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
ACL 2025
Redundancy Principles for MLLMs Benchmarks
ACL 2025
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
ACL 2025
Shadow-Activated Backdoor Attacks on Multimodal Large Language Models
ACL 2025
From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalities
ACL 2025
Knowledge-Augmented Multimodal Clinical Rationale Generation for Disease Diagnosis with Small Language Models
ACL 2025
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts
ACL 2025
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
ACL 2025
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
ACL 2025
Agri-CM3: A Chinese Massive Multi-modal, Multi-level Benchmark for Agricultural Understanding and Reasoning
ACL 2025
Text2midi: Generating Symbolic Music from Captions
AAAI 2025
Mental-Perceiver: Audio-Textual Multi-Modal Learning for Estimating Mental Disorders
AAAI 2025
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives
ICCV 2025
Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow
ICCV 2025
MCS-Bench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in Chinese Classical Studies
ACL 2025
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
ICCV 2025
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
ACL 2025
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
NAACL 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
ACL 2025
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
AAAI 2025
Error-driven Data-efficient Large Multimodal Model Tuning
ACL 2025
GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions
AAAI 2025
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
EMNLP 2025
IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web
ACL 2025
<
1
2
3
4
5
>