Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Multi-Modal Learning
115 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 1
2018: 1
2019: 1
2020: 3
2021: 3
2022: 7
2023: 5
2024: 35
2025: 57
Papers
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts
ACL 2025
MCS-Bench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in Chinese Classical Studies
ACL 2025
Knowledge-Augmented Multimodal Clinical Rationale Generation for Disease Diagnosis with Small Language Models
ACL 2025
Agri-CM3: A Chinese Massive Multi-modal, Multi-level Benchmark for Agricultural Understanding and Reasoning
ACL 2025
Redundancy Principles for MLLMs Benchmarks
ACL 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
ACL 2025
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
ACL 2025
Error-driven Data-efficient Large Multimodal Model Tuning
ACL 2025
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
ACL 2025
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
ACL 2025
Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation
ACL 2025
Shadow-Activated Backdoor Attacks on Multimodal Large Language Models
ACL 2025
AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
ACL 2025
IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web
ACL 2025
Challenges for AI in Multimodal STEM Assessments: a Human-AI Comparison
ACL 2025
Gradient Flush at Slavic NLP 2025 Task: Leveraging Slavic BERT and Translation for Persuasion Techniques Classification
ACL 2025
Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning
ACL 2025
Stress-Testing Multimodal Foundation Models for Crystallographic Reasoning
ACL 2025
PALI-NLP at SemEval 2025 Task 1: Multimodal Idiom Recognition and Alignment
ACL 2025
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes
CVPR 2025
Yo'Chameleon: Personalized Vision and Language Generation
CVPR 2025
GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions
AAAI 2025
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
CVPR 2025
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
EMNLP 2025
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
EMNLP 2025
<
1
2
3
4
5
>