Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Multimodal Learning
85 directly classified papers
Papers per year
2017: 2
2019: 3
2020: 4
2021: 4
2022: 8
2023: 12
2024: 15
2025: 37
Papers
Multimodal Prior Learning with Double Constraint Alignment for Snapshot Spectral Compressive Imaging
IJCAI 2025
Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity
CVPR 2025
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
ICCV 2025
Streaming VideoLLMs for Real-Time Procedural Video Understanding
ICCV 2025
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
ICCV 2025
HumorDB: Can AI understand graphical humor?
ICCV 2025
Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios
EMNLP 2025
AI Knows Where You Are: Exposure, Bias, and Inference in Multimodal Geolocation with KoreaGEO
EMNLP 2025
M-LongDoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework
EMNLP 2025
5cNLP at BioLaySumm2025: Prompts, Retrieval, and Multimodal Fusion
ACL 2025
CLEAR: Character Unlearning in Textual and Visual Modalities
ACL 2025
Multimodal Invariant Sentiment Representation Learning
ACL 2025
AccentFold: A Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents
EACL 2024
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing
ACL 2024
YYama@Multimodal Hate Speech Event Detection 2024: Simpler Prompts, Better Results - Enhancing Zero-shot Detection with a Large Multimodal Model
EACL 2024
CLTL@Multimodal Hate Speech Event Detection 2024: The Winning Approach to Detecting Multimodal Hate Speech and Its Targets
EACL 2024
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding
COLING 2024
Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
ACL 2024
MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention
ACL 2024
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
CVPR 2024
Data-Efficient Multimodal Fusion on a Single GPU
CVPR 2024
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
CVPR 2024
EEVR: A Dataset of Paired Physiological Signals and Textual Descriptions for Joint Emotion Representation Learning
NIPS 2024
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
NIPS 2024
What to Say and When to Say it: Live Fitness Coaching as a Testbed for Situated Interaction
NIPS 2024
<
1
2
3
4
>