Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multimodal Learning
323 directly classified papers
Papers per year
2014: 1
2015: 1
2017: 8
2018: 11
2019: 11
2020: 27
2021: 23
2022: 46
2023: 35
2024: 53
2025: 104
2026: 3
Papers
Multi-Granular Multimodal Clue Fusion for Meme Understanding
AAAI 2025
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
ACL 2025
Aligning Text/Speech Representations from Multimodal Models with MEG Brain Activity During Listening
EMNLP 2025
CTYUN-AI at SemEval-2025 Task 1: Learning to Rank for Idiomatic Expressions
ACL 2025
Coreference as an indicator of context scope in multimodal narrative
ACL 2025
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
CVPR 2025
External Memory Matters: Generalizable Object-Action Memory for Retrieval-Augmented Long-Term Video Understanding
IJCAI 2025
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
CVPR 2024
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
CVPR 2024
PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
CVPR 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
CVPR 2024
Text-Driven Image Editing via Learnable Regions
CVPR 2024
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
CVPR 2024
RWKV-CLIP: A Robust Vision-Language Representation Learner
EMNLP 2024
CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning
CVPR 2024
Plot Twist: Multimodal Models Don’t Comprehend Simple Chart Details
EMNLP 2024
Revisiting Multimodal Transformers for Tabular Data with Text Fields
ACL 2024
Visual Pivoting Unsupervised Multimodal Machine Translation in Low-Resource Distant Language Pairs
EMNLP 2024
PRISM: A New Lens for Improved Color Understanding
EMNLP 2024
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention
EMNLP 2024
Multimodal Instruction Tuning with Conditional Mixture of LoRA
ACL 2024
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
CVPR 2024
ViLT-CLIP: Video and Language Tuning CLIP with Multimodal Prompt Learning and Scenario-Guided Optimization
AAAI 2024
FocalDreamer: Text-Driven 3D Editing via Focal-Fusion Assembly
AAAI 2024
Large Language Models can Share Images, Too!
ACL 2024
<
1
…
4
5
6
…
13
>