Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multimodal Learning
323 directly classified papers
Papers per year
2014: 1
2015: 1
2017: 8
2018: 11
2019: 11
2020: 27
2021: 23
2022: 46
2023: 35
2024: 53
2025: 104
2026: 3
Papers
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
ACL 2025
Big Escape Benchmark: Evaluating Human-Like Reasoning in Language Models via Real-World Escape Room Challenges
ACL 2025
CPIQA: Climate Paper Image Question Answering Dataset for Retrieval-Augmented Generation with Context-based Query Expansion
ACL 2025
AXIS: Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents
ACL 2025
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
ACL 2025
Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models
ACL 2025
Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence
ACL 2025
YNU-HPCC at SemEval-2025 Task 1: Enhancing Multimodal Idiomaticity Representation via LoRA and Hybrid Loss Optimization
SEMEVAL 2025
A Survey on Patent Analysis: From NLP to Multimodal AI
ACL 2025
Zhoumou at SemEval-2025 Task 1: Leveraging Multimodal Data Augmentation and Large Language Models for Enhanced Idiom Understanding
ACL 2025
Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation
ACL 2025
ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval
AAAI 2025
EssayDetect at GenAI Detection Task 2: Guardians of Academic Integrity: Multilingual Detection of AI-Generated Essays
COLING 2025
Cross-Aligned Fusion for Multimodal Understanding
WACV 2025
SemiDAViL: Semi-supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation
CVPR 2025
Instruction-based Image Manipulation by Watching How Things Move
CVPR 2025
Enhancing Audiovisual Speech Recognition Through Bifocal Preference Optimization
AAAI 2025
Pose as a Modality: A Psychology-Inspired Network for Personality Recognition with a New Multimodal Dataset
AAAI 2025
BIG-FUSION: Brain-Inspired Global-Local Context Fusion Framework for Multimodal Emotion Recognition in Conversations
AAAI 2025
VODiff: Controlling Object Visibility Order in Text-to-Image Generation
CVPR 2025
AudioGenX: Explainability on Text-to-Audio Generative Models
AAAI 2025
Explanation Bottleneck Models
AAAI 2025
BiMAC: Bidirectional Multimodal Alignment in Contrastive Learning
AAAI 2025
Zero-Shot Image Captioning with Multi-type Entity Representations
AAAI 2025
Conical Visual Concentration for Efficient Large Vision-Language Models
CVPR 2025
<
1
2
3
4
5
…
13
>