Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
DialogDraw: Image Generation and Editing System Based on Multi-Turn Dialogue
AAAI 2025
GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art
ACL 2025
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension
AAAI 2025
Graphic Design with Large Multimodal Model
AAAI 2025
External Reliable Information-enhanced Multimodal Contrastive Learning for Fake News Detection
AAAI 2025
Cross-Domain Trajectory Association Based on Hierarchical Spatiotemporal Enhanced Attention Hypergraph
AAAI 2025
Debiased Multimodal Understanding for Human Language Sequences
AAAI 2025
Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition
AAAI 2025
MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction
AAAI 2025
Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration
AAAI 2025
Towards Audio-Visual Navigation in Noisy Environments: A Large-Scale Benchmark Dataset and an Architecture Considering Multiple Sound-Sources
AAAI 2025
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
AAAI 2025
Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning
AAAI 2025
Internal Activation Revision: Safeguarding Vision Language Models Without Parameter Update
AAAI 2025
Retention Score: Quantifying Jailbreak Risks for Vision Language Models
AAAI 2025
PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection
AAAI 2025
Differentiated Vision: Unveiling Entity-Specific Visual Modality Requirements for Multimodal Knowledge Graph
EMNLP 2025
Read, Watch and Scream! Sound Generation from Text and Video
AAAI 2025
Semi-Supervised Multi-View Multi-Label Learning with View-Specific Transformer and Enhanced Pseudo-Label
AAAI 2025
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
ACL 2025
Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?
ACL 2025
Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language Models
ACL 2025
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
ACL 2025
Sharper and Faster mean Better: Towards More Efficient Vision-Language Model for Hour-scale Long Video Understanding
ACL 2025
Cultivating Gaming Sense for Yourself: Making VLMs Gaming Experts
ACL 2025
<
1
…
5
6
7
…
59
>