Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Cross-Modal Few-Shot Learning with Second-Order Neural Ordinary Differential Equations
AAAI 2025
Partial Point Cloud Registration with Multi-view 2D Image Learning
AAAI 2025
Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing
AAAI 2025
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
AAAI 2025
Cross-modal Multi-task Learning for Multimedia Event Extraction
AAAI 2025
Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective
AAAI 2025
M^3EL: A Multi-task Multi-topic Dataset for Multi-modal Entity Linking
AAAI 2025
Cross-Domain Trajectory Association Based on Hierarchical Spatiotemporal Enhanced Attention Hypergraph
AAAI 2025
MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction
AAAI 2025
Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition
AAAI 2025
Debiased Multimodal Understanding for Human Language Sequences
AAAI 2025
JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation
AAAI 2025
Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration
AAAI 2025
Towards Audio-Visual Navigation in Noisy Environments: A Large-Scale Benchmark Dataset and an Architecture Considering Multiple Sound-Sources
AAAI 2025
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
AAAI 2025
Read, Watch and Scream! Sound Generation from Text and Video
AAAI 2025
Semi-Supervised Multi-View Multi-Label Learning with View-Specific Transformer and Enhanced Pseudo-Label
AAAI 2025
MoLE:Decoding by Mixture of Layer Experts Alleviates Hallucination in Large Vision-Language Models
AAAI 2025
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
AAAI 2025
Noisy Correspondence Rectification via Asymmetric Similarity Learning
AAAI 2025
Towards Multimodal Sentiment Analysis via Hierarchical Correlation Modeling with Semantic Distribution Constraints
AAAI 2025
CP-Guard: Malicious Agent Detection and Defense in Collaborative Bird’s Eye View Perception
AAAI 2025
CoPEFT: Fast Adaptation Framework for Multi-Agent Collaborative Perception with Parameter-Efficient Fine-Tuning
AAAI 2025
Affordances-Oriented Planning Using Foundation Models for Continuous Vision-Language Navigation
AAAI 2025
Howard University-AI4PC at SemEval-2025 Task 1: Using GPT-4o and CLIP-ViLT to Decode Figurative Language Across Text and Images
ACL 2025
<
1
…
9
10
11
…
59
>