Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
SAR: A Structure-Aligned Reasoning Framework for Temporal Knowledge Graph Question Answering
AAAI 2026
BIG-FUSION: Brain-Inspired Global-Local Context Fusion Framework for Multimodal Emotion Recognition in Conversations
AAAI 2025
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
NAACL 2025
Multi-View Incremental Learning with Structured Hebbian Plasticity for Enhanced Fusion Efficiency
AAAI 2025
Asymmetric Cross-Modal Hashing Based on Formal Concept Analysis
AAAI 2025
Pose as a Modality: A Psychology-Inspired Network for Personality Recognition with a New Multimodal Dataset
AAAI 2025
External Reliable Information-enhanced Multimodal Contrastive Learning for Fake News Detection
AAAI 2025
mmFAS: Multimodal Face Anti-Spoofing Using Multi-Level Alignment and Switch-Attention Fusion
AAAI 2025
DAMMFND: Domain-Aware Multimodal Multi-view Fake News Detection
AAAI 2025
COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems Against Semantic Attacks
AAAI 2025
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
CVPR 2025
Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow
ICCV 2025
EyEar: Learning Audio Synchronized Human Gaze Trajectory Based on Physics-Informed Dynamics
AAAI 2025
Multi-to-Single: Reducing Multimodal Dependency in Emotion Recognition Through Contrastive Learning
AAAI 2025
APIRL: Deep Reinforcement Learning for REST API Fuzzing
AAAI 2025
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
CVPR 2025
EgoLM: Multi-Modal Language Model of Egocentric Motions
CVPR 2025
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
CVPR 2025
S3E: Self-Supervised State Estimation for Radar-Inertial System
ICCV 2025
WiFi CSI Based Temporal Activity Detection via Dual Pyramid Network
AAAI 2025
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
ICCV 2025
See Through Their Minds: Learning Transferable Brain Decoding Models from Cross-Subject fMRI
AAAI 2025
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives
ICCV 2025
Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
AAAI 2025
MSAmba: Exploring Multimodal Sentiment Analysis with State Space Models
AAAI 2025
<
1
2
3
4
5
…
59
>