Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems Against Semantic Attacks
AAAI 2025
ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries
AAAI 2025
CLIP-MSM: A Multi-Semantic Mapping Brain Representation for Human High-Level Visual Cortex
AAAI 2025
Visual Perturbation for Text-Based Person Search
AAAI 2025
Cross-Modal Few-Shot Learning with Second-Order Neural Ordinary Differential Equations
AAAI 2025
Motion Prior Knowledge Learning with Homogeneous Language Descriptions for Moving Infrared Small Target Detection
AAAI 2025
External Reliable Information-enhanced Multimodal Contrastive Learning for Fake News Detection
AAAI 2025
APIRL: Deep Reinforcement Learning for REST API Fuzzing
AAAI 2025
WiFi CSI Based Temporal Activity Detection via Dual Pyramid Network
AAAI 2025
MSAmba: Exploring Multimodal Sentiment Analysis with State Space Models
AAAI 2025
Asymmetric Cross-Modal Hashing Based on Formal Concept Analysis
AAAI 2025
Multi-to-Single: Reducing Multimodal Dependency in Emotion Recognition Through Contrastive Learning
AAAI 2025
Pose as a Modality: A Psychology-Inspired Network for Personality Recognition with a New Multimodal Dataset
AAAI 2025
S3E: Self-Supervised State Estimation for Radar-Inertial System
ICCV 2025
Cross-View Referring Multi-Object Tracking
AAAI 2025
EgoLM: Multi-Modal Language Model of Egocentric Motions
CVPR 2025
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
CVPR 2025
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
CVPR 2025
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
CVPR 2025
Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network
AAAI 2025
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
AAAI 2025
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives
ICCV 2025
Enhancing Fine-Grained Vision-Language Pretraining with Negative Augmented Samples
AAAI 2025
Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
AAAI 2025
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension
AAAI 2025
<
1
2
3
4
5
…
59
>