Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network
AAAI 2025
RefDetector: A Simple Yet Effective Matching-based Method for Referring Expression Comprehension
AAAI 2025
PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery
AAAI 2025
AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs
ACL 2025
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
AAAI 2025
AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
AAAI 2025
PoseLLaVA: Pose Centric Multimodal LLM for Fine-Grained 3D Pose Manipulation
AAAI 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
ACL 2025
S3E: Self-Supervised State Estimation for Radar-Inertial System
ICCV 2025
Cross-modulated Attention Transformer for RGBT Tracking
AAAI 2025
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension
AAAI 2025
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
AAAI 2025
Few-Shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation
AAAI 2025
ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries
AAAI 2025
Expand VSR Benchmark for VLLM to Expertize in Spatial Rules
AAAI 2025
Can MLLMs Understand the Deep Implication Behind Chinese Images?
ACL 2025
EgoLM: Multi-Modal Language Model of Egocentric Motions
CVPR 2025
Cross-Modal Few-Shot Learning with Second-Order Neural Ordinary Differential Equations
AAAI 2025
CLIP-MSM: A Multi-Semantic Mapping Brain Representation for Human High-Level Visual Cortex
AAAI 2025
CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling
ACL 2025
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
CVPR 2025
Visual Perturbation for Text-Based Person Search
AAAI 2025
End-to-End Autonomous Driving Through V2X Cooperation
AAAI 2025
Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts
ACL 2025
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
ACL 2025
<
1
…
6
7
8
…
59
>