Artificial Intelligence › Core AI ›

Multi-Modal Learning

1457 directly classified papers

Papers per year

Papers

Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition AAAI 2025

JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation AAAI 2025

Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration AAAI 2025

S3E: Self-Supervised State Estimation for Radar-Inertial System ICCV 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing AAAI 2025

IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities AAAI 2025

Noisy Correspondence Rectification via Asymmetric Similarity Learning AAAI 2025

EgoLM: Multi-Modal Language Model of Egocentric Motions CVPR 2025

CoPEFT: Fast Adaptation Framework for Multi-Agent Collaborative Perception with Parameter-Efficient Fine-Tuning AAAI 2025

Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation CVPR 2025

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models AAAI 2025

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models CVPR 2025

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models ICCV 2025

Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method CVPR 2025

LRM-LLaVA: Overcoming the Modality Gap of Multilingual Large Language-Vision Model for Low-Resource Languages AAAI 2025

Multi-View Empowered Structural Graph Wordification for Language Models AAAI 2025

Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines AAAI 2025

Tensorized Attention for Understanding Multi-Object Relationships AAAI 2025

GNS: Solving Plane Geometry Problems by Neural-Symbolic Reasoning with Multi-Modal LLMs AAAI 2025

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives ICCV 2025

Multi-View Incremental Learning with Structured Hebbian Plasticity for Enhanced Fusion Efficiency AAAI 2025

RefDetector: A Simple Yet Effective Matching-based Method for Referring Expression Comprehension AAAI 2025

UniMuMo: Unified Text, Music, and Motion Generation AAAI 2025

Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration AAAI 2025

SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor AAAI 2025