Artificial Intelligence › Core AI ›

Multi-Modal Learning

1457 directly classified papers

Papers per year

Papers

See Through Their Minds: Learning Transferable Brain Decoding Models from Cross-Subject fMRI AAAI 2025

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow ICCV 2025

COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems Against Semantic Attacks AAAI 2025

Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition AAAI 2025

Towards Audio-Visual Navigation in Noisy Environments: A Large-Scale Benchmark Dataset and an Architecture Considering Multiple Sound-Sources AAAI 2025

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis AAAI 2025

Semi-Supervised Multi-View Multi-Label Learning with View-Specific Transformer and Enhanced Pseudo-Label AAAI 2025

MoLE:Decoding by Mixture of Layer Experts Alleviates Hallucination in Large Vision-Language Models AAAI 2025

Towards Multimodal Sentiment Analysis via Hierarchical Correlation Modeling with Semantic Distribution Constraints AAAI 2025

CP-Guard: Malicious Agent Detection and Defense in Collaborative Bird’s Eye View Perception AAAI 2025

Affordances-Oriented Planning Using Foundation Models for Continuous Vision-Language Navigation AAAI 2025

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models AAAI 2025

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation AAAI 2025

External Reliable Information-enhanced Multimodal Contrastive Learning for Fake News Detection AAAI 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing AAAI 2025

WiFi CSI Based Temporal Activity Detection via Dual Pyramid Network AAAI 2025

MSAmba: Exploring Multimodal Sentiment Analysis with State Space Models AAAI 2025

Asymmetric Cross-Modal Hashing Based on Formal Concept Analysis AAAI 2025

Multi-to-Single: Reducing Multimodal Dependency in Emotion Recognition Through Contrastive Learning AAAI 2025

Pose as a Modality: A Psychology-Inspired Network for Personality Recognition with a New Multimodal Dataset AAAI 2025

Cross-View Referring Multi-Object Tracking AAAI 2025

FIRM: Flexible Interactive Reflection ReMoval AAAI 2025

PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery AAAI 2025

Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network AAAI 2025

Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing AAAI 2025