Artificial Intelligence › Core AI ›

Multi-Modal Learning

1457 directly classified papers

Papers per year

Papers

FIRM: Flexible Interactive Reflection ReMoval AAAI 2025

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models ICCV 2025

A New Formula for Sticker Retrieval: Reply with Stickers in Multi-Modal and Multi-Session Conversation AAAI 2025

Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method CVPR 2025

A Video-grounded Dialogue Dataset and Metric for Event-driven Activities AAAI 2025

DEQA: Descriptions Enhanced Question-Answering Framework for Multimodal Aspect-Based Sentiment Analysis AAAI 2025

Exploring Multimodal Relation Extraction of Hierarchical Tabular Data with Multi-task Learning ACL 2025

Motion Prior Knowledge Learning with Homogeneous Language Descriptions for Moving Infrared Small Target Detection AAAI 2025

Dynamic Syntactic Feature Filtering and Injecting Networks for Cross-lingual Dependency Parsing AAAI 2025

Cross-View Referring Multi-Object Tracking AAAI 2025

Probing Relative Interaction and Dynamic Calibration in Multi-modal Entity Alignment ACL 2025

CustomContrast: A Multilevel Contrastive Perspective for Subject-Driven Text-to-Image Customization AAAI 2025

GNS: Solving Plane Geometry Problems by Neural-Symbolic Reasoning with Multi-Modal LLMs AAAI 2025

Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech AAAI 2025

SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models ACL 2025

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives ICCV 2025

Visual Perturbation for Text-Based Person Search AAAI 2025

External Reliable Information-enhanced Multimodal Contrastive Learning for Fake News Detection AAAI 2025

mmFAS: Multimodal Face Anti-Spoofing Using Multi-Level Alignment and Switch-Attention Fusion AAAI 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing AAAI 2025

WiFi CSI Based Temporal Activity Detection via Dual Pyramid Network AAAI 2025

DAMMFND: Domain-Aware Multimodal Multi-view Fake News Detection AAAI 2025

Multi-View Incremental Learning with Structured Hebbian Plasticity for Enhanced Fusion Efficiency AAAI 2025

MSAmba: Exploring Multimodal Sentiment Analysis with State Space Models AAAI 2025

Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions ACL 2025