Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Computer Vision
›
Core AI
›
Multi-Modal Learning
29 directly classified papers
Papers per year
2013: 1
2015: 1
2017: 1
2018: 1
2019: 1
2020: 2
2021: 3
2022: 6
2023: 4
2024: 3
2025: 6
Papers
Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language Models
ACL 2025
Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation
EMNLP 2025
PURA: Parameter Update-Recovery Test-Time Adaption for RGB-T Tracking
CVPR 2025
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech
AAAI 2025
One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception
CVPR 2025
Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?
ACL 2025
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations
AAAI 2024
Single Image Reflection Separation via Dual-Stream Interactive Transformers
NIPS 2024
Event-assisted Low-Light Video Object Segmentation
CVPR 2024
Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching
CVPR 2023
A Multi-modal Global Instance Tracking Benchmark (MGIT): Better Locating Target in Complex Spatio-temporal and Causal Relationship
NIPS 2023
FLAG3D: A 3D Fitness Activity Dataset With Language Instruction
CVPR 2023
iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition
CVPR 2023
A Unified Sequence Interface for Vision Tasks
NIPS 2022
Improving Single-Image Defocus Deblurring: How Dual-Pixel Images Help Through Multi-Task Learning
WACV 2022
Target-Aware Dual Adversarial Learning and a Multi-Scenario Multi-Modality Benchmark To Fuse Infrared and Visible for Object Detection
CVPR 2022
Cross-Modal Transferable Adversarial Attacks From Images to Videos
CVPR 2022
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
NIPS 2022
CyCLIP: Cyclic Contrastive Language-Image Pretraining
NIPS 2022
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
CVPR 2021
Binaural Audio-Visual Localization
AAAI 2021
Robust Multi-Modality Person Re-identification
AAAI 2021
Fusing Wearable IMUs With Multi-View Images for Human Pose Estimation: A Geometric Approach
CVPR 2020
Expressing Objects Just Like Words: Recurrent Visual Embedding for Image-Text Matching
AAAI 2020
Generating Question Relevant Captions to Aid Visual Question Answering
ACL 2019
<
1
2
>