Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
FOLDER: Accelerating Multi-Modal Large Language Models with Enhanced Performance
ICCV 2025
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives
ICCV 2025
Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
ICCV 2025
SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis
ICCV 2025
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
ICCV 2025
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
ICCV 2025
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
ICCV 2025
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
ICCV 2025
Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes
CVPR 2025
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
ICCV 2025
OrderChain: Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM
ICCV 2025
Fix-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text
ICCV 2025
Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion
ICCV 2025
ReMP-AD: Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection
ICCV 2025
EgoM2P: Egocentric Multimodal Multitask Pretraining
ICCV 2025
ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning
ICCV 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
ICCV 2025
PanSt3R: Multi-view Consistent Panoptic Segmentation
ICCV 2025
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
ICCV 2025
SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis
CVPR 2025
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
CVPR 2025
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
CVPR 2025
Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion
CVPR 2025
Multimodal Neural Machine Translation: A Survey of the State of the Art
EMNLP 2025
Beyond Content: How Grammatical Gender Shapes Visual Representation in Text-to-Image Models
EMNLP 2025
<
1
…
5
6
7
…
128
>