Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Multi-Modal Learning
115 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 1
2018: 1
2019: 1
2020: 3
2021: 3
2022: 7
2023: 5
2024: 35
2025: 57
Papers
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
EMNLP 2024
AnyTrans: Translate AnyText in the Image with Large Scale Models
EMNLP 2024
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
EMNLP 2024
Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features
CVPR 2024
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
CVPR 2024
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation
CVPR 2024
LangSplat: 3D Language Gaussian Splatting
CVPR 2024
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
CVPR 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
CVPR 2024
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
CVPR 2024
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
CVPR 2024
Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation
EMNLP 2024
PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology
AAAI 2024
ReMI: A Dataset for Reasoning with Multiple Images
NIPS 2024
Efficient Large Multi-modal Models via Visual Context Compression
NIPS 2024
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE
NIPS 2024
Multimodal Large Language Models Make Text-to-Image Generative Models Align Better
NIPS 2024
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding
ACL 2023
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
NIPS 2023
Unifying Text, Tables, and Images for Multimodal Question Answering
EMNLP 2023
Retrieving Multimodal Information for Augmented Generation: A Survey
EMNLP 2023
i-Code: An Integrative and Composable Multimodal Learning Framework
AAAI 2023
Translation between Molecules and Natural Language
EMNLP 2022
Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
CVPR 2022
Lexi: Self-Supervised Learning of the UI Language
EMNLP 2022
<
1
2
3
4
5
>