← Models

Deep Learning › Models ›

Multi-Modal Learning

115 directly classified papers

Papers per year

Papers

MIBench: Evaluating Multimodal Large Language Models over Multiple Images EMNLP 2024

AnyTrans: Translate AnyText in the Image with Large Scale Models EMNLP 2024

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models EMNLP 2024

Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features CVPR 2024

GPT4Point: A Unified Framework for Point-Language Understanding and Generation CVPR 2024

CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation CVPR 2024

LangSplat: 3D Language Gaussian Splatting CVPR 2024

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model CVPR 2024

Video ReCap: Recursive Captioning of Hour-Long Videos CVPR 2024

On the Robustness of Large Multimodal Models Against Image Adversarial Attacks CVPR 2024

SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection CVPR 2024

Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation EMNLP 2024

PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology AAAI 2024

ReMI: A Dataset for Reasoning with Multiple Images NIPS 2024

Efficient Large Multi-modal Models via Visual Context Compression NIPS 2024

Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE NIPS 2024

Multimodal Large Language Models Make Text-to-Image Generative Models Align Better NIPS 2024

Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding ACL 2023

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark NIPS 2023

Unifying Text, Tables, and Images for Multimodal Question Answering EMNLP 2023

Retrieving Multimodal Information for Augmented Generation: A Survey EMNLP 2023

i-Code: An Integrative and Composable Multimodal Learning Framework AAAI 2023

Translation between Molecules and Natural Language EMNLP 2022

Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation CVPR 2022

Lexi: Self-Supervised Learning of the UI Language EMNLP 2022