← Models

Deep Learning › Models ›

Multimodal Learning

24 directly classified papers

Papers per year

Papers

Model-free Domain Adaptation for Concealed Multimodal Large-Language Models WACV 2026

ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs’ Capability via Chart Editing ACL 2025

MLAN: Language-Based Instruction Tuning Preserves and Transfers Knowledge in Multimodal Language Models ACL 2025

Benchmarking Table Extraction: Multimodal LLMs vs Traditional OCR ACL 2025

On Domain-Adaptive Post-Training for Multimodal Large Language Models EMNLP 2025

The Photographer's Eye: Teaching Multimodal Large Language Models to See, and Critique Like Photographers CVPR 2025

GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model CVPR 2025

UniEDU: Toward Unified and Efficient Large Multimodal Models for Educational Tasks EMNLP 2025

mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus ACL 2025

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension ACL 2025

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models NIPS 2024

Multimodal Instruction Tuning with Conditional Mixture of LoRA ACL 2024

L+M-24: Building a Dataset for Language+Molecules @ ACL 2024 ACL 2024

ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation ACL 2024

PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes EMNLP 2024

MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding EMNLP 2024

Make Prompts Adaptable: Bayesian Modeling for Vision-Language Prompt Learning with Data-Dependent Prior AAAI 2024

Mind Reader: Reconstructing complex images from brain activities NIPS 2022

CapOnImage: Context-driven Dense-Captioning on Image EMNLP 2022

ViLMedic: a framework for research at the intersection of vision and language in medical AI ACL 2022

ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERT EMNLP 2020

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning EMNLP 2020

Learning to Represent Image and Text with Denotation Graph EMNLP 2020

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors AAAI 2019