Papers
22,524 papers found
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Jiabo Ye, Anwen Hu, Haiyang Xu et al.
Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation
Yuyang Ye, Zhi Zheng, Yishan Shen et al.
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
Junkai Chen, Zhijie Deng, Kening Zheng et al.
Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models
Xin Zhang, Yanzhao Zhang, Wen Xie et al.
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Wanhua Li, Renping Zhou, Jiawei Zhou et al.
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
Ziyue Wang, Yurui Dong, Fuwen Luo et al.
Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
Shuo Chen, Zhen Han, Bailan He et al.
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
Leyang Shen, Gongwei Chen, Rui Shao et al.
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Jianzong Wu, Xiangtai Li, Chenyang Si et al.
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
Xianfu Cheng, Wei Zhang, Shiwei Zhang et al.
SignAlignLM: Integrating Multimodal Sign Language Processing into Large Language Models
Mert Inan, Anthony Sicilia, Malihe Alikhani
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
Haochen Xue, Feilong Tang, Ming Hu et al.
Multimodal Causal Reasoning Benchmark: Challenging Multimodal Large Language Models to Discern Causal Links Across Modalities
Zhiyuan Li, Heng Wang, Dongnan Liu et al.
Exploring and Evaluating Multimodal Knowledge Reasoning Consistency of Multimodal Large Language Models
Boyu Jia, Junzhe Zhang, Huixuan Zhang et al.
Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression
Roy H. Jennings, Genady Paikin, Roy Shaul et al.
Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
Wenxuan Huang, Zijie Zhai, Yunhang Shen et al.
Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models
Teng Ma, Xiaojun Jia, Ranjie Duan et al.
Looking Beyond Text: Reducing Language Bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance
Haozhe Zhao, Shuzheng Si, Liang Chen et al.
Enhancing Large Language Models for Scientific Multimodal Summarization with Multimodal Output
Zusheng Tan, Xinyi Zhong, Jing-Yu Ji et al.
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu, Muyan Zhong, Sen Xing et al.
MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Language Models
Jiahao Huo, Yibo Yan, Xu Zheng et al.
Can We Trust AI Doctors? A Survey of Medical Hallucination in Large Language and Large Vision-Language Models
Zhihong Zhu, Yunyan Zhang, Xianwei Zhuang et al.
LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models
Zhenyue Qin, Yu Yin, Dylan Campbell et al.
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
Hengyi Wang, Haizhou Shi, Shiwei Tan et al.
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu, Zeyang Zhou, Kexin Huang et al.