Papers
22,524 papers found
Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks?
Simeon Junker, Manar Ali, Larissa Koch et al.
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?
Yingjin Song, Yupei Du, Denis Paperno et al.
MMRefine: Unveiling the Obstacles to Robust Refinement in Multimodal Large Language Models
Gio Paik, Geewook Kim, Jinbae Im
Adaptive Linguistic Prompting (ALP) Enhances Phishing Webpage Detection in Multimodal Large Language Models
Atharva Bhargude, Ishan Gonehal, Dave Yoon et al.
Coling-UniA at SciVQA 2025: Few-Shot Example Retrieval and Confidence-Informed Ensembling for Multimodal Large Language Models
Christian Jaumann, Annemarie Friedrich, Rainer Lienhart
Probing Multimodal Large Language Models for Global and Local Semantic Representations
Mingxu Tao, Quzhe Huang, Kun Xu et al.
MLLM-I2W: Harnessing Multimodal Large Language Model for Zero-Shot Composed Image Retrieval
Tong Bao, Che Liu, Derong Xu et al.
Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models
Zijun Chen, Wenbo Hu, Guande He et al.
Context-Informed Machine Translation of Manga using Multimodal Large Language Models
Philip Lippmann, Konrad Skublicki, Joshua Tanner et al.
RRHF-V: Ranking Responses to Mitigate Hallucinations in Multimodal Large Language Models with Human Feedback
Guoqing Chen, Fu Zhang, Jinghao Lin et al.
Unveiling Fake News with Adversarial Arguments Generated by Multimodal Large Language Models
Xiaofan Zheng, Minnan Luo, Xinghao Wang
LLaVA-RE: Binary Image-Text Relevancy Evaluation with Multimodal Large Language Model
Tao Sun, Oliver Liu, JinJin Li et al.
A Multimodal Large Language Model “Foresees” Objects Based on Verb Information but Not Gender
Shuqi Wang, Xufeng Duan, Zhenguang Cai
LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen, Leyang Shen, Rui Shao et al.
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain, Jianwei Yang, Humphrey Shi
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Chaoya Jiang, Haiyang Xu, Mengfan Dong et al.
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li, Mingxu Zhang, Yiran Geng et al.
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Yuzhou Huang, Liangbin Xie, Xintao Wang et al.
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren, Linli Yao, Shicheng Li et al.
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Peng Qi, Zehong Yan, Wynne Hsu et al.
GSVA: Generalized Segmentation via Multimodal Large Language Models
Zhuofan Xia, Dongchen Han, Yizeng Han et al.
Cloud-Device Collaborative Learning for Multimodal Large Language Models
Guanqun Wang, Jiaming Liu, Chenxuan Li et al.
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
Yichi Zhang, Yinpeng Dong, Siyuan Zhang et al.
SEED-Bench: Benchmarking Multimodal Large Language Models
Bohao Li, Yuying Ge, Yixiao Ge et al.
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
Ziang Yan, Zhilin Li, Yinan He et al.