Papers
10,699 papers found
Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks?
Simeon Junker, Manar Ali, Larissa Koch et al.
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?
Yingjin Song, Yupei Du, Denis Paperno et al.
MMRefine: Unveiling the Obstacles to Robust Refinement in Multimodal Large Language Models
Gio Paik, Geewook Kim, Jinbae Im
Adaptive Linguistic Prompting (ALP) Enhances Phishing Webpage Detection in Multimodal Large Language Models
Atharva Bhargude, Ishan Gonehal, Dave Yoon et al.
Coling-UniA at SciVQA 2025: Few-Shot Example Retrieval and Confidence-Informed Ensembling for Multimodal Large Language Models
Christian Jaumann, Annemarie Friedrich, Rainer Lienhart
Probing Multimodal Large Language Models for Global and Local Semantic Representations
Mingxu Tao, Quzhe Huang, Kun Xu et al.
MLLM-I2W: Harnessing Multimodal Large Language Model for Zero-Shot Composed Image Retrieval
Tong Bao, Che Liu, Derong Xu et al.
Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models
Zijun Chen, Wenbo Hu, Guande He et al.
Context-Informed Machine Translation of Manga using Multimodal Large Language Models
Philip Lippmann, Konrad Skublicki, Joshua Tanner et al.
RRHF-V: Ranking Responses to Mitigate Hallucinations in Multimodal Large Language Models with Human Feedback
Guoqing Chen, Fu Zhang, Jinghao Lin et al.
Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection
Jinfa Huang, Jinsheng Pan, Zhongwei Wan et al.
Unveiling Fake News with Adversarial Arguments Generated by Multimodal Large Language Models
Xiaofan Zheng, Minnan Luo, Xinghao Wang
LLaVA-RE: Binary Image-Text Relevancy Evaluation with Multimodal Large Language Model
Tao Sun, Oliver Liu, JinJin Li et al.
A Multimodal Large Language Model “Foresees” Objects Based on Verb Information but Not Gender
Shuqi Wang, Xufeng Duan, Zhenguang Cai
CIMI4D: A Large Multimodal Climbing Motion Dataset Under Human-Scene Interactions
Ming Yan, Xin Wang, Yudi Dai et al.
LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen, Leyang Shen, Rui Shao et al.
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain, Jianwei Yang, Humphrey Shi
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai, Haotian Liu, Siva Karthik Mustikovela et al.
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Chaoya Jiang, Haiyang Xu, Mengfan Dong et al.
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li, Mingxu Zhang, Yiran Geng et al.
GLaMM: Pixel Grounding Large Multimodal Model
Hanoona Rasheed, Muhammad Maaz, Sahal Shaji et al.
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Yuzhou Huang, Liangbin Xie, Xintao Wang et al.
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren, Linli Yao, Shicheng Li et al.
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang et al.
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Jianzong Wu, Xiangtai Li, Chenyang Si et al.