Papers
10,699 papers found
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Runqi Qiao, Qiuna Tan, Guanting Dong et al.
Error-driven Data-efficient Large Multimodal Model Tuning
Barry Menglong Yao, Qifan Wang, Lifu Huang
CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?
Aashish Anantha Ramakrishnan, Aadarsh Anantha Ramakrishnan, Dongwon Lee
SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings
Weikai Lu, Hao Peng, Huiping Zhuang et al.
Do Multimodal Large Language Models Truly See What We Point At? Investigating Indexical, Iconic, and Symbolic Gesture Comprehension
Noriki Nishida, Koji Inoue, Hideki Nakayama et al.
WinSpot: GUI Grounding Benchmark with Multimodal Large Language Models
Zheng Hui, Yinheng Li, Dan Zhao et al.
UQ-Merge: Uncertainty Guided Multimodal Large Language Model Merging
Huaizhi Qu, Xinyu Zhao, Jie Peng et al.
Harnessing PDF Data for Improving Japanese Large Multimodal Models
Jeonghun Baek, Akiko Aizawa, Kiyoharu Aizawa
Shadow-Activated Backdoor Attacks on Multimodal Large Language Models
Ziyi Yin, Muchao Ye, Yuanpu Cao et al.
Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models
Kening Zheng, Junkai Chen, Yibo Yan et al.
EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models
Jiamin Su, Yibo Yan, Fangteng Fu et al.
IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web
Hongcheng Guo, Wei Zhang, Junhao Chen et al.
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
Jaewoo Lee, Keyang Xuan, Chanakya Ekbote et al.
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
You Li, Heyu Huang, Chi Chen et al.
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges
Yibo Yan, Jiamin Su, Jianxiang He et al.
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind
William Rudman, Michal Golovanevsky, Amir Bar et al.
MDIT-Bench: Evaluating the Dual-Implicit Toxicity in Large Multimodal Models
Bohan Jin, Shuhan Qi, Kehai Chen et al.
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
Zichen Wen, Yifeng Gao, Weijia Li et al.
WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code
Zhiyu Lin, Zhengda Zhou, Zhiyuan Zhao et al.
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization
Yuhan Fu, Ruobing Xie, Xingwu Sun et al.
AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual Granularity
Zhibin Lan, Liqiang Niu, Fandong Meng et al.
Look & Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation
Yunsoo Kim, Jinge Wu, Su Hwan Kim et al.
Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA
Qianqi Yan, Xuehai He, Xiang Yue et al.
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
Run Luo, Haonan Zhang, Longze Chen et al.
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Pei Fu, Tongkun Guan, Zining Wang et al.