Papers
10,699 papers found
Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models
Jiaqi Li, Qianshan Wei, Chuanyi Zhang et al.
ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model
Yiming Sun, Fan Yu, Shaoxiang Chen et al.
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Mingrui Wu, Xinyue Cai, Jiayi Ji et al.
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
Ziqiang Liu, Feiteng Fang, Xi Feng et al.
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models
Yichi Zhang, Yao Huang, Yitong Sun et al.
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Yang Yue, Yulin Wang, Bingyi Kang et al.
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu, Muyan Zhong, Sen Xing et al.
Multimodal Large Language Models Make Text-to-Image Generative Models Align Better
Xun Wu, Shaohan Huang, Guolong Wang et al.
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Yang Jiao, Shaoxiang Chen, Zequn Jie et al.
WhodunitBench: Evaluating Large Multimodal Agents via Murder Mystery Games
Junlin Xie, Ruifei Zhang, Zhihong Chen et al.
Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials
Ye Fang, Zeyi Sun, Tong Wu et al.
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
Chaoya Jiang, Hongrui Jia, Haiyang Xu et al.
Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models
Baao Xie, Qiuyu Chen, Yunnan Wang et al.
RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
Haoyu Chen, Wenbo Li, Jinjin Gu et al.
A Concept-Based Explainability Framework for Large Multimodal Models
Jayneel Parekh, Pegah Khayatan, Mustafa Shukor et al.
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
Junxian Li, Di Zhang, Xunzhi Wang et al.
ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO
Daechul Ahn, Yura Choi, San Kim et al.
Graphic Design with Large Multimodal Model
Yutao Cheng, Zhao Zhang, Maoke Yang et al.
AIM: Let Any Multimodal Large Language Models Embrace Efficient In-Context Learning
Jun Gao, Qian Qiao, Tianxiang Wu et al.
Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine
Xiaoshuang Huang, Lingdong Shen, Jia Liu et al.
Medical MLLM Is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models
Xijie Huang, Xinyuan Wang, Hantao Zhang et al.
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
Zhihang Lin, Mingbao Lin, Luxi Lin et al.
Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models
Chutian Meng, Fan Ma, Jiaxu Miao et al.
ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models
Yeji Park, Deokyeong Lee, Junsuk Choe et al.
Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution
Wentao Tan, Qiong Cao, Yibing Zhan et al.