PerVL-Bench: Benchmarking Multimodal Personalization for Large Vision-Language Models

Minsung Kim

2026 WACV WACV 2026

PerVL-Bench: Benchmarking Multimodal Personalization for Large Vision-Language Models

Abstract

In recent years, personalization, which utilizes user-specific data to generate tailored responses, has been increasingly adopted in user-centric domains. However, while Large Language Models (LLMs) are actively researched, the exploration of the personalization capabilities of Large Vision-Language Models (LVLMs) remains limited. To systematically evaluate the personalization ability of LVLMs, we introduce PerVL-Bench, a synthetic benchmark specifically designed for this purpose. PerVL-Bench incorporates user-specific data, including multiple images and long text information, and provides two types of QA pairs. Furthermore, we use PerVL-Bench to comprehensively evaluate the essential capabilities for personalization in current state-of-the-art LVLMs. Through this evaluation, we reveal the limitations of current models in multimodal personalization and provide insights for the development of personalized LVLMs. We publicly release PerVL-Bench and the accompanying experimental code to advance future research: https://github.com/MSungK/PerVL-Bench

🧭 Keyword Pioneer — user-specific datum

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio