2025 WACV WACV 2025

Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models

Abstract

We present a comprehensive three-phase study to examine (1) the cultural understanding of Large Multimodal Models (LMMs) by introducing DALLE STREET a large-scale dataset generated by DALL-E 3 and validated by humans containing 9935 images of 67 countries and 10 concept classes; (2) the underlying implicit and potentially stereotypical cultural associations with a cultural artifact extraction task; and (3) an approach to adapt cultural representation in an image based on extracted associations using a modular pipeline CULTUREADAPT. We find disparities in cultural understanding at geographic sub-region levels with both open-source (LLaVA) and closed-source (GPT-4V) models on DALLE STREET and other existing benchmarks which we try to understand using over 18000 artifacts that we identify in association to different countries. Our findings reveal a nuanced picture of the cultural competence of LMMs highlighting the need to develop culture-aware systems.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio