Colorism in Multimodal AI: An Empirical Exploration of Socioeconomic Linguistic Bias in Text-to-Image Generation

Raj Gaurav Maurya; Vaibhav Shukla; Sreedath Panat

2026 EACL EACL 2026

Colorism in Multimodal AI: An Empirical Exploration of Socioeconomic Linguistic Bias in Text-to-Image Generation

Abstract

AbstractThe recent rapid real-world adoption of vision-language models (VLMs) raises concerns about how social biases encoded in language may propagate into visual generation. In this work, we examine whether socioeconomic stereotypes, expressed through occupation and income-related linguistic cues in prompts, systematically influences skin-tone representations in text-to-image (T2I) generation, with a focus on colorism as a visual marker of social inequality. We first benchmark 3 small VLMs and 60 human annotators on the Monk Skin Tone (MST) scale using the MST-E dataset. We then conduct a large-scale T2I generation study in which we systematically vary the linguistic framing of income in prompts describing 210 occupations, producing over 2,500 portraits across 3 large VLMs. The skin-tone audit of the portraits by the best-performing annotator (GPT-5 mini) reveals strong color bias: high-income prompts consistently produce lighter-skinned faces, with prompt constraints only modestly attenuating this effect. Bias magnitude varies across generators, with GPT-5 Image-mini and Gemini-2.5 Flash-Image exhibiting more pronounced shifts in MST than Grok-2 Image. Our findings indicate that VLMs encode and amplify ethnoracialized socioeconomic stereotypes in language-conditioned image generation, underscoring the need for cross-modal fairness audits and human-centered evaluations.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — facial recognition bia

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Raj Gaurav Maurya , Vaibhav Shukla , Sreedath Panat

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Fairness Deep Learning > Architectures > Transformers

Keywords

text-to-image generation vision-language model socioeconomic bia skin tone facial recognition bia

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026