DiversityMedQA: A Benchmark for Assessing Demographic Biases in Medical Diagnosis using Large Language Models

Rajat Rawat; Hudson McBride; Rajarshi Ghosh; Dhiyaan Nirmal; Jong Moon; Dhruv Alamuri; Sean O'Brien; Kevin Zhu

2024 EMNLP EMNLP 2024

DiversityMedQA: A Benchmark for Assessing Demographic Biases in Medical Diagnosis using Large Language Models

Abstract

AbstractAs large language models (LLMs) gain traction in healthcare, concerns about their susceptibility to demographic biases are growing. We introduce DiversityMedQA, a novel benchmark designed to assess LLM responses to medical queries across diverse patient demographics, such as gender and ethnicity. By perturbing questions from the MedQA dataset, which comprises of medical board exam questions, we created a benchmark that captures the nuanced differences in medical diagnosis across varying patient profiles. To ensure that our perturbations did not alter the clinical outcomes, we implemented a filtering strategy to validate each perturbation, so that any performance discrepancies would be indicative of bias. Our findings reveal notable discrepancies in model performance when tested against these demographic variations. By releasing DiversityMedQA, we provide a resource for evaluating and mitigating demographic bias in LLM medical diagnoses.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐣 Hot Topic Early Bird — demographic bia

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Rajat Rawat , Hudson McBride , Rajarshi Ghosh , Dhiyaan Nirmal , Jong Moon , Dhruv Alamuri , Sean O'Brien , Kevin Zhu

Topics

Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Fairness

Keywords

medical diagnosis demographic bia large language model

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024