Accepted to AIM-FM @ NeurIPS 2024

DiversityMedQA: Assessing Demographic Biases in Medical Diagnosis using LLMs

Rajat Rawat, Hudson McBride, Rajarshi Ghosh, Dhiyaan Nirmal, Jong Moon, Dhruv Alamuri

Abstract

As large language models (LLMs) gain traction in healthcare, concerns about their susceptibility to demographic biases are growing. We introduce DiversityMedQA, a novel benchmark designed to assess LLM responses to medical queries across diverse patient demographics, such as gender and ethnicity. By perturbing questions from the MedQA dataset, which comprises of medical board exam questions, we created a benchmark that captures the nuanced differences in medical diagnosis across varying patient profiles. To ensure that our perturbations did not alter the clinical outcomes, we implemented a filtering strategy to validate each perturbation, so that any performance discrepancies would be indicative of bias. Our findings reveal notable discrepancies in model performance when tested against these demographic variations. By releasing DiversityMedQA, we provide a resource for evaluating and mitigating demographic bias in LLM medical diagnoses.

Citation

Rajat Rawat, Hudson McBride, Rajarshi Ghosh, Dhiyaan Nirmal, Jong Moon, Dhruv Alamuri. "DiversityMedQA: Assessing Demographic Biases in Medical Diagnosis using LLMs". Accepted to AIM-FM @ NeurIPS 2024.

Resources

View on arXiv

Details

Conference: Accepted to AIM-FM @ NeurIPS 2024
Authors: 6 authors

Related Publications

Explore more research from Algoverse

NeurIPS 2025 (Spotlight)

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application