Abstract
This study investigates skin tone biases in large language models when evaluating common dermatological conditions. We systematically analyze model performance across diverse skin tones, revealing significant performance disparities that could lead to inequitable healthcare outcomes. Our findings highlight the need for more representative training data and bias mitigation strategies in medical AI systems. [arXiv link TBA]