Accepted to Building Trust in LLMs @ ICLR 2025
Authors: Abhay Gupta, Jacob Cheung, Philip Meng, Shayan Sayyed
We present EnDive (English Dialect Variability Evaluation), a benchmark designed to assess the fairness and robustness of large language models across English dialects. EnDive spans five major English dialects—African American Vernacular English (AAVE), Indian English, British English, Australian English, and Standard American English—covering tasks including sentiment analysis, natural language inference, and question answering. We find significant performance disparities across dialects, with models consistently underperforming on AAVE and Indian English inputs.

