EnDive: A Cross-Dialect Benchmark for Fairness and Performance in Large Language Models

December 1, 2025

Accepted to Building Trust in LLMs @ ICLR 2025

Authors: Abhay Gupta, Jacob Cheung, Philip Meng, Shayan Sayyed

We present EnDive (English Dialect Variability Evaluation), a benchmark designed to assess the fairness and robustness of large language models across English dialects. EnDive spans five major English dialects—African American Vernacular English (AAVE), Indian English, British English, Australian English, and Standard American English—covering tasks including sentiment analysis, natural language inference, and question answering. We find significant performance disparities across dialects, with models consistently underperforming on AAVE and Indian English inputs.

Begin Your Journey

The application takes 10 minutes and is reviewed on a rolling basis. We look for strong technical signal—projects, coursework, or competition results—and a genuine curiosity to do real research.

If admitted, you will join a structured pipeline with direct mentorship to take your work from ideation to top conference submission at venues like NeurIPS, ACL, and EMNLP.

Begin Application Financial Aid