Accepted to Building Trust in LLMs @ ICLR 2025

EnDive: A Cross-Dialect Benchmark for Fairness and Performance in Large Language Models

Abhay Gupta, Jacob Cheung, Philip Meng, Shayan Sayyed

Abstract

We present EnDive (English Dialect Variability Evaluation), a benchmark designed to assess the fairness and robustness of large language models across English dialects. EnDive spans five major English dialects—African American Vernacular English (AAVE), Indian English, British English, Australian English, and Standard American English—covering tasks including sentiment analysis, natural language inference, and question answering. We find significant performance disparities across dialects, with models consistently underperforming on AAVE and Indian English inputs.

Citation

Abhay Gupta, Jacob Cheung, Philip Meng, Shayan Sayyed. "EnDive: A Cross-Dialect Benchmark for Fairness and Performance in Large Language Models". Accepted to Building Trust in LLMs @ ICLR 2025.

Resources

View on arXiv

Details

Conference: Accepted to Building Trust in LLMs @ ICLR 2025
Authors: 4 authors

Related Publications

Explore more research from Algoverse

NeurIPS 2025 (Spotlight)

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application