AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark

Accepted to High School Track @ NeurIPS 2024

Authors: Abhay Gupta, Philip Meng, Ece Yurtseven

Large language models (LLMs) frequently generate plausible-sounding but factually incorrect outputs, known as hallucinations. We introduce AAVENUE (Activation Analysis for Verifying Extensive Neural Unit Explanations), a novel approach that detects hallucinations by analyzing internal model activations during generation. Our method identifies characteristic activation patterns associated with hallucinated content, enabling real-time detection without requiring external knowledge bases. AAVENUE achieves 87% accuracy on hallucination detection across diverse domains, significantly outperforming baseline approaches. We release our trained detection models and a benchmark dataset of labeled hallucinations.

AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark

Begin Your Journey