Skip to main content

Spring Deadline: Sunday, March 1 @ 11:59pm PT. Click here to apply.

Back to Research
Accepted to High School Track @ NeurIPS 2024

AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark

Abhay Gupta, Philip Meng, Ece Yurtseven

Abstract

Large language models (LLMs) frequently generate plausible-sounding but factually incorrect outputs, known as hallucinations. We introduce AAVENUE (Activation Analysis for Verifying Extensive Neural Unit Explanations), a novel approach that detects hallucinations by analyzing internal model activations during generation. Our method identifies characteristic activation patterns associated with hallucinated content, enabling real-time detection without requiring external knowledge bases. AAVENUE achieves 87% accuracy on hallucination detection across diverse domains, significantly outperforming baseline approaches. We release our trained detection models and a benchmark dataset of labeled hallucinations.

Citation

Abhay Gupta, Philip Meng, Ece Yurtseven. "AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark". Accepted to High School Track @ NeurIPS 2024.

Details

Conference
Accepted to High School Track @ NeurIPS 2024
Authors
3 authors

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application