Accepted to High School Track @ NeurIPS 2024

AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark

Abhay Gupta, Philip Meng, Ece Yurtseven

Abstract

Large language models (LLMs) frequently generate plausible-sounding but factually incorrect outputs, known as hallucinations. We introduce AAVENUE (Activation Analysis for Verifying Extensive Neural Unit Explanations), a novel approach that detects hallucinations by analyzing internal model activations during generation. Our method identifies characteristic activation patterns associated with hallucinated content, enabling real-time detection without requiring external knowledge bases. AAVENUE achieves 87% accuracy on hallucination detection across diverse domains, significantly outperforming baseline approaches. We release our trained detection models and a benchmark dataset of labeled hallucinations.

Citation

Abhay Gupta, Philip Meng, Ece Yurtseven. "AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark". Accepted to High School Track @ NeurIPS 2024.

Resources

View on arXiv

Details

Conference: Accepted to High School Track @ NeurIPS 2024
Authors: 3 authors

Related Publications

Explore more research from Algoverse

NeurIPS 2025 (Spotlight)

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application