Accepted to High School Track @ NeurIPS 2024
Authors: Abhay Gupta, Philip Meng, Ece Yurtseven
Large language models (LLMs) frequently generate plausible-sounding but factually incorrect outputs, known as hallucinations. We introduce AAVENUE (Activation Analysis for Verifying Extensive Neural Unit Explanations), a novel approach that detects hallucinations by analyzing internal model activations during generation. Our method identifies characteristic activation patterns associated with hallucinated content, enabling real-time detection without requiring external knowledge bases. AAVENUE achieves 87% accuracy on hallucination detection across diverse domains, significantly outperforming baseline approaches. We release our trained detection models and a benchmark dataset of labeled hallucinations.

