Accepted to NAACL SRW 2025
Authors: Joshua Liu, Aarav Jain, Srihan Vege, Soham T
Rapid improvements in large language models have unveiled a critical challenge in human-AI interaction: sycophancy. In this context, sycophancy refers to the tendency of models to excessively agree with or flatter users, often at the expense of factual accuracy. While previous studies have primarily analyzed this behavior in single-turn interactions, its persistence and evolution in multi-step conversations remain largely unexplored. We introduce Truth Decay, a benchmark specifically designed to evaluate sycophancy in extended dialogues, where language models must navigate iterative user feedback, challenges, and persuasion. By subjecting language models to multi-turn interactions, we found that sycophantic behaviors can cause accuracy drops of up to 47%, with models progressively drifting away from factual correctness under persistent user influence. We prompt models to elicit four types of sycophantic biases and propose sycophancy reduction strategies.

