Accepted to NAACL SRW 2025

Truth Decay: Quantifying Multi-Turn Sycophancy in Language Models

Joshua Liu, Aarav Jain, Srihan Vege, Soham T

Abstract

Rapid improvements in large language models have unveiled a critical challenge in human-AI interaction: sycophancy. In this context, sycophancy refers to the tendency of models to excessively agree with or flatter users, often at the expense of factual accuracy. While previous studies have primarily analyzed this behavior in single-turn interactions, its persistence and evolution in multi-step conversations remain largely unexplored. We introduce Truth Decay, a benchmark specifically designed to evaluate sycophancy in extended dialogues, where language models must navigate iterative user feedback, challenges, and persuasion. By subjecting language models to multi-turn interactions, we found that sycophantic behaviors can cause accuracy drops of up to 47%, with models progressively drifting away from factual correctness under persistent user influence. We prompt models to elicit four types of sycophantic biases and propose sycophancy reduction strategies.

Citation

Joshua Liu, Aarav Jain, Srihan Vege, Soham T. "Truth Decay: Quantifying Multi-Turn Sycophancy in Language Models". Accepted to NAACL SRW 2025.

Resources

View on arXiv

Details

Conference: Accepted to NAACL SRW 2025
Authors: 4 authors

Related Publications

Explore more research from Algoverse

NeurIPS 2025 (Spotlight)

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application