Skip to main content

Spring Deadline: Sunday, March 1 @ 11:59pm PT. Click here to apply.

Back to Research
Accepted to NAACL SRW 2025

Truth Decay: Quantifying Multi-Turn Sycophancy in Language Models

Joshua Liu, Aarav Jain, Srihan Vege, Soham T

Abstract

Rapid improvements in large language models have unveiled a critical challenge in human-AI interaction: sycophancy. In this context, sycophancy refers to the tendency of models to excessively agree with or flatter users, often at the expense of factual accuracy. While previous studies have primarily analyzed this behavior in single-turn interactions, its persistence and evolution in multi-step conversations remain largely unexplored. We introduce Truth Decay, a benchmark specifically designed to evaluate sycophancy in extended dialogues, where language models must navigate iterative user feedback, challenges, and persuasion. By subjecting language models to multi-turn interactions, we found that sycophantic behaviors can cause accuracy drops of up to 47%, with models progressively drifting away from factual correctness under persistent user influence. We prompt models to elicit four types of sycophantic biases and propose sycophancy reduction strategies.

Citation

Joshua Liu, Aarav Jain, Srihan Vege, Soham T. "Truth Decay: Quantifying Multi-Turn Sycophancy in Language Models". Accepted to NAACL SRW 2025.

Details

Conference
Accepted to NAACL SRW 2025
Authors
4 authors

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application