Skip to main content

Summer Deadline: Sunday, March 29 @ 11:59pm PT. Click to apply.

Back to Research
Accepted to Mech Interp @ NeurIPS 2025

Mitigating Sycophancy in Language Models via Sparse Activation Fusion and Multi-Layer Activation Steering

Pyae Phoo Min, Avigya Paudel, Naufal Adityo, Arthur Zhu, Andrew Rufail

Abstract

Abstract coming soon. This paper has been accepted but the arXiv preprint is not yet available.

Citation

Pyae Phoo Min, Avigya Paudel, Naufal Adityo, Arthur Zhu, Andrew Rufail. "Mitigating Sycophancy in Language Models via Sparse Activation Fusion and Multi-Layer Activation Steering". Accepted to Mech Interp @ NeurIPS 2025.

Resources

Details

Conference
Accepted to Mech Interp @ NeurIPS 2025
Authors
5 authors

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application