Accepted to ACL SRW 2025

Causal Language Control in Multilingual Transformers via Sparse Feature Steering

Tim Chou, George Liu

Abstract

Deterministically controlling the target generation language of large multilingual language models (LLMs) remains a fundamental challenge, particularly in zero-shot settings where neither explicit language prompts nor fine-tuning are available. We investigate whether sparse autoencoder (SAE) features can be leveraged to steer the generated language of LLMs during inference. Using pretrained SAEs on the residual streams of Gemma-2B and Gemma-9B, we identify features whose activations differ most significantly between English and four target languages: Chinese, Japanese, Spanish, and French. By modifying just a single SAE feature at one transformer layer, we achieve controlled language shifts with up to 90% success, as measured by FastText language classification, while preserving semantic fidelity according to LaBSE similarity.

Citation

Tim Chou, George Liu. "Causal Language Control in Multilingual Transformers via Sparse Feature Steering". Accepted to ACL SRW 2025.

Resources

View on arXiv

Details

Conference: Accepted to ACL SRW 2025
Authors: 2 authors

Related Publications

Explore more research from Algoverse

NeurIPS 2025 (Spotlight)

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application