Accepted to Interplay @ COLM 2025

Universal Neurons in GPT-2: Emergence, Persistence, and Functional Impact

Advey Nandan, Tim Chou, Amrit Lalith

Abstract

We investigate the phenomenon of neuron universality in independently trained GPT-2 Small models, examining how these universal neurons—neurons with consistently correlated activations across models—emerge and evolve throughout training. By analyzing five GPT-2 models at three checkpoints (100k, 200k, 300k steps), we identify universal neurons through pairwise correlation analysis of activations over a dataset of 5 million tokens. Universal neurons emerge early, increasing consistently through training, notably in deeper layers. Universal neurons are highly stable over time, especially in later layers. Ablating universal neurons significantly increases loss and KL divergence, confirming their causal importance to model predictions. Layer-wise ablation reveals that ablating universal neurons in the first layer causes a disproportionately large increase in both KL divergence and loss.

Citation

Advey Nandan, Tim Chou, Amrit Lalith. "Universal Neurons in GPT-2: Emergence, Persistence, and Functional Impact". Accepted to Interplay @ COLM 2025.

Resources

View on arXiv

Details

Conference: Accepted to Interplay @ COLM 2025
Authors: 3 authors

Related Publications

Explore more research from Algoverse

NeurIPS 2025 (Spotlight)

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application

Universal Neurons in GPT-2: Emergence, Persistence, and Functional Impact

Abstract

Citation

Resources

Details

Related Publications

Shared Parameter Subspaces and Cross-Task Linearity in Emergently Misaligned Behavior

Scratchpad Thinking: Alternation Between Storage and Computation in Latent Reasoning Models

Universal Neurons in GPT-2: Emergence, Persistence, and Functional Impact

Publish Your Research