Accepted to ER @ NeurIPS 2025
Authors: Nathan Egbuna, Saatvik Gaur
Current test-time optimization methods require 10-100x more compute per query than standard decoding. We propose Amortized Latent Steering (ALS), which collapses iterative test-time optimization into a single offline-computed vector applied at constant cost during inference. ALS computes the mean difference between hidden states from successful versus unsuccessful generations, then uses this direction to calibrate the model hidden representations. Across GSM8K and MATH-500 benchmarks, ALS achieves 2-5x speedup over iterative methods while matching or surpassing greedy Chain-of-Thought and Self-Consistency baselines, yielding up to 101% improvement in efficiency-accuracy trade-off.

