Accepted to Sparsity in LLMs @ ICLR 2025
Authors: Sam Mikhak, Venkata Sai Gummidi
This paper explores the use of matrix product operators (MPOs) to compress transformer-based architectures. By factorizing full-rank weight matrices into tensor-train products, MPOs reduce both memory footprint and computational cost, critical for deployment on resource-constrained devices. Experiments on speaker identification using the LibriSpeech train-clean-360 subset show that MPO-based models, and even their pruned variants, maintain high performance with far fewer parameters than full-rank transformers.

