Efficient Transformers via MPO-Based Low-Rank Factorization and Pruning

December 1, 2025

Accepted to Sparsity in LLMs @ ICLR 2025

Authors: Sam Mikhak, Venkata Sai Gummidi

This paper explores the use of matrix product operators (MPOs) to compress transformer-based architectures. By factorizing full-rank weight matrices into tensor-train products, MPOs reduce both memory footprint and computational cost, critical for deployment on resource-constrained devices. Experiments on speaker identification using the LibriSpeech train-clean-360 subset show that MPO-based models, and even their pruned variants, maintain high performance with far fewer parameters than full-rank transformers.

Begin Your Journey

The application takes 10 minutes and is reviewed on a rolling basis. We look for strong technical signal—projects, coursework, or competition results—and a genuine curiosity to do real research.

If admitted, you will join a structured pipeline with direct mentorship to take your work from ideation to top conference submission at venues like NeurIPS, ACL, and EMNLP.

Begin Application Financial Aid