Accepted to Sparsity in LLMs @ ICLR 2025

Efficient Transformers via MPO-Based Low-Rank Factorization and Pruning

Sam Mikhak, Venkata Sai Gummidi

Abstract

This paper explores the use of matrix product operators (MPOs) to compress transformer-based architectures. By factorizing full-rank weight matrices into tensor-train products, MPOs reduce both memory footprint and computational cost, critical for deployment on resource-constrained devices. Experiments on speaker identification using the LibriSpeech train-clean-360 subset show that MPO-based models, and even their pruned variants, maintain high performance with far fewer parameters than full-rank transformers.

Citation

Sam Mikhak, Venkata Sai Gummidi. "Efficient Transformers via MPO-Based Low-Rank Factorization and Pruning". Accepted to Sparsity in LLMs @ ICLR 2025.

Resources

OpenReview

Details

Conference: Accepted to Sparsity in LLMs @ ICLR 2025
Authors: 2 authors

Related Publications

Explore more research from Algoverse

NeurIPS 2025 (Spotlight)

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application