Accepted to Compression @ NeurIPS 2024
Authors: Zhumazhan Balapanov, Edward Magongo, Vanessa Matvei, Olivia Holmberg
We introduce QIANets (Quantum-Inspired Attention Networks), a novel architecture that leverages quantum-inspired computational principles to achieve efficient attention computation. By reformulating the attention mechanism using tensor network decompositions inspired by quantum many-body physics, we achieve sub-quadratic complexity in sequence length while maintaining model expressiveness. Our approach demonstrates significant speedups on long-context tasks, with experiments showing 3-5x inference acceleration compared to standard transformers on sequences of 8K+ tokens.

