Skip to main content

Spring Deadline: Sunday, March 1 @ 11:59pm PT. Click here to apply.

Back to Research
Accepted to Compression @ NeurIPS 2024

QIANets for Reduced Latency and Improved Inference Times in CNN Models

Zhumazhan Balapanov, Edward Magongo, Vanessa Matvei, Olivia Holmberg

Abstract

We introduce QIANets (Quantum-Inspired Attention Networks), a novel architecture that leverages quantum-inspired computational principles to achieve efficient attention computation. By reformulating the attention mechanism using tensor network decompositions inspired by quantum many-body physics, we achieve sub-quadratic complexity in sequence length while maintaining model expressiveness. Our approach demonstrates significant speedups on long-context tasks, with experiments showing 3-5x inference acceleration compared to standard transformers on sequences of 8K+ tokens.

Citation

Zhumazhan Balapanov, Edward Magongo, Vanessa Matvei, Olivia Holmberg. "QIANets for Reduced Latency and Improved Inference Times in CNN Models". Accepted to Compression @ NeurIPS 2024.

Details

Conference
Accepted to Compression @ NeurIPS 2024
Authors
4 authors

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application