Abstract
Large language models have demonstrated remarkable capabilities in natural language processing tasks requiring multi-step logical reasoning capabilities, such as automated theorem proving. We present KG-Prover, a novel framework that leverages knowledge graphs mined from reputable mathematical texts to augment general-purpose LLMs to construct and formalize mathematical proofs. We built a knowledge graph of over 60,000 nodes and 300,000 edges that represents mathematical concepts and their interrelations. We study the effects of scaling graph-based, test-time compute using KG-Prover, demonstrating significant performance improvements over baselines. General-purpose LLMs improve up to 21% on miniF2F-test when combined with KG-Prover, with consistent improvements ranging from 2-11% on the ProofNet, miniF2F-test, and MUSTARD datasets. Furthermore, KG-Prover with o4-mini achieves over 50% on miniF2F-test.