Skip to main content

Spring Deadline: Sunday, March 1 @ 11:59pm PT. Click here to apply.

Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs

Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs

December 1, 2025

Large language models have demonstrated remarkable capabilities in natural language processing tasks requiring multi-step logical reasoning capabilities, such as automated theorem proving. We present ...

Accepted to NAACL SRW 2025

Authors: Vincent Li, Tim Knappe, Yule Fu

Large language models have demonstrated remarkable capabilities in natural language processing tasks requiring multi-step logical reasoning capabilities, such as automated theorem proving. We present KG-Prover, a novel framework that leverages knowledge graphs mined from reputable mathematical texts to augment general-purpose LLMs to construct and formalize mathematical proofs. We built a knowledge graph of over 60,000 nodes and 300,000 edges that represents mathematical concepts and their interrelations. We study the effects of scaling graph-based, test-time compute using KG-Prover, demonstrating significant performance improvements over baselines. General-purpose LLMs improve up to 21% on miniF2F-test when combined with KG-Prover, with consistent improvements ranging from 2-11% on the ProofNet, miniF2F-test, and MUSTARD datasets. Furthermore, KG-Prover with o4-mini achieves over 50% on miniF2F-test.

Begin Your Journey

The application takes 10 minutes and is reviewed on a rolling basis. We look for strong technical signal—projects, coursework, or competition results—and a genuine curiosity to do real research.

If admitted, you will join a structured pipeline with direct mentorship to take your work from ideation to top conference submission at venues like NeurIPS, ACL, and EMNLP.