Skip to content
Tech News
← Back to articles

Automatic Textbook Formalization

read original get AI Writing Assistant Software → more articles
Why This Matters

RepoProver introduces an innovative multi-agent framework for automating the formalization of mathematics textbooks in Lean, significantly streamlining the process of translating and verifying complex mathematical content. This advancement enhances the capabilities of formal verification tools, making large-scale formalization projects more efficient and accessible for the tech industry and educational sectors. By automating proof generation and quality review, it paves the way for more reliable and comprehensive mathematical documentation.

Key Takeaways

RepoProver

Code for Automatic Textbook Formalization (Gloeckle, Rammal, Arnal, Munos, Cabannes, Synnaeve, Hayat, 2026).

RepoProver is a multi-agent scaffold for large-scale formalization of mathematics textbooks in Lean. It orchestrates multiple LLM agents that collaborate on a shared git repository with the Lean project: sketcher agents translate definitions and theorem statements, prover agents fill in proofs, and reviewer agents enforce quality via pull request reviews. Coordination happens through a lightweight file-system-based issue tracker and a merge queue that ensures the main branch always builds.

This code produced an automatic formalization of the graduate textbook Algebraic Combinatorics by Darij Grinberg.

Setup

Requires Python 3.10+. Install in editable mode:

pip install -e .

Preparing a formalization project

RepoProver operates on a Lean project repository. Before running, you need to set up:

Create a Lean project with Mathlib and build it: lake init MyProject math lake update lake build Add LaTeX source files under a tex/ directory inside the project, organized by topic: MyProject/ ├── lakefile.lean ├── lean-toolchain ├── lake-manifest.json ├── MyProject.lean # root import file ├── MyProject/ │ └── tex/ # LaTeX source chapters │ ├── all.tex # full textbook source (optional) │ ├── Topic1/ │ │ ├── Chapter1.tex │ │ └── Chapter2.tex │ └── Topic2/ │ └── ... ├── manifest.json # chapter manifest (see below) ├── CONTENTS.md # structure documentation (see below) └── issues/ # issue tracker (see below) The tex files should be split by chapter/section so each can be assigned to a sketcher agent independently. An all.tex with the full source can be included for reference. Note that tex files are read-only — agents can read them but never modify source material. Create a CONTENTS.md at the project root documenting the structure of tex sources and corresponding Lean files. The coordinator generates an initial version from the manifest, and agents update it as the Lean codebase evolves. It serves as the central reference for project structure, proof status and architecture notes. Create a manifest.json at the project root listing the chapters to formalize and their target theorems/definitions. Each chapter entry has: id : unique identifier for the chapter

... continue reading