AI agents have proven to be highly capable tools at code generation. Yet, as we push these models to high-stakes domains, ranging from frontier research mathematics to mission-critical software, we encounter a scaling bottleneck: the human review. The time and specialized expertise required to manually verify become the primary impedance of engineering velocity.
We envision a more helpful generation of coding agents to both carry out their tasks and formally prove their implementations against strict specifications. Instead of debugging machine-generated logic, humans dictate what they want. Today, we are taking the first major step toward that vision.
Introducing Leanstral
We release Leanstral, the first open-source code agent designed for Lean 4. Lean4 is a proof assistant capable of expressing complex mathematical objects such as perfectoid spaces and software specifications like properties of Rust fragments. Unlike existing proving systems that act as wrappers around large generalist models or focus on single math problems, Leanstral is designed to be highly efficient (with 6B active parameters) and trained for operating in realistic formal repositories.
Open and accessible: We release Leanstral weights under an Apache 2.0 license, in an agent mode within Mistral vibe, and through a free API endpoint. We will also release a tech report detailing our training approach, and a new evaluation suite FLTEval, to move evaluations beyond their focus on competition math.
Efficient and mighty: We use a highly sparse architecture for Leanstral, and optimise it for proof engineering tasks. Leveraging parallel inference with Lean as a perfect verifier, Leanstral is both performant and cost-efficient against existing closed-source competitors.
Upgradable via MCP: Leanstral supports arbitrary MCPs through vibe, and was specifically trained to achieve maximal performance with the frequently used lean-lsp-mcp.
Evaluation
To reflect usefulness in realistic proof engineering scenarios, we benchmark Leanstral for completing all formal proofs and correctly defining new mathematical concepts in each PR to the FLT project, instead of isolated mathematical problems. We compare Leanstral against leading coding agents (Claude Opus 4.6, Sonnet 4.6, Haiku 4.5) and open-source models (Qwen3.5 397B-A17B, Kimi-K2.5 1T-A32B, GLM5 744B-A40B).
Leanstral vs. OSS Models
... continue reading