I have recently been playing around with AI and formal methods and built Provepy. It's an experiment to see if we can use frontier LLMs to bridge the gap between everyday Python code and formal verification in Lean.
The idea is to make formal methods more accessible by burying the complex Lean syntax behind a standard Python decorator. You add @provable to your function, give it a plain English claim, and run your code.
from provepy import provable
@provable(claim="This function returns the sum of its inputs")
def add(a: int, b: int) -> int:
return a + b
Under the hood, when the script runs, Provepy grabs the function and your claim, and passes them to an LLM (defaults to Gemma, but supports OpenRouter or any custom OpenAI-compatible endpoint). The LLM attempts to translate the claim into a Lean 4 theorem and generate the proof. Using Frontier models is recommended for best results.Generation of the theorem, code and proof is done separately to avoid the LLM proving the wrong claim or function just to succeed in the proof.
If Lean compiles and accepts the proof, your Python function executes normally. If the proof fails, your program halts with a VerificationError.
You can also pass context to the decorator if your function relies on other Python functions in your codebase. I also added a fallback mechanism—you can configure a smaller, cheaper model to try first, and if it fails to generate a valid proof, it automatically retries with a heavier model.
To be clear: this is very experimental alpha software.
It currently only works reliably on simple functions.
There's a lot left to do, like improving how context is pulled from the AST and resolving name clashes with Lean's mathlib.
Contributions are most welcome!
I'd love to hear your thoughts.