Why am I betting on Lean 4 for agentic coding?

By Harsh Gupta · December 22, 2025

Because Lean 4 can potentially solve the unreliability of AI coding agents.

AI coding agents have become really, really good. But they remain unreliable. And unreliable in ways which are very sinister, hard to detect, and extremely hard to tackle.

They can write API servers, they can write front-ends, they can write features, they can write CLIs. You ask them to add a caching module to a system? They can do that. You ask them to fix CORS issues? They can do that. They can do pretty much everything you ask them to do, but in isolation.

And what it leads to is agents which are extremely hard to rely on and extremely hard to build solid stuff upon.

When you are writing a web server and you have a caching layer written, you add a new API. You do not want to be at a place where you suddenly just do not have caching anymore.
But these things happen quite a lot with AI agents.
They will write one thing and they will fumble something else.
They will forget the whole project semantics.

They are really brilliant and yet extremely stupid in ways which are hard to measure. There is an extreme jaggedness in terms of how they behave.

Lean 4 is an interactive theorem prover. It is extensively used by mathematicians. You define axioms and theorems and do more stuff on top of it. But you can also use Lean 4 for general-purpose programming.

The way you do it is essentially you write a program as a proof.

For example, let us look at a function to sort a list:

The general procedural way: Iterate through the array, compare index i and i + 1, swap if needed.
The declarative way: list.sort(). Trust the library.
The proof way, dependent types: You define a type called SortedPermutation. This type represents a list that is verified to be sorted and contains exactly the same elements as the input. You define the function signature so that it takes a List and must return a SortedPermutation. You write the code. You write a proof that your code has the properties you want.

Here is the magic: In Python or TypeScript, types describe the shape of data, for example, this is a List. In Lean, types can describe the behavior of data, for example, this is a List that is sorted.

To write that sort function in Lean, you have to do double the work:

The implementation: You write the actual sorting logic, Merge Sort, Quick Sort, etc.
The proof: You have to prove to the compiler that your implementation actually results in a SortedPermutation.

For a human, this double work is tedious and fairly hard. But for an AI agent, this is the missing link.

When an agent works with Lean, we fix the type definitions first. We tell it: You must produce a SortedPermutation. If the agent writes code that drops an element or fails to sort correctly, the code will not compile.

We do not need to trust the agent's logic. If the code compiles, we can blindly trust the implementation to be right, because the compiler has verified the proof.

Basically, you can codify all the invariants of the system and you can codify all the rules of the system.

If you forget caching a certain API endpoint, you will know that immediately. Because the compiler and the system keep track of all these invariants.
You can build newer rules, make them part of the system, build newer properties, make them part of the system.

At least in theory, it feels like if we can use Lean 4 well, it will remove all of these issues of agents explicitly doing stupid things.

It will remove a whole class of errors:

Similar to how all memory management errors are gone with modern languages like Rust.
Similar to how works on my machine errors are gone with Docker.
Similar to how type errors go away with strong typing systems.
Similar to how dependency errors go away with good package managers.

A huge class of logical errors and requirement errors will just go away. You will be at a much stronger place and a much more solid base to work from.

Lean 4 exposes all the language internals through LSP, which AI can use very well. Lean 4 tends to be verbose and it tends to be rigid for a human to write. It is a different kind of program writing; there is a lot of information to work with.

But AI is better equipped to deal with those things than a human.

They can deal with much larger contexts.
They can deal with much more text.
They do not get tired writing the boring proofs that humans hate.