Why am I betting on Lean 4 for agentic coding?

December 22, 2025

Because Lean 4 can potentially solve the unreliability of AI coding agents.

AI coding agents have become really, really good. But they remain unreliable. And unreliable in ways which are very sinister, hard to detect, and extremely hard to tackle.

They can write API servers, they can write front-ends, they can write features, they can write CLIs. You ask them to add a caching module to a system? They can do that. You ask them to fix CORS issues? They can do that. They can do pretty much everything you ask them to do, but in isolation.

And what it leads to is agents which are extremely hard to rely on and extremely hard to build solid stuff upon.

They are really brilliant and yet extremely stupid in ways which are hard to measure. There is an extreme jaggedness in terms of how they behave.

Lean 4 is an interactive theorem prover. It is extensively used by mathematicians. You define axioms and theorems and do more stuff on top of it. But you can also use Lean 4 for general-purpose programming.

The way you do it is essentially you write a program as a proof.

For example, let us look at a function to sort a list:

Here is the magic: In Python or TypeScript, types describe the shape of data, for example, this is a List. In Lean, types can describe the behavior of data, for example, this is a List that is sorted.

To write that sort function in Lean, you have to do double the work:

  1. The implementation: You write the actual sorting logic, Merge Sort, Quick Sort, etc.
  2. The proof: You have to prove to the compiler that your implementation actually results in a SortedPermutation.

For a human, this double work is tedious and fairly hard. But for an AI agent, this is the missing link.

When an agent works with Lean, we fix the type definitions first. We tell it: You must produce a SortedPermutation. If the agent writes code that drops an element or fails to sort correctly, the code will not compile.

We do not need to trust the agent's logic. If the code compiles, we can blindly trust the implementation to be right, because the compiler has verified the proof.

Basically, you can codify all the invariants of the system and you can codify all the rules of the system.

At least in theory, it feels like if we can use Lean 4 well, it will remove all of these issues of agents explicitly doing stupid things.

It will remove a whole class of errors:

A huge class of logical errors and requirement errors will just go away. You will be at a much stronger place and a much more solid base to work from.

Lean 4 exposes all the language internals through LSP, which AI can use very well. Lean 4 tends to be verbose and it tends to be rigid for a human to write. It is a different kind of program writing; there is a lot of information to work with.

But AI is better equipped to deal with those things than a human.