Data Story · AI × Formal Verification

When AI Learned to Prove

How we used Claude + Lean 4 to formally verify three mathematical theorems—including one of the hardest IMO problems ever posed.

T. Pavan Kumar · March 2026 · Experiment Report

3
Proof files
22
Theorems verified
0
sorry’s remaining
100%
Machine-checked

In mathematics, a proof is either correct or it is not. There is no “mostly right.” For centuries, verification depended on human reviewers—brilliant but fallible. In 2025, AI systems like Harmonic’s Aristotle proved five of six IMO problems with formal verification in Lean 4, requiring zero human checking. We set out to see how far a single developer could go with Claude Code and the same proof assistant.

The answer: from zero Lean experience to a formally verified proof of IMO 1977 Problem 5—Vieta jumping, one of the most famous and technically demanding competition problems in existence—in a single session.

This is the story of that journey.

Chapter 1

What is Lean 4, and why does it matter?

Lean 4 is an open-source programming language and proof assistant. Every theorem written in Lean must pass a strict type-checking kernel that yields a binary verdict: the statement either checks out as correct, or it doesn’t. There is no ambiguity.

This matters because AI systems hallucinate. Large language models confidently assert false information. But when an answer comes with a Lean 4 proof, you don’t have to trust the AI—you can check it. The proof is the proof.

Lean 4 -- The very first thing we checked: does Lean know -- that addition is commutative? #check Nat.add_comm -- Output: ∀ (n m : Nat), n + m = m + n ✓

Lean knows. And it can verify anything we throw at it—from the trivially obvious to problems that stumped the world’s best mathematicians.

Chapter 2

The toolchain: Claude Code + lean4-skills

Our pipeline was simple. Claude Code, Anthropic’s command-line coding agent, was augmented with the lean4-skills plugin—a structured workflow that gives the AI a prove/review/golf loop, Mathlib search, axiom checking, and safety guardrails.

The workflow for each theorem followed a consistent pattern:

Step 1
Formalize
State the theorem in natural language. Claude translates it into a Lean 4 statement with the correct Mathlib types and imports.
Step 2
Prove
Claude generates candidate proof tactics. The Lean compiler checks each step in real-time, providing error feedback that Claude uses to self-correct.
Step 3
Review
The lean4-skills plugin verifies: zero sorry’s, zero errors, zero warnings. Every declaration is machine-checked.
Step 4
Strengthen
Claude suggests generalizations and additional lemmas. Each new theorem goes through the same verify cycle.
Claude Code -- The command that started each proof: /lean4:formalize "Prove that the fraction (21n + 4) / (14n + 3) is irreducible for every natural number n." -- Claude Code's response: theorem imo1959_p1 (n : ) : Nat.Coprime (21 * n + 4) (14 * n + 3) := ... -- ✓ Compiled clean. 0 sorry's.
Chapter 3

Warmup: Digital roots—filling a gap in Mathlib

We started with something no one had formalized. The digital root of a number—what you get when you repeatedly sum its digits until a single digit remains—is a concept every mathematician knows. Yet Mathlib, the million-line mathematical library for Lean 4, had no definition of it.

Claude searched Mathlib systematically, mapped the gaps, and proposed a family of ten theorems:

Definition — New to Mathlib

digitalRoot(n)

Returns 0 for n=0, returns 9 when 9 divides n, otherwise returns n mod 9. Captures the “repeatedly sum digits” operation in a single closed-form expression.

Lean 4 · DigitalRoot.lean def digitalRoot (n : ) : := if n = 0 then 0 else if 9 ∣ n then 9 else n % 9 -- The elegant closed-form: theorem digitalRoot_eq (n : ) : digitalRoot n = if n = 0 then 0 else (n - 1) % 9 + 1 := by simp only [digitalRoot] split_ifs with h1 h2 · rfl · obtain ⟨k, hk⟩ := h2 cases k with | zero => omega | succ k => subst hk; omega · have : n % 9 ≠ 0 := mt Nat.dvd_of_mod_eq_zero h2 omega

The crown jewels of the file are two algebraic structure theorems showing that digital root preserves both addition and multiplication:

Theorem — Multiplicativity

dr(a × b) = dr(dr(a) × dr(b))

The digital root is a ring homomorphism from the naturals to {0, 1, ..., 9}. It preserves both addition and multiplication—meaning you can take digital roots at any stage of a calculation and get the same final answer.

Ten theorems. Zero sorry’s. All machine-verified. None of them existed in Mathlib before this session.

Chapter 4

The first IMO problem ever posed

In 1959, seven countries sent their best young mathematicians to Brașov, Romania, for the very first International Mathematical Olympiad. Problem 1 asked them to prove that the fraction (21n + 4) / (14n + 3) is irreducible for every natural number n.

The proof is a gem of number theory: any common divisor of the numerator and denominator must also divide their linear combination. And there exists a combination that equals exactly 1:

3 × (14n + 3) − 2 × (21n + 4) = 42n + 9 − 42n − 8 = 1
The identity that makes the proof work
Lean 4 · IMO1959P1.lean -- The core lemma: any common divisor divides 1 private lemma aux (n k : ) (h1 : k ∣ 21 * n + 4) (h2 : k ∣ 14 * n + 3) : k ∣ 1 := by have h3 : k ∣ 2 * (21 * n + 4) := h1.mul_left 2 have h4 : k ∣ 3 * (14 * n + 3) := h2.mul_left 3 have h5 : 3 * (14 * n + 3) = 2 * (21 * n + 4) + 1 := by ring exact (Nat.dvd_add_right h3).mp (h5 ▸ h4) -- The theorem: gcd(21n+4, 14n+3) = 1 theorem imo1959_p1 (n : ) : Nat.Coprime (21 * n + 4) (14 * n + 3) := Nat.dvd_one.mp (aux n _ (Nat.gcd_dvd_left _ _) (Nat.gcd_dvd_right _ _))

But Claude didn’t stop at the specific problem. It generalized the argument into a reusable lemma:

Generalization

coprime_linear_combo

For any naturals a, b, c, d: if there exist multipliers p, q with p·c = q·a and q·b + 1 = p·d, then gcd(a·n + b, c·n + d) = 1 for all n. IMO 1959 P1 is the special case p=3, q=2.

Chapter 5

Vieta jumping: the showstopper

Then we went for the hardest target. IMO 1977, Problem 5 is legendary in competition mathematics. It asks: if a·b + 1 divides a² + b² for positive integers a and b, prove the quotient is a perfect square.

The proof technique—called Vieta jumping—is a form of infinite descent that treats a as a root of a quadratic and “jumps” to the other root. It requires working in the integers, careful case analysis, well-founded induction, and transfer back to the naturals. It is, by any measure, a graduate-level formalization challenge.

Claude Code produced a complete proof in a single pass.

Lean 4 · IMO1977P5.lean · Key Lemmas -- The Vieta conjugate satisfies the same equation private lemma vieta_satisfies (a b k : ) (h : a ^ 2 + b ^ 2 = k * (a * b + 1)) : (k * b - a) ^ 2 + b ^ 2 = k * ((k * b - a) * b + 1) := by have key : (k * b - a) ^ 2 + b ^ 2 - k * ((k * b - a) * b + 1) = a ^ 2 + b ^ 2 - k * (a * b + 1) := by ring linarith -- Product formula: (kb - a) · a = b² - k private lemma vieta_prod (a b k : ) (h : a ^ 2 + b ^ 2 = k * (a * b + 1)) : (k * b - a) * a = b ^ 2 - k := by linear_combination -h

The descent argument uses strong induction on the natural-number encoding of a. At each step, the Vieta conjugate a′ = k·b − a is shown to be non-negative, strictly less than b, and satisfying the same equation. When a′ = 0, we get k = b²—a perfect square.

Lean 4 · The Descent -- Base case: a' = 0 implies k = b² by_cases ha'z : a' = 0 · have hkb2 : k = b ^ 2 := by linarith [ha'z ▸ ha'prod, mul_comm (0 : ) a] exact ⟨b, by linarith-- Inductive case: descend to (b, a') · exact ih b.toNat hb_lt_n b a' k rfl hb ha'pos (le_of_lt ha'lt) heq_sym

The final theorem bridges from integers back to natural numbers, handling the symmetry between a and b with a case split:

Lean 4 · The Main Theorem -- IMO 1977 Problem 5: the quotient is a perfect square theorem imo1977_p5 (a b : ) (ha : 0 < a) (hb : 0 < b) (hdvd : (a * b + 1) ∣ (a ^ 2 + b ^ 2)) : IsSquare ((a ^ 2 + b ^ 2) / (a * b + 1)) := ... -- ✓ Compiled clean. 0 sorry's. Fully verified.
• • •
Chapter 6

The Lean tactics that made it possible

Throughout these proofs, Claude leveraged Lean’s tactic language and Mathlib’s extensive library. Here are the key tools it reached for:

Tactic What it does Used in
omega Solves linear arithmetic over naturals and integers automatically Digital Root, IMO 1959
ring Proves equalities in commutative rings by normalization IMO 1959, IMO 1977
linarith Closes goals by linear arithmetic over ordered fields IMO 1977 (Vieta)
nlinarith Nonlinear arithmetic—handles products and squares IMO 1977 (bounds)
linear_combination Proves goals by specifying the exact linear combination of hypotheses IMO 1977 (product formula)
split_ifs Case-splits on if-then-else expressions in the goal Digital Root
Nat.strong_induction_on Well-founded induction: prove P(n) assuming P(m) for all m < n IMO 1977 (descent)

Mathlib’s API was equally critical. Lemmas like Nat.dvd_add_right, Nat.Coprime, IsSquare, and Int.toNat provided the building blocks that Claude assembled into complete proofs.

Chapter 7

Why this matters

“When our system outputs the proof, nobody has to look at it. You know it’s correct.”
Tudor Achim, CEO of Harmonic AI

We are witnessing a phase transition in mathematics. In July 2025, multiple AI systems achieved gold-medal performance at the International Mathematical Olympiad with formally verified solutions in Lean 4. Harmonic’s Aristotle. ByteDance’s Seed-Prover. Google DeepMind’s AlphaProof.

But those systems used hundreds of thousands of CPUs and billion-parameter models. What we demonstrated is different: a single developer, using Claude Code with a lean4-skills plugin, can formally verify competition-level mathematics in real time. The barrier to entry has collapsed.

This has implications far beyond mathematics. Formal verification can guarantee:

Applications

Beyond competition math

Software correctness — AWS uses Lean to verify Cedar, their authorization policy language. Cryptography — Microsoft formally verified its SymCrypt library. AI safety — instead of trusting AI outputs, you can verify them. The proof is the proof.

Summary

What we proved, and what it means

File Result Theorems Difficulty
DigitalRoot.lean Complete theory of digital roots—new to Mathlib 10 Original
IMO1959P1.lean First-ever IMO problem + generalization 6 IMO
IMO1977P5.lean Vieta jumping—perfect square quotient 6+ Legendary

Every theorem is machine-checked. Every proof compiles with zero errors and zero sorry’s. The Lean 4 kernel guarantees absolute correctness.

We did not merely use AI to write code. We used AI to discover and verify mathematical truth—truth that is guaranteed correct not by human review, but by the iron logic of a proof assistant. This is the future of trustworthy AI: not “trust me,” but “check me.”

— — —