In mathematics, a proof is either correct or it is not. There is no “mostly right.” For centuries, verification depended on human reviewers—brilliant but fallible. In 2025, AI systems like Harmonic’s Aristotle proved five of six IMO problems with formal verification in Lean 4, requiring zero human checking. We set out to see how far a single developer could go with Claude Code and the same proof assistant.
The answer: from zero Lean experience to a formally verified proof of IMO 1977 Problem 5—Vieta jumping, one of the most famous and technically demanding competition problems in existence—in a single session.
This is the story of that journey.
What is Lean 4, and why does it matter?
Lean 4 is an open-source programming language and proof assistant. Every theorem written in Lean must pass a strict type-checking kernel that yields a binary verdict: the statement either checks out as correct, or it doesn’t. There is no ambiguity.
This matters because AI systems hallucinate. Large language models confidently assert false information. But when an answer comes with a Lean 4 proof, you don’t have to trust the AI—you can check it. The proof is the proof.
Lean knows. And it can verify anything we throw at it—from the trivially obvious to problems that stumped the world’s best mathematicians.
The toolchain: Claude Code + lean4-skills
Our pipeline was simple. Claude Code, Anthropic’s command-line coding agent, was augmented with the lean4-skills plugin—a structured workflow that gives the AI a prove/review/golf loop, Mathlib search, axiom checking, and safety guardrails.
The workflow for each theorem followed a consistent pattern:
Warmup: Digital roots—filling a gap in Mathlib
We started with something no one had formalized. The digital root of a number—what you get when you repeatedly sum its digits until a single digit remains—is a concept every mathematician knows. Yet Mathlib, the million-line mathematical library for Lean 4, had no definition of it.
Claude searched Mathlib systematically, mapped the gaps, and proposed a family of ten theorems:
digitalRoot(n)
Returns 0 for n=0, returns 9 when 9 divides n, otherwise returns n mod 9. Captures the “repeatedly sum digits” operation in a single closed-form expression.
The crown jewels of the file are two algebraic structure theorems showing that digital root preserves both addition and multiplication:
dr(a × b) = dr(dr(a) × dr(b))
The digital root is a ring homomorphism from the naturals to {0, 1, ..., 9}. It preserves both addition and multiplication—meaning you can take digital roots at any stage of a calculation and get the same final answer.
Ten theorems. Zero sorry’s. All machine-verified. None of them existed in Mathlib before this session.
The first IMO problem ever posed
In 1959, seven countries sent their best young mathematicians to Brașov, Romania, for the very first International Mathematical Olympiad. Problem 1 asked them to prove that the fraction (21n + 4) / (14n + 3) is irreducible for every natural number n.
The proof is a gem of number theory: any common divisor of the numerator and denominator must also divide their linear combination. And there exists a combination that equals exactly 1:
3 × (14n + 3) − 2 × (21n + 4) = 42n + 9 − 42n − 8 = 1The identity that makes the proof work
But Claude didn’t stop at the specific problem. It generalized the argument into a reusable lemma:
coprime_linear_combo
For any naturals a, b, c, d: if there exist multipliers p, q with p·c = q·a and q·b + 1 = p·d, then gcd(a·n + b, c·n + d) = 1 for all n. IMO 1959 P1 is the special case p=3, q=2.
Vieta jumping: the showstopper
Then we went for the hardest target. IMO 1977, Problem 5 is legendary in competition mathematics. It asks: if a·b + 1 divides a² + b² for positive integers a and b, prove the quotient is a perfect square.
The proof technique—called Vieta jumping—is a form of infinite descent that treats a as a root of a quadratic and “jumps” to the other root. It requires working in the integers, careful case analysis, well-founded induction, and transfer back to the naturals. It is, by any measure, a graduate-level formalization challenge.
Claude Code produced a complete proof in a single pass.
The descent argument uses strong induction on the natural-number encoding of a. At each step, the Vieta conjugate a′ = k·b − a is shown to be non-negative, strictly less than b, and satisfying the same equation. When a′ = 0, we get k = b²—a perfect square.
The final theorem bridges from integers back to natural numbers, handling the symmetry between a and b with a case split:
The Lean tactics that made it possible
Throughout these proofs, Claude leveraged Lean’s tactic language and Mathlib’s extensive library. Here are the key tools it reached for:
| Tactic | What it does | Used in |
|---|---|---|
omega |
Solves linear arithmetic over naturals and integers automatically | Digital Root, IMO 1959 |
ring |
Proves equalities in commutative rings by normalization | IMO 1959, IMO 1977 |
linarith |
Closes goals by linear arithmetic over ordered fields | IMO 1977 (Vieta) |
nlinarith |
Nonlinear arithmetic—handles products and squares | IMO 1977 (bounds) |
linear_combination |
Proves goals by specifying the exact linear combination of hypotheses | IMO 1977 (product formula) |
split_ifs |
Case-splits on if-then-else expressions in the goal | Digital Root |
Nat.strong_induction_on |
Well-founded induction: prove P(n) assuming P(m) for all m < n | IMO 1977 (descent) |
Mathlib’s API was equally critical. Lemmas like Nat.dvd_add_right, Nat.Coprime, IsSquare, and Int.toNat provided the building blocks that Claude assembled into complete proofs.
Why this matters
“When our system outputs the proof, nobody has to look at it. You know it’s correct.”Tudor Achim, CEO of Harmonic AI
We are witnessing a phase transition in mathematics. In July 2025, multiple AI systems achieved gold-medal performance at the International Mathematical Olympiad with formally verified solutions in Lean 4. Harmonic’s Aristotle. ByteDance’s Seed-Prover. Google DeepMind’s AlphaProof.
But those systems used hundreds of thousands of CPUs and billion-parameter models. What we demonstrated is different: a single developer, using Claude Code with a lean4-skills plugin, can formally verify competition-level mathematics in real time. The barrier to entry has collapsed.
This has implications far beyond mathematics. Formal verification can guarantee:
Beyond competition math
Software correctness — AWS uses Lean to verify Cedar, their authorization policy language. Cryptography — Microsoft formally verified its SymCrypt library. AI safety — instead of trusting AI outputs, you can verify them. The proof is the proof.
What we proved, and what it means
| File | Result | Theorems | Difficulty |
|---|---|---|---|
DigitalRoot.lean |
Complete theory of digital roots—new to Mathlib | 10 | Original |
IMO1959P1.lean |
First-ever IMO problem + generalization | 6 | IMO |
IMO1977P5.lean |
Vieta jumping—perfect square quotient | 6+ | Legendary |
Every theorem is machine-checked. Every proof compiles with zero errors and zero sorry’s. The Lean 4 kernel guarantees absolute correctness.
We did not merely use AI to write code. We used AI to discover and verify mathematical truth—truth that is guaranteed correct not by human review, but by the iron logic of a proof assistant. This is the future of trustworthy AI: not “trust me,” but “check me.”