Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Proof is in the Pudding: Can we formally pr...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

The Proof is in the Pudding: Can we formally prove correctness of (AI-generated) code?

Presented at Kubernetes Community Days Helsinki 2026 as a keynote.
Schedule link: https://sessionize.com/api/v2/es08xsn7/view/GridSmart

Recording: TBA

Abstract:
Now that LLMs are capable of generating code like we expect humans to, how will the role of the software engineer evolve? No one knows for sure, but Jevon's paradox, which has proven itself time and time again, states that as the unit cost of something (in this case, writing code) goes down, the total demand increases. Thus, if we end up with an unprecedented amount of code, how do we ensure our systems don't collapse under their own weight?

Not only code has gotten easier to generate, so has mathematical proofs. Tools such as Lean, SAT/SMT solvers, differential testing, and Rust have recently gained popularity as key pillars for making sense out of the complexity of the AI era. Mathematical proof of correctness lets the team move fast without breaking things.

Using LLMs to generate Lean proofs is very similar to Kubernetes letting you focus entirely on the desired state instead of how to get there. In the second decade of Kubernetes, how can the cloud native community answer the challenge of ever-increasing complexity, with possibly tons of newly-contributed code being AI-generated? Can formal verification stand up to the challenge and the community rally around it?

Avatar for Lucas Käldström

Lucas Käldström

May 20, 2026

More Decks by Lucas Käldström

Other Decks in Technology

Transcript

  1. THE PROOF IS IN THE PUDDING: CAN WE FORMALLY PROVE

    CORRECTNESS OF CODE? @luxas.dev Illustrations by Katarina Jakobsson
  2. WHO IN THE AUDIENCE HAS AN EXISTENTIAL CRISIS ABOUT YOUR

    SKILLS BECOMING AUTOMATED? @luxas.dev
  3. @luxas.dev OR MAYBE NOT, IF HISTORY IS ANYTHING TO GO

    BY When something gets automated/commoditized, we get capacity to go beyond what was previously possible, as the cost-benefit equation changes.
  4. @luxas.dev WILL JEVONS PARADOX HOLD? When something becomes cheap, we

    want more of it in aggregate. Even though this would be true at macro scale, there can still be unwanted outcomes at smaller scale.
  5. @luxas.dev Even in the age of AI, we still have

    responsibility towards society. We must still provide enough confidence, but how to do this without (necessarily) understanding the code?
  6. How to avoid the plateau of the LLM being such

    that P(breaking stuff) > P(improving stuff)? @luxas.dev “Only new mistakes are allowed” - Jessie Frazelle at KCD CERN Lots of questions, are there answers as well?
  7. @luxas.dev WHOAMI Kubernetes / cloud native contributor in various ways

    since 2015 Staff Software Engineer at Upbound Cedar Policy access control engine maintainer Graduating from Aalto University now
  8. @luxas.dev OUTCOME ENGINEERING / SPEC-DRIVEN DEVELOPMENT We are fast coming

    up with new names for things Focusing on the end goal is definitely the right way to go, that’s what Kubernetes is all about too, at its core
  9. @luxas.dev “Writing is nature's way of letting you know how

    sloppy your thinking is.” - Cartoonist Dick Guindon
  10. @luxas.dev “Thinking doesn't guarantee that we won't make mistakes. But

    not thinking guarantees that we will.” - Leslie Lamport in “Why We Should Build Software Like We Build Houses” “Writing is nature's way of letting you know how sloppy your thinking is.” - Cartoonist Dick Guindon
  11. @luxas.dev “the ‘naturalness’ with which we use our native tongues

    boils down to the ease with which we can use them for making statements the nonsense of which is not obvious.” - Dijkstra in “On the foolishness of ‘natural language programming’”
  12. @luxas.dev So how? 0. Avoid coding it in the first

    place 1. Abstraction 2. Property-based testing 3. Model-based checking 4. Mathematical proof
  13. @luxas.dev 1. Abstraction Favor typed languages that eliminate whole classes

    of bugs Make invalid states unrepresentable (e.g. through types, enums or taint analysis) Only allow the LLM to build on top of trusted libraries Trusted Computing Base Rust/Go (e.g. memory safety and race conditions) LLM-accessible logic
  14. @luxas.dev 2. Property-based testing Define the properties that always must

    hold (possibly under given assumptions) Use a fuzzer / simulator to find violations probabilistically forall text: decode(encode(text)) = text hegel.dev (Go, Rust, C++, TypeScript) Hypothesis (Python) go fuzz cargo fuzz libFuzzer (deterministic)
  15. @luxas.dev “Give a man a fish, and you feed him

    for a day. Teach a man to fish, and you feed him for a lifetime”. - Unknown author, but as seen in Lawrie Green’s presentation
  16. @luxas.dev “Give a computer a test, and it finds one

    bug. Teach a computer the properties, and it keeps finding bugs for you”. Now, how do we express our properties?
  17. @luxas.dev Encode abstract functionality into a P/TLA+/Quint/SAT model (formal representation

    of code, not to be confused with LLM models) 3. (Bounded) model checking Use a model checker to exhaustively search for violations* p-org.github.io/P foundation.tlapl.us quint.sh cvc5.github.io microsoft.github.io/z3guide *Most likely up to some reasonable depth/bound, as the search may grow exponentially in the depth
  18. @luxas.dev E.g. Proof by contradiction, contraposition, and induction 4. Mathematical

    proof cvc5.github.io microsoft.github.io/z3guide Small, trusted kernel which checks each step follows from the last lean-lang.org A proof is a program => LLMs can* write it *And improving all the time. Unlike normal programming, the kernel instantly verifies the code/proof is correct
  19. @luxas.dev Klowden and Tao warns that while AI-generated, brute-force proofs

    can take us to the goal, we might not learn anything from it [1]: Tanya Klowden and Terence Tao, “Mathematical Methods and Human Thought in the age of AI” “AI tools are like taking a helicopter to drop you off at the site” - Terence Tao on The Edge of Mathematics
  20. @luxas.dev Bridge the gap between the train and the platform

    code and model Lean proof Pre- / postconditions Aeneas aeneasverif.github.io Verus verus-lang.github.io/verus/guide Production trace PObserve
  21. @luxas.dev Lots of hype ( 🔥) around combining (Deterministic) Automated

    reasoning + (Nondeterministic) LLMs = Neurosymbolic AI Jukka Suomela has used LLMs for proving bounds. Axiom Math, Math Inc, Harmonic.fun, Atalanta, Google AlphaProof, Leanstral and natural language formalization are examples
  22. @luxas.dev What does this mean for the cloud native community?

    Cedar Policy is formally verified in Lean I’m piloting verifying my Conditional Authorization Kubernetes feature in Lean etcd uses deterministic simulation and property-based testing Where could these methods be beneficial next?
  23. @luxas.dev Where to go from here? I still think diversity

    of thought and human connections will be important We as a community need to think about how to make it possible for everyone to “upskill”
  24. @luxas.dev Where to go from here? This talk is a

    hypothesis that might or might not hold true. Let’s join forces and figure it out together. If you want to go fast, go alone. If you want to go far, go together.