The Proof is in the Pudding: Can we formally prove correctness of (AI-generated) code?

THE PROOF IS IN THE PUDDING: CAN WE FORMALLY PROVE
CORRECTNESS OF CODE? @luxas.dev Illustrations by Katarina Jakobsson

@luxas.dev WHO IN THE AUDIENCE USES LLMS TO CODE OR
MANAGE INFRA?

WHO IN THE AUDIENCE HAS AN EXISTENTIAL CRISIS ABOUT YOUR
SKILLS BECOMING AUTOMATED? @luxas.dev

@luxas.dev DOES IT MEAN WE CAN JUST START AN AGENT
AND RELAX?

@luxas.dev OR MAYBE NOT, IF HISTORY IS ANYTHING TO GO
BY When something gets automated/commoditized, we get capacity to go beyond what was previously possible, as the cost-benefit equation changes.

@luxas.dev WILL JEVONS PARADOX HOLD? When something becomes cheap, we
want more of it in aggregate.

@luxas.dev WILL JEVONS PARADOX HOLD? When something becomes cheap, we
want more of it in aggregate. Even though this would be true at macro scale, there can still be unwanted outcomes at smaller scale.

SO WHAT SKILLS WILL BE NEEDED? @luxas.dev Unclear. Predicting is
hard, especially the future.

@luxas.dev Even in the age of AI, we still have
responsibility towards society. We must still provide enough confidence, but how to do this without (necessarily) understanding the code?

How to avoid the plateau of the LLM being such
that P(breaking stuff) > P(improving stuff)? @luxas.dev “Only new mistakes are allowed” - Jessie Frazelle at KCD CERN Lots of questions, are there answers as well?

@luxas.dev WHOAMI Kubernetes / cloud native contributor in various ways
since 2015 Staff Software Engineer at Upbound Cedar Policy access control engine maintainer Graduating from Aalto University now

@luxas.dev OUTCOME ENGINEERING / SPEC-DRIVEN DEVELOPMENT We are fast coming
up with new names for things Focusing on the end goal is definitely the right way to go, that’s what Kubernetes is all about too, at its core

@luxas.dev “Writing is nature's way of letting you know how
sloppy your thinking is.” - Cartoonist Dick Guindon

@luxas.dev “Thinking doesn't guarantee that we won't make mistakes. But
not thinking guarantees that we will.” - Leslie Lamport in “Why We Should Build Software Like We Build Houses” “Writing is nature's way of letting you know how sloppy your thinking is.” - Cartoonist Dick Guindon

@luxas.dev “the ‘naturalness’ with which we use our native tongues
boils down to the ease with which we can use them for making statements the nonsense of which is not obvious.” - Dijkstra in “On the foolishness of ‘natural language programming’”

@luxas.dev So how? 0. Avoid coding it in the first
place 1. Abstraction 2. Property-based testing 3. Model-based checking 4. Mathematical proof

@luxas.dev 1. Abstraction Favor typed languages that eliminate whole classes
of bugs Make invalid states unrepresentable (e.g. through types, enums or taint analysis) Only allow the LLM to build on top of trusted libraries Trusted Computing Base Rust/Go (e.g. memory safety and race conditions) LLM-accessible logic

@luxas.dev 2. Property-based testing Define the properties that always must
hold (possibly under given assumptions) Use a fuzzer / simulator to find violations probabilistically forall text: decode(encode(text)) = text hegel.dev (Go, Rust, C++, TypeScript) Hypothesis (Python) go fuzz cargo fuzz libFuzzer (deterministic)

@luxas.dev “Give a man a fish, and you feed him
for a day. Teach a man to fish, and you feed him for a lifetime”. - Unknown author, but as seen in Lawrie Green’s presentation

@luxas.dev “Give a computer a test, and it finds one
bug. Teach a computer the properties, and it keeps finding bugs for you”. Now, how do we express our properties?

@luxas.dev Encode abstract functionality into a P/TLA+/Quint/SAT model (formal representation
of code, not to be confused with LLM models) 3. (Bounded) model checking Use a model checker to exhaustively search for violations* p-org.github.io/P foundation.tlapl.us quint.sh cvc5.github.io microsoft.github.io/z3guide *Most likely up to some reasonable depth/bound, as the search may grow exponentially in the depth

@luxas.dev E.g. Proof by contradiction, contraposition, and induction 4. Mathematical
proof cvc5.github.io microsoft.github.io/z3guide Small, trusted kernel which checks each step follows from the last lean-lang.org A proof is a program => LLMs can* write it *And improving all the time. Unlike normal programming, the kernel instantly verifies the code/proof is correct

@luxas.dev

@luxas.dev [1]: CSLib: The Lean Computer Science Library

@luxas.dev Klowden and Tao warns that while AI-generated, brute-force proofs
can take us to the goal, we might not learn anything from it [1]: Tanya Klowden and Terence Tao, “Mathematical Methods and Human Thought in the age of AI” “AI tools are like taking a helicopter to drop you off at the site” - Terence Tao on The Edge of Mathematics

@luxas.dev Bridge the gap between the train and the platform
code and model Lean proof Pre- / postconditions Aeneas aeneasverif.github.io Verus verus-lang.github.io/verus/guide Production trace PObserve

@luxas.dev Lots of hype ( 🔥) around combining (Deterministic) Automated
reasoning + (Nondeterministic) LLMs = Neurosymbolic AI Jukka Suomela has used LLMs for proving bounds. Axiom Math, Math Inc, Harmonic.fun, Atalanta, Google AlphaProof, Leanstral and natural language formalization are examples

@luxas.dev What does this mean for the cloud native community?
Cedar Policy is formally verified in Lean I’m piloting verifying my Conditional Authorization Kubernetes feature in Lean etcd uses deterministic simulation and property-based testing Where could these methods be beneficial next?

@luxas.dev Where to go from here? I still think diversity
of thought and human connections will be important We as a community need to think about how to make it possible for everyone to “upskill”

@luxas.dev Where to go from here? This talk is a
hypothesis that might or might not hold true. Let’s join forces and figure it out together. If you want to go fast, go alone. If you want to go far, go together.

@luxas.dev Thanks!

The Proof is in the Pudding: Can we formally pr...

The Proof is in the Pudding: Can we formally prove correctness of (AI-generated) code?

Lucas Käldström

More Decks by Lucas Käldström

Other Decks in Technology

Featured

Transcript