A Verified Compiler from Isabelle/HOL to CakeML

A Veriﬁed Compiler from Isabelle/HOL to CakeML Lars Hupel Technische
Universität München February 19th, 2018

Isabelle 2

Isabelle ▶ interactive proof assistant ▶ powerful automation ▶ classical
and equational reasoning ▶ decision procedures (e.g. linear arithmetic) ▶ integration with external automated theorem provers ▶ ... ▶ supports functional programming 3

Low-level Isabelle ▶ generic proof assistant ▶ supports multiple object
logics ▶ kernel: intuitionistic higher-order logic with natural deduction ▶ Isabelle/HOL built on top of the kernel ▶ ML code can be embedded almost everywhere ▶ theory syntax (commands) and term syntax (logic) can be extended 4

If you want to make an apple pie from scratch
... 5

Isabelle/HOL for Users People say Isabelle and mean Isabelle/HOL. ▶
inductive predicates ▶ datatypes and “reﬁnement” types ▶ recursive functions ▶ pattern matching ▶ type classes 6

Isabelle/HOL for Users People say Isabelle and mean Isabelle/HOL. ▶
inductive predicates ▶ datatypes and “reﬁnement” types ▶ recursive functions ▶ pattern matching ▶ type classes actually a feature of the kernel 6

Deﬁning Datatypes datatype α list = Nil | Cons α
(α list) This speciﬁcation introduces: ▶ an induction principle ▶ a recursor rec_list :: α ⇒ (β ⇒ β list ⇒ α ⇒ α) ⇒ β list ⇒ α ▶ injective, non-overlapping constructors 7

(α list) This speciﬁcation introduces: ▶ an induction principle ▶ a recursor rec_list :: α ⇒ (β ⇒ β list ⇒ α ⇒ α) ⇒ β list ⇒ α ▶ injective, non-overlapping constructors “freely constructed” 7

(α list) This speciﬁcation introduces: ▶ an induction principle ▶ a recursor rec_list :: α ⇒ (β ⇒ β list ⇒ α ⇒ α) ⇒ β list ⇒ α ▶ injective, non-overlapping constructors ▶ alotmore “freely constructed” 7

Deﬁning Functions ▶ HOL has a deﬁnition principle: x =
λy1 . . . yn. t ▶ introduces a new constant and an axiom 8

λy1 . . . yn. t ▶ introduces a new constant and an axiom t must not depend on x itself 8

λy1 . . . yn. t ▶ introduces a new constant and an axiom ▶ primrec deﬁnes a function using a recursor t must not depend on x itself 8

λy1 . . . yn. t ▶ introduces a new constant and an axiom ▶ primrec deﬁnes a function using a recursor t must not depend on x itself “expressible as a fold” 8

λy1 . . . yn. t ▶ introduces a new constant and an axiom ▶ primrec deﬁnes a function using a recursor ▶ fun allows more flexible recursion, but need to prove termination t must not depend on x itself “expressible as a fold” 8

Functional Programming in Isabelle Deﬁnitions datatype α list = Nil
| Cons α (α list) primrec append where append Nil ys = ys append (Cons x xs) ys = Cons x (append xs ys) 9

| Cons α (α list) primrec append where append Nil ys = ys append (Cons x xs) ys = Cons x (append xs ys) Proofs lemma append xs (append ys zs) = append (append xs ys) zs by (induction xs) simp+ 9

| Cons α (α list) fun append where append Nil ys = ys append (Cons x xs) ys = Cons x (append xs ys) Proofs lemma append xs (append ys zs) = append (append xs ys) zs by (induction xs) simp+ 9

Advanced Functional Programming Automatic termination proof fun fib where fib
0 = 1 fib (Suc 0) = 1 fib (Suc (Suc n)) = fib n + fib (Suc n) 10

Advanced Functional Programming Automatic termination proof fun fib where fib
0 = 1 fib (Suc 0) = 1 fib (Suc (Suc n)) = fib n + fib (Suc n) Manual termination proof function f91 where f91 n = (if 100 < n then n − 10 else f91 (f91 (n + 11))) 10

Evaluating Expressions We want to evaluate functions for concrete inputs,
e.g. fib 10. 1. using term rewriting (by simp) ▶ certiﬁed, but slow 2. using code generation (by eval) ▶ fast, but not certiﬁed 11

Code Generation Isabelle can generate code for ML, Haskell, Scala
and OCaml ML datatype ’a list = Nil | Cons of ’a * ’a list; fun append Nil xs = xs | append (Cons (y, ys)) xs = Cons (y, append ys xs); 12

Code Generation Isabelle can generate code for ML, Haskell, Scala
and OCaml Scala abstract sealed class list[A] final case class Nila[A]() extends list[A] final case class Cons[A](a: A, b: list[A]) extends list[A] def append[A](x0: list[A], xs: list[A]): list[A] = (x0, xs) match { case (Nila(), xs) => xs case (Cons(y, ys), xs) => Cons[A](y, append[A](ys, xs)) } 12

Code Generation Pipeline 1. input: Set of equations 2. preprocess
3. build dependency graph, compute SCCs 4. translate to intermediate language 5. serialize to target language 6. output: Source text 13

Certifying Code Generation Idea: Transform equations into intermediate formal object
Intermediate AST is a value in the logic 14

Certifying Code Generation Idea: Transform equations into intermediate formal object
Intermediate AST is a value in the logic Magnus O. Myreen and Scott Owens. Proof-producing synthesis of ML from higher-order logic. ICFP 2012. Magnus O. Myreen and Scott Owens. Proof-producing translation of higher-order logic into pure and stateful ML. JAR 2014. 14

Certifying Code Generation Approach by Myreen & Owens ▶ define
a datatype for ML syntax, formalize semantics ▶ define relators between HOL values and ML values, e.g. relint :: ML_val ⇒ int ⇒ bool ▶ when code generator is invoked on constant f, ▶ define a logical constant fML containing the AST ▶ prove theorem relating f to fML using the type’s relator 15

Certifying Code Generation Approach by Myreen & Owens ▶ define
a datatype for ML syntax, formalize semantics ▶ define relators between HOL values and ML values, e.g. relint :: ML_val ⇒ int ⇒ bool ▶ when code generator is invoked on constant f, ▶ define a logical constant fML containing the AST ▶ prove theorem relating f to fML using the type’s relator specified in Lem 15

Certified Code Generation Our Approach Stage 1 (certifying) ▶ define
a higher-order lambda calculus with term-rewriting semantics ▶ define relators between HOL values and lambda terms, e.g. relint :: term ⇒ int ⇒ bool ▶ when code generator is invoked on constant f, ▶ define a logical constant fλ containing the TRS ▶ prove theorem relating f to fλ using the type’s relator 16

a higher-order lambda calculus with term-rewriting semantics ▶ deﬁne relators between HOL values and lambda terms, e.g. relint :: term ⇒ int ⇒ bool ▶ when code generator is invoked on constant f, ▶ deﬁne a logical constant fλ containing the TRS ▶ prove theorem relating f to fλ using the type’s relator λ-terms are conceptually much simpler! 16

a higher-order lambda calculus with term-rewriting semantics ▶ deﬁne relators between HOL values and lambda terms, e.g. relint :: term ⇒ int ⇒ bool ▶ when code generator is invoked on constant f, ▶ deﬁne a logical constant fλ containing the TRS ▶ prove theorem relating f to fλ using the type’s relator λ-terms are conceptually much simpler! requires type class elimination 16

Certiﬁed Code Generation Our Approach Stage 2 (certiﬁed) ▶ reuse
ML syntax and semantics 16

ML syntax and semantics export Lem to Isabelle 16

ML syntax and semantics ▶ deﬁne a HOL function compile :: (term × term) set ⇒ ML_val ▶ prove it correct once and for all export Lem to Isabelle 16

Challenges ▶ Isabelle supports type classes, ML doesn’t certifying dictionary
construction ▶ users can specify custom equations need to ﬁgure out termination and induction principles (like fun) ▶ generation of relators for complex data types complex proof tactics to accomodate for non-standard recursion ▶ set of code equations is unordered need to specify wellformedness conditions ▶ transformation from term rewriting to big-step semantics multiple compiler phases 17

Challenge: Custom Code Equations What the user speciﬁed sum_by f
= sum ◦ map f 18

= sum ◦ map f What the user proved sum_by f [] = 0 sum_by f (x # xs) = f x + sum_by xs 18

= sum ◦ map f What the user proved sum_by f [] = 0 sum_by f (x # xs) = f x + sum_by xs What the system needs sum_by monoidβ f [] = zero monoidβ sum_by monoidβ f (x # xs) = plus monoidβ (f x) (sum_by monoidβ f xs) 18

Challenge: Term Rewriting to Big-Step de Bruijn terms Named bound
variables Explicit pattern matching R :: (term × term) set, t, t′ :: term R ⊢ t −→ t′ R :: (term × nterm) set, t, t′ :: nterm R ⊢ t −→ t′ R :: (string × pterm) set, t, t′ :: pterm R ⊢ t −→ t′ 19 compiler phase semantics reﬁnement semantics belonging to the phase

Challenge: Term Rewriting to Big-Step Explicit pattern matching Sequential clauses
R :: (string × pterm) set, t, t′ :: pterm R ⊢ t −→ t′ rs :: (string × sterm) list, t, t′ :: sterm rs ⊢ t −→ t′ rs :: (string×sterm) list, σ :: string ⇀ sterm t, u :: sterm rs, σ ⊢ t ↓ u 19 compiler phase semantics reﬁnement semantics belonging to the phase

Challenge: Term Rewriting to Big-Step Sequential clauses Evaluation semantics rs
:: (string×sterm) list, σ :: string ⇀ sterm t, u :: sterm rs, σ ⊢ t ↓ u rs :: (string × value) list, σ :: string ⇀ value t :: sterm, u :: value rs, σ ⊢ t ↓ u σ :: string ⇀ value t :: sterm, u :: value σ ⊢ t ↓ u 19 compiler phase semantics reﬁnement semantics belonging to the phase

Key Insights ▶ reusable Lem specifications are a game changer
▶ less certifying code, more certified proofs ▶ feature parity is challenging ▶ performance is a significant issue 20

Q & A  lars.hupel.info  larsrh  larsr_h

A Verified Compiler from Isabelle/HOL to CakeML

A Verified Compiler from Isabelle/HOL to CakeML

More Decks by Lars Hupel

Other Decks in Research

Featured

Transcript