A Verified Compiler from Isabelle/HOL to CakeML

A1216674d5c9747bcdcc716872439137?s=47 Lars Hupel
February 19, 2018

A Verified Compiler from Isabelle/HOL to CakeML

A1216674d5c9747bcdcc716872439137?s=128

Lars Hupel

February 19, 2018
Tweet

Transcript

  1. A Verified Compiler from Isabelle/HOL to CakeML Lars Hupel Technische

    Universität München February 19th, 2018
  2. Isabelle 2

  3. Isabelle ▶ interactive proof assistant ▶ powerful automation ▶ classical

    and equational reasoning ▶ decision procedures (e.g. linear arithmetic) ▶ integration with external automated theorem provers ▶ ... ▶ supports functional programming 3
  4. Low-level Isabelle ▶ generic proof assistant ▶ supports multiple object

    logics ▶ kernel: intuitionistic higher-order logic with natural deduction ▶ Isabelle/HOL built on top of the kernel ▶ ML code can be embedded almost everywhere ▶ theory syntax (commands) and term syntax (logic) can be extended 4
  5. If you want to make an apple pie from scratch

    ... 5
  6. Isabelle/HOL for Users People say Isabelle and mean Isabelle/HOL. ▶

    inductive predicates ▶ datatypes and “refinement” types ▶ recursive functions ▶ pattern matching ▶ type classes 6
  7. Isabelle/HOL for Users People say Isabelle and mean Isabelle/HOL. ▶

    inductive predicates ▶ datatypes and “refinement” types ▶ recursive functions ▶ pattern matching ▶ type classes actually a feature of the kernel 6
  8. Defining Datatypes datatype α list = Nil | Cons α

    (α list) This specification introduces: ▶ an induction principle ▶ a recursor rec_list :: α ⇒ (β ⇒ β list ⇒ α ⇒ α) ⇒ β list ⇒ α ▶ injective, non-overlapping constructors 7
  9. Defining Datatypes datatype α list = Nil | Cons α

    (α list) This specification introduces: ▶ an induction principle ▶ a recursor rec_list :: α ⇒ (β ⇒ β list ⇒ α ⇒ α) ⇒ β list ⇒ α ▶ injective, non-overlapping constructors “freely constructed” 7
  10. Defining Datatypes datatype α list = Nil | Cons α

    (α list) This specification introduces: ▶ an induction principle ▶ a recursor rec_list :: α ⇒ (β ⇒ β list ⇒ α ⇒ α) ⇒ β list ⇒ α ▶ injective, non-overlapping constructors ▶ alotmore “freely constructed” 7
  11. Defining Functions ▶ HOL has a definition principle: x =

    λy1 . . . yn. t ▶ introduces a new constant and an axiom 8
  12. Defining Functions ▶ HOL has a definition principle: x =

    λy1 . . . yn. t ▶ introduces a new constant and an axiom t must not depend on x itself 8
  13. Defining Functions ▶ HOL has a definition principle: x =

    λy1 . . . yn. t ▶ introduces a new constant and an axiom ▶ primrec defines a function using a recursor t must not depend on x itself 8
  14. Defining Functions ▶ HOL has a definition principle: x =

    λy1 . . . yn. t ▶ introduces a new constant and an axiom ▶ primrec defines a function using a recursor t must not depend on x itself “expressible as a fold” 8
  15. Defining Functions ▶ HOL has a definition principle: x =

    λy1 . . . yn. t ▶ introduces a new constant and an axiom ▶ primrec defines a function using a recursor ▶ fun allows more flexible recursion, but need to prove termination t must not depend on x itself “expressible as a fold” 8
  16. Functional Programming in Isabelle Definitions datatype α list = Nil

    | Cons α (α list) primrec append where append Nil ys = ys append (Cons x xs) ys = Cons x (append xs ys) 9
  17. Functional Programming in Isabelle Definitions datatype α list = Nil

    | Cons α (α list) primrec append where append Nil ys = ys append (Cons x xs) ys = Cons x (append xs ys) Proofs lemma append xs (append ys zs) = append (append xs ys) zs by (induction xs) simp+ 9
  18. Functional Programming in Isabelle Definitions datatype α list = Nil

    | Cons α (α list) fun append where append Nil ys = ys append (Cons x xs) ys = Cons x (append xs ys) Proofs lemma append xs (append ys zs) = append (append xs ys) zs by (induction xs) simp+ 9
  19. Advanced Functional Programming Automatic termination proof fun fib where fib

    0 = 1 fib (Suc 0) = 1 fib (Suc (Suc n)) = fib n + fib (Suc n) 10
  20. Advanced Functional Programming Automatic termination proof fun fib where fib

    0 = 1 fib (Suc 0) = 1 fib (Suc (Suc n)) = fib n + fib (Suc n) Manual termination proof function f91 where f91 n = (if 100 < n then n − 10 else f91 (f91 (n + 11))) 10
  21. Evaluating Expressions We want to evaluate functions for concrete inputs,

    e.g. fib 10. 1. using term rewriting (by simp) ▶ certified, but slow 2. using code generation (by eval) ▶ fast, but not certified 11
  22. Evaluating Expressions We want to evaluate functions for concrete inputs,

    e.g. fib 10. 1. using term rewriting (by simp) ▶ certified, but slow 2. using code generation (by eval) ▶ fast, but not certified 11
  23. Code Generation Isabelle can generate code for ML, Haskell, Scala

    and OCaml ML datatype ’a list = Nil | Cons of ’a * ’a list; fun append Nil xs = xs | append (Cons (y, ys)) xs = Cons (y, append ys xs); 12
  24. Code Generation Isabelle can generate code for ML, Haskell, Scala

    and OCaml Scala abstract sealed class list[A] final case class Nila[A]() extends list[A] final case class Cons[A](a: A, b: list[A]) extends list[A] def append[A](x0: list[A], xs: list[A]): list[A] = (x0, xs) match { case (Nila(), xs) => xs case (Cons(y, ys), xs) => Cons[A](y, append[A](ys, xs)) } 12
  25. Code Generation Pipeline 1. input: Set of equations 2. preprocess

    3. build dependency graph, compute SCCs 4. translate to intermediate language 5. serialize to target language 6. output: Source text 13
  26. Certifying Code Generation Idea: Transform equations into intermediate formal object

    Intermediate AST is a value in the logic 14
  27. Certifying Code Generation Idea: Transform equations into intermediate formal object

    Intermediate AST is a value in the logic Magnus O. Myreen and Scott Owens. Proof-producing synthesis of ML from higher-order logic. ICFP 2012. Magnus O. Myreen and Scott Owens. Proof-producing translation of higher-order logic into pure and stateful ML. JAR 2014. 14
  28. Certifying Code Generation Approach by Myreen & Owens ▶ define

    a datatype for ML syntax, formalize semantics ▶ define relators between HOL values and ML values, e.g. relint :: ML_val ⇒ int ⇒ bool ▶ when code generator is invoked on constant f, ▶ define a logical constant fML containing the AST ▶ prove theorem relating f to fML using the type’s relator 15
  29. Certifying Code Generation Approach by Myreen & Owens ▶ define

    a datatype for ML syntax, formalize semantics ▶ define relators between HOL values and ML values, e.g. relint :: ML_val ⇒ int ⇒ bool ▶ when code generator is invoked on constant f, ▶ define a logical constant fML containing the AST ▶ prove theorem relating f to fML using the type’s relator specified in Lem 15
  30. Certified Code Generation Our Approach Stage 1 (certifying) ▶ define

    a higher-order lambda calculus with term-rewriting semantics ▶ define relators between HOL values and lambda terms, e.g. relint :: term ⇒ int ⇒ bool ▶ when code generator is invoked on constant f, ▶ define a logical constant fλ containing the TRS ▶ prove theorem relating f to fλ using the type’s relator 16
  31. Certified Code Generation Our Approach Stage 1 (certifying) ▶ define

    a higher-order lambda calculus with term-rewriting semantics ▶ define relators between HOL values and lambda terms, e.g. relint :: term ⇒ int ⇒ bool ▶ when code generator is invoked on constant f, ▶ define a logical constant fλ containing the TRS ▶ prove theorem relating f to fλ using the type’s relator λ-terms are conceptually much simpler! 16
  32. Certified Code Generation Our Approach Stage 1 (certifying) ▶ define

    a higher-order lambda calculus with term-rewriting semantics ▶ define relators between HOL values and lambda terms, e.g. relint :: term ⇒ int ⇒ bool ▶ when code generator is invoked on constant f, ▶ define a logical constant fλ containing the TRS ▶ prove theorem relating f to fλ using the type’s relator λ-terms are conceptually much simpler! requires type class elimination 16
  33. Certified Code Generation Our Approach Stage 2 (certified) ▶ reuse

    ML syntax and semantics 16
  34. Certified Code Generation Our Approach Stage 2 (certified) ▶ reuse

    ML syntax and semantics export Lem to Isabelle 16
  35. Certified Code Generation Our Approach Stage 2 (certified) ▶ reuse

    ML syntax and semantics ▶ define a HOL function compile :: (term × term) set ⇒ ML_val ▶ prove it correct once and for all export Lem to Isabelle 16
  36. Challenges ▶ Isabelle supports type classes, ML doesn’t certifying dictionary

    construction ▶ users can specify custom equations need to figure out termination and induction principles (like fun) ▶ generation of relators for complex data types complex proof tactics to accomodate for non-standard recursion ▶ set of code equations is unordered need to specify wellformedness conditions ▶ transformation from term rewriting to big-step semantics multiple compiler phases 17
  37. Challenge: Custom Code Equations What the user specified sum_by f

    = sum ◦ map f 18
  38. Challenge: Custom Code Equations What the user specified sum_by f

    = sum ◦ map f What the user proved sum_by f [] = 0 sum_by f (x # xs) = f x + sum_by xs 18
  39. Challenge: Custom Code Equations What the user specified sum_by f

    = sum ◦ map f What the user proved sum_by f [] = 0 sum_by f (x # xs) = f x + sum_by xs What the system needs sum_by monoidβ f [] = zero monoidβ sum_by monoidβ f (x # xs) = plus monoidβ (f x) (sum_by monoidβ f xs) 18
  40. Challenge: Term Rewriting to Big-Step de Bruijn terms Named bound

    variables Explicit pattern matching R :: (term × term) set, t, t′ :: term R ⊢ t −→ t′ R :: (term × nterm) set, t, t′ :: nterm R ⊢ t −→ t′ R :: (string × pterm) set, t, t′ :: pterm R ⊢ t −→ t′ 19 compiler phase semantics refinement semantics belonging to the phase
  41. Challenge: Term Rewriting to Big-Step Explicit pattern matching Sequential clauses

    R :: (string × pterm) set, t, t′ :: pterm R ⊢ t −→ t′ rs :: (string × sterm) list, t, t′ :: sterm rs ⊢ t −→ t′ rs :: (string×sterm) list, σ :: string ⇀ sterm t, u :: sterm rs, σ ⊢ t ↓ u 19 compiler phase semantics refinement semantics belonging to the phase
  42. Challenge: Term Rewriting to Big-Step Sequential clauses Evaluation semantics rs

    :: (string×sterm) list, σ :: string ⇀ sterm t, u :: sterm rs, σ ⊢ t ↓ u rs :: (string × value) list, σ :: string ⇀ value t :: sterm, u :: value rs, σ ⊢ t ↓ u σ :: string ⇀ value t :: sterm, u :: value σ ⊢ t ↓ u 19 compiler phase semantics refinement semantics belonging to the phase
  43. Key Insights ▶ reusable Lem specifications are a game changer

    ▶ less certifying code, more certified proofs ▶ feature parity is challenging ▶ performance is a significant issue 20
  44. Q & A  lars.hupel.info  larsrh  larsr_h