Slide 1

Slide 1 text

Annotating Deeply Embedded Languages Robbert van der Helm Trevor L. McDonell Gabriele Keller

Slide 2

Slide 2 text

Embedded Languages A language written inside of another language May be domain speci fi c That other language is called the host language 2

Slide 3

Slide 3 text

Deeply Embedded Languages 3 https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/gadt.html data Exp a where Lit : : Int - > Exp Int Succ : : Exp Int - > Exp Int . . .

Slide 4

Slide 4 text

ans : : Exp Int ans = Succ (Lit 41) Deeply Embedded Languages 3 https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/gadt.html data Exp a where Lit : : Int - > Exp Int Succ : : Exp Int - > Exp Int . . .

Slide 5

Slide 5 text

ans : : Exp Int ans = Succ (Lit 41) Deeply Embedded Languages 3 https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/gadt.html data Exp a where Lit : : Int - > Exp Int Succ : : Exp Int - > Exp Int . . . eval : : Exp Int - > Int eval (Lit i) = i eval (Succ x) = 1 + eval x . . .

Slide 6

Slide 6 text

ans : : Exp Int ans = Succ (Lit 41) Deeply Embedded Languages 3 https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/gadt.html data Exp a where Lit : : Int - > Exp Int Succ : : Exp Int - > Exp Int . . . eval : : Exp Int - > Int eval (Lit i) = i eval (Succ x) = 1 + eval x . . .

Slide 7

Slide 7 text

ans : : Exp Int ans = Succ (Lit 41) Deeply Embedded Languages 3 https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/gadt.html data Exp a where Lit : : Int - > Exp Int Succ : : Exp Int - > Exp Int . . . eval : : Exp Int - > Int eval (Lit i) = i eval (Succ x) = 1 + eval x . . .

Slide 8

Slide 8 text

ans : : Exp Int ans = Succ (Lit 41) Deeply Embedded Languages 3 https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/gadt.html data Exp a where Lit : : Int - > Exp Int Succ : : Exp Int - > Exp Int . . . eval : : Exp Int - > Int eval (Lit i) = i eval (Succ x) = 1 + eval x . . . academics

Slide 9

Slide 9 text

ans : : Exp Int ans = Succ (Lit 41) Deeply Embedded Languages 3 https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/gadt.html data Exp a where Lit : : Int - > Exp Int Succ : : Exp Int - > Exp Int . . . eval : : Exp Int - > Int eval (Lit i) = i eval (Succ x) = 1 + eval x . . . academics me

Slide 10

Slide 10 text

Deeply Embedded Languages 4

Slide 11

Slide 11 text

Deeply Embedded Languages Advantages: Integrates with the host language No separate parsing step 4

Slide 12

Slide 12 text

Deeply Embedded Languages Advantages: Integrates with the host language No separate parsing step Disadvantages: No separate parsing step 4

Slide 13

Slide 13 text

Deeply Embedded Languages Advantages: Integrates with the host language No separate parsing step Disadvantages: No separate parsing step Deep embeddings lack context information 4

Slide 14

Slide 14 text

The Problem • Let’s write an embedded program! 5 https://github.com/tmcdonell/lulesh-accelerate

Slide 15

Slide 15 text

The Problem • Let’s write an embedded program! 1. Write some code 5 https://github.com/tmcdonell/lulesh-accelerate

Slide 16

Slide 16 text

The Problem • Let’s write an embedded program! 1. Write some code 5 https://github.com/tmcdonell/lulesh-accelerate

Slide 17

Slide 17 text

The Problem • Let’s write an embedded program! 1. Write some code 5 https://github.com/tmcdonell/lulesh-accelerate

Slide 18

Slide 18 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 5 https://github.com/tmcdonell/lulesh-accelerate

Slide 19

Slide 19 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 5 https://github.com/tmcdonell/lulesh-accelerate

Slide 20

Slide 20 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 5 https://github.com/tmcdonell/lulesh-accelerate

Slide 21

Slide 21 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 5 https://github.com/tmcdonell/lulesh-accelerate

Slide 22

Slide 22 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 5 https://github.com/tmcdonell/lulesh-accelerate 👍

Slide 23

Slide 23 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 5 https://github.com/tmcdonell/lulesh-accelerate 👍

Slide 24

Slide 24 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 5 https://github.com/tmcdonell/lulesh-accelerate 👍

Slide 25

Slide 25 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 5 https://github.com/tmcdonell/lulesh-accelerate 👍

Slide 26

Slide 26 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 3. … 5 https://github.com/tmcdonell/lulesh-accelerate 👍

Slide 27

Slide 27 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 3. … 5 https://github.com/tmcdonell/lulesh-accelerate 👍

Slide 28

Slide 28 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 3. … 5 https://github.com/tmcdonell/lulesh-accelerate 👍

Slide 29

Slide 29 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 3. … 5 https://github.com/tmcdonell/lulesh-accelerate 👍

Slide 30

Slide 30 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 3. … 5 https://github.com/tmcdonell/lulesh-accelerate 👍

Slide 31

Slide 31 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 3. … 5 https://github.com/tmcdonell/lulesh-accelerate 👍

Slide 32

Slide 32 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 3. … 5 https://github.com/tmcdonell/lulesh-accelerate 👍

Slide 33

Slide 33 text

The Problem • Let’s write an embedded program! 1. Write some code 2. Run it! 3. … 4. Pro fi t? 5 https://github.com/tmcdonell/lulesh-accelerate 👍 🤔

Slide 34

Slide 34 text

The Problem There is a disconnect between: the embedded program the user writes; the abstract syntax tree generated for that program; and the optimised code that is eventually executed 6

Slide 35

Slide 35 text

Objective 1. Recover context in deeply embedded programs 2. Annotate the embedded program with that information 3. Find other uses for the annotation system 7

Slide 36

Slide 36 text

The Idea The program is described by some abstract syntax tree This AST is built via smart constructors These smart constructors should generate and store the necessary annotations 8

Slide 37

Slide 37 text

The Idea These smart constructors can be encountered by: Regular functions Type class methods Pattern synonymns 9 Embedded Pattern Matching, McDonell T.L., Meredith, J.D, and Keller G.

Slide 38

Slide 38 text

The Idea These smart constructors can be encountered by: Regular functions Type class methods Pattern synonymns 9 Embedded Pattern Matching, McDonell T.L., Meredith, J.D, and Keller G. constant : : Int - > Exp Int constant = Lit

Slide 39

Slide 39 text

The Idea These smart constructors can be encountered by: Regular functions Type class methods Pattern synonymns 9 Embedded Pattern Matching, McDonell T.L., Meredith, J.D, and Keller G. instance Num (Exp a) where (+) = PrimApp PrimAdd constant : : Int - > Exp Int constant = Lit

Slide 40

Slide 40 text

The Idea These smart constructors can be encountered by: Regular functions Type class methods Pattern synonymns 9 Embedded Pattern Matching, McDonell T.L., Meredith, J.D, and Keller G. instance Num (Exp a) where (+) = PrimApp PrimAdd pattern Maybe_ : : Exp a - > Exp (Maybe a) constant : : Int - > Exp Int constant = Lit

Slide 41

Slide 41 text

Annotations Store metadata for an AST node Should be easily extensible Adding them shouldn’t change the user-facing language 10

Slide 42

Slide 42 text

Storing Annotations 11 Trees that Grow, Njjd S. and Peyton-Jones S. data Ann = Ann { . . . } data Exp a where Lit : : Ann - > Succ : : Ann - > . . . constant : : Int - > Exp Int constant = Lit mkAnn mkAnn : : . . . = > Ann mkAnn = Ann { . . . } Int - > Exp Int Exp Int - > Exp Int

Slide 43

Slide 43 text

Storing Annotations 11 Trees that Grow, Njjd S. and Peyton-Jones S. data Ann = Ann { . . . } data Exp a where Lit : : Ann - > Succ : : Ann - > . . . constant : : Int - > Exp Int constant = Lit mkAnn mkAnn : : . . . = > Ann mkAnn = Ann { . . . } Int - > Exp Int Exp Int - > Exp Int

Slide 44

Slide 44 text

Storing Annotations 11 Trees that Grow, Njjd S. and Peyton-Jones S. data Ann = Ann { . . . } data Exp a where Lit : : Ann - > Succ : : Ann - > . . . constant : : Int - > Exp Int constant = Lit mkAnn mkAnn : : . . . = > Ann mkAnn = Ann { . . . } Int - > Exp Int Exp Int - > Exp Int

Slide 45

Slide 45 text

Storing Annotations 11 Trees that Grow, Njjd S. and Peyton-Jones S. data Ann = Ann { . . . } data Exp a where Lit : : Ann - > Succ : : Ann - > . . . constant : : Int - > Exp Int constant = Lit mkAnn mkAnn : : . . . = > Ann mkAnn = Ann { . . . } Int - > Exp Int Exp Int - > Exp Int

Slide 46

Slide 46 text

Storing Annotations 11 Trees that Grow, Njjd S. and Peyton-Jones S. data Ann = Ann { . . . } data Exp a where Lit : : Ann - > Succ : : Ann - > . . . constant : : Int - > Exp Int constant = Lit mkAnn mkAnn : : . . . = > Ann mkAnn = Ann { . . . } Int - > Exp Int Exp Int - > Exp Int

Slide 47

Slide 47 text

Source Locations Associate AST fragments back to their original source location Use that for diagnostics, pro fi ling, debugging… 12

Slide 48

Slide 48 text

Source Locations 1. GHC Call Stacks: GHC.Stack 2. RTS Execution Stacks: GHC.ExecutionStack 13

Slide 49

Slide 49 text

Source Locations 1. GHC Call Stacks: GHC.Stack 2. RTS Execution Stacks: GHC.ExecutionStack 13 • Created at compile time • Functions require a HasCallStack constraint

Slide 50

Slide 50 text

Source Locations 1. GHC Call Stacks: GHC.Stack 2. RTS Execution Stacks: GHC.ExecutionStack 13 • Runtime backtraces! • No changes to user code required! • Created at compile time • Functions require a HasCallStack constraint

Slide 51

Slide 51 text

Source Locations 1. GHC Call Stacks: GHC.Stack 2. RTS Execution Stacks: GHC.ExecutionStack 13 • Runtime backtraces! • No changes to user code required! • Currently unusable 🙁 • Created at compile time • Functions require a HasCallStack constraint

Slide 52

Slide 52 text

Source Locations Three scenarios: Regular functions: GHC Call Stacks (Existing) Type class methods: RTS Execution Stacks Pattern synonyms: GHC Call Stacks (plus some trickery) 14 https://gitlab.haskell.org/ghc/ghc/-/issues/19289

Slide 53

Slide 53 text

HasCallStack 15 printError : : HasCallStack = > String - > IO () printError msg = putStrLn msg > > print callStack

Slide 54

Slide 54 text

HasCallStack 15 printError : : HasCallStack = > String - > IO () printError msg = putStrLn msg > > print callStack printError : : (?callStack : : CallStack) = > String - > IO () printError msg = putStrLn msg > > print ?callStack desugars to….

Slide 55

Slide 55 text

HasCallStack 16 main : : HasCallStack = > IO () main = foo foo : : IO () - - silent error: no HasCallStack constraint! foo = bar bar : : HasCallStack = > IO () - - only prints ‘bar’ bar = print callStack

Slide 56

Slide 56 text

HasCallStack 16 main : : HasCallStack = > IO () main = foo foo : : IO () - - silent error: no HasCallStack constraint! foo = bar bar : : HasCallStack = > IO () - - only prints ‘bar’ bar = print callStack

Slide 57

Slide 57 text

HasCallStack 16 main : : HasCallStack = > IO () main = foo foo : : IO () - - silent error: no HasCallStack constraint! foo = bar bar : : HasCallStack = > IO () - - only prints ‘bar’ bar = print callStack

Slide 58

Slide 58 text

SourceMapped 17 data OpaqueType = NotExported type SourceMapped = ( ?requiresSourceMapping : : OpaqueType, HasCallStack ) - - Throws an error if the caller did not have the HasCallStack constraint sourceMap : : HasCallStack = > (SourceMapped = > a) - > a sourceMap k = . . .

Slide 59

Slide 59 text

SourceMapped 17 data OpaqueType = NotExported type SourceMapped = ( ?requiresSourceMapping : : OpaqueType, HasCallStack ) - - Throws an error if the caller did not have the HasCallStack constraint sourceMap : : HasCallStack = > (SourceMapped = > a) - > a sourceMap k = . . . The only way to satisfy the SourceMapped constraint

Slide 60

Slide 60 text

SourceMapped 18 main : : HasCallStack = > IO () main = foo foo : : IO () - - silent error: no HasCallStack constraint foo = bar bar : : HasCallStack = > IO () bar = print callStack qux : : HasCallStack = > qux = sourceMap bar - - Runtime error: no HasCallStack IO ()

Slide 61

Slide 61 text

SourceMapped 18 main : : HasCallStack = > IO () main = foo foo : : IO () - - silent error: no HasCallStack constraint foo = bar bar : : HasCallStack = > IO () bar = print callStack qux : : HasCallStack = > qux = sourceMap bar SourceMapped - - Runtime error: no HasCallStack IO ()

Slide 62

Slide 62 text

SourceMapped 18 main : : HasCallStack = > IO () main = foo foo : : IO () - - silent error: no HasCallStack constraint foo = bar bar : : HasCallStack = > IO () bar = print callStack qux : : HasCallStack = > qux = sourceMap bar SourceMapped - - Compilation error: unbound implicit parameter - - Runtime error: no HasCallStack IO ()

Slide 63

Slide 63 text

SourceMapped 18 main : : HasCallStack = > IO () main = foo foo : : IO () - - silent error: no HasCallStack constraint foo = bar bar : : HasCallStack = > IO () bar = print callStack qux : : HasCallStack = > qux = sourceMap bar SourceMapped - - Compilation error: unbound implicit parameter - - Runtime error: no HasCallStack IO ()

Slide 64

Slide 64 text

SourceMapped 18 main : : HasCallStack = > IO () main = foo foo : : IO () - - silent error: no HasCallStack constraint foo = bar bar : : HasCallStack = > IO () bar = print callStack qux : : HasCallStack = > qux = sourceMap bar SourceMapped - - Compilation error: unbound implicit parameter IO () - - Works!

Slide 65

Slide 65 text

Putting it together 19 data Ann = Ann { locations : : HashSet CallStack, . . . } mkAnn : : SourceMapped = > Ann mkAnn = Ann { locations = capture ?callStack } where capture = . . . constant : : HasCallStack = > Int - > Exp Int constant = sourceMap $ Lit mkAnn

Slide 66

Slide 66 text

Putting it together 19 data Ann = Ann { locations : : HashSet CallStack, . . . } mkAnn : : SourceMapped = > Ann mkAnn = Ann { locations = capture ?callStack } where capture = . . . constant : : HasCallStack = > Int - > Exp Int constant = sourceMap $ Lit mkAnn • inlining • loop unrolling • [no] fast math • …

Slide 67

Slide 67 text

Case study: Accelerate • Deeply embedded language for data-parallel array computations - Multiple backends (CPU, GPU, …) - Multiple expression types (collective array and scalar expression) - Multiple AST types (surface language in HOAS, internal language is fi rst-order) 20

Slide 68

Slide 68 text

Accelerate: Sharing recovery • Finds shared parts of the program based on stable names; moves those parts to explicit binders • NEW: - Keep track of annotation state during sharing recovery - Enable terms to be inlining by ignoring sharing 21

Slide 69

Slide 69 text

Accelerate: Array fusion • Producer/consumer fusion rewrites the program to combine operations 22

Slide 70

Slide 70 text

Accelerate: Array fusion • Producer/consumer fusion rewrites the program to combine operations 22 map f . map g

Slide 71

Slide 71 text

Accelerate: Array fusion • Producer/consumer fusion rewrites the program to combine operations 22 map f . map g map (f . g) rewries into…

Slide 72

Slide 72 text

Accelerate: Array fusion • Producer/consumer fusion rewrites the program to combine operations • NEW: - Multiple AST nodes may be merged; new annotation contains the union of the call stack sets - Source locations may be disjoint - Optimisation fl ags might differ… 22 map f . map g map (f . g) rewries into…

Slide 73

Slide 73 text

DEMO 23

Slide 74

Slide 74 text

Pro fi ling LULESH • One kernel accounts for 66% of the runtime - A good candidate to experiment with loop unrolling… 
 
 
 
 
 24 Ryzen 9 5900x: 8.66 s - > 9.6 s RTX 2080 Super: 3.13 s - > 3.07 ± 0.04 s

Slide 75

Slide 75 text

Pro fi ling LULESH • One kernel accounts for 66% of the runtime - A good candidate to experiment with loop unrolling… 
 
 
 
 
 24 Ryzen 9 5900x: 8.66 s - > 9.6 s RTX 2080 Super: 3.13 s - > 3.07 ± 0.04 s

Slide 76

Slide 76 text

Pro fi ling LULESH • One kernel accounts for 66% of the runtime - A good candidate to experiment with loop unrolling… 
 
 
 
 
 24 Ryzen 9 5900x: 8.66 s - > 9.6 s RTX 2080 Super: 3.13 s - > 3.07 ± 0.04 s • 14x higher L1 inst. cache miss rate • 7x higher LL inst. cache miss rate

Slide 77

Slide 77 text

Summary New annotation system opens the door to a better developer experience Future work: 1. Expression-level debugging and pro fi ling 2. Granular loop optimisations 3. …? 4. Merge it 25

Slide 78

Slide 78 text

AccelerateHS.org https://github.com/AccelerateHS/ Robbert van der Helm Trevor L. McDonell Gabriele Keller