Slide 1

Slide 1 text

Trevor L. McDonell1 Manuel M. T. Chakravarty2 Vinod Grover3 Ryan R. Newton1 1Indiana University Type-safe Runtime Code Generation:
 Accelerate to LLVM tmcdonell 2University of New South Wales 3NVIDIA Corporation

Slide 2

Slide 2 text

https://xkcd.com/378/

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

https://commons.wikimedia.org/wiki/File:FortranCardPROJ039.agr.jpg https://commons.wikimedia.org/wiki/File:Motorola_6800_Assembly_Language.png

Slide 6

Slide 6 text

https://commons.wikimedia.org/wiki/File:FortranCardPROJ039.agr.jpg https://commons.wikimedia.org/wiki/File:Motorola_6800_Assembly_Language.png

Slide 7

Slide 7 text

https://commons.wikimedia.org/wiki/File:FortranCardPROJ039.agr.jpg https://commons.wikimedia.org/wiki/File:Motorola_6800_Assembly_Language.png

Slide 8

Slide 8 text

Compilers are complex…

Slide 9

Slide 9 text

Compilers are complex…

Slide 10

Slide 10 text

Can you trust your compiler?

Slide 11

Slide 11 text

Option #1: Formal verification seL4 CompCert

Slide 12

Slide 12 text

Option #1: Formal verification seL4 CompCert

Slide 13

Slide 13 text

Option #2: Extensive testing GCC LLVM

Slide 14

Slide 14 text

Option #2: Extensive testing GCC LLVM

Slide 15

Slide 15 text

What about “young” languages?

Slide 16

Slide 16 text

What about “young” languages?

Slide 17

Slide 17 text

improve assurance or add new features

Slide 18

Slide 18 text

Parsing & Lexing Semantic analysis Optimisation Code generation Intermediate representation

Slide 19

Slide 19 text

Parsing & Lexing Semantic analysis Optimisation Code generation Intermediate representation

Slide 20

Slide 20 text

Parsing & Lexing Semantic analysis Optimisation Code generation Intermediate representation ( Guillemette & Monnier, ICFP’08 )

Slide 21

Slide 21 text

Parsing & Lexing Semantic analysis Optimisation Code generation Intermediate representation

Slide 22

Slide 22 text

Accelerate An embedded language for high-performance computing

Slide 23

Slide 23 text

Accelerate An embedded language for high-performance computing

Slide 24

Slide 24 text

0 50 100 150 200 250 300 350 0 5 10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU)

Slide 25

Slide 25 text

0 50 100 150 200 250 300 350 0 5 10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU) NEW! vectorising, multicore CPU backend for Accelerate

Slide 26

Slide 26 text

0 50 100 150 200 250 300 350 0 5 10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU) NEW! vectorising, multicore CPU backend for Accelerate socket #1 socket #2 hyper-threads

Slide 27

Slide 27 text

0 50 100 150 200 250 300 350 0 5 10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU) NEW! vectorising, multicore CPU backend for Accelerate socket #1 socket #2 hyper-threads

Slide 28

Slide 28 text

Accelerate-LLVM an embedded array language with runtime compiler static type preservation for the entire compiler pipeline ensures it can never go wrong* GADT and type family techniques, scaled up to a realistic language

Slide 29

Slide 29 text

Accelerate-LLVM an embedded array language with runtime compiler static type preservation for the entire compiler pipeline ensures it can never go wrong* GADT and type family techniques, scaled up to a realistic language

Slide 30

Slide 30 text

Accelerate-LLVM an embedded array language with runtime compiler static type preservation for the entire compiler pipeline ensures it can never go wrong* GADT and type family techniques, scaled up to a realistic language

Slide 31

Slide 31 text

Intermediate representation

Slide 32

Slide 32 text

inc arr = map (+1) arr

Slide 33

Slide 33 text

inc arr = map (+1) arr from Accelerate

Slide 34

Slide 34 text

inc arr = map (+1) arr from Accelerate overload standard type classes

Slide 35

Slide 35 text

inc arr = map (+1) arr inc :: Acc (Vector Float) -> Acc (Vector Float) from Accelerate overload standard type classes

Slide 36

Slide 36 text

inc arr = map (+1) arr inc :: Acc (Vector Float) -> Acc (Vector Float) GADT from Accelerate overload standard type classes

Slide 37

Slide 37 text

Type Safe Interpreter [ GHC User’s Guide ]

Slide 38

Slide 38 text

Type Safe Interpreter [ GHC User’s Guide ] data Expr a where

Slide 39

Slide 39 text

Type Safe Interpreter [ GHC User’s Guide ] data Expr a where Lit :: Int -> Expr Int

Slide 40

Slide 40 text

Type Safe Interpreter [ GHC User’s Guide ] data Expr a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int

Slide 41

Slide 41 text

Type Safe Interpreter [ GHC User’s Guide ] data Expr a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool

Slide 42

Slide 42 text

Type Safe Interpreter [ GHC User’s Guide ] data Expr a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a

Slide 43

Slide 43 text

Type Safe Interpreter [ GHC User’s Guide ] data Expr a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b)

Slide 44

Slide 44 text

Type Safe Interpreter [ GHC User’s Guide ] data Expr a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) Constructors can require more specific types

Slide 45

Slide 45 text

Type Safe Interpreter [ GHC User’s Guide ] data Expr a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) Constructors can require more specific types

Slide 46

Slide 46 text

Type Safe Interpreter [ GHC User’s Guide ] data Expr a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) Constructors can require more specific types

Slide 47

Slide 47 text

Type Safe Interpreter [ GHC User’s Guide ] data Expr a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) eval :: Expr a -> a Constructors can require more specific types

Slide 48

Slide 48 text

Type Safe Interpreter [ GHC User’s Guide ] data Expr a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) eval :: Expr a -> a eval (Succ n) = 1 + eval n ... Constructors can require more specific types

Slide 49

Slide 49 text

Type Safe Interpreter [ GHC User’s Guide ] data Expr a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) eval :: Expr a -> a eval (Succ n) = 1 + eval n ... Pattern matching causes type refinement Constructors can require more specific types

Slide 50

Slide 50 text

inc arr = map (+1) arr inc :: Acc (Vector Float) -> Acc (Vector Float) GADT from Accelerate overload standard type classes

Slide 51

Slide 51 text

inc = Map (\x -> x + 1) inc :: Acc (Vector Float) -> Acc (Vector Float)

Slide 52

Slide 52 text

inc = Map (\x -> x + 1) inc :: Acc (Vector Float) -> Acc (Vector Float) Map :: (Shape sh, Elt a, Elt b) => Fun aenv (a -> b) -> OpenAcc aenv (Array sh a) -> OpenAcc aenv (Array sh b)

Slide 53

Slide 53 text

inc = Map (\x -> x + 1) inc :: Acc (Vector Float) -> Acc (Vector Float) Map :: (Shape sh, Elt a, Elt b) => Fun aenv (a -> b) -> OpenAcc aenv (Array sh a) -> OpenAcc aenv (Array sh b) indexed by type of result

Slide 54

Slide 54 text

inc = Map (\x -> x + 1) inc :: Acc (Vector Float) -> Acc (Vector Float) Map :: (Shape sh, Elt a, Elt b) => Fun aenv (a -> b) -> OpenAcc aenv (Array sh a) -> OpenAcc aenv (Array sh b) environment of free array variables indexed by type of result

Slide 55

Slide 55 text

inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp` Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)

Slide 56

Slide 56 text

inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp` Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)

Slide 57

Slide 57 text

inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp` Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float) introduce new binder

Slide 58

Slide 58 text

inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp` Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float) introduce new binder typed de Bruijn index

Slide 59

Slide 59 text

inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp` Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)

Slide 60

Slide 60 text

inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp` Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float) overloaded functions carry explicit dictionaries

Slide 61

Slide 61 text

inc :: Acc (Vector Float) -> Acc (Vector Float) inc arr = map (+1) arr

Slide 62

Slide 62 text

inc :: ( ) => Acc (Vector a) -> Acc (Vector a) inc arr = map (+1) arr

Slide 63

Slide 63 text

inc arr = map (+1) arr inc :: (Elt a, IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class

Slide 64

Slide 64 text

inc arr = map (+1) arr inc :: (Elt a, IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class

Slide 65

Slide 65 text

inc arr = map (+1) arr inc :: (Elt a, IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class extensible set of surface types

Slide 66

Slide 66 text

inc arr = map (+1) arr inc :: (Elt a, IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class extensible set of surface types type family EltRepr :: * type instance EltRepr Int = Int type instance EltRepr Float = Float type instance EltRepr (a,b) =(((),EltRepr a),EltRepr b) type instance EltRepr (a,b,c) = ((((), EltRepr a), ...) closed set of representation types

Slide 67

Slide 67 text

Optimisation

Slide 68

Slide 68 text

simple xs = map f ( map g xs )

Slide 69

Slide 69 text

simple xs = map f ( map g xs ) map (f . g) xs

Slide 70

Slide 70 text

Fusion [ McDonell, ICFP 2013 ] p5 p4 c1 p2 p3 p1 c2 p6 p7

Slide 71

Slide 71 text

Fusion [ McDonell, ICFP 2013 ] p5 p4 c1 p2 p3 p1 c2 p6 p7

Slide 72

Slide 72 text

Fusion [ McDonell, ICFP 2013 ] p5 p4 c1 p2 p3 p1 c2 p6 p7

Slide 73

Slide 73 text

simple xs = map f ( map g xs ) map (f . g) xs

Slide 74

Slide 74 text

data Cunctation aenv a where Done :: Arrays a => Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action.

Slide 75

Slide 75 text

data Cunctation aenv a where Done :: Arrays a => Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action. manifest array

Slide 76

Slide 76 text

data Cunctation aenv a where Done :: Arrays a => Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action. construct element at each index manifest array

Slide 77

Slide 77 text

data Cunctation aenv a where Done :: Arrays a => Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action. construct element at each index manifest array not defined in terms of array computations

Slide 78

Slide 78 text

mapD :: Fun aenv (a -> b) -> Cunctation aenv (Array sh a) -> Cunctation aenv (Array sh b) mapD f (Done arr) = Yield (shape arr) (f `compose` index arr) mapD f (Yield sh g) = Yield sh (f `compose` g)

Slide 79

Slide 79 text

mapD :: Fun aenv (a -> b) -> Cunctation aenv (Array sh a) -> Cunctation aenv (Array sh b) mapD f (Done arr) = Yield (shape arr) (f `compose` index arr) mapD f (Yield sh g) = Yield sh (f `compose` g) ( see paper for details )

Slide 80

Slide 80 text

mapD :: Fun aenv (a -> b) -> Cunctation aenv (Array sh a) -> Cunctation aenv (Array sh b) mapD f (Done arr) = Yield (shape arr) (f `compose` index arr) mapD f (Yield sh g) = Yield sh (f `compose` g) ( see paper for details ) environment types must be the same

Slide 81

Slide 81 text

complex = map f $ let xs = use (Array ...) in map g xs

Slide 82

Slide 82 text

complex = map f $ let xs = use (Array ...) in map g xs input data

Slide 83

Slide 83 text

complex = map f $ let xs = use (Array ...) in map g xs environment type ‘aenv' input data

Slide 84

Slide 84 text

complex = map f $ let xs = use (Array ...) in map g xs environment type ‘aenv' type includes base environment ‘aenv’ plus extra binding ‘xs’ input data

Slide 85

Slide 85 text

complex = let xs = use (Array ...) in map f $ map g xs

Slide 86

Slide 86 text

complex = let xs = use (Array ...) in map f $ map g xs ‘mapD’ rule can now be applied

Slide 87

Slide 87 text

( see paper for details ) complex = let xs = use (Array ...) in map f $ map g xs ‘mapD’ rule can now be applied

Slide 88

Slide 88 text

Code generation

Slide 89

Slide 89 text

Code generation

Slide 90

Slide 90 text

inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp` Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)

Slide 91

Slide 91 text

data Instruction a where Add :: NumType a -> Operand a -> Operand a -> Instruction a

Slide 92

Slide 92 text

data Instruction a where Add :: NumType a -> Operand a -> Operand a -> Instruction a constants and local references

Slide 93

Slide 93 text

data Instruction a where Add :: NumType a -> Operand a -> Operand a -> Instruction a reified dictionaries provide a type witness constants and local references

Slide 94

Slide 94 text

data Instruction a where Add :: NumType a -> Operand a -> Operand a -> Instruction a reified dictionaries provide a type witness constants and local references %2 = getelementptr float* %xs, i64 %1 %3 = load float* %2 %4 = fadd float %3, 1.000000e+00 http://hackage.haskell.org/package/llvm-general

Slide 95

Slide 95 text

Exp a

Slide 96

Slide 96 text

Exp a x + 1 :: Float

Slide 97

Slide 97 text

Exp a IR a x + 1 :: Float

Slide 98

Slide 98 text

Exp a IR a x + 1 :: Float %4 = fadd float %3, 1.000000e+00

Slide 99

Slide 99 text

Exp a IR a Frontend Backend x + 1 :: Float %4 = fadd float %3, 1.000000e+00

Slide 100

Slide 100 text

Exp a IR a Frontend Backend x + 1 :: Float %4 = fadd float %3, 1.000000e+00 data IR a where

Slide 101

Slide 101 text

Exp a IR a Frontend Backend x + 1 :: Float %4 = fadd float %3, 1.000000e+00 data IR a where IR :: Operands (EltRepr a) -> IR a

Slide 102

Slide 102 text

Exp a IR a Frontend Backend x + 1 :: Float %4 = fadd float %3, 1.000000e+00 data IR a where IR :: Operands (EltRepr a) -> IR a data family Operands :: * data instance Operands Float = ...

Slide 103

Slide 103 text

Exp a IR a Frontend Backend x + 1 :: Float %4 = fadd float %3, 1.000000e+00 data IR a where IR :: Operands (EltRepr a) -> IR a data family Operands :: * data instance Operands Float = ... ( see paper for details )

Slide 104

Slide 104 text

Safety and performance?

Slide 105

Slide 105 text

0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads Mandelbrot Repa Accelerate (LLVM-CPU)

Slide 106

Slide 106 text

0 5 10 15 20 25 0 5 10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads Ray Tracer Repa Accelerate (LLVM-CPU)

Slide 107

Slide 107 text

0 10 20 30 40 50 60 0 5 10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads Black-Scholes Repa Accelerate (LLVM-CPU)

Slide 108

Slide 108 text

0 5 10 15 20 25 30 35 40 45 50 55 0 5 10 15 20 25 30 35 40 45 50 Speedup vs. Hashcat @ 1 Thread # Threads MD5 Hash Hashcat Accelerate (LLVM-CPU)

Slide 109

Slide 109 text

Summary We can have both safety and performance while balancing correctness and effort, in a reusable framework targeting CPUs & GPUs https://github.com/AccelerateHS/accelerate-llvm