Type-safe Runtime Code Generation: Accelerate to LLVM

Type-safe Runtime Code Generation: Accelerate to LLVM

Presented at Haskell Symposium 2015: https://www.haskell.org/haskell-symposium/2015/
Paper: https://github.com/tmcdonell/tmcdonell.github.io/raw/master/papers/acc-llvm-haskell2015.pdf
Video: https://www.youtube.com/watch?v=snXhXA5noVc

Embedded languages are often compiled at application runtime; thus, embedded compile-time errors become application runtime errors. We argue that advanced type system features, such as GADTs and type families, play a crucial role in minimising such runtime errors. Specifically, a rigorous type discipline reduces runtime errors due to bugs in both embedded language applications and the implementation of the embedded language compiler itself.

In this paper, we focus on the safety guarantees achieved by type preserving compilation. We discuss the compilation pipeline of Accelerate, a high-performance array language targeting both multicore CPUs and GPUs, where we are able to preserve types from the source language down to a low-level register language in SSA form. Specifically, we demonstrate the practicability of our approach by creating a new type-safe interface to the industrial-strength LLVM compiler infrastructure, which we used to build two new Accelerate backends that show competitive runtimes on a set of benchmarks across both CPUs and GPUs.

2e4f4da0d0954eba69cf06d7df00480e?s=128

Trevor L. McDonell

September 04, 2015
Tweet

Transcript

  1. Trevor L. McDonell1 Manuel M. T. Chakravarty2 Vinod Grover3 Ryan

    R. Newton1 1Indiana University Type-safe Runtime Code Generation:
 Accelerate to LLVM tmcdonell 2University of New South Wales 3NVIDIA Corporation
  2. https://xkcd.com/378/

  3. None
  4. None
  5. https://commons.wikimedia.org/wiki/File:FortranCardPROJ039.agr.jpg https://commons.wikimedia.org/wiki/File:Motorola_6800_Assembly_Language.png

  6. https://commons.wikimedia.org/wiki/File:FortranCardPROJ039.agr.jpg https://commons.wikimedia.org/wiki/File:Motorola_6800_Assembly_Language.png

  7. https://commons.wikimedia.org/wiki/File:FortranCardPROJ039.agr.jpg https://commons.wikimedia.org/wiki/File:Motorola_6800_Assembly_Language.png

  8. Compilers are complex…

  9. Compilers are complex…

  10. Can you trust your compiler?

  11. Option #1: Formal verification seL4 CompCert

  12. Option #1: Formal verification seL4 CompCert

  13. Option #2: Extensive testing GCC LLVM

  14. Option #2: Extensive testing GCC LLVM

  15. What about “young” languages?

  16. What about “young” languages?

  17. improve assurance or add new features

  18. Parsing & Lexing Semantic analysis Optimisation Code generation Intermediate representation

  19. Parsing & Lexing Semantic analysis Optimisation Code generation Intermediate representation

  20. Parsing & Lexing Semantic analysis Optimisation Code generation Intermediate representation

    ( Guillemette & Monnier, ICFP’08 )
  21. Parsing & Lexing Semantic analysis Optimisation Code generation Intermediate representation

  22. Accelerate An embedded language for high-performance computing

  23. Accelerate An embedded language for high-performance computing

  24. 0 50 100 150 200 250 300 350 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU)
  25. 0 50 100 150 200 250 300 350 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU) NEW! vectorising, multicore CPU backend for Accelerate
  26. 0 50 100 150 200 250 300 350 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU) NEW! vectorising, multicore CPU backend for Accelerate socket #1 socket #2 hyper-threads
  27. 0 50 100 150 200 250 300 350 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU) NEW! vectorising, multicore CPU backend for Accelerate socket #1 socket #2 hyper-threads
  28. Accelerate-LLVM an embedded array language with runtime compiler static type

    preservation for the entire compiler pipeline ensures it can never go wrong* GADT and type family techniques, scaled up to a realistic language
  29. Accelerate-LLVM an embedded array language with runtime compiler static type

    preservation for the entire compiler pipeline ensures it can never go wrong* GADT and type family techniques, scaled up to a realistic language
  30. Accelerate-LLVM an embedded array language with runtime compiler static type

    preservation for the entire compiler pipeline ensures it can never go wrong* GADT and type family techniques, scaled up to a realistic language
  31. Intermediate representation

  32. inc arr = map (+1) arr

  33. inc arr = map (+1) arr from Accelerate

  34. inc arr = map (+1) arr from Accelerate overload standard

    type classes
  35. inc arr = map (+1) arr inc :: Acc (Vector

    Float) -> Acc (Vector Float) from Accelerate overload standard type classes
  36. inc arr = map (+1) arr inc :: Acc (Vector

    Float) -> Acc (Vector Float) GADT from Accelerate overload standard type classes
  37. Type Safe Interpreter [ GHC User’s Guide ]

  38. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where
  39. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int
  40. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int
  41. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool
  42. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a
  43. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b)
  44. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) Constructors can require more specific types
  45. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) Constructors can require more specific types
  46. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) Constructors can require more specific types
  47. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) eval :: Expr a -> a Constructors can require more specific types
  48. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) eval :: Expr a -> a eval (Succ n) = 1 + eval n ... Constructors can require more specific types
  49. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) eval :: Expr a -> a eval (Succ n) = 1 + eval n ... Pattern matching causes type refinement Constructors can require more specific types
  50. inc arr = map (+1) arr inc :: Acc (Vector

    Float) -> Acc (Vector Float) GADT from Accelerate overload standard type classes
  51. inc = Map (\x -> x + 1) inc ::

    Acc (Vector Float) -> Acc (Vector Float)
  52. inc = Map (\x -> x + 1) inc ::

    Acc (Vector Float) -> Acc (Vector Float) Map :: (Shape sh, Elt a, Elt b) => Fun aenv (a -> b) -> OpenAcc aenv (Array sh a) -> OpenAcc aenv (Array sh b)
  53. inc = Map (\x -> x + 1) inc ::

    Acc (Vector Float) -> Acc (Vector Float) Map :: (Shape sh, Elt a, Elt b) => Fun aenv (a -> b) -> OpenAcc aenv (Array sh a) -> OpenAcc aenv (Array sh b) indexed by type of result
  54. inc = Map (\x -> x + 1) inc ::

    Acc (Vector Float) -> Acc (Vector Float) Map :: (Shape sh, Elt a, Elt b) => Fun aenv (a -> b) -> OpenAcc aenv (Array sh a) -> OpenAcc aenv (Array sh b) environment of free array variables indexed by type of result
  55. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)
  56. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)
  57. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float) introduce new binder
  58. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float) introduce new binder typed de Bruijn index
  59. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)
  60. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float) overloaded functions carry explicit dictionaries
  61. inc :: Acc (Vector Float) -> Acc (Vector Float) inc

    arr = map (+1) arr
  62. inc :: ( ) => Acc (Vector a) -> Acc

    (Vector a) inc arr = map (+1) arr
  63. inc arr = map (+1) arr inc :: (Elt a,

    IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class
  64. inc arr = map (+1) arr inc :: (Elt a,

    IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class
  65. inc arr = map (+1) arr inc :: (Elt a,

    IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class extensible set of surface types
  66. inc arr = map (+1) arr inc :: (Elt a,

    IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class extensible set of surface types type family EltRepr :: * type instance EltRepr Int = Int type instance EltRepr Float = Float type instance EltRepr (a,b) =(((),EltRepr a),EltRepr b) type instance EltRepr (a,b,c) = ((((), EltRepr a), ...) closed set of representation types
  67. Optimisation

  68. simple xs = map f ( map g xs )

  69. simple xs = map f ( map g xs )

    map (f . g) xs
  70. Fusion [ McDonell, ICFP 2013 ] p5 p4 c1 p2

    p3 p1 c2 p6 p7
  71. Fusion [ McDonell, ICFP 2013 ] p5 p4 c1 p2

    p3 p1 c2 p6 p7
  72. Fusion [ McDonell, ICFP 2013 ] p5 p4 c1 p2

    p3 p1 c2 p6 p7
  73. simple xs = map f ( map g xs )

    map (f . g) xs
  74. data Cunctation aenv a where Done :: Arrays a =>

    Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action.
  75. data Cunctation aenv a where Done :: Arrays a =>

    Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action. manifest array
  76. data Cunctation aenv a where Done :: Arrays a =>

    Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action. construct element at each index manifest array
  77. data Cunctation aenv a where Done :: Arrays a =>

    Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action. construct element at each index manifest array not defined in terms of array computations
  78. mapD :: Fun aenv (a -> b) -> Cunctation aenv

    (Array sh a) -> Cunctation aenv (Array sh b) mapD f (Done arr) = Yield (shape arr) (f `compose` index arr) mapD f (Yield sh g) = Yield sh (f `compose` g)
  79. mapD :: Fun aenv (a -> b) -> Cunctation aenv

    (Array sh a) -> Cunctation aenv (Array sh b) mapD f (Done arr) = Yield (shape arr) (f `compose` index arr) mapD f (Yield sh g) = Yield sh (f `compose` g) ( see paper for details )
  80. mapD :: Fun aenv (a -> b) -> Cunctation aenv

    (Array sh a) -> Cunctation aenv (Array sh b) mapD f (Done arr) = Yield (shape arr) (f `compose` index arr) mapD f (Yield sh g) = Yield sh (f `compose` g) ( see paper for details ) environment types must be the same
  81. complex = map f $ let xs = use (Array

    ...) in map g xs
  82. complex = map f $ let xs = use (Array

    ...) in map g xs input data
  83. complex = map f $ let xs = use (Array

    ...) in map g xs environment type ‘aenv' input data
  84. complex = map f $ let xs = use (Array

    ...) in map g xs environment type ‘aenv' type includes base environment ‘aenv’ plus extra binding ‘xs’ input data
  85. complex = let xs = use (Array ...) in map

    f $ map g xs
  86. complex = let xs = use (Array ...) in map

    f $ map g xs ‘mapD’ rule can now be applied
  87. ( see paper for details ) complex = let xs

    = use (Array ...) in map f $ map g xs ‘mapD’ rule can now be applied
  88. Code generation

  89. Code generation

  90. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)
  91. data Instruction a where Add :: NumType a -> Operand

    a -> Operand a -> Instruction a
  92. data Instruction a where Add :: NumType a -> Operand

    a -> Operand a -> Instruction a constants and local references
  93. data Instruction a where Add :: NumType a -> Operand

    a -> Operand a -> Instruction a reified dictionaries provide a type witness constants and local references
  94. data Instruction a where Add :: NumType a -> Operand

    a -> Operand a -> Instruction a reified dictionaries provide a type witness constants and local references %2 = getelementptr float* %xs, i64 %1 %3 = load float* %2 %4 = fadd float %3, 1.000000e+00 http://hackage.haskell.org/package/llvm-general
  95. Exp a

  96. Exp a x + 1 :: Float

  97. Exp a IR a x + 1 :: Float

  98. Exp a IR a x + 1 :: Float %4

    = fadd float %3, 1.000000e+00
  99. Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00
  100. Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00 data IR a where
  101. Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00 data IR a where IR :: Operands (EltRepr a) -> IR a
  102. Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00 data IR a where IR :: Operands (EltRepr a) -> IR a data family Operands :: * data instance Operands Float = ...
  103. Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00 data IR a where IR :: Operands (EltRepr a) -> IR a data family Operands :: * data instance Operands Float = ... ( see paper for details )
  104. Safety and performance?

  105. 0 5 10 15 20 25 30 35 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads Mandelbrot Repa Accelerate (LLVM-CPU)
  106. 0 5 10 15 20 25 0 5 10 15

    20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads Ray Tracer Repa Accelerate (LLVM-CPU)
  107. 0 10 20 30 40 50 60 0 5 10

    15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads Black-Scholes Repa Accelerate (LLVM-CPU)
  108. 0 5 10 15 20 25 30 35 40 45

    50 55 0 5 10 15 20 25 30 35 40 45 50 Speedup vs. Hashcat @ 1 Thread # Threads MD5 Hash Hashcat Accelerate (LLVM-CPU)
  109. Summary We can have both safety and performance while balancing

    correctness and effort, in a reusable framework targeting CPUs & GPUs https://github.com/AccelerateHS/accelerate-llvm