Type-safe Runtime Code Generation: Accelerate to LLVM

Type-safe Runtime Code Generation: Accelerate to LLVM

Presented at Haskell Symposium 2015: https://www.haskell.org/haskell-symposium/2015/
Paper: https://github.com/tmcdonell/tmcdonell.github.io/raw/master/papers/acc-llvm-haskell2015.pdf
Video: https://www.youtube.com/watch?v=snXhXA5noVc

Embedded languages are often compiled at application runtime; thus, embedded compile-time errors become application runtime errors. We argue that advanced type system features, such as GADTs and type families, play a crucial role in minimising such runtime errors. Specifically, a rigorous type discipline reduces runtime errors due to bugs in both embedded language applications and the implementation of the embedded language compiler itself.

In this paper, we focus on the safety guarantees achieved by type preserving compilation. We discuss the compilation pipeline of Accelerate, a high-performance array language targeting both multicore CPUs and GPUs, where we are able to preserve types from the source language down to a low-level register language in SSA form. Specifically, we demonstrate the practicability of our approach by creating a new type-safe interface to the industrial-strength LLVM compiler infrastructure, which we used to build two new Accelerate backends that show competitive runtimes on a set of benchmarks across both CPUs and GPUs.

2e4f4da0d0954eba69cf06d7df00480e?s=128

Trevor L. McDonell

September 04, 2015
Tweet

Transcript

  1. 1.

    Trevor L. McDonell1 Manuel M. T. Chakravarty2 Vinod Grover3 Ryan

    R. Newton1 1Indiana University Type-safe Runtime Code Generation:
 Accelerate to LLVM tmcdonell 2University of New South Wales 3NVIDIA Corporation
  2. 3.
  3. 4.
  4. 24.

    0 50 100 150 200 250 300 350 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU)
  5. 25.

    0 50 100 150 200 250 300 350 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU) NEW! vectorising, multicore CPU backend for Accelerate
  6. 26.

    0 50 100 150 200 250 300 350 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU) NEW! vectorising, multicore CPU backend for Accelerate socket #1 socket #2 hyper-threads
  7. 27.

    0 50 100 150 200 250 300 350 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU) NEW! vectorising, multicore CPU backend for Accelerate socket #1 socket #2 hyper-threads
  8. 28.

    Accelerate-LLVM an embedded array language with runtime compiler static type

    preservation for the entire compiler pipeline ensures it can never go wrong* GADT and type family techniques, scaled up to a realistic language
  9. 29.

    Accelerate-LLVM an embedded array language with runtime compiler static type

    preservation for the entire compiler pipeline ensures it can never go wrong* GADT and type family techniques, scaled up to a realistic language
  10. 30.

    Accelerate-LLVM an embedded array language with runtime compiler static type

    preservation for the entire compiler pipeline ensures it can never go wrong* GADT and type family techniques, scaled up to a realistic language
  11. 35.

    inc arr = map (+1) arr inc :: Acc (Vector

    Float) -> Acc (Vector Float) from Accelerate overload standard type classes
  12. 36.

    inc arr = map (+1) arr inc :: Acc (Vector

    Float) -> Acc (Vector Float) GADT from Accelerate overload standard type classes
  13. 39.
  14. 40.

    Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int
  15. 41.

    Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool
  16. 42.

    Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a
  17. 43.

    Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b)
  18. 44.

    Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) Constructors can require more specific types
  19. 45.

    Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) Constructors can require more specific types
  20. 46.

    Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) Constructors can require more specific types
  21. 47.

    Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) eval :: Expr a -> a Constructors can require more specific types
  22. 48.

    Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) eval :: Expr a -> a eval (Succ n) = 1 + eval n ... Constructors can require more specific types
  23. 49.

    Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) eval :: Expr a -> a eval (Succ n) = 1 + eval n ... Pattern matching causes type refinement Constructors can require more specific types
  24. 50.

    inc arr = map (+1) arr inc :: Acc (Vector

    Float) -> Acc (Vector Float) GADT from Accelerate overload standard type classes
  25. 51.

    inc = Map (\x -> x + 1) inc ::

    Acc (Vector Float) -> Acc (Vector Float)
  26. 52.

    inc = Map (\x -> x + 1) inc ::

    Acc (Vector Float) -> Acc (Vector Float) Map :: (Shape sh, Elt a, Elt b) => Fun aenv (a -> b) -> OpenAcc aenv (Array sh a) -> OpenAcc aenv (Array sh b)
  27. 53.

    inc = Map (\x -> x + 1) inc ::

    Acc (Vector Float) -> Acc (Vector Float) Map :: (Shape sh, Elt a, Elt b) => Fun aenv (a -> b) -> OpenAcc aenv (Array sh a) -> OpenAcc aenv (Array sh b) indexed by type of result
  28. 54.

    inc = Map (\x -> x + 1) inc ::

    Acc (Vector Float) -> Acc (Vector Float) Map :: (Shape sh, Elt a, Elt b) => Fun aenv (a -> b) -> OpenAcc aenv (Array sh a) -> OpenAcc aenv (Array sh b) environment of free array variables indexed by type of result
  29. 55.

    inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)
  30. 56.

    inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)
  31. 57.

    inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float) introduce new binder
  32. 58.

    inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float) introduce new binder typed de Bruijn index
  33. 59.

    inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)
  34. 60.

    inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float) overloaded functions carry explicit dictionaries
  35. 62.

    inc :: ( ) => Acc (Vector a) -> Acc

    (Vector a) inc arr = map (+1) arr
  36. 63.

    inc arr = map (+1) arr inc :: (Elt a,

    IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class
  37. 64.

    inc arr = map (+1) arr inc :: (Elt a,

    IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class
  38. 65.

    inc arr = map (+1) arr inc :: (Elt a,

    IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class extensible set of surface types
  39. 66.

    inc arr = map (+1) arr inc :: (Elt a,

    IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class extensible set of surface types type family EltRepr :: * type instance EltRepr Int = Int type instance EltRepr Float = Float type instance EltRepr (a,b) =(((),EltRepr a),EltRepr b) type instance EltRepr (a,b,c) = ((((), EltRepr a), ...) closed set of representation types
  40. 69.
  41. 73.
  42. 74.

    data Cunctation aenv a where Done :: Arrays a =>

    Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action.
  43. 75.

    data Cunctation aenv a where Done :: Arrays a =>

    Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action. manifest array
  44. 76.

    data Cunctation aenv a where Done :: Arrays a =>

    Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action. construct element at each index manifest array
  45. 77.

    data Cunctation aenv a where Done :: Arrays a =>

    Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action. construct element at each index manifest array not defined in terms of array computations
  46. 78.

    mapD :: Fun aenv (a -> b) -> Cunctation aenv

    (Array sh a) -> Cunctation aenv (Array sh b) mapD f (Done arr) = Yield (shape arr) (f `compose` index arr) mapD f (Yield sh g) = Yield sh (f `compose` g)
  47. 79.

    mapD :: Fun aenv (a -> b) -> Cunctation aenv

    (Array sh a) -> Cunctation aenv (Array sh b) mapD f (Done arr) = Yield (shape arr) (f `compose` index arr) mapD f (Yield sh g) = Yield sh (f `compose` g) ( see paper for details )
  48. 80.

    mapD :: Fun aenv (a -> b) -> Cunctation aenv

    (Array sh a) -> Cunctation aenv (Array sh b) mapD f (Done arr) = Yield (shape arr) (f `compose` index arr) mapD f (Yield sh g) = Yield sh (f `compose` g) ( see paper for details ) environment types must be the same
  49. 82.

    complex = map f $ let xs = use (Array

    ...) in map g xs input data
  50. 83.

    complex = map f $ let xs = use (Array

    ...) in map g xs environment type ‘aenv' input data
  51. 84.

    complex = map f $ let xs = use (Array

    ...) in map g xs environment type ‘aenv' type includes base environment ‘aenv’ plus extra binding ‘xs’ input data
  52. 86.

    complex = let xs = use (Array ...) in map

    f $ map g xs ‘mapD’ rule can now be applied
  53. 87.

    ( see paper for details ) complex = let xs

    = use (Array ...) in map f $ map g xs ‘mapD’ rule can now be applied
  54. 90.

    inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)
  55. 91.
  56. 92.

    data Instruction a where Add :: NumType a -> Operand

    a -> Operand a -> Instruction a constants and local references
  57. 93.

    data Instruction a where Add :: NumType a -> Operand

    a -> Operand a -> Instruction a reified dictionaries provide a type witness constants and local references
  58. 94.

    data Instruction a where Add :: NumType a -> Operand

    a -> Operand a -> Instruction a reified dictionaries provide a type witness constants and local references %2 = getelementptr float* %xs, i64 %1 %3 = load float* %2 %4 = fadd float %3, 1.000000e+00 http://hackage.haskell.org/package/llvm-general
  59. 95.
  60. 98.

    Exp a IR a x + 1 :: Float %4

    = fadd float %3, 1.000000e+00
  61. 99.

    Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00
  62. 100.

    Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00 data IR a where
  63. 101.

    Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00 data IR a where IR :: Operands (EltRepr a) -> IR a
  64. 102.

    Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00 data IR a where IR :: Operands (EltRepr a) -> IR a data family Operands :: * data instance Operands Float = ...
  65. 103.

    Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00 data IR a where IR :: Operands (EltRepr a) -> IR a data family Operands :: * data instance Operands Float = ... ( see paper for details )
  66. 105.

    0 5 10 15 20 25 30 35 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads Mandelbrot Repa Accelerate (LLVM-CPU)
  67. 106.

    0 5 10 15 20 25 0 5 10 15

    20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads Ray Tracer Repa Accelerate (LLVM-CPU)
  68. 107.

    0 10 20 30 40 50 60 0 5 10

    15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads Black-Scholes Repa Accelerate (LLVM-CPU)
  69. 108.

    0 5 10 15 20 25 30 35 40 45

    50 55 0 5 10 15 20 25 30 35 40 45 50 Speedup vs. Hashcat @ 1 Thread # Threads MD5 Hash Hashcat Accelerate (LLVM-CPU)
  70. 109.

    Summary We can have both safety and performance while balancing

    correctness and effort, in a reusable framework targeting CPUs & GPUs https://github.com/AccelerateHS/accelerate-llvm