Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Type-safe Runtime Code Generation: Accelerate to LLVM

Trevor L. McDonell
September 04, 2015

Type-safe Runtime Code Generation: Accelerate to LLVM

Presented at Haskell Symposium 2015: https://www.haskell.org/haskell-symposium/2015/
Paper: https://github.com/tmcdonell/tmcdonell.github.io/raw/master/papers/acc-llvm-haskell2015.pdf
Video: https://www.youtube.com/watch?v=snXhXA5noVc

Embedded languages are often compiled at application runtime; thus, embedded compile-time errors become application runtime errors. We argue that advanced type system features, such as GADTs and type families, play a crucial role in minimising such runtime errors. Specifically, a rigorous type discipline reduces runtime errors due to bugs in both embedded language applications and the implementation of the embedded language compiler itself.

In this paper, we focus on the safety guarantees achieved by type preserving compilation. We discuss the compilation pipeline of Accelerate, a high-performance array language targeting both multicore CPUs and GPUs, where we are able to preserve types from the source language down to a low-level register language in SSA form. Specifically, we demonstrate the practicability of our approach by creating a new type-safe interface to the industrial-strength LLVM compiler infrastructure, which we used to build two new Accelerate backends that show competitive runtimes on a set of benchmarks across both CPUs and GPUs.

Trevor L. McDonell

September 04, 2015
Tweet

More Decks by Trevor L. McDonell

Other Decks in Research

Transcript

  1. Trevor L. McDonell1 Manuel M. T. Chakravarty2 Vinod Grover3 Ryan

    R. Newton1 1Indiana University Type-safe Runtime Code Generation:
 Accelerate to LLVM tmcdonell 2University of New South Wales 3NVIDIA Corporation
  2. 0 50 100 150 200 250 300 350 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU)
  3. 0 50 100 150 200 250 300 350 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU) NEW! vectorising, multicore CPU backend for Accelerate
  4. 0 50 100 150 200 250 300 350 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU) NEW! vectorising, multicore CPU backend for Accelerate socket #1 socket #2 hyper-threads
  5. 0 50 100 150 200 250 300 350 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads N-Body Repa Accelerate (LLVM-CPU) NEW! vectorising, multicore CPU backend for Accelerate socket #1 socket #2 hyper-threads
  6. Accelerate-LLVM an embedded array language with runtime compiler static type

    preservation for the entire compiler pipeline ensures it can never go wrong* GADT and type family techniques, scaled up to a realistic language
  7. Accelerate-LLVM an embedded array language with runtime compiler static type

    preservation for the entire compiler pipeline ensures it can never go wrong* GADT and type family techniques, scaled up to a realistic language
  8. Accelerate-LLVM an embedded array language with runtime compiler static type

    preservation for the entire compiler pipeline ensures it can never go wrong* GADT and type family techniques, scaled up to a realistic language
  9. inc arr = map (+1) arr inc :: Acc (Vector

    Float) -> Acc (Vector Float) from Accelerate overload standard type classes
  10. inc arr = map (+1) arr inc :: Acc (Vector

    Float) -> Acc (Vector Float) GADT from Accelerate overload standard type classes
  11. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int
  12. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool
  13. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a
  14. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b)
  15. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) Constructors can require more specific types
  16. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) Constructors can require more specific types
  17. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) Constructors can require more specific types
  18. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) eval :: Expr a -> a Constructors can require more specific types
  19. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) eval :: Expr a -> a eval (Succ n) = 1 + eval n ... Constructors can require more specific types
  20. Type Safe Interpreter [ GHC User’s Guide ] data Expr

    a where Lit :: Int -> Expr Int Succ :: Expr Int -> Expr Int IsZero :: Expr Int -> Expr Bool If :: Expr Bool -> Expr a -> Expr a -> Expr a Pair :: Expr a -> Expr b -> Expr (a, b) eval :: Expr a -> a eval (Succ n) = 1 + eval n ... Pattern matching causes type refinement Constructors can require more specific types
  21. inc arr = map (+1) arr inc :: Acc (Vector

    Float) -> Acc (Vector Float) GADT from Accelerate overload standard type classes
  22. inc = Map (\x -> x + 1) inc ::

    Acc (Vector Float) -> Acc (Vector Float)
  23. inc = Map (\x -> x + 1) inc ::

    Acc (Vector Float) -> Acc (Vector Float) Map :: (Shape sh, Elt a, Elt b) => Fun aenv (a -> b) -> OpenAcc aenv (Array sh a) -> OpenAcc aenv (Array sh b)
  24. inc = Map (\x -> x + 1) inc ::

    Acc (Vector Float) -> Acc (Vector Float) Map :: (Shape sh, Elt a, Elt b) => Fun aenv (a -> b) -> OpenAcc aenv (Array sh a) -> OpenAcc aenv (Array sh b) indexed by type of result
  25. inc = Map (\x -> x + 1) inc ::

    Acc (Vector Float) -> Acc (Vector Float) Map :: (Shape sh, Elt a, Elt b) => Fun aenv (a -> b) -> OpenAcc aenv (Array sh a) -> OpenAcc aenv (Array sh b) environment of free array variables indexed by type of result
  26. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)
  27. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)
  28. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float) introduce new binder
  29. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float) introduce new binder typed de Bruijn index
  30. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)
  31. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float) overloaded functions carry explicit dictionaries
  32. inc :: ( ) => Acc (Vector a) -> Acc

    (Vector a) inc arr = map (+1) arr
  33. inc arr = map (+1) arr inc :: (Elt a,

    IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class
  34. inc arr = map (+1) arr inc :: (Elt a,

    IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class
  35. inc arr = map (+1) arr inc :: (Elt a,

    IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class extensible set of surface types
  36. inc arr = map (+1) arr inc :: (Elt a,

    IsNum a) => Acc (Vector a) -> Acc (Vector a) reifies dictionary of Num class extensible set of surface types type family EltRepr :: * type instance EltRepr Int = Int type instance EltRepr Float = Float type instance EltRepr (a,b) =(((),EltRepr a),EltRepr b) type instance EltRepr (a,b,c) = ((((), EltRepr a), ...) closed set of representation types
  37. data Cunctation aenv a where Done :: Arrays a =>

    Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action.
  38. data Cunctation aenv a where Done :: Arrays a =>

    Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action. manifest array
  39. data Cunctation aenv a where Done :: Arrays a =>

    Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action. construct element at each index manifest array
  40. data Cunctation aenv a where Done :: Arrays a =>

    Idx aenv a -> Cunctation aenv a Yield :: (Shape sh, Elt e) => Exp aenv sh -> Fun aenv (sh -> e) -> Cunctation aenv (Array sh e) cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun) The action of delaying or putting off something. A tardy action. construct element at each index manifest array not defined in terms of array computations
  41. mapD :: Fun aenv (a -> b) -> Cunctation aenv

    (Array sh a) -> Cunctation aenv (Array sh b) mapD f (Done arr) = Yield (shape arr) (f `compose` index arr) mapD f (Yield sh g) = Yield sh (f `compose` g)
  42. mapD :: Fun aenv (a -> b) -> Cunctation aenv

    (Array sh a) -> Cunctation aenv (Array sh b) mapD f (Done arr) = Yield (shape arr) (f `compose` index arr) mapD f (Yield sh g) = Yield sh (f `compose` g) ( see paper for details )
  43. mapD :: Fun aenv (a -> b) -> Cunctation aenv

    (Array sh a) -> Cunctation aenv (Array sh b) mapD f (Done arr) = Yield (shape arr) (f `compose` index arr) mapD f (Yield sh g) = Yield sh (f `compose` g) ( see paper for details ) environment types must be the same
  44. complex = map f $ let xs = use (Array

    ...) in map g xs input data
  45. complex = map f $ let xs = use (Array

    ...) in map g xs environment type ‘aenv' input data
  46. complex = map f $ let xs = use (Array

    ...) in map g xs environment type ‘aenv' type includes base environment ‘aenv’ plus extra binding ‘xs’ input data
  47. complex = let xs = use (Array ...) in map

    f $ map g xs ‘mapD’ rule can now be applied
  48. ( see paper for details ) complex = let xs

    = use (Array ...) in map f $ map g xs ‘mapD’ rule can now be applied
  49. inc = Map (Lam (Body PrimAdd (IsNum Float dictionary) `PrimApp`

    Tuple (NilTup `SnocTup` (Var ZeroIdx) `SnocTup` (Const 1)))) inc :: Acc (Vector Float) -> Acc (Vector Float)
  50. data Instruction a where Add :: NumType a -> Operand

    a -> Operand a -> Instruction a constants and local references
  51. data Instruction a where Add :: NumType a -> Operand

    a -> Operand a -> Instruction a reified dictionaries provide a type witness constants and local references
  52. data Instruction a where Add :: NumType a -> Operand

    a -> Operand a -> Instruction a reified dictionaries provide a type witness constants and local references %2 = getelementptr float* %xs, i64 %1 %3 = load float* %2 %4 = fadd float %3, 1.000000e+00 http://hackage.haskell.org/package/llvm-general
  53. Exp a IR a x + 1 :: Float %4

    = fadd float %3, 1.000000e+00
  54. Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00
  55. Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00 data IR a where
  56. Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00 data IR a where IR :: Operands (EltRepr a) -> IR a
  57. Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00 data IR a where IR :: Operands (EltRepr a) -> IR a data family Operands :: * data instance Operands Float = ...
  58. Exp a IR a Frontend Backend x + 1 ::

    Float %4 = fadd float %3, 1.000000e+00 data IR a where IR :: Operands (EltRepr a) -> IR a data family Operands :: * data instance Operands Float = ... ( see paper for details )
  59. 0 5 10 15 20 25 30 35 0 5

    10 15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads Mandelbrot Repa Accelerate (LLVM-CPU)
  60. 0 5 10 15 20 25 0 5 10 15

    20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads Ray Tracer Repa Accelerate (LLVM-CPU)
  61. 0 10 20 30 40 50 60 0 5 10

    15 20 25 30 35 40 45 50 Speedup vs. Repa @ 1 Thread # Threads Black-Scholes Repa Accelerate (LLVM-CPU)
  62. 0 5 10 15 20 25 30 35 40 45

    50 55 0 5 10 15 20 25 30 35 40 45 50 Speedup vs. Hashcat @ 1 Thread # Threads MD5 Hash Hashcat Accelerate (LLVM-CPU)
  63. Summary We can have both safety and performance while balancing

    correctness and effort, in a reusable framework targeting CPUs & GPUs https://github.com/AccelerateHS/accelerate-llvm