Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Type-safe Runtime Code Generation: Accelerate to LLVM

Trevor L. McDonell
September 04, 2015

Type-safe Runtime Code Generation: Accelerate to LLVM

Presented at Haskell Symposium 2015: https://www.haskell.org/haskell-symposium/2015/
Paper: https://github.com/tmcdonell/tmcdonell.github.io/raw/master/papers/acc-llvm-haskell2015.pdf
Video: https://www.youtube.com/watch?v=snXhXA5noVc

Embedded languages are often compiled at application runtime; thus, embedded compile-time errors become application runtime errors. We argue that advanced type system features, such as GADTs and type families, play a crucial role in minimising such runtime errors. Specifically, a rigorous type discipline reduces runtime errors due to bugs in both embedded language applications and the implementation of the embedded language compiler itself.

In this paper, we focus on the safety guarantees achieved by type preserving compilation. We discuss the compilation pipeline of Accelerate, a high-performance array language targeting both multicore CPUs and GPUs, where we are able to preserve types from the source language down to a low-level register language in SSA form. Specifically, we demonstrate the practicability of our approach by creating a new type-safe interface to the industrial-strength LLVM compiler infrastructure, which we used to build two new Accelerate backends that show competitive runtimes on a set of benchmarks across both CPUs and GPUs.

Trevor L. McDonell

September 04, 2015
Tweet

More Decks by Trevor L. McDonell

Other Decks in Research

Transcript

  1. Trevor L. McDonell1

    Manuel M. T. Chakravarty2

    Vinod Grover3

    Ryan R. Newton1
    1Indiana University
    Type-safe Runtime Code Generation:

    Accelerate to LLVM
    tmcdonell
    2University of New South Wales 3NVIDIA Corporation

    View full-size slide

  2. https://xkcd.com/378/

    View full-size slide

  3. https://commons.wikimedia.org/wiki/File:FortranCardPROJ039.agr.jpg
    https://commons.wikimedia.org/wiki/File:Motorola_6800_Assembly_Language.png

    View full-size slide

  4. https://commons.wikimedia.org/wiki/File:FortranCardPROJ039.agr.jpg
    https://commons.wikimedia.org/wiki/File:Motorola_6800_Assembly_Language.png

    View full-size slide

  5. https://commons.wikimedia.org/wiki/File:FortranCardPROJ039.agr.jpg
    https://commons.wikimedia.org/wiki/File:Motorola_6800_Assembly_Language.png

    View full-size slide

  6. Compilers are complex…

    View full-size slide

  7. Compilers are complex…

    View full-size slide

  8. Can you trust your compiler?

    View full-size slide

  9. Option #1: Formal verification

    seL4
    CompCert

    View full-size slide

  10. Option #1: Formal verification

    seL4
    CompCert

    View full-size slide

  11. Option #2: Extensive testing

    GCC
    LLVM

    View full-size slide

  12. Option #2: Extensive testing

    GCC
    LLVM

    View full-size slide

  13. What about “young” languages?

    View full-size slide

  14. What about “young” languages?

    View full-size slide

  15. improve assurance
    or
    add new features

    View full-size slide

  16. Parsing & Lexing
    Semantic analysis
    Optimisation
    Code generation
    Intermediate representation

    View full-size slide

  17. Parsing & Lexing
    Semantic analysis
    Optimisation
    Code generation
    Intermediate representation

    View full-size slide

  18. Parsing & Lexing
    Semantic analysis
    Optimisation
    Code generation
    Intermediate representation
    ( Guillemette & Monnier, ICFP’08 )

    View full-size slide

  19. Parsing & Lexing
    Semantic analysis
    Optimisation
    Code generation
    Intermediate representation

    View full-size slide

  20. Accelerate
    An embedded language for high-performance computing

    View full-size slide

  21. Accelerate
    An embedded language for high-performance computing

    View full-size slide

  22. 0
    50
    100
    150
    200
    250
    300
    350
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    N-Body
    Repa
    Accelerate (LLVM-CPU)

    View full-size slide

  23. 0
    50
    100
    150
    200
    250
    300
    350
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    N-Body
    Repa
    Accelerate (LLVM-CPU)
    NEW! vectorising, multicore
    CPU backend for Accelerate

    View full-size slide

  24. 0
    50
    100
    150
    200
    250
    300
    350
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    N-Body
    Repa
    Accelerate (LLVM-CPU)
    NEW! vectorising, multicore
    CPU backend for Accelerate
    socket #1 socket #2 hyper-threads

    View full-size slide

  25. 0
    50
    100
    150
    200
    250
    300
    350
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    N-Body
    Repa
    Accelerate (LLVM-CPU)
    NEW! vectorising, multicore
    CPU backend for Accelerate
    socket #1 socket #2 hyper-threads

    View full-size slide

  26. Accelerate-LLVM
    an embedded array language with runtime compiler
    static type preservation for the entire compiler pipeline
    ensures it can never go wrong*
    GADT and type family techniques, scaled up to a
    realistic language

    View full-size slide

  27. Accelerate-LLVM
    an embedded array language with runtime compiler
    static type preservation for the entire compiler pipeline
    ensures it can never go wrong*
    GADT and type family techniques, scaled up to a
    realistic language

    View full-size slide

  28. Accelerate-LLVM
    an embedded array language with runtime compiler
    static type preservation for the entire compiler pipeline
    ensures it can never go wrong*
    GADT and type family techniques, scaled up to a
    realistic language

    View full-size slide

  29. Intermediate representation

    View full-size slide

  30. inc arr = map (+1) arr

    View full-size slide

  31. inc arr = map (+1) arr
    from Accelerate

    View full-size slide

  32. inc arr = map (+1) arr
    from Accelerate
    overload standard type classes

    View full-size slide

  33. inc arr = map (+1) arr
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    from Accelerate
    overload standard type classes

    View full-size slide

  34. inc arr = map (+1) arr
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    GADT
    from Accelerate
    overload standard type classes

    View full-size slide

  35. Type Safe Interpreter
    [ GHC User’s Guide ]

    View full-size slide

  36. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where

    View full-size slide

  37. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int

    View full-size slide

  38. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int

    View full-size slide

  39. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool

    View full-size slide

  40. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a

    View full-size slide

  41. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)

    View full-size slide

  42. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)
    Constructors can require
    more specific types

    View full-size slide

  43. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)
    Constructors can require
    more specific types

    View full-size slide

  44. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)
    Constructors can require
    more specific types

    View full-size slide

  45. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)
    eval :: Expr a -> a
    Constructors can require
    more specific types

    View full-size slide

  46. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)
    eval :: Expr a -> a
    eval (Succ n) = 1 + eval n
    ...
    Constructors can require
    more specific types

    View full-size slide

  47. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)
    eval :: Expr a -> a
    eval (Succ n) = 1 + eval n
    ...
    Pattern matching causes
    type refinement
    Constructors can require
    more specific types

    View full-size slide

  48. inc arr = map (+1) arr
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    GADT
    from Accelerate
    overload standard type classes

    View full-size slide

  49. inc =
    Map
    (\x -> x + 1)
    inc :: Acc (Vector Float) -> Acc (Vector Float)

    View full-size slide

  50. inc =
    Map
    (\x -> x + 1)
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    Map :: (Shape sh, Elt a, Elt b)
    => Fun aenv (a -> b)
    -> OpenAcc aenv (Array sh a)
    -> OpenAcc aenv (Array sh b)

    View full-size slide

  51. inc =
    Map
    (\x -> x + 1)
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    Map :: (Shape sh, Elt a, Elt b)
    => Fun aenv (a -> b)
    -> OpenAcc aenv (Array sh a)
    -> OpenAcc aenv (Array sh b)
    indexed by
    type of result

    View full-size slide

  52. inc =
    Map
    (\x -> x + 1)
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    Map :: (Shape sh, Elt a, Elt b)
    => Fun aenv (a -> b)
    -> OpenAcc aenv (Array sh a)
    -> OpenAcc aenv (Array sh b)
    environment of
    free array variables indexed by
    type of result

    View full-size slide

  53. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)

    View full-size slide

  54. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)

    View full-size slide

  55. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    introduce new binder

    View full-size slide

  56. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    introduce new binder
    typed de Bruijn index

    View full-size slide

  57. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)

    View full-size slide

  58. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    overloaded functions
    carry explicit dictionaries

    View full-size slide

  59. inc :: Acc (Vector Float) -> Acc (Vector Float)
    inc arr = map (+1) arr

    View full-size slide

  60. inc :: ( )
    => Acc (Vector a) -> Acc (Vector a)
    inc arr = map (+1) arr

    View full-size slide

  61. inc arr = map (+1) arr
    inc :: (Elt a, IsNum a)
    => Acc (Vector a) -> Acc (Vector a)
    reifies dictionary of
    Num class

    View full-size slide

  62. inc arr = map (+1) arr
    inc :: (Elt a, IsNum a)
    => Acc (Vector a) -> Acc (Vector a)
    reifies dictionary of
    Num class

    View full-size slide

  63. inc arr = map (+1) arr
    inc :: (Elt a, IsNum a)
    => Acc (Vector a) -> Acc (Vector a)
    reifies dictionary of
    Num class
    extensible set of
    surface types

    View full-size slide

  64. inc arr = map (+1) arr
    inc :: (Elt a, IsNum a)
    => Acc (Vector a) -> Acc (Vector a)
    reifies dictionary of
    Num class
    extensible set of
    surface types
    type family EltRepr :: *
    type instance EltRepr Int = Int
    type instance EltRepr Float = Float
    type instance EltRepr (a,b) =(((),EltRepr a),EltRepr b)
    type instance EltRepr (a,b,c) = ((((), EltRepr a), ...)
    closed set of representation types

    View full-size slide

  65. Optimisation

    View full-size slide

  66. simple xs = map f ( map g xs )

    View full-size slide

  67. simple xs = map f ( map g xs )
    map (f . g) xs

    View full-size slide

  68. Fusion
    [ McDonell, ICFP 2013 ]
    p5
    p4
    c1
    p2 p3
    p1 c2 p6 p7

    View full-size slide

  69. Fusion
    [ McDonell, ICFP 2013 ]
    p5
    p4
    c1
    p2 p3
    p1 c2 p6 p7

    View full-size slide

  70. Fusion
    [ McDonell, ICFP 2013 ]
    p5
    p4
    c1
    p2 p3
    p1 c2 p6 p7

    View full-size slide

  71. simple xs = map f ( map g xs )
    map (f . g) xs

    View full-size slide

  72. data Cunctation aenv a where
    Done :: Arrays a
    => Idx aenv a
    -> Cunctation aenv a
    Yield :: (Shape sh, Elt e)
    => Exp aenv sh
    -> Fun aenv (sh -> e)
    -> Cunctation aenv (Array sh e)
    cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun)
    The action of delaying or putting off something. A tardy action.

    View full-size slide

  73. data Cunctation aenv a where
    Done :: Arrays a
    => Idx aenv a
    -> Cunctation aenv a
    Yield :: (Shape sh, Elt e)
    => Exp aenv sh
    -> Fun aenv (sh -> e)
    -> Cunctation aenv (Array sh e)
    cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun)
    The action of delaying or putting off something. A tardy action.
    manifest array

    View full-size slide

  74. data Cunctation aenv a where
    Done :: Arrays a
    => Idx aenv a
    -> Cunctation aenv a
    Yield :: (Shape sh, Elt e)
    => Exp aenv sh
    -> Fun aenv (sh -> e)
    -> Cunctation aenv (Array sh e)
    cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun)
    The action of delaying or putting off something. A tardy action.
    construct element
    at each index
    manifest array

    View full-size slide

  75. data Cunctation aenv a where
    Done :: Arrays a
    => Idx aenv a
    -> Cunctation aenv a
    Yield :: (Shape sh, Elt e)
    => Exp aenv sh
    -> Fun aenv (sh -> e)
    -> Cunctation aenv (Array sh e)
    cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun)
    The action of delaying or putting off something. A tardy action.
    construct element
    at each index
    manifest array
    not defined in terms of
    array computations

    View full-size slide

  76. mapD :: Fun aenv (a -> b)
    -> Cunctation aenv (Array sh a)
    -> Cunctation aenv (Array sh b)
    mapD f (Done arr)
    = Yield (shape arr) (f `compose` index arr)
    mapD f (Yield sh g)
    = Yield sh (f `compose` g)

    View full-size slide

  77. mapD :: Fun aenv (a -> b)
    -> Cunctation aenv (Array sh a)
    -> Cunctation aenv (Array sh b)
    mapD f (Done arr)
    = Yield (shape arr) (f `compose` index arr)
    mapD f (Yield sh g)
    = Yield sh (f `compose` g)
    ( see paper for details )

    View full-size slide

  78. mapD :: Fun aenv (a -> b)
    -> Cunctation aenv (Array sh a)
    -> Cunctation aenv (Array sh b)
    mapD f (Done arr)
    = Yield (shape arr) (f `compose` index arr)
    mapD f (Yield sh g)
    = Yield sh (f `compose` g)
    ( see paper for details )
    environment types must
    be the same

    View full-size slide

  79. complex = map f
    $ let xs = use (Array ...) in
    map g xs

    View full-size slide

  80. complex = map f
    $ let xs = use (Array ...) in
    map g xs
    input data

    View full-size slide

  81. complex = map f
    $ let xs = use (Array ...) in
    map g xs
    environment type ‘aenv'
    input data

    View full-size slide

  82. complex = map f
    $ let xs = use (Array ...) in
    map g xs
    environment type ‘aenv'
    type includes base environment ‘aenv’
    plus extra binding ‘xs’
    input data

    View full-size slide

  83. complex = let xs = use (Array ...) in
    map f
    $ map g xs

    View full-size slide

  84. complex = let xs = use (Array ...) in
    map f
    $ map g xs
    ‘mapD’ rule can now be applied

    View full-size slide

  85. ( see paper for details )
    complex = let xs = use (Array ...) in
    map f
    $ map g xs
    ‘mapD’ rule can now be applied

    View full-size slide

  86. Code generation

    View full-size slide

  87. Code generation

    View full-size slide

  88. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)

    View full-size slide

  89. data Instruction a where
    Add :: NumType a
    -> Operand a
    -> Operand a
    -> Instruction a

    View full-size slide

  90. data Instruction a where
    Add :: NumType a
    -> Operand a
    -> Operand a
    -> Instruction a
    constants and
    local references

    View full-size slide

  91. data Instruction a where
    Add :: NumType a
    -> Operand a
    -> Operand a
    -> Instruction a
    reified dictionaries provide
    a type witness
    constants and
    local references

    View full-size slide

  92. data Instruction a where
    Add :: NumType a
    -> Operand a
    -> Operand a
    -> Instruction a
    reified dictionaries provide
    a type witness
    constants and
    local references
    %2 = getelementptr float* %xs, i64 %1
    %3 = load float* %2
    %4 = fadd float %3, 1.000000e+00
    http://hackage.haskell.org/package/llvm-general

    View full-size slide

  93. Exp a
    x + 1 :: Float

    View full-size slide

  94. Exp a IR a
    x + 1 :: Float

    View full-size slide

  95. Exp a IR a
    x + 1 :: Float %4 = fadd float %3, 1.000000e+00

    View full-size slide

  96. Exp a IR a
    Frontend Backend
    x + 1 :: Float %4 = fadd float %3, 1.000000e+00

    View full-size slide

  97. Exp a IR a
    Frontend Backend
    x + 1 :: Float %4 = fadd float %3, 1.000000e+00
    data IR a where

    View full-size slide

  98. Exp a IR a
    Frontend Backend
    x + 1 :: Float %4 = fadd float %3, 1.000000e+00
    data IR a where
    IR :: Operands (EltRepr a) -> IR a

    View full-size slide

  99. Exp a IR a
    Frontend Backend
    x + 1 :: Float %4 = fadd float %3, 1.000000e+00
    data IR a where
    IR :: Operands (EltRepr a) -> IR a
    data family Operands :: *
    data instance Operands Float = ...

    View full-size slide

  100. Exp a IR a
    Frontend Backend
    x + 1 :: Float %4 = fadd float %3, 1.000000e+00
    data IR a where
    IR :: Operands (EltRepr a) -> IR a
    data family Operands :: *
    data instance Operands Float = ...
    ( see paper for details )

    View full-size slide

  101. Safety and performance?

    View full-size slide

  102. 0
    5
    10
    15
    20
    25
    30
    35
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    Mandelbrot
    Repa
    Accelerate (LLVM-CPU)

    View full-size slide

  103. 0
    5
    10
    15
    20
    25
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    Ray Tracer
    Repa
    Accelerate (LLVM-CPU)

    View full-size slide

  104. 0
    10
    20
    30
    40
    50
    60
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    Black-Scholes
    Repa
    Accelerate (LLVM-CPU)

    View full-size slide

  105. 0
    5
    10
    15
    20
    25
    30
    35
    40
    45
    50
    55
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Hashcat @ 1 Thread
    # Threads
    MD5 Hash
    Hashcat
    Accelerate (LLVM-CPU)

    View full-size slide

  106. Summary
    We can have both safety and performance
    while balancing correctness and effort,
    in a reusable framework targeting CPUs & GPUs
    https://github.com/AccelerateHS/accelerate-llvm

    View full-size slide