Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Type-safe Runtime Code Generation: Accelerate to LLVM

Trevor L. McDonell
September 04, 2015

Type-safe Runtime Code Generation: Accelerate to LLVM

Presented at Haskell Symposium 2015: https://www.haskell.org/haskell-symposium/2015/
Paper: https://github.com/tmcdonell/tmcdonell.github.io/raw/master/papers/acc-llvm-haskell2015.pdf
Video: https://www.youtube.com/watch?v=snXhXA5noVc

Embedded languages are often compiled at application runtime; thus, embedded compile-time errors become application runtime errors. We argue that advanced type system features, such as GADTs and type families, play a crucial role in minimising such runtime errors. Specifically, a rigorous type discipline reduces runtime errors due to bugs in both embedded language applications and the implementation of the embedded language compiler itself.

In this paper, we focus on the safety guarantees achieved by type preserving compilation. We discuss the compilation pipeline of Accelerate, a high-performance array language targeting both multicore CPUs and GPUs, where we are able to preserve types from the source language down to a low-level register language in SSA form. Specifically, we demonstrate the practicability of our approach by creating a new type-safe interface to the industrial-strength LLVM compiler infrastructure, which we used to build two new Accelerate backends that show competitive runtimes on a set of benchmarks across both CPUs and GPUs.

Trevor L. McDonell

September 04, 2015
Tweet

More Decks by Trevor L. McDonell

Other Decks in Research

Transcript

  1. Trevor L. McDonell1

    Manuel M. T. Chakravarty2

    Vinod Grover3

    Ryan R. Newton1
    1Indiana University
    Type-safe Runtime Code Generation:

    Accelerate to LLVM
    tmcdonell
    2University of New South Wales 3NVIDIA Corporation

    View Slide

  2. https://xkcd.com/378/

    View Slide

  3. View Slide

  4. View Slide

  5. https://commons.wikimedia.org/wiki/File:FortranCardPROJ039.agr.jpg
    https://commons.wikimedia.org/wiki/File:Motorola_6800_Assembly_Language.png

    View Slide

  6. https://commons.wikimedia.org/wiki/File:FortranCardPROJ039.agr.jpg
    https://commons.wikimedia.org/wiki/File:Motorola_6800_Assembly_Language.png

    View Slide

  7. https://commons.wikimedia.org/wiki/File:FortranCardPROJ039.agr.jpg
    https://commons.wikimedia.org/wiki/File:Motorola_6800_Assembly_Language.png

    View Slide

  8. Compilers are complex…

    View Slide

  9. Compilers are complex…

    View Slide

  10. Can you trust your compiler?

    View Slide

  11. Option #1: Formal verification

    seL4
    CompCert

    View Slide

  12. Option #1: Formal verification

    seL4
    CompCert

    View Slide

  13. Option #2: Extensive testing

    GCC
    LLVM

    View Slide

  14. Option #2: Extensive testing

    GCC
    LLVM

    View Slide

  15. What about “young” languages?

    View Slide

  16. What about “young” languages?

    View Slide

  17. improve assurance
    or
    add new features

    View Slide

  18. Parsing & Lexing
    Semantic analysis
    Optimisation
    Code generation
    Intermediate representation

    View Slide

  19. Parsing & Lexing
    Semantic analysis
    Optimisation
    Code generation
    Intermediate representation

    View Slide

  20. Parsing & Lexing
    Semantic analysis
    Optimisation
    Code generation
    Intermediate representation
    ( Guillemette & Monnier, ICFP’08 )

    View Slide

  21. Parsing & Lexing
    Semantic analysis
    Optimisation
    Code generation
    Intermediate representation

    View Slide

  22. Accelerate
    An embedded language for high-performance computing

    View Slide

  23. Accelerate
    An embedded language for high-performance computing

    View Slide

  24. 0
    50
    100
    150
    200
    250
    300
    350
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    N-Body
    Repa
    Accelerate (LLVM-CPU)

    View Slide

  25. 0
    50
    100
    150
    200
    250
    300
    350
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    N-Body
    Repa
    Accelerate (LLVM-CPU)
    NEW! vectorising, multicore
    CPU backend for Accelerate

    View Slide

  26. 0
    50
    100
    150
    200
    250
    300
    350
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    N-Body
    Repa
    Accelerate (LLVM-CPU)
    NEW! vectorising, multicore
    CPU backend for Accelerate
    socket #1 socket #2 hyper-threads

    View Slide

  27. 0
    50
    100
    150
    200
    250
    300
    350
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    N-Body
    Repa
    Accelerate (LLVM-CPU)
    NEW! vectorising, multicore
    CPU backend for Accelerate
    socket #1 socket #2 hyper-threads

    View Slide

  28. Accelerate-LLVM
    an embedded array language with runtime compiler
    static type preservation for the entire compiler pipeline
    ensures it can never go wrong*
    GADT and type family techniques, scaled up to a
    realistic language

    View Slide

  29. Accelerate-LLVM
    an embedded array language with runtime compiler
    static type preservation for the entire compiler pipeline
    ensures it can never go wrong*
    GADT and type family techniques, scaled up to a
    realistic language

    View Slide

  30. Accelerate-LLVM
    an embedded array language with runtime compiler
    static type preservation for the entire compiler pipeline
    ensures it can never go wrong*
    GADT and type family techniques, scaled up to a
    realistic language

    View Slide

  31. Intermediate representation

    View Slide

  32. inc arr = map (+1) arr

    View Slide

  33. inc arr = map (+1) arr
    from Accelerate

    View Slide

  34. inc arr = map (+1) arr
    from Accelerate
    overload standard type classes

    View Slide

  35. inc arr = map (+1) arr
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    from Accelerate
    overload standard type classes

    View Slide

  36. inc arr = map (+1) arr
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    GADT
    from Accelerate
    overload standard type classes

    View Slide

  37. Type Safe Interpreter
    [ GHC User’s Guide ]

    View Slide

  38. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where

    View Slide

  39. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int

    View Slide

  40. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int

    View Slide

  41. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool

    View Slide

  42. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a

    View Slide

  43. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)

    View Slide

  44. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)
    Constructors can require
    more specific types

    View Slide

  45. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)
    Constructors can require
    more specific types

    View Slide

  46. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)
    Constructors can require
    more specific types

    View Slide

  47. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)
    eval :: Expr a -> a
    Constructors can require
    more specific types

    View Slide

  48. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)
    eval :: Expr a -> a
    eval (Succ n) = 1 + eval n
    ...
    Constructors can require
    more specific types

    View Slide

  49. Type Safe Interpreter
    [ GHC User’s Guide ]
    data Expr a where
    Lit :: Int -> Expr Int
    Succ :: Expr Int -> Expr Int
    IsZero :: Expr Int -> Expr Bool
    If :: Expr Bool -> Expr a -> Expr a -> Expr a
    Pair :: Expr a -> Expr b -> Expr (a, b)
    eval :: Expr a -> a
    eval (Succ n) = 1 + eval n
    ...
    Pattern matching causes
    type refinement
    Constructors can require
    more specific types

    View Slide

  50. inc arr = map (+1) arr
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    GADT
    from Accelerate
    overload standard type classes

    View Slide

  51. inc =
    Map
    (\x -> x + 1)
    inc :: Acc (Vector Float) -> Acc (Vector Float)

    View Slide

  52. inc =
    Map
    (\x -> x + 1)
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    Map :: (Shape sh, Elt a, Elt b)
    => Fun aenv (a -> b)
    -> OpenAcc aenv (Array sh a)
    -> OpenAcc aenv (Array sh b)

    View Slide

  53. inc =
    Map
    (\x -> x + 1)
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    Map :: (Shape sh, Elt a, Elt b)
    => Fun aenv (a -> b)
    -> OpenAcc aenv (Array sh a)
    -> OpenAcc aenv (Array sh b)
    indexed by
    type of result

    View Slide

  54. inc =
    Map
    (\x -> x + 1)
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    Map :: (Shape sh, Elt a, Elt b)
    => Fun aenv (a -> b)
    -> OpenAcc aenv (Array sh a)
    -> OpenAcc aenv (Array sh b)
    environment of
    free array variables indexed by
    type of result

    View Slide

  55. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)

    View Slide

  56. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)

    View Slide

  57. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    introduce new binder

    View Slide

  58. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    introduce new binder
    typed de Bruijn index

    View Slide

  59. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)

    View Slide

  60. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)
    overloaded functions
    carry explicit dictionaries

    View Slide

  61. inc :: Acc (Vector Float) -> Acc (Vector Float)
    inc arr = map (+1) arr

    View Slide

  62. inc :: ( )
    => Acc (Vector a) -> Acc (Vector a)
    inc arr = map (+1) arr

    View Slide

  63. inc arr = map (+1) arr
    inc :: (Elt a, IsNum a)
    => Acc (Vector a) -> Acc (Vector a)
    reifies dictionary of
    Num class

    View Slide

  64. inc arr = map (+1) arr
    inc :: (Elt a, IsNum a)
    => Acc (Vector a) -> Acc (Vector a)
    reifies dictionary of
    Num class

    View Slide

  65. inc arr = map (+1) arr
    inc :: (Elt a, IsNum a)
    => Acc (Vector a) -> Acc (Vector a)
    reifies dictionary of
    Num class
    extensible set of
    surface types

    View Slide

  66. inc arr = map (+1) arr
    inc :: (Elt a, IsNum a)
    => Acc (Vector a) -> Acc (Vector a)
    reifies dictionary of
    Num class
    extensible set of
    surface types
    type family EltRepr :: *
    type instance EltRepr Int = Int
    type instance EltRepr Float = Float
    type instance EltRepr (a,b) =(((),EltRepr a),EltRepr b)
    type instance EltRepr (a,b,c) = ((((), EltRepr a), ...)
    closed set of representation types

    View Slide

  67. Optimisation

    View Slide

  68. simple xs = map f ( map g xs )

    View Slide

  69. simple xs = map f ( map g xs )
    map (f . g) xs

    View Slide

  70. Fusion
    [ McDonell, ICFP 2013 ]
    p5
    p4
    c1
    p2 p3
    p1 c2 p6 p7

    View Slide

  71. Fusion
    [ McDonell, ICFP 2013 ]
    p5
    p4
    c1
    p2 p3
    p1 c2 p6 p7

    View Slide

  72. Fusion
    [ McDonell, ICFP 2013 ]
    p5
    p4
    c1
    p2 p3
    p1 c2 p6 p7

    View Slide

  73. simple xs = map f ( map g xs )
    map (f . g) xs

    View Slide

  74. data Cunctation aenv a where
    Done :: Arrays a
    => Idx aenv a
    -> Cunctation aenv a
    Yield :: (Shape sh, Elt e)
    => Exp aenv sh
    -> Fun aenv (sh -> e)
    -> Cunctation aenv (Array sh e)
    cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun)
    The action of delaying or putting off something. A tardy action.

    View Slide

  75. data Cunctation aenv a where
    Done :: Arrays a
    => Idx aenv a
    -> Cunctation aenv a
    Yield :: (Shape sh, Elt e)
    => Exp aenv sh
    -> Fun aenv (sh -> e)
    -> Cunctation aenv (Array sh e)
    cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun)
    The action of delaying or putting off something. A tardy action.
    manifest array

    View Slide

  76. data Cunctation aenv a where
    Done :: Arrays a
    => Idx aenv a
    -> Cunctation aenv a
    Yield :: (Shape sh, Elt e)
    => Exp aenv sh
    -> Fun aenv (sh -> e)
    -> Cunctation aenv (Array sh e)
    cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun)
    The action of delaying or putting off something. A tardy action.
    construct element
    at each index
    manifest array

    View Slide

  77. data Cunctation aenv a where
    Done :: Arrays a
    => Idx aenv a
    -> Cunctation aenv a
    Yield :: (Shape sh, Elt e)
    => Exp aenv sh
    -> Fun aenv (sh -> e)
    -> Cunctation aenv (Array sh e)
    cunctation | kʌŋ(k)ˈteɪʃ(ə)n | (noun)
    The action of delaying or putting off something. A tardy action.
    construct element
    at each index
    manifest array
    not defined in terms of
    array computations

    View Slide

  78. mapD :: Fun aenv (a -> b)
    -> Cunctation aenv (Array sh a)
    -> Cunctation aenv (Array sh b)
    mapD f (Done arr)
    = Yield (shape arr) (f `compose` index arr)
    mapD f (Yield sh g)
    = Yield sh (f `compose` g)

    View Slide

  79. mapD :: Fun aenv (a -> b)
    -> Cunctation aenv (Array sh a)
    -> Cunctation aenv (Array sh b)
    mapD f (Done arr)
    = Yield (shape arr) (f `compose` index arr)
    mapD f (Yield sh g)
    = Yield sh (f `compose` g)
    ( see paper for details )

    View Slide

  80. mapD :: Fun aenv (a -> b)
    -> Cunctation aenv (Array sh a)
    -> Cunctation aenv (Array sh b)
    mapD f (Done arr)
    = Yield (shape arr) (f `compose` index arr)
    mapD f (Yield sh g)
    = Yield sh (f `compose` g)
    ( see paper for details )
    environment types must
    be the same

    View Slide

  81. complex = map f
    $ let xs = use (Array ...) in
    map g xs

    View Slide

  82. complex = map f
    $ let xs = use (Array ...) in
    map g xs
    input data

    View Slide

  83. complex = map f
    $ let xs = use (Array ...) in
    map g xs
    environment type ‘aenv'
    input data

    View Slide

  84. complex = map f
    $ let xs = use (Array ...) in
    map g xs
    environment type ‘aenv'
    type includes base environment ‘aenv’
    plus extra binding ‘xs’
    input data

    View Slide

  85. complex = let xs = use (Array ...) in
    map f
    $ map g xs

    View Slide

  86. complex = let xs = use (Array ...) in
    map f
    $ map g xs
    ‘mapD’ rule can now be applied

    View Slide

  87. ( see paper for details )
    complex = let xs = use (Array ...) in
    map f
    $ map g xs
    ‘mapD’ rule can now be applied

    View Slide

  88. Code generation

    View Slide

  89. Code generation

    View Slide

  90. inc =
    Map
    (Lam (Body
    PrimAdd (IsNum Float dictionary)
    `PrimApp`
    Tuple (NilTup
    `SnocTup` (Var ZeroIdx)
    `SnocTup` (Const 1))))
    inc :: Acc (Vector Float) -> Acc (Vector Float)

    View Slide

  91. data Instruction a where
    Add :: NumType a
    -> Operand a
    -> Operand a
    -> Instruction a

    View Slide

  92. data Instruction a where
    Add :: NumType a
    -> Operand a
    -> Operand a
    -> Instruction a
    constants and
    local references

    View Slide

  93. data Instruction a where
    Add :: NumType a
    -> Operand a
    -> Operand a
    -> Instruction a
    reified dictionaries provide
    a type witness
    constants and
    local references

    View Slide

  94. data Instruction a where
    Add :: NumType a
    -> Operand a
    -> Operand a
    -> Instruction a
    reified dictionaries provide
    a type witness
    constants and
    local references
    %2 = getelementptr float* %xs, i64 %1
    %3 = load float* %2
    %4 = fadd float %3, 1.000000e+00
    http://hackage.haskell.org/package/llvm-general

    View Slide

  95. Exp a

    View Slide

  96. Exp a
    x + 1 :: Float

    View Slide

  97. Exp a IR a
    x + 1 :: Float

    View Slide

  98. Exp a IR a
    x + 1 :: Float %4 = fadd float %3, 1.000000e+00

    View Slide

  99. Exp a IR a
    Frontend Backend
    x + 1 :: Float %4 = fadd float %3, 1.000000e+00

    View Slide

  100. Exp a IR a
    Frontend Backend
    x + 1 :: Float %4 = fadd float %3, 1.000000e+00
    data IR a where

    View Slide

  101. Exp a IR a
    Frontend Backend
    x + 1 :: Float %4 = fadd float %3, 1.000000e+00
    data IR a where
    IR :: Operands (EltRepr a) -> IR a

    View Slide

  102. Exp a IR a
    Frontend Backend
    x + 1 :: Float %4 = fadd float %3, 1.000000e+00
    data IR a where
    IR :: Operands (EltRepr a) -> IR a
    data family Operands :: *
    data instance Operands Float = ...

    View Slide

  103. Exp a IR a
    Frontend Backend
    x + 1 :: Float %4 = fadd float %3, 1.000000e+00
    data IR a where
    IR :: Operands (EltRepr a) -> IR a
    data family Operands :: *
    data instance Operands Float = ...
    ( see paper for details )

    View Slide

  104. Safety and performance?

    View Slide

  105. 0
    5
    10
    15
    20
    25
    30
    35
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    Mandelbrot
    Repa
    Accelerate (LLVM-CPU)

    View Slide

  106. 0
    5
    10
    15
    20
    25
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    Ray Tracer
    Repa
    Accelerate (LLVM-CPU)

    View Slide

  107. 0
    10
    20
    30
    40
    50
    60
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Repa @ 1 Thread
    # Threads
    Black-Scholes
    Repa
    Accelerate (LLVM-CPU)

    View Slide

  108. 0
    5
    10
    15
    20
    25
    30
    35
    40
    45
    50
    55
    0 5 10 15 20 25 30 35 40 45 50
    Speedup vs. Hashcat @ 1 Thread
    # Threads
    MD5 Hash
    Hashcat
    Accelerate (LLVM-CPU)

    View Slide

  109. Summary
    We can have both safety and performance
    while balancing correctness and effort,
    in a reusable framework targeting CPUs & GPUs
    https://github.com/AccelerateHS/accelerate-llvm

    View Slide