$30 off During Our Annual Pro Sale. View Details »

Functional High-Performance Computing

Functional High-Performance Computing

Keynote talk at the inaugural workshop on Functional High-Performance and Numerical Computing 2019: https://icfp19.sigplan.org/home/FHPNC-2019

Video: TBD

Trevor L. McDonell

August 18, 2019
Tweet

More Decks by Trevor L. McDonell

Other Decks in Research

Transcript

  1. Functional High-Performance Computing
    Trevor L. McDonell
    Utrecht University
    AccelerateHS
    acceleratehs.org

    View Slide

  2. All modern processors have multiple cores
    Ryzen 7 1800X (8 core)
    4.8B transistors
    A12 Bionic (2+4 core)
    6.9B transistors
    GTX 1080 (2560 core*)
    7.2B transistors

    View Slide

  3. GPU
    (graphics processing unit)
    medical imaging data
    science
    weather &

    climate
    bioinformatics
    computational
    chemistry
    machine learning
    GTX 1080 (2560 cores*)
    software programmable
    caches
    data distribution
    thread
    synchronisation
    memory access
    patterns
    control flow
    divergence

    View Slide

  4. Performance
    Effort

    View Slide

  5. Performance
    Effort
    expected

    View Slide

  6. Performance
    Effort
    expected
    actual

    View Slide

  7. Performance
    Effort
    expected
    actual
    desired

    View Slide

  8. Performance
    Effort
    expected
    actual
    desired
    After expressing available parallelism, I often find
    that the code has slowed down.
    — Jeff Larkin, NVIDIA Developer Technology
    https://devblogs.nvidia.com/getting-started-openacc/

    View Slide

  9. Can we have

    parallel programming

    with

    less effort?

    View Slide

  10. Part 1: Thinking in parallel

    View Slide

  11. for (int i = 0; i < length; ++i)
    {
    // do something (in parallel)
    }

    View Slide

  12. Theory Practice

    View Slide

  13. Why is this difficult?
    Concurrency
    Multiple interleaved threads of control

    All threads have effects on the world

    Non-determinism and concurrency control

    View Slide

  14. Data parallelism
    Instead of unrestricted concurrency, let’s simplify
    The same operation is applied to different data

    abstracts over concurrency control

    abstracts over indeterminism

    great for developers and hardware

    View Slide

  15. Energy efficiency

    View Slide

  16. Flat data-parallelism

    View Slide

  17. Nested data-parallelism

    View Slide

  18. Amorphous data-parallelism

    View Slide

  19. Expressiveness of

    Parallelism


    Expressiveness of computation
    Embedded Native
    Flat Nested Amorphous
    Repa
    Futhark
    Lift
    Data-Parallel Haskell
    Nessie
    Accelerate

    View Slide

  20. Accelerate
    Haskell/Accelerate
    program
    Target code
    Compile and run on
    the CPU/GPU
    Copy result back to Haskell
    Reify and optimise
    Accelerate program
    An embedded language for data-parallel arrays

    View Slide

  21. Example: vector dot product
    dotp xs ys =

    View Slide

  22. dotp xs ys = fold (+) 0 (zipWith (*) xs ys)
    1
    2
    3
    4

    5
    6
    7
    8

    *
    *
    *
    *
    Example: vector dot product

    View Slide

  23. Example: vector dot product
    dotp xs ys = fold (+) 0 (zipWith (*) xs ys)
    6 8 10 12 …
    + + + +
    … + 0

    View Slide

  24. Example: vector dot product
    import Prelude
    dotp :: Num a
    => [a] -> [a] -> a
    dotp xs ys = fold (+) 0 (zipWith (*) xs ys)

    View Slide

  25. Example: vector dot product
    import Data.Vector.Unboxed
    dotp :: (Num a, Unbox a)
    => Vector a
    -> Vector a
    -> a
    dotp xs ys = fold (+) 0 (zipWith (*) xs ys)

    View Slide

  26. Example: vector dot product
    import Data.Array.Accelerate
    dotp :: (Num a, Elt a)
    => Acc (Vector a)
    -> Acc (Vector a)
    -> Acc (Scalar a)
    dotp xs ys = fold (+) 0 (zipWith (*) xs ys)

    View Slide

  27. Computers are good at operating on bulk data,

    not on single elements
    Restrictions can guide the programmer into

    writing an efficient parallel program
    Parallel programming and functional programming

    are a natural fit
    Thinking in parallel

    View Slide

  28. Part 2: Make it work

    View Slide

  29. @jasper.samoyed

    View Slide

  30. Show, don’t tell
    https://github.com/AccelerateHS/accelerate-examples

    View Slide

  31. LULESH
    Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics
    https://github.com/tmcdonell/lulesh-accelerate

    View Slide

  32. LULESH
    Lines of Code Runtime @ 643 (s)
    C (OpenMP) 2400 64
    CUDA 3000 5.2
    Accelerate (CPU) 1200 38
    Accelerate (GPU) ±1 4.1
    i7-6700K @ 3.4GHz / GTX 1080 Ti
    Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics

    View Slide

  33. Salt marsh creek formation
    https://github.com/tmcdonell/spatial-ecology-accelerate

    View Slide

  34. Salt marsh creek formation
    Elapsed time (s)
    1
    10
    100
    1000
    Grid size
    512 1024 1536 2048 3072 4096 6144 8192
    Python+OpenCL Accelerate
    GTX 1080 Ti
    7x faster
    2x fewer lines of code

    View Slide

  35. use
    real examples
    as a
    working laboratory

    View Slide

  36. Motivating examples
    validate and test your ideas

    improve performance

    what is good about this, what is bad about it?

    as a basis for future work
    Can we take what’s good as a seed and make it into
    something that is better?

    View Slide

  37. Part 3: What’s next?

    View Slide

  38. The tower of abstraction

    View Slide

  39. Sequence of abstraction
    Machine language

    Assembly language

    Fortran / C / C++

    C# / Haskell / Javascript
    “I am working at a higher level;

    being smart, saving effort!”

    View Slide

  40. mean :: [Double] -> Double
    mean xs = sum xs / fromIntegral (length xs)

    View Slide

  41. data T = MkT Int Int
    MkT
    I# Int# I# Int#
    MkT Int# Int#
    data T = MkT !Int !Int

    View Slide

  42. data Word8x4 = Word8x4 !Word8 !Word8 !Word8 !Word8
    Word8x4 Word8# Word8# Word8# Word8#
    size in memory = ?
    = header
    + 4 * 8 bytes
    ∴ in Haskell: 4 * 8 = 256
    . . .

    View Slide

  43. data T a = MkT a
    MkT a
    what if I need to know whether ‘a’ has been computed?
    data T a = MkT (IORef (Maybe a))
    IORef Nothing
    Just a
    MkT

    View Slide

  44. Loss of capability
    Can no longer program in assembly

    Don’t know how values are stored in memory

    Don’t know what the CPU is doing
    The rhetoric is “I shouldn’t have to”

    but the flip side is the loss of ability to

    View Slide

  45. Functional high-performance computing
    Performance
    Effort
    expected
    actual
    desired

    View Slide

  46. Functional high-performance computing
    Performance
    Effort
    ? expected
    actual
    desired

    View Slide

  47. Functional high-performance computing
    Real-world examples as a laboratory to develop:

    new features, test performance, …
    Working at a high level is good

    but this also entails a loss of capability
    Reality exists at the low level

    View Slide

  48. Functional high-performance computing
    Functional programming languages provides

    the right set of abstractions
    Instead of:

    climbing the tower of abstraction
    Better idea?:

    feet on the ground; reach for the heavens

    View Slide

  49. acceleratehs.org
    https://github.com/AccelerateHS/
    Trevor L. McDonell
    Robert Clifton-Everest
    Manuel M. T. Chakravarty
    Josh Meredith
    Gabriele Keller
    Ben Lippmeier

    View Slide

  50. Image attribution
    Logo designed by Tina Lam http://instagram.com/tinabarbarina
    https://en.wikipedia.org/wiki/Waterman_butterfly_projection
    https://www.instagram.com/p/BbTjiebnaw1
    http://book.realworldhaskell.org/read/profiling-and-optimization.html
    https://researchinprogress.tumblr.com/post/34088637501/fast-vs-exact-solutions
    https://researchinprogress.tumblr.com/post/34627563943/when-somebody-mixes-up-causality-and-correlation
    https://researchinprogress.tumblr.com/post/32886698944/how-is-your-research-useful
    https://en.wikipedia.org/wiki/Tower_of_Babel
    https://www.art.com/products/p46922818644-sa-i10543606/paul-souders-polar-bear-swimming-past-melting-iceberg-near-
    harbor-islands-canada.htm
    http://unseasonably.blogspot.com/2014/03/a-tangled-ball-of-yarn.html
    https://www.reddit.com/r/aww/comments/2oagj8/multithreaded_programming_theory_and_practice/

    View Slide