Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Functional High-Performance Computing

Functional High-Performance Computing

Keynote talk at the inaugural workshop on Functional High-Performance and Numerical Computing 2019: https://icfp19.sigplan.org/home/FHPNC-2019

Video: TBD

Trevor L. McDonell

August 18, 2019
Tweet

More Decks by Trevor L. McDonell

Other Decks in Research

Transcript

  1. All modern processors have multiple cores Ryzen 7 1800X (8

    core) 4.8B transistors A12 Bionic (2+4 core) 6.9B transistors GTX 1080 (2560 core*) 7.2B transistors
  2. GPU (graphics processing unit) medical imaging data science weather &


    climate bioinformatics computational chemistry machine learning GTX 1080 (2560 cores*) software programmable caches data distribution thread synchronisation memory access patterns control flow divergence
  3. Performance Effort expected actual desired After expressing available parallelism, I

    often find that the code has slowed down. — Jeff Larkin, NVIDIA Developer Technology https://devblogs.nvidia.com/getting-started-openacc/
  4. for (int i = 0; i < length; ++i) {

    // do something (in parallel) }
  5. Why is this difficult? Concurrency Multiple interleaved threads of control


    All threads have effects on the world
 Non-determinism and concurrency control
  6. Data parallelism Instead of unrestricted concurrency, let’s simplify The same

    operation is applied to different data
 abstracts over concurrency control
 abstracts over indeterminism
 great for developers and hardware
  7. Expressiveness of
 Parallelism
 
 Expressiveness of computation Embedded Native Flat

    Nested Amorphous Repa Futhark Lift Data-Parallel Haskell Nessie Accelerate
  8. Accelerate Haskell/Accelerate program Target code Compile and run on the

    CPU/GPU Copy result back to Haskell Reify and optimise Accelerate program An embedded language for data-parallel arrays
  9. dotp xs ys = fold (+) 0 (zipWith (*) xs

    ys) 1 2 3 4 ⋮ 5 6 7 8 ⋮ * * * * Example: vector dot product
  10. Example: vector dot product dotp xs ys = fold (+)

    0 (zipWith (*) xs ys) 6 8 10 12 … + + + + … + 0
  11. Example: vector dot product import Prelude dotp :: Num a

    => [a] -> [a] -> a dotp xs ys = fold (+) 0 (zipWith (*) xs ys)
  12. Example: vector dot product import Data.Vector.Unboxed dotp :: (Num a,

    Unbox a) => Vector a -> Vector a -> a dotp xs ys = fold (+) 0 (zipWith (*) xs ys)
  13. Example: vector dot product import Data.Array.Accelerate dotp :: (Num a,

    Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a) dotp xs ys = fold (+) 0 (zipWith (*) xs ys)
  14. Computers are good at operating on bulk data,
 not on

    single elements Restrictions can guide the programmer into
 writing an efficient parallel program Parallel programming and functional programming
 are a natural fit Thinking in parallel
  15. LULESH Lines of Code Runtime @ 643 (s) C (OpenMP)

    2400 64 CUDA 3000 5.2 Accelerate (CPU) 1200 38 Accelerate (GPU) ±1 4.1 i7-6700K @ 3.4GHz / GTX 1080 Ti Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics
  16. Salt marsh creek formation Elapsed time (s) 1 10 100

    1000 Grid size 512 1024 1536 2048 3072 4096 6144 8192 Python+OpenCL Accelerate GTX 1080 Ti 7x faster 2x fewer lines of code
  17. Motivating examples validate and test your ideas
 improve performance
 what

    is good about this, what is bad about it?
 as a basis for future work Can we take what’s good as a seed and make it into something that is better?
  18. Sequence of abstraction Machine language
 Assembly language
 Fortran / C

    / C++
 C# / Haskell / Javascript “I am working at a higher level;
 being smart, saving effort!”
  19. mean :: [Double] -> Double mean xs = sum xs

    / fromIntegral (length xs)
  20. data T = MkT Int Int MkT I# Int# I#

    Int# MkT Int# Int# data T = MkT !Int !Int
  21. data Word8x4 = Word8x4 !Word8 !Word8 !Word8 !Word8 Word8x4 Word8#

    Word8# Word8# Word8# size in memory = ? = header + 4 * 8 bytes ∴ in Haskell: 4 * 8 = 256 . . .
  22. data T a = MkT a MkT a what if

    I need to know whether ‘a’ has been computed? data T a = MkT (IORef (Maybe a)) IORef Nothing Just a MkT
  23. Loss of capability Can no longer program in assembly
 Don’t

    know how values are stored in memory
 Don’t know what the CPU is doing The rhetoric is “I shouldn’t have to”
 but the flip side is the loss of ability to
  24. Functional high-performance computing Real-world examples as a laboratory to develop:


    new features, test performance, … Working at a high level is good
 but this also entails a loss of capability Reality exists at the low level
  25. Functional high-performance computing Functional programming languages provides
 the right set

    of abstractions Instead of:
 climbing the tower of abstraction Better idea?:
 feet on the ground; reach for the heavens
  26. Image attribution Logo designed by Tina Lam http://instagram.com/tinabarbarina https://en.wikipedia.org/wiki/Waterman_butterfly_projection https://www.instagram.com/p/BbTjiebnaw1

    http://book.realworldhaskell.org/read/profiling-and-optimization.html https://researchinprogress.tumblr.com/post/34088637501/fast-vs-exact-solutions https://researchinprogress.tumblr.com/post/34627563943/when-somebody-mixes-up-causality-and-correlation https://researchinprogress.tumblr.com/post/32886698944/how-is-your-research-useful https://en.wikipedia.org/wiki/Tower_of_Babel https://www.art.com/products/p46922818644-sa-i10543606/paul-souders-polar-bear-swimming-past-melting-iceberg-near- harbor-islands-canada.htm http://unseasonably.blogspot.com/2014/03/a-tangled-ball-of-yarn.html https://www.reddit.com/r/aww/comments/2oagj8/multithreaded_programming_theory_and_practice/