Functional High-Performance Computing
Trevor L. McDonell
Utrecht University
AccelerateHS
acceleratehs.org
Slide 2
Slide 2 text
All modern processors have multiple cores
Ryzen 7 1800X (8 core)
4.8B transistors
A12 Bionic (2+4 core)
6.9B transistors
GTX 1080 (2560 core*)
7.2B transistors
Slide 3
Slide 3 text
GPU
(graphics processing unit)
medical imaging data
science
weather &
climate
bioinformatics
computational
chemistry
machine learning
GTX 1080 (2560 cores*)
software programmable
caches
data distribution
thread
synchronisation
memory access
patterns
control flow
divergence
Slide 4
Slide 4 text
Performance
Effort
Slide 5
Slide 5 text
Performance
Effort
expected
Slide 6
Slide 6 text
Performance
Effort
expected
actual
Slide 7
Slide 7 text
Performance
Effort
expected
actual
desired
Slide 8
Slide 8 text
Performance
Effort
expected
actual
desired
After expressing available parallelism, I often find
that the code has slowed down.
— Jeff Larkin, NVIDIA Developer Technology
https://devblogs.nvidia.com/getting-started-openacc/
Slide 9
Slide 9 text
Can we have
parallel programming
with
less effort?
Slide 10
Slide 10 text
Part 1: Thinking in parallel
Slide 11
Slide 11 text
for (int i = 0; i < length; ++i)
{
// do something (in parallel)
}
Slide 12
Slide 12 text
Theory Practice
Slide 13
Slide 13 text
Why is this difficult?
Concurrency
Multiple interleaved threads of control
All threads have effects on the world
Non-determinism and concurrency control
Slide 14
Slide 14 text
Data parallelism
Instead of unrestricted concurrency, let’s simplify
The same operation is applied to different data
abstracts over concurrency control
abstracts over indeterminism
great for developers and hardware
Slide 15
Slide 15 text
Energy efficiency
Slide 16
Slide 16 text
Flat data-parallelism
Slide 17
Slide 17 text
Nested data-parallelism
Slide 18
Slide 18 text
Amorphous data-parallelism
Slide 19
Slide 19 text
Expressiveness of
Parallelism
Expressiveness of computation
Embedded Native
Flat Nested Amorphous
Repa
Futhark
Lift
Data-Parallel Haskell
Nessie
Accelerate
Slide 20
Slide 20 text
Accelerate
Haskell/Accelerate
program
Target code
Compile and run on
the CPU/GPU
Copy result back to Haskell
Reify and optimise
Accelerate program
An embedded language for data-parallel arrays
Example: vector dot product
import Prelude
dotp :: Num a
=> [a] -> [a] -> a
dotp xs ys = fold (+) 0 (zipWith (*) xs ys)
Slide 25
Slide 25 text
Example: vector dot product
import Data.Vector.Unboxed
dotp :: (Num a, Unbox a)
=> Vector a
-> Vector a
-> a
dotp xs ys = fold (+) 0 (zipWith (*) xs ys)
Slide 26
Slide 26 text
Example: vector dot product
import Data.Array.Accelerate
dotp :: (Num a, Elt a)
=> Acc (Vector a)
-> Acc (Vector a)
-> Acc (Scalar a)
dotp xs ys = fold (+) 0 (zipWith (*) xs ys)
Slide 27
Slide 27 text
Computers are good at operating on bulk data,
not on single elements
Restrictions can guide the programmer into
writing an efficient parallel program
Parallel programming and functional programming
are a natural fit
Thinking in parallel
LULESH
Lines of Code Runtime @ 643 (s)
C (OpenMP) 2400 64
CUDA 3000 5.2
Accelerate (CPU) 1200 38
Accelerate (GPU) ±1 4.1
i7-6700K @ 3.4GHz / GTX 1080 Ti
Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics
Slide 33
Slide 33 text
Salt marsh creek formation
https://github.com/tmcdonell/spatial-ecology-accelerate
Slide 34
Slide 34 text
Salt marsh creek formation
Elapsed time (s)
1
10
100
1000
Grid size
512 1024 1536 2048 3072 4096 6144 8192
Python+OpenCL Accelerate
GTX 1080 Ti
7x faster
2x fewer lines of code
Slide 35
Slide 35 text
use
real examples
as a
working laboratory
Slide 36
Slide 36 text
Motivating examples
validate and test your ideas
improve performance
what is good about this, what is bad about it?
as a basis for future work
Can we take what’s good as a seed and make it into
something that is better?
Slide 37
Slide 37 text
Part 3: What’s next?
Slide 38
Slide 38 text
The tower of abstraction
Slide 39
Slide 39 text
Sequence of abstraction
Machine language
Assembly language
Fortran / C / C++
C# / Haskell / Javascript
“I am working at a higher level;
being smart, saving effort!”
Slide 40
Slide 40 text
mean :: [Double] -> Double
mean xs = sum xs / fromIntegral (length xs)
Slide 41
Slide 41 text
data T = MkT Int Int
MkT
I# Int# I# Int#
MkT Int# Int#
data T = MkT !Int !Int
data T a = MkT a
MkT a
what if I need to know whether ‘a’ has been computed?
data T a = MkT (IORef (Maybe a))
IORef Nothing
Just a
MkT
Slide 44
Slide 44 text
Loss of capability
Can no longer program in assembly
Don’t know how values are stored in memory
Don’t know what the CPU is doing
The rhetoric is “I shouldn’t have to”
but the flip side is the loss of ability to
Slide 45
Slide 45 text
Functional high-performance computing
Performance
Effort
expected
actual
desired
Slide 46
Slide 46 text
Functional high-performance computing
Performance
Effort
? expected
actual
desired
Slide 47
Slide 47 text
Functional high-performance computing
Real-world examples as a laboratory to develop:
new features, test performance, …
Working at a high level is good
but this also entails a loss of capability
Reality exists at the low level
Slide 48
Slide 48 text
Functional high-performance computing
Functional programming languages provides
the right set of abstractions
Instead of:
climbing the tower of abstraction
Better idea?:
feet on the ground; reach for the heavens
Slide 49
Slide 49 text
acceleratehs.org
https://github.com/AccelerateHS/
Trevor L. McDonell
Robert Clifton-Everest
Manuel M. T. Chakravarty
Josh Meredith
Gabriele Keller
Ben Lippmeier
Slide 50
Slide 50 text
Image attribution
Logo designed by Tina Lam http://instagram.com/tinabarbarina
https://en.wikipedia.org/wiki/Waterman_butterfly_projection
https://www.instagram.com/p/BbTjiebnaw1
http://book.realworldhaskell.org/read/profiling-and-optimization.html
https://researchinprogress.tumblr.com/post/34088637501/fast-vs-exact-solutions
https://researchinprogress.tumblr.com/post/34627563943/when-somebody-mixes-up-causality-and-correlation
https://researchinprogress.tumblr.com/post/32886698944/how-is-your-research-useful
https://en.wikipedia.org/wiki/Tower_of_Babel
https://www.art.com/products/p46922818644-sa-i10543606/paul-souders-polar-bear-swimming-past-melting-iceberg-near-
harbor-islands-canada.htm
http://unseasonably.blogspot.com/2014/03/a-tangled-ball-of-yarn.html
https://www.reddit.com/r/aww/comments/2oagj8/multithreaded_programming_theory_and_practice/