A Functional Programming language for GPUs

A Functional Programming Language for GPUs Trevor L. McDonell Utrecht
University AccelerateHS acceleratehs.org

https://xkcd.com/378/

GPUs software programmable caches data distribution thread synchronisation weak memory
model memory access patterns control ﬂow divergence shared-state concurrency

Concrete λ Abstract Compositional Entangled

λ Concrete Abstract Compositional Entangled

λ Polymorphism & generics Strictly isolating side-effects Higher-order functions &
closures Expressive type system & inference Strong static typing Garbage collection Boxed values ? Memory access patterns Software programmable  caches Thread coordination Data distribution

Can we have eﬃcient parallel code from a high-level language?

Performance Effort

Performance Effort expected

Performance Effort expected actual

Performance Effort expected actual desired

How about embedded languages with specialised code generation?

Accelerate An embedded language for data-parallel arrays Haskell/Accelerate program Target
code Compile and run on the CPU/GPU Copy result back to Haskell Reify and optimise Accelerate program

dotp xs ys = Example: dot product

dotp xs ys = fold (+) 0 (zipWith (*) xs
ys) 1 2 3 4 ⋮ 5 6 7 8 ⋮ * * * * Example: dot product

ys) Example: dot product 6 8 10 12 … + + + + … + 0

import Prelude dotp :: Num a => [a] -> [a]
-> a dotp xs ys = fold (+) 0 (zipWith (*) xs ys)

import Data.Vector.Unboxed dotp :: (Num a, Unbox a) => Vector
a -> Vector a -> a dotp xs ys = fold (+) 0 (zipWith (*) xs ys)

import Data.Array.Accelerate dotp :: (Num a, Elt a) => Acc
(Vector a) -> Acc (Vector a) -> Acc (Scalar a) dotp xs ys = fold (+) 0 (zipWith (*) xs ys)

Accelerate dotp xs ys = fold (+) 0 (zipWith (*)
xs ys) embedded language arrays from Accelerate library Collective operations which compile to parallel code xs, ys :: Acc (Vector Float)

Accelerate dotp xs ys = fold (+) 0 (zipWith (*)
xs ys) Collective operations which compile to parallel code fold :: (Shape sh, Elt e) => (Exp e -> Exp e -> Exp e) -> Exp e -> Acc (Array (sh:.Int) e) -> Acc (Array sh e) language of sequential, scalar expressions language of collective, parallel operations rank-polymorphic To enforce hardware restrictions, nested parallel computation can't be expressed almost

ys) Skeleton #1 Skeleton #2 Intermediate array Combined operation Array fusion Combines successive element-wise operations (a.k.a. loop fusion) accelerate 4.8 ms benchmarking (single thread): vector 14.5 ms

LULESH Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics

LULESH element& node& ariables on a staggered mesh. Thermodynamic variables
are represented at ele ematic variables are represented at nodes. The ﬁgure shows a two-dimensional In a parallel world, imperative is the wrong default concurrent writes!

LULESH Immutable arrays guide us to a more natural parallel
solution node-centric computation element& node& ariables on a staggered mesh. Thermodynamic variables are represented at ele ematic variables are represented at nodes. The ﬁgure shows a two-dimensional

LULESH Accelerate: high-level language, low-level performance Speedup vs. Reference @
1 Thread 0 2 4 6 8 10 # Threads 1 2 3 4 5 6 7 8 9 10 11 12 Accelerate OpenMP i7-6700K @ 3.4GHz / GTX 1080Ti

LULESH Accelerate: high-level language, low-level performance Lines of Code Runtime
@ 643 (s) C (OpenMP) 2400 64 CUDA 3000 5.2 Accelerate (CPU) 1200 38 Accelerate (GPU) +0 4.1 i7-6700K @ 3.4GHz / GTX 1080Ti

Summary Abstraction also means that  the compiler has more information
so we can leverage these abstractions to help guide program design and generate efﬁcient parallel code

acceleratehs.org https://github.com/AccelerateHS/ Trevor L. McDonell Robert Clifton-Everest Manuel M. T.
Chakravarty Josh Meredith Gabriele Keller Ben Lippmeier

Image attribution https://ﬂic.kr/p/XcAjn3 https://xkcd.com/378 https://commons.wikimedia.org/wiki/File:Motorola_6800_Assembly_Language.png https://commons.wikimedia.org/wiki/File:FortranCardPROJ039.agr.jpg https://commons.wikimedia.org/wiki/File:Set_square_Geodreieck.svg

A Functional Programming language for GPUs

A Functional Programming language for GPUs

Trevor L. McDonell

More Decks by Trevor L. McDonell

Other Decks in Programming

Featured

Transcript