Slide 1

Slide 1 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Daniel Rubio Bonilla TMPA 2017 Using Functional Directives to Analyze code Complexity and Communication Daniel Rubio Bonilla HLRS – University of Stuttgart

Slide 2

Slide 2 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 2 CPU Evolution Daniel Rubio Bonilla TMPA 2017

Slide 3

Slide 3 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 3 Hazel Hen CPU E5-2680 v3 12 Cores 30MiB Cache 2.5 GhZ Node 2 CPUs – 24C 128 GB Comp. Nodes 7712 Total Cores 185,088 Performance 7420 TFlops Storage ~10 PB Weight 61.5 T Power 3200 KW Daniel Rubio Bonilla TMPA 2017

Slide 4

Slide 4 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 4 Amdahl Law # processing units speedup 100% parallelizable 98% parallelizable 90% parallelizable 50% parallelizable Daniel Rubio Bonilla TMPA 2017

Slide 5

Slide 5 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 5 Real Amdahl Law # processing units speedup 100% parallelizable 98% parallelizable 90% parallelizable 50% parallelizable Daniel Rubio Bonilla TMPA 2017

Slide 6

Slide 6 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 6 The Problem Daniel Rubio Bonilla In High Performance Computing… • Performance is increased by • Integrating more cores (millions!?) • Using heterogeneous accelerators (GPU, FPGA, ...) • Issues • Programmability • Portability TMPA 2017

Slide 7

Slide 7 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 7 New Approaches Different Programming Model • Focused on mathematical problems • Engineering • Science • To enable: • Parallelization and concurrency • Portability across different hardware and accelerators Daniel Rubio Bonilla TMPA 2017

Slide 8

Slide 8 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 8 Our Approach To obtain the structural information of the application by annotating the imperative code with a functional-like directives (mathematical / algorithmic structure) • The main difficulty in this approach are: • “deriving” the structure of the application • matching the structure to the source code Daniel Rubio Bonilla TMPA 2017

Slide 9

Slide 9 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 9 Higher Order Functions • Higher Order functions are mathematical functions • Takes one or more function as an argument • Can return a function as a result • Clear repetitive execution structure • These structures can be transformed to equivalent ones • But with different non-functional properties Daniel Rubio Bonilla TMPA 2017

Slide 10

Slide 10 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 10 map :: (a -> b) -> [a] -> [b] map (*2) [1,2,3,4] = [2,4,6,8] Higher Order Functions • Apply to all: Daniel Rubio Bonilla TMPA 2017

Slide 11

Slide 11 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 11 foldl :: (a -> b -> a) -> a -> [b] -> a foldl (+) 0 [1,2,3,4] = 10 map :: (a -> b) -> [a] -> [b] map (*2) [1,2,3,4] = [2,4,6,8] Higher Order Functions • Apply to all: • Reduction: Daniel Rubio Bonilla TMPA 2017

Slide 12

Slide 12 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 12 Other Higher Order Functions Daniel Rubio Bonilla TMPA 2017

Slide 13

Slide 13 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 13 Higher Order Functions Transformations total = foldl (+) 0 vs One possible transformation Only if the operation is associative and we know its neutral element Daniel Rubio Bonilla TMPA 2017

Slide 14

Slide 14 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 14 Transformations • Changes in the mathematical formulation • Or the algorithm execution • Produce equivalent code • Change computing load • Change memory distribution • Modify communication • Allow adaptation to different architectures • While maintaining correctness! Daniel Rubio Bonilla TMPA 2017

Slide 15

Slide 15 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 15 Hierarchical Structures • Functional annotations allow the construction of multiple structural levels: • Emerging complexity of the structural information • We distinguish between: • Output of one Higher Order Function is input of another • This can be achieved by analyzing the data dependencies between the functions • The operator of one (Higher Order) Function is composed of other functions Daniel Rubio Bonilla TMPA 2017

Slide 16

Slide 16 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 16 Flat Structure Graph of a Complex Structure of two same level Higher Order Functions (HOFs) • The output of one HOF is the input for another HOF foldl (+) 0 (map (*2) [0..n-1]) foldl :: (a -> b -> a) -> a -> [b] -> a map :: (a -> b) -> [a] -> [b] Daniel Rubio Bonilla TMPA 2017

Slide 17

Slide 17 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 17 Hierarchical Structure Graph of a Complex Hierarchical Structure of two different level Higher Order Functions (HOF) • The operator of one HOF is another HOF map (foldl (+) 0) [[..]..[..]] foldl :: (a -> b -> a) -> a -> [b] -> a map :: (a -> b) -> [a] -> [b] Daniel Rubio Bonilla TMPA 2017

Slide 18

Slide 18 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 18 Other requirements • Strong binding between directives and code • Description of memory organization Daniel Rubio Bonilla TMPA 2017

Slide 19

Slide 19 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 19 Example - Heat t Daniel Rubio Bonilla 1-D heat dissipation function Discretization TMPA 2017

Slide 20

Slide 20 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 20 Complexity Daniel Rubio Bonilla TMPA 2017 O(N_ELEM) O(N_ITER) O(N_ELEM) O(1) O(1) O(1)

Slide 21

Slide 21 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 21 Complexity Daniel Rubio Bonilla TMPA 2017 O(N_ELEM) O(N_ITER) O(N_ELEM) O(1) O(1) O(1) O(1) + O(1) + O(N_ITER) * (O(N_ELEM)*O(1) + O(N_ELEM)) O(N_ITER * N_ELEM)

Slide 22

Slide 22 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 22 Transformations – Partitioning 1 let heatDiffusion = itn HEATTIMESTEP hm_array N_ITER PAR1 v = stencil1D TKernel 1 v where TKernel x y z = y + K * (x - 2*y + z) HEATTIMESTEP vs = map PAR1 vs Daniel Rubio Bonilla TMPA 2017

Slide 23

Slide 23 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 23 Transformations – Partitioning 2 let heatDiffusion = itn HEATTIMESTEP hm_array N_ITER PAR2 v = stencil1D TKernel 1 v where TKernel x y z = y + K * (x - 2*y + z) PAR1 vs = map PAR2 vs HEATTIMESTEP vss = map PAR1 vss Daniel Rubio Bonilla TMPA 2017

Slide 24

Slide 24 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 24 Platform Specific Transformations • OpenMP: • Relatively straightforward • MPI: • Communication • Halos Daniel Rubio Bonilla TMPA 2017

Slide 25

Slide 25 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 25 Transformed Code if (rank < size - 1) MPI_Send(&hm[LOCAL_N_ELEM],1, MPI_FLOAT, rank + 1, 0, MPI_COMM_WORLD); if (rank > 0) MPI_Recv(&hm[0], 1, MPI_FLOAT, rank-1, 0, MPI_COMM_WORLD, &status); if (rank > 0) MPI_Send(&hm[1], 1, MPI_FLOAT, rank-1, 1, MPI_COMM_WORLD ); if (rank < size - 1) MPI_Recv(&hm[LOCAL_N_ELEM+1],1,MPI_FLOAT, rank+1, 1, MPI_COMM_WORLD, \ &status); #pragma polca stencil1D 1 G hm hm_tmp #pragma omp parallel for for(i=1; i

Slide 26

Slide 26 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 26 Example - NBody t Daniel Rubio Bonilla N-Body Problem TMPA 2017 Three steps 1) Calculate Forces 2) Update Velocities 3) Calculate Position

Slide 27

Slide 27 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 27 Structure Daniel Rubio Bonilla TMPA 2017 O(1) O(1) O(1) O(nIters) O(nBodies) O(nBodies) O(nBodies) O(nBodies)

Slide 28

Slide 28 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 28 Structure Daniel Rubio Bonilla TMPA 2017 O(nIters) O(nBodies2) O(nBodies) O(nBodies)

Slide 29

Slide 29 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 29 Structure Daniel Rubio Bonilla TMPA 2017 O(nIters) * (O(nBodies2) + 2*O(nBodies)) O(nIters * nBodies2) O(nBodies2) O(nBodies) O(nBodies)

Slide 30

Slide 30 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 30 Communication Daniel Rubio Bonilla TMPA 2017 Parallel Parallel Sequential Parallel (with caution)

Slide 31

Slide 31 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 31 Conclusion • Functional semantics can enable code: • Transformation • Adaptation • But also... • Algorithmic complexity analysis • Communication patterns • This information helps to predict application’s behavior Daniel Rubio Bonilla TMPA 2017

Slide 32

Slide 32 text

:: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 32 Questions Thank you! Contact: [email protected] Projects: POLCA www.polca-project.eu Smart-Dash www.dash-project.org CλaSH www.clash-lang.org Daniel Rubio Bonilla TMPA 2017