Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TMPA-2017: Using Functional Directives to Analy...

TMPA-2017: Using Functional Directives to Analyze Code Complexity and Communication

TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow

Using Functional Directives to Analyze Code Complexity and Communication
Daniel Rubio Bonilla, HLRS - University of Stuttgart

For video follow the link: https://youtu.be/ckjR9TWk_Tg
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa

Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro

Exactpro

March 23, 2017
Tweet

More Decks by Exactpro

Other Decks in Technology

Transcript

  1. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Daniel Rubio Bonilla TMPA 2017 Using Functional Directives to Analyze code Complexity and Communication Daniel Rubio Bonilla HLRS – University of Stuttgart
  2. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 2 CPU Evolution Daniel Rubio Bonilla TMPA 2017
  3. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 3 Hazel Hen CPU E5-2680 v3 12 Cores 30MiB Cache 2.5 GhZ Node 2 CPUs – 24C 128 GB Comp. Nodes 7712 Total Cores 185,088 Performance 7420 TFlops Storage ~10 PB Weight 61.5 T Power 3200 KW Daniel Rubio Bonilla TMPA 2017
  4. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 4 Amdahl Law # processing units speedup 100% parallelizable 98% parallelizable 90% parallelizable 50% parallelizable Daniel Rubio Bonilla TMPA 2017
  5. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 5 Real Amdahl Law # processing units speedup 100% parallelizable 98% parallelizable 90% parallelizable 50% parallelizable Daniel Rubio Bonilla TMPA 2017
  6. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 6 The Problem Daniel Rubio Bonilla In High Performance Computing… • Performance is increased by • Integrating more cores (millions!?) • Using heterogeneous accelerators (GPU, FPGA, ...) • Issues • Programmability • Portability TMPA 2017
  7. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 7 New Approaches Different Programming Model • Focused on mathematical problems • Engineering • Science • To enable: • Parallelization and concurrency • Portability across different hardware and accelerators Daniel Rubio Bonilla TMPA 2017
  8. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 8 Our Approach To obtain the structural information of the application by annotating the imperative code with a functional-like directives (mathematical / algorithmic structure) • The main difficulty in this approach are: • “deriving” the structure of the application • matching the structure to the source code Daniel Rubio Bonilla TMPA 2017
  9. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 9 Higher Order Functions • Higher Order functions are mathematical functions • Takes one or more function as an argument • Can return a function as a result • Clear repetitive execution structure • These structures can be transformed to equivalent ones • But with different non-functional properties Daniel Rubio Bonilla TMPA 2017
  10. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 10 map :: (a -> b) -> [a] -> [b] map (*2) [1,2,3,4] = [2,4,6,8] Higher Order Functions • Apply to all: Daniel Rubio Bonilla TMPA 2017
  11. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 11 foldl :: (a -> b -> a) -> a -> [b] -> a foldl (+) 0 [1,2,3,4] = 10 map :: (a -> b) -> [a] -> [b] map (*2) [1,2,3,4] = [2,4,6,8] Higher Order Functions • Apply to all: • Reduction: Daniel Rubio Bonilla TMPA 2017
  12. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 12 Other Higher Order Functions Daniel Rubio Bonilla TMPA 2017
  13. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 13 Higher Order Functions Transformations total = foldl (+) 0 vs One possible transformation Only if the operation is associative and we know its neutral element Daniel Rubio Bonilla TMPA 2017
  14. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 14 Transformations • Changes in the mathematical formulation • Or the algorithm execution • Produce equivalent code • Change computing load • Change memory distribution • Modify communication • Allow adaptation to different architectures • While maintaining correctness! Daniel Rubio Bonilla TMPA 2017
  15. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 15 Hierarchical Structures • Functional annotations allow the construction of multiple structural levels: • Emerging complexity of the structural information • We distinguish between: • Output of one Higher Order Function is input of another • This can be achieved by analyzing the data dependencies between the functions • The operator of one (Higher Order) Function is composed of other functions Daniel Rubio Bonilla TMPA 2017
  16. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 16 Flat Structure Graph of a Complex Structure of two same level Higher Order Functions (HOFs) • The output of one HOF is the input for another HOF foldl (+) 0 (map (*2) [0..n-1]) foldl :: (a -> b -> a) -> a -> [b] -> a map :: (a -> b) -> [a] -> [b] Daniel Rubio Bonilla TMPA 2017
  17. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 17 Hierarchical Structure Graph of a Complex Hierarchical Structure of two different level Higher Order Functions (HOF) • The operator of one HOF is another HOF map (foldl (+) 0) [[..]..[..]] foldl :: (a -> b -> a) -> a -> [b] -> a map :: (a -> b) -> [a] -> [b] Daniel Rubio Bonilla TMPA 2017
  18. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 18 Other requirements • Strong binding between directives and code • Description of memory organization Daniel Rubio Bonilla TMPA 2017
  19. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 19 Example - Heat t Daniel Rubio Bonilla 1-D heat dissipation function Discretization TMPA 2017
  20. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 20 Complexity Daniel Rubio Bonilla TMPA 2017 O(N_ELEM) O(N_ITER) O(N_ELEM) O(1) O(1) O(1)
  21. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 21 Complexity Daniel Rubio Bonilla TMPA 2017 O(N_ELEM) O(N_ITER) O(N_ELEM) O(1) O(1) O(1) O(1) + O(1) + O(N_ITER) * (O(N_ELEM)*O(1) + O(N_ELEM)) O(N_ITER * N_ELEM)
  22. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 22 Transformations – Partitioning 1 let heatDiffusion = itn HEATTIMESTEP hm_array N_ITER PAR1 v = stencil1D TKernel 1 v where TKernel x y z = y + K * (x - 2*y + z) HEATTIMESTEP vs = map PAR1 vs Daniel Rubio Bonilla TMPA 2017
  23. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 23 Transformations – Partitioning 2 let heatDiffusion = itn HEATTIMESTEP hm_array N_ITER PAR2 v = stencil1D TKernel 1 v where TKernel x y z = y + K * (x - 2*y + z) PAR1 vs = map PAR2 vs HEATTIMESTEP vss = map PAR1 vss Daniel Rubio Bonilla TMPA 2017
  24. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 24 Platform Specific Transformations • OpenMP: • Relatively straightforward • MPI: • Communication • Halos Daniel Rubio Bonilla TMPA 2017
  25. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 25 Transformed Code if (rank < size - 1) MPI_Send(&hm[LOCAL_N_ELEM],1, MPI_FLOAT, rank + 1, 0, MPI_COMM_WORLD); if (rank > 0) MPI_Recv(&hm[0], 1, MPI_FLOAT, rank-1, 0, MPI_COMM_WORLD, &status); if (rank > 0) MPI_Send(&hm[1], 1, MPI_FLOAT, rank-1, 1, MPI_COMM_WORLD ); if (rank < size - 1) MPI_Recv(&hm[LOCAL_N_ELEM+1],1,MPI_FLOAT, rank+1, 1, MPI_COMM_WORLD, \ &status); #pragma polca stencil1D 1 G hm hm_tmp #pragma omp parallel for for(i=1; i<LOCAL_N_ELEM+1; i++) { #pragma polca G #pragma polca input (hm[i-1] hm[i] hm[i+1]) output(hm_tmp[i]) hm_tmp[i] = hm[i] + K * (hm[i-1] + hm[i+1] - 2 * hm[i]); } Daniel Rubio Bonilla TMPA 2017
  26. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 26 Example - NBody t Daniel Rubio Bonilla N-Body Problem TMPA 2017 Three steps 1) Calculate Forces 2) Update Velocities 3) Calculate Position
  27. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 27 Structure Daniel Rubio Bonilla TMPA 2017 O(1) O(1) O(1) O(nIters) O(nBodies) O(nBodies) O(nBodies) O(nBodies)
  28. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 28 Structure Daniel Rubio Bonilla TMPA 2017 O(nIters) O(nBodies2) O(nBodies) O(nBodies)
  29. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 29 Structure Daniel Rubio Bonilla TMPA 2017 O(nIters) * (O(nBodies2) + 2*O(nBodies)) O(nIters * nBodies2) O(nBodies2) O(nBodies) O(nBodies)
  30. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 30 Communication Daniel Rubio Bonilla TMPA 2017 Parallel Parallel Sequential Parallel (with caution)
  31. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 31 Conclusion • Functional semantics can enable code: • Transformation • Adaptation • But also... • Algorithmic complexity analysis • Communication patterns • This information helps to predict application’s behavior Daniel Rubio Bonilla TMPA 2017
  32. :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

    ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 32 Questions Thank you! Contact: [email protected] Projects: POLCA www.polca-project.eu Smart-Dash www.dash-project.org CλaSH www.clash-lang.org Daniel Rubio Bonilla TMPA 2017