Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Why OPS (Optimizing Parallelizing System) May b...

CEE-SECR
October 20, 2017

Why OPS (Optimizing Parallelizing System) May be Useful for Clang, Denis Dubrov, Southern Federal University, CEE-SECR 2017

Open Parallelizing System is a compling system for high-performance accelerators. The talk presents several advantages of OPS over LLVM and GCC. The high and low level intermediate program representations of compiler systems are being compared. The talk is intended for those who are interested in developing fast programs in C, developing software for graphic accelerators, and in new computational architectures and compilers for them.

CEE-SECR

October 20, 2017
Tweet

More Decks by CEE-SECR

Other Decks in Technology

Transcript

  1. Introduction Advantages OpenOPS Title Overview How OPS (Optimizing Parallelizing System)

    May be Useful for Clang SECR-2017 Lev R. Gervith, Sergey A. Guda, Denis V. Dubrov, Ruslan A. Ibragimov, Elena A. Metelitsa, Yury M. Mikhailuts, Artyom E. Paterikin, Victor V. Petrenko, Ilya R. Skapenko, Boris Ya. Steinberg, Oleg B. Steinberg, Vladislav A. Yakovlev, Mikhail V. Yurushkin Southern Federal University, Rostov-on-Don, Russia October 20, 2017 How OPS (Optimizing Parallelizing System) May be Useful for Clang 1 / 13
  2. Introduction Advantages OpenOPS Title Overview Optimizing parallelizing system (OPS) OPS

    at a glance High-level internal representation: “Reprise”; Frontend (Clang based); Analysis: Dependencies graph, Alias analysis, Computations graph, . . . Transformations: Recurrent loops parallelizing, Loop unrolling, Loop fusion, Loop nesting, Loop interchange . . . OPS at a glance (end) GUI for testing purposes; Data visualization: Dependencies graph, . . . Backends: MPI, OpenMP, CUDA, VHDL, Clang, . . . How OPS (Optimizing Parallelizing System) May be Useful for Clang 2 / 13
  3. Introduction Advantages OpenOPS Title Overview Generating high-level code Automatic parallelization:

    CUDA OpenMP MPI VHDL Optimizing memory usage: tiling. How OPS (Optimizing Parallelizing System) May be Useful for Clang 3 / 13
  4. Introduction Advantages OpenOPS Title Overview Clang + OPS integration Source

    Code clang AST Reprise (OPS) clang AST Binary code Figure 1: injecting OPS inside Clang How OPS (Optimizing Parallelizing System) May be Useful for Clang 4 / 13
  5. Introduction Advantages OpenOPS Title Overview Low-level IR vs. High-level IR

    Many input languages Low-level IR Few target architectures Few input languages High-level IR Many target architectures Figure 2: comparison of low-level and high-level internal representations How OPS (Optimizing Parallelizing System) May be Useful for Clang 5 / 13
  6. Introduction Advantages OpenOPS Overview Code generation Visualization Dialog mode Advantages

    of using OPS Code generation for accelerators GPU FPGA Block data placement shared memory distributed memory Parallel programming visual aid: dependency visualization in terms of original source code. Dialog compilation How OPS (Optimizing Parallelizing System) May be Useful for Clang 6 / 13
  7. Introduction Advantages OpenOPS Overview Code generation Visualization Dialog mode Block

    array placement 2D-FFT LU-decomp. Floyd Matrix-squaring 100 100.5 log tN /tO Figure 3: running times for algorithms with and without block placement in shared memory How OPS (Optimizing Parallelizing System) May be Useful for Clang 7 / 13
  8. Introduction Advantages OpenOPS Overview Code generation Visualization Dialog mode Block

    affine array placement in distributed memory Result Automatically generated MPI code + block affine data placement. Table 1: solving 3D Dirichlet problem for Poisson equation with iterative Jacobi algorithm Running time, s Size 1 node 2 nodes 4 nodes 8 nodes 128 × 128 × 128 38.25 19.74 10.9 5.97 256 × 256 × 256 310.19 165.64 87.11 48.64 384 × 384 × 384 1078 697.95 356.9 190.12 512 × 512 × 512 2786.47 1432.2 776.14 418.38 How OPS (Optimizing Parallelizing System) May be Useful for Clang 8 / 13
  9. Introduction Advantages OpenOPS Overview Code generation Visualization Dialog mode Graphics

    accelerator atax adi bicg cholesky correlation covariance doitgen durbin dynprog fdtd-2d fdtd-apm l floyd-warshall gem m gem ver gesum m v gram schm idt jacobi-1d-im per jacobi-2d-im per lu ludcm p m vt reg-detect seidel-2d sym m syr2k syrk trisolv trm m 2m m 3m m 10−2 10−1 100 101 102 log tP /tO Figure 4: comparison of OPS-based solution with PPCG (http://ppcg.gforge.inria.fr/) How OPS (Optimizing Parallelizing System) May be Useful for Clang 9 / 13
  10. Introduction Advantages OpenOPS Overview Code generation Visualization Dialog mode FPGA

    accelerator Processor core Programmable logic CPU PC FPGA AXI/EMIO Ethernet/ USB/PCI-e a) on one core b) on separate cores Figure 5: the hybrid compute systems with FPGA/CPU How OPS (Optimizing Parallelizing System) May be Useful for Clang 10 / 13
  11. Introduction Advantages OpenOPS Overview Code generation Visualization Dialog mode Dependency

    visualization Figure 6: parallel programmer’s training simulator output How OPS (Optimizing Parallelizing System) May be Useful for Clang 11 / 13
  12. Introduction Advantages OpenOPS Overview Code generation Visualization Dialog mode Dependency

    refinement in dialog Example (Floyd algorithm) for (k = 0; k < n; ++ k) for (i = 0; i < n; ++ i) for (j = 0; j < n; ++ j) if (a[i][j] > a[i][k] + a[k][j]) a[i][j] = a[i][k] + a[k][j]; Neither loop can be automatically parallelized due to data dependency. Actually, the dependency is not realized if a[i][i] >= 0. The dialog compiler may ask the question to the programmer. How OPS (Optimizing Parallelizing System) May be Useful for Clang 12 / 13
  13. Introduction Advantages OpenOPS Links OpenOPS OpenOPS source code https://github.com/OpsGroup/open-ops OPS

    website http://www.ops.rsu.ru/ Web auto-parallelizer http://ops.opsgroup.ru/en/ How OPS (Optimizing Parallelizing System) May be Useful for Clang 13 / 13