Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hierarchical Parallel Dynamic Dependence Analys...

Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs

What it this talk about?

PARTEE, a runtime system that:
■ Runs task-parallel applications efficiently
■ Detects and resolves dependencies
■ Features parallel and hierarchical dynamic dependence analysis

Avatar for zakkak

zakkak

May 26, 2016
Tweet

More Decks by zakkak

Other Decks in Programming

Transcript

  1. Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs IPDPS

    2016, 26th of May 2016 Nikolaos Papakonstantinou, Foivos S. Zakkak, and Polyvios Pratikakis Except where otherwise noted, this presentation is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. Third party marks and brands are the property of their respective holders.
  2. 1 Introduction 1.1 Contributions 1/13 What it this talk about?

    PARTEE, a runtime system that: ▪ Runs task-parallel applications efficiently ▪ Detects and resolves dependencies ▪ Features parallel and hierarchical dynamic dependence analysis F. Zakkak - zakkak@ics.forth.gr
  3. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - zakkak@ics.forth.gr
  4. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - zakkak@ics.forth.gr
  5. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - zakkak@ics.forth.gr
  6. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - zakkak@ics.forth.gr
  7. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - zakkak@ics.forth.gr
  8. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - zakkak@ics.forth.gr
  9. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - zakkak@ics.forth.gr
  10. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - zakkak@ics.forth.gr
  11. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - zakkak@ics.forth.gr
  12. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - zakkak@ics.forth.gr
  13. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - zakkak@ics.forth.gr
  14. 1 Introduction 1.2 The Programming Model 3/13 Properties of Tasks

    ▪ Parents’ include their children’s memory footprints ▪ Children’s spawns depend on their parents’ scheduling ▪ Parents wait for their children to complete (not in OMPSs) ▪ Two tasks are dependent when one is not a descendant-task of the other, and the intersection of their memory footprints is not the empty set foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency PM Dependency F. Zakkak - zakkak@ics.forth.gr
  15. 1 Introduction 1.2 The Programming Model 3/13 Properties of Tasks

    ▪ Parents’ include their children’s memory footprints ▪ Children’s spawns depend on their parents’ scheduling ▪ Parents wait for their children to complete (not in OMPSs) ▪ Two tasks are dependent when one is not a descendant-task of the other, and the intersection of their memory footprints is not the empty set foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency PM Dependency F. Zakkak - zakkak@ics.forth.gr
  16. 1 Introduction 1.2 The Programming Model 4/13 Observation Dependencies need

    to be resolved only among sibling tasks! F. Zakkak - zakkak@ics.forth.gr
  17. 2 The algorithm 2.1 The Base Algorithm 5/13 High Level

    Description The algorithm consists of 2 phases Phase 1 ▪ Creates dependencies at task spawns ▪ Run by parent task Phase 2 ▪ Resolves dependencies at task completion ▪ Run by finishing task F. Zakkak - zakkak@ics.forth.gr
  18. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  19. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  20. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  21. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  22. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  23. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  24. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  25. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  26. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  27. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  28. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  29. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  30. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  31. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  32. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  33. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  34. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  35. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  36. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  37. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  38. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  39. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  40. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  41. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  42. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  43. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  44. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  45. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  46. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  47. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - zakkak@ics.forth.gr
  48. 2 The algorithm 2.1 The Base Algorithm 8/13 Phase 2:

    Dependency Resolution x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 1 2 2 NL readwrite(x) inout: x Task 4 0 1 1 NL print(x) in: x Task 5 0 1 1 NL read(x) in: x Task 6 0 1 1 NL F. Zakkak - zakkak@ics.forth.gr
  49. 2 The algorithm 2.1 The Base Algorithm 8/13 Phase 2:

    Dependency Resolution x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 1 2 2 NL readwrite(x) inout: x Task 4 0 1 1 NL print(x) in: x Task 5 0 1 1 NL read(x) in: x Task 6 0 1 1 NL F. Zakkak - zakkak@ics.forth.gr
  50. 2 The algorithm 2.1 The Base Algorithm 8/13 Phase 2:

    Dependency Resolution x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 1 2 2 NL readwrite(x) inout: x Task 4 0 1 1 NL print(x) in: x Task 5 0 1 1 NL read(x) in: x Task 6 0 1 1 NL F. Zakkak - zakkak@ics.forth.gr
  51. 2 The algorithm 2.1 The Base Algorithm 8/13 Phase 2:

    Dependency Resolution x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 1 2 2 NL readwrite(x) inout: x Task 4 0 1 1 NL print(x) in: x Task 5 0 1 1 NL read(x) in: x Task 6 0 1 1 NL F. Zakkak - zakkak@ics.forth.gr
  52. 2 The algorithm 2.1 The Base Algorithm 8/13 Phase 2:

    Dependency Resolution x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 1 2 2 NL readwrite(x) inout: x Task 4 0 1 1 NL print(x) in: x Task 5 0 1 1 NL read(x) in: x Task 6 0 1 1 NL F. Zakkak - zakkak@ics.forth.gr
  53. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  54. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  55. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  56. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  57. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  58. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  59. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  60. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  61. 3 PARTEE 3.1 The Runtime System 10/13 Memory Handling ▪

    Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
  62. 3 PARTEE 3.1 The Runtime System 10/13 Memory Handling ▪

    Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
  63. 3 PARTEE 3.1 The Runtime System 10/13 Memory Handling ▪

    Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
  64. 3 PARTEE 3.1 The Runtime System 10/13 Memory Handling ▪

    Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
  65. 3 PARTEE 3.1 The Runtime System 10/13 Memory Handling ▪

    Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
  66. 4 Evaluation 11/13 Evaluation Setup ▪ 4-chip NUMA system ▪

    Total of 64 AMD Opteron Processor 6272 Cores ▪ Total of 256GB RAM ▪ Native sequential executions as baseline ▪ Geometric mean over 10 runs F. Zakkak - zakkak@ics.forth.gr
  67. 4 Evaluation 12/13 Performance PARTEE PARTEE ND Cilk Nanos++ Linear

    0.35 1 2 4 8 16 32 64 1 2 4 8 16 32 64 Speedup Cores Blackscholes 1 2 4 8 16 32 64 Cores Matrix Multiply F. Zakkak - zakkak@ics.forth.gr
  68. 4 Evaluation 12/13 Performance PARTEE PARTEE ND Cilk Nanos++ Linear

    0.35 1 2 4 8 16 32 64 1 2 4 8 16 32 64 Speedup Cores Heat Diffusion 1 2 4 8 16 32 64 Cores Mergesort F. Zakkak - zakkak@ics.forth.gr
  69. 4 Evaluation 12/13 Performance PARTEE PARTEE ND Cilk Nanos++ Linear

    0.35 1 2 4 8 16 32 64 1 2 4 8 16 32 64 Speedup Cores Cholesky 1 2 4 8 16 32 64 Cores LU Decomposition F. Zakkak - zakkak@ics.forth.gr
  70. 5 Remarks 13/13 Remarks ▪ PARTEE brings Cilk-like performance to

    OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) F. Zakkak - zakkak@ics.forth.gr
  71. 5 Remarks 13/13 Remarks ▪ PARTEE brings Cilk-like performance to

    OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) F. Zakkak - zakkak@ics.forth.gr
  72. 5 Remarks 13/13 Remarks ▪ PARTEE brings Cilk-like performance to

    OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) F. Zakkak - zakkak@ics.forth.gr
  73. 5 Remarks 13/13 Remarks ▪ PARTEE brings Cilk-like performance to

    OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) Thank You! F. Zakkak - zakkak@ics.forth.gr
  74. 6 Backup Slides 13/13 Overhead vs Task granularity 100 101

    102 103 100 101 102 103 Task Time (μs) Workload (μs) Native PARTEE Cilk Nanos++ F. Zakkak - zakkak@ics.forth.gr
  75. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  76. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  77. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  78. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  79. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  80. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  81. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  82. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
  83. 6 Backup Slides 13/13 Memory Handling ▪ Custom region allocator

    ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
  84. 6 Backup Slides 13/13 Memory Handling ▪ Custom region allocator

    ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
  85. 6 Backup Slides 13/13 Memory Handling ▪ Custom region allocator

    ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
  86. 6 Backup Slides 13/13 Memory Handling ▪ Custom region allocator

    ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
  87. 6 Backup Slides 13/13 Memory Handling ▪ Custom region allocator

    ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr