Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hierarchical Parallel Dynamic Dependence Analys...

Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs

What it this talk about?

PARTEE, a runtime system that:
■ Runs task-parallel applications efficiently
■ Detects and resolves dependencies
■ Features parallel and hierarchical dynamic dependence analysis

zakkak

May 26, 2016
Tweet

More Decks by zakkak

Other Decks in Programming

Transcript

  1. Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs IPDPS

    2016, 26th of May 2016 Nikolaos Papakonstantinou, Foivos S. Zakkak, and Polyvios Pratikakis Except where otherwise noted, this presentation is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. Third party marks and brands are the property of their respective holders.
  2. 1 Introduction 1.1 Contributions 1/13 What it this talk about?

    PARTEE, a runtime system that: ▪ Runs task-parallel applications efficiently ▪ Detects and resolves dependencies ▪ Features parallel and hierarchical dynamic dependence analysis F. Zakkak - [email protected]
  3. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - [email protected]
  4. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - [email protected]
  5. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - [email protected]
  6. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - [email protected]
  7. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - [email protected]
  8. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - [email protected]
  9. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - [email protected]
  10. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - [email protected]
  11. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - [email protected]
  12. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - [email protected]
  13. 1 Introduction 1.2 The Programming Model 2/13 The OMPSs programming

    model An example void baz(int *k, int *l) {*k = *l;} void foo(int *x, int *y, int *z) { #pragma omp task in(z) out(x) baz(x, z); #pragma omp task in(z) out(y) baz(y, z); } void bar(int *k, int *l) { #pragma omp task in(l) out(k) baz(k, l); } int main(void) { // ... #pragma omp task in(z) out(x, y) foo(x, y, z); #pragma omp task in(x) out(k) bar(k, x); #pragma omp task in(m) out(l) baz(l, m); // ... } Spawn Graph foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency F. Zakkak - [email protected]
  14. 1 Introduction 1.2 The Programming Model 3/13 Properties of Tasks

    ▪ Parents’ include their children’s memory footprints ▪ Children’s spawns depend on their parents’ scheduling ▪ Parents wait for their children to complete (not in OMPSs) ▪ Two tasks are dependent when one is not a descendant-task of the other, and the intersection of their memory footprints is not the empty set foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency PM Dependency F. Zakkak - [email protected]
  15. 1 Introduction 1.2 The Programming Model 3/13 Properties of Tasks

    ▪ Parents’ include their children’s memory footprints ▪ Children’s spawns depend on their parents’ scheduling ▪ Parents wait for their children to complete (not in OMPSs) ▪ Two tasks are dependent when one is not a descendant-task of the other, and the intersection of their memory footprints is not the empty set foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency PM Dependency F. Zakkak - [email protected]
  16. 2 The algorithm 2.1 The Base Algorithm 5/13 High Level

    Description The algorithm consists of 2 phases Phase 1 ▪ Creates dependencies at task spawns ▪ Run by parent task Phase 2 ▪ Resolves dependencies at task completion ▪ Run by finishing task F. Zakkak - [email protected]
  17. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  18. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  19. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  20. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  21. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  22. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  23. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  24. 2 The algorithm 2.1 The Base Algorithm 6/13 Metadata For

    each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  25. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  26. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  27. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  28. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  29. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  30. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  31. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  32. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  33. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  34. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  35. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  36. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  37. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  38. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  39. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  40. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  41. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  42. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  43. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  44. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  45. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  46. 2 The algorithm 2.1 The Base Algorithm 7/13 Phase 1:

    Dependency Creation x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: in type: out type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 0 NL 1 2 readwrite(x) inout: x Task 4 0 0 NL 1 print(x) in: x Task 5 0 0 1 NL read(x) in: x Task 6 0 0 1 NL F. Zakkak - [email protected]
  47. 2 The algorithm 2.1 The Base Algorithm 8/13 Phase 2:

    Dependency Resolution x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 1 2 2 NL readwrite(x) inout: x Task 4 0 1 1 NL print(x) in: x Task 5 0 1 1 NL read(x) in: x Task 6 0 1 1 NL F. Zakkak - [email protected]
  48. 2 The algorithm 2.1 The Base Algorithm 8/13 Phase 2:

    Dependency Resolution x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 1 2 2 NL readwrite(x) inout: x Task 4 0 1 1 NL print(x) in: x Task 5 0 1 1 NL read(x) in: x Task 6 0 1 1 NL F. Zakkak - [email protected]
  49. 2 The algorithm 2.1 The Base Algorithm 8/13 Phase 2:

    Dependency Resolution x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 1 2 2 NL readwrite(x) inout: x Task 4 0 1 1 NL print(x) in: x Task 5 0 1 1 NL read(x) in: x Task 6 0 1 1 NL F. Zakkak - [email protected]
  50. 2 The algorithm 2.1 The Base Algorithm 8/13 Phase 2:

    Dependency Resolution x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 1 2 2 NL readwrite(x) inout: x Task 4 0 1 1 NL print(x) in: x Task 5 0 1 1 NL read(x) in: x Task 6 0 1 1 NL F. Zakkak - [email protected]
  51. 2 The algorithm 2.1 The Base Algorithm 8/13 Phase 2:

    Dependency Resolution x = malloc(...); #pragma omp task in(x) print(x) /* 1st */ #pragma omp task in(x) read(x) /* RAR */ #pragma omp task out(x) write(x) /* WAR */ #pragma omp task inout(x) readwrite(x) /* WAW */ #pragma omp task in(x) print(x) /* RAW */ #pragma omp task in(x) read(x) /* RAR */ type: inout x owner RL print(x) in: x Task 1 0 0 NL read(x) in: x Task 2 0 0 NL Task 3 write(x) out: x 0 1 2 2 NL readwrite(x) inout: x Task 4 0 1 1 NL print(x) in: x Task 5 0 1 1 NL read(x) in: x Task 6 0 1 1 NL F. Zakkak - [email protected]
  52. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  53. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  54. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  55. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  56. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  57. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  58. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  59. 3 PARTEE 3.1 The Runtime System 9/13 Metadata handling Per-task

    lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  60. 3 PARTEE 3.1 The Runtime System 10/13 Memory Handling ▪

    Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
  61. 3 PARTEE 3.1 The Runtime System 10/13 Memory Handling ▪

    Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
  62. 3 PARTEE 3.1 The Runtime System 10/13 Memory Handling ▪

    Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
  63. 3 PARTEE 3.1 The Runtime System 10/13 Memory Handling ▪

    Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
  64. 3 PARTEE 3.1 The Runtime System 10/13 Memory Handling ▪

    Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
  65. 4 Evaluation 11/13 Evaluation Setup ▪ 4-chip NUMA system ▪

    Total of 64 AMD Opteron Processor 6272 Cores ▪ Total of 256GB RAM ▪ Native sequential executions as baseline ▪ Geometric mean over 10 runs F. Zakkak - [email protected]
  66. 4 Evaluation 12/13 Performance PARTEE PARTEE ND Cilk Nanos++ Linear

    0.35 1 2 4 8 16 32 64 1 2 4 8 16 32 64 Speedup Cores Blackscholes 1 2 4 8 16 32 64 Cores Matrix Multiply F. Zakkak - [email protected]
  67. 4 Evaluation 12/13 Performance PARTEE PARTEE ND Cilk Nanos++ Linear

    0.35 1 2 4 8 16 32 64 1 2 4 8 16 32 64 Speedup Cores Heat Diffusion 1 2 4 8 16 32 64 Cores Mergesort F. Zakkak - [email protected]
  68. 4 Evaluation 12/13 Performance PARTEE PARTEE ND Cilk Nanos++ Linear

    0.35 1 2 4 8 16 32 64 1 2 4 8 16 32 64 Speedup Cores Cholesky 1 2 4 8 16 32 64 Cores LU Decomposition F. Zakkak - [email protected]
  69. 5 Remarks 13/13 Remarks ▪ PARTEE brings Cilk-like performance to

    OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) F. Zakkak - [email protected]
  70. 5 Remarks 13/13 Remarks ▪ PARTEE brings Cilk-like performance to

    OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) F. Zakkak - [email protected]
  71. 5 Remarks 13/13 Remarks ▪ PARTEE brings Cilk-like performance to

    OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) F. Zakkak - [email protected]
  72. 5 Remarks 13/13 Remarks ▪ PARTEE brings Cilk-like performance to

    OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) Thank You! F. Zakkak - [email protected]
  73. 6 Backup Slides 13/13 Overhead vs Task granularity 100 101

    102 103 100 101 102 103 Task Time (μs) Workload (μs) Native PARTEE Cilk Nanos++ F. Zakkak - [email protected]
  74. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  75. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  76. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  77. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  78. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  79. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  80. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  81. 6 Backup Slides 13/13 Metadata handling Per-task lookup tables (LUTs)

    ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
  82. 6 Backup Slides 13/13 Memory Handling ▪ Custom region allocator

    ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
  83. 6 Backup Slides 13/13 Memory Handling ▪ Custom region allocator

    ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
  84. 6 Backup Slides 13/13 Memory Handling ▪ Custom region allocator

    ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
  85. 6 Backup Slides 13/13 Memory Handling ▪ Custom region allocator

    ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
  86. 6 Backup Slides 13/13 Memory Handling ▪ Custom region allocator

    ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]