Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs
What it this talk about?
PARTEE, a runtime system that:
■ Runs task-parallel applications efficiently
■ Detects and resolves dependencies
■ Features parallel and hierarchical dynamic dependence analysis
2016, 26th of May 2016 Nikolaos Papakonstantinou, Foivos S. Zakkak, and Polyvios Pratikakis Except where otherwise noted, this presentation is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. Third party marks and brands are the property of their respective holders.
PARTEE, a runtime system that: ▪ Runs task-parallel applications efficiently ▪ Detects and resolves dependencies ▪ Features parallel and hierarchical dynamic dependence analysis F. Zakkak - zakkak@ics.forth.gr
▪ Parents’ include their children’s memory footprints ▪ Children’s spawns depend on their parents’ scheduling ▪ Parents wait for their children to complete (not in OMPSs) ▪ Two tasks are dependent when one is not a descendant-task of the other, and the intersection of their memory footprints is not the empty set foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency PM Dependency F. Zakkak - zakkak@ics.forth.gr
▪ Parents’ include their children’s memory footprints ▪ Children’s spawns depend on their parents’ scheduling ▪ Parents wait for their children to complete (not in OMPSs) ▪ Two tasks are dependent when one is not a descendant-task of the other, and the intersection of their memory footprints is not the empty set foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency PM Dependency F. Zakkak - zakkak@ics.forth.gr
Description The algorithm consists of 2 phases Phase 1 ▪ Creates dependencies at task spawns ▪ Run by parent task Phase 2 ▪ Resolves dependencies at task completion ▪ Run by finishing task F. Zakkak - zakkak@ics.forth.gr
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
Total of 64 AMD Opteron Processor 6272 Cores ▪ Total of 256GB RAM ▪ Native sequential executions as baseline ▪ Geometric mean over 10 runs F. Zakkak - zakkak@ics.forth.gr
OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) F. Zakkak - zakkak@ics.forth.gr
OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) F. Zakkak - zakkak@ics.forth.gr
OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) F. Zakkak - zakkak@ics.forth.gr
OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) Thank You! F. Zakkak - zakkak@ics.forth.gr
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - zakkak@ics.forth.gr
▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr
▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - zakkak@ics.forth.gr