Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs
What it this talk about?
PARTEE, a runtime system that:
■ Runs task-parallel applications efficiently
■ Detects and resolves dependencies
■ Features parallel and hierarchical dynamic dependence analysis
2016, 26th of May 2016 Nikolaos Papakonstantinou, Foivos S. Zakkak, and Polyvios Pratikakis Except where otherwise noted, this presentation is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. Third party marks and brands are the property of their respective holders.
PARTEE, a runtime system that: ▪ Runs task-parallel applications efficiently ▪ Detects and resolves dependencies ▪ Features parallel and hierarchical dynamic dependence analysis F. Zakkak - [email protected]
▪ Parents’ include their children’s memory footprints ▪ Children’s spawns depend on their parents’ scheduling ▪ Parents wait for their children to complete (not in OMPSs) ▪ Two tasks are dependent when one is not a descendant-task of the other, and the intersection of their memory footprints is not the empty set foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency PM Dependency F. Zakkak - [email protected]
▪ Parents’ include their children’s memory footprints ▪ Children’s spawns depend on their parents’ scheduling ▪ Parents wait for their children to complete (not in OMPSs) ▪ Two tasks are dependent when one is not a descendant-task of the other, and the intersection of their memory footprints is not the empty set foo(x, y, z) out: x, y in: z Task 1 baz(y, z) out: y in: z Task 1.1 baz(x, z) out: x in: z Task 1.2 bar(k, x) out: k in: x Task 2 baz(k, x) out: k in: x Task 2.1 baz(l, m) out: l in: m Task 3 main() inout: x,y,z,k,l,m Root Task Spawn Dependency PM Dependency F. Zakkak - [email protected]
Description The algorithm consists of 2 phases Phase 1 ▪ Creates dependencies at task spawns ▪ Run by parent task Phase 2 ▪ Resolves dependencies at task completion ▪ Run by finishing task F. Zakkak - [email protected]
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
each task we keep: ▪ a dependencies counter ▪ a resolved dependencies counter ▪ a notify list For each argument we keep: ▪ the last access type ▪ the last owner ▪ a readers’ list foo(x, y, z) out: x, y in: z Task 1 1 3 NL Task 2 Task 3 … z type: in owner RL Task 1 Task 4 … F. Zakkak - [email protected]
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
lookup tables (LUTs) ▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
Custom region allocator ▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
Total of 64 AMD Opteron Processor 6272 Cores ▪ Total of 256GB RAM ▪ Native sequential executions as baseline ▪ Geometric mean over 10 runs F. Zakkak - [email protected]
OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) F. Zakkak - [email protected]
OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) F. Zakkak - [email protected]
OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) F. Zakkak - [email protected]
OMPSs ▪ Up to 2× better than Cilk in applications with irregular dependencies ▪ PARTEE is licensed under the Apache License v2.0 and can be found at https://github.com/CARV-ICS-FORTH/partee (Shortened: https://is.gd/ipdps_partee) Thank You! F. Zakkak - [email protected]
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
▪ Hold only associations for the arguments accessed by the children tasks ▪ Inherent distribution of metadata ▪ No contention ▪ Increased spatial and temporal locality ▪ Implemented as tries ▪ Memory address as key foo(x, y, z) out: x, y in: z Task 1 1 3 LUT NL Task 2 Task 3 … type: in z owner RL Task 1 Task 4 … F. Zakkak - [email protected]
▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]
▪ Each task owns a region in which it allocates its: □ Lookup Table (LUT) □ Children’s task descriptors ▪ Bulk deallocation at task completion foo(x, y, z) out: x, y in: z Task 1 1 3 3 LUT NL Task 2 Task 3 type: in z owner RL Task 1.1 … read(z) in: z Task 1.1 0 0 LUT NL F. Zakkak - [email protected]