Aleksandar Prokopec
September 25, 2013
52

# Work-stealing Tree Scheduler

Explanation of the work-stealing tree scheduler used in ScalaBlitz.

## Aleksandar Prokopec

September 25, 2013

## Transcript

1. ### Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads Aleksandar

Prokopec Martin Odersky 1
2. ### Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads Aleksandar

Prokopec Martin Odersky Irregular Data-Parallel 2

+ x 4
5. ### Uniform workload (0 until 10000000) reduce (+) sum = sum

+ x … N cycles 5

6

9. ### Irregular workload for { x <- 0 until width y

<- 0 until height } image(x, y) = compute(x, y) N cycles 9
10. ### Irregular workload for { x <- 0 until width y

<- 0 until height } image(x, y) = compute(x, y) image(x, y) = compute(x, y) N cycles 10
11. ### Workload function workload(n) – work spent on element n after

the data-parallel operation completed 11
12. ### Workload function Could be… Runtime value dependent for { x

<- 0 until width y <- 0 until height } img(x, y) = compute(x, y) workload(n) – work spent on element n after the data-parallel operation completed 12
13. ### Workload function Could be… Execution-schedule dependent for (n <- nodes)

n.neighbours += new Node workload(n) – work spent on element n after the data-parallel operation completed 13
14. ### Workload function Could be… Totally random for ((x, y) <-

img.indices) img(x, y) = sample( x + random(), y + random() ) workload(n) – work spent on element n after the data-parallel operation completed 14

17. ### Data-parallel scheduler 1. Linear speedup for the baseline workload 2.

Optimal speedup for irregular workloads Assign loop elements to workers without knowledge about the workload function. 17
18. ### Static batching Decides on the worker-element assignment before the data-parallel

operation begins. N cycles 18
19. ### Static batching Decides on the worker-element assignment before the data-parallel

operation begins. No knowledge → divide uniformly. Not optimal for even mildly irregular workloads. N cycles 19

20

21
22. ### Fixed-size batching Workload-driven – decides during execution. N cycles 2

T0: CAS T0 22
23. ### Fixed-size batching Workload-driven – decides during execution. N cycles 4

T1: CAS T0 T1 23
24. ### Fixed-size batching Workload-driven – decides during execution. N cycles 6

T0: CAS T0 T1 24
25. ### Fixed-size batching Workload-driven – decides during execution. N cycles 8

T0: CAS T0 T1 25
26. ### Fixed-size batching Workload-driven – decides during execution. N cycles 10

T0: CAS T0 T1 26
27. ### Fixed-size batching Workload-driven – decides during execution. N cycles 12

T0: CAS T0 T1 27
28. ### Fixed-size batching Workload-driven – decides during execution. N cycles progress

Pros: lightweight Cons: minimum batch size, contention 28

30. ### Factoring, GSS, TS Batch size varies. N cycles progress Pros:

lightweight Cons: contention 30

32. ### Task-based work-stealing N cycles 0..2 2..4 4..8 8..16 2..4 4..8

8..16 T0 T1 0..2 32
33. ### Task-based work-stealing N cycles 0..2 2..4 4..8 8..16 2..4 4..8

8..16 T0 T1 0..2 steal – a rare event 33
34. ### Task-based work-stealing N cycles 0..2 2..4 4..8 8..16 2..4 4..8

8..16 T0 T1 10..12 12..16 8..10 0..2 34
35. ### Task-based work-stealing Pros: can be adaptive - uses stealing information

Cons: heavyweight - minimum batch size much larger N cycles 0..2 2..4 4..8 8..16 2..4 4..8 8..16 T0 T1 10..12 12..16 0..2 8..10 35
36. ### Task-based work-stealing N cycles 0..2 2..4 4..8 8..16 Cannot be

stolen after T0 starts processing it 36

38. ### Work-stealing tree 0 0 T0 N 0 50 T0 N

owned owned T0: CAS 38
39. ### Work-stealing tree 0 0 T0 N 0 50 T0 N

0 N T0 N … owned owned completed T0: CAS T0: CAS What about stealing? 39
40. ### Work-stealing tree 0 0 T0 N 0 50 T0 N

0 N T0 N … owned owned completed 0 -51 T0 N T0: CAS T1: CAS stolen T0: CAS 40
41. ### Work-stealing tree 0 50 T0 N 0 N T0 N

… owned completed 0 -51 T0 N T0: CAS stolen T0: CAS 0 0 T0 N owned T1: CAS 41
42. ### Work-stealing tree 0 50 T0 N 0 N T0 N

… owned completed 0 -51 T0 N T0: CAS stolen 0 -51 T0 N expanded 50 50 T0 M M M T1 N T0: CAS 0 0 T0 N owned M = (50 + N) / 2 42
43. ### Work-stealing tree 0 50 T0 N 0 N T0 N

… owned completed 0 -51 T0 N T0: CAS stolen 0 -51 T0 N expanded 50 50 T0 M M M T1 N T0: CAS 0 0 T0 N owned M = (50 + N) / 2 T0 or T1: CAS 43
44. ### Work-stealing tree 0 50 T0 N 0 N T0 N

… owned completed 0 -51 T0 N T0: CAS stolen 0 -51 T0 N expanded 50 50 T0 M M M T1 N T0 or T1: CAS T0: CAS 0 0 T0 N owned M = (50 + N) / 2 44

46. ### Work-stealing tree scheduling 1) find either a non-expanded, non-completed node

2) if not found, terminate 3) if not owned, steal and/or expand, and descend 4) advance until node is completed or stolen 5) go to 1) 50
47. ### Work-stealing tree scheduling 1) find either a non-expanded, non-completed node

2) if not found, terminate 3) if not owned, steal and/or expand, and descend 4) advance until node is completed or stolen 5) go to 1) 1) find either a non-expanded, non-completed node 51

9 5 3 52
49. ### Choosing the node to steal Find first, in-order traversal 2

9 5 3 Catastrophic – a lot of stealing, huge trees 53
50. ### Choosing the node to steal Find first, in-order traversal Find

first, random order traversal 2 9 5 3 2 9 5 3 Catastrophic – a lot of stealing, huge trees 54
51. ### Choosing the node to steal Find first, in-order traversal Find

first, random order traversal 2 9 5 3 2 9 5 3 Catastrophic – a lot of stealing, huge trees Works reasonably well. 55
52. ### Choosing the node to steal Find first, in-order traversal Find

first, random order traversal Find most elements 2 9 5 3 2 9 5 3 2 9 5 3 Catastrophic – a lot of stealing, huge trees Works reasonably well. Generates least nodes. Seems to be best. 56