near optimal work-stealing tree for highly irregular data-parallel workloads
DESCRIPTION
Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads. Aleksandar Prokopec Martin Odersky. Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads. Irregular Data-Parallel. Aleksandar Prokopec Martin Odersky. Uniform workload. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/1.jpg)
1
Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads
Aleksandar ProkopecMartin Odersky
![Page 2: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/2.jpg)
2
Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads
Aleksandar ProkopecMartin Odersky
Irregular Data-Parallel
![Page 3: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/3.jpg)
3
Uniform workload
(0 until 10000000) reduce (+)
![Page 4: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/4.jpg)
4
Uniform workload
(0 until 10000000) reduce (+)
sum = sum + x
![Page 5: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/5.jpg)
5
Uniform workload
(0 until 10000000) reduce (+)
sum = sum + x
…
N
cycles
![Page 6: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/6.jpg)
6
Baseline workload
for (0 until 10000000) {}
…
N
cycles
![Page 7: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/7.jpg)
7
Irregular workload
![Page 8: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/8.jpg)
8
Irregular workload
N
cycles
![Page 9: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/9.jpg)
9
Irregular workload
for { x <- 0 until width y <- 0 until height} image(x, y) = compute(x, y)
N
cycles
![Page 10: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/10.jpg)
10
Irregular workload
for { x <- 0 until width y <- 0 until height} image(x, y) = compute(x, y)image(x, y) = compute(x, y)
N
cycles
![Page 11: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/11.jpg)
11
Workload function
workload(n) – work spent on element n after the data-parallel operation completed
![Page 12: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/12.jpg)
12
Workload function
Could be…
Runtime valuedependent
for { x <- 0 until width y <- 0 until height} img(x, y) = compute(x, y)
workload(n) – work spent on element n after the data-parallel operation completed
![Page 13: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/13.jpg)
13
Workload function
Could be…
Execution-scheduledependent
for (n <- nodes) n.neighbours += new Node
workload(n) – work spent on element n after the data-parallel operation completed
![Page 14: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/14.jpg)
14
Workload function
Could be…
Totally randomfor ((x, y) <- img.indices) img(x, y) = sample( x + random(), y + random() )
workload(n) – work spent on element n after the data-parallel operation completed
![Page 15: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/15.jpg)
15
Data-parallel scheduler
Assign loop elements to workerswithout knowledge about the workload function.
![Page 16: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/16.jpg)
16
Data-parallel scheduler
1. Linear speedup for the baseline workload
Assign loop elements to workerswithout knowledge about the workload function.
![Page 17: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/17.jpg)
17
Data-parallel scheduler
1. Linear speedup for the baseline workload2. Optimal speedup for irregular workloads
Assign loop elements to workerswithout knowledge about the workload function.
![Page 18: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/18.jpg)
18
Static batching
Decides on the worker-element assignment before the data-parallel operation begins.
N
cycles
![Page 19: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/19.jpg)
19
Static batching
Decides on the worker-element assignment before the data-parallel operation begins.
No knowledge → divide uniformly.
Not optimal for even mildly irregular workloads.
N
cycles
![Page 20: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/20.jpg)
20
Fixed-size batching
Workload-driven – decides during execution.
N
cycles
progress
![Page 21: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/21.jpg)
21
Fixed-size batching
Workload-driven – decides during execution.
N
cycles
0
![Page 22: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/22.jpg)
22
Fixed-size batching
Workload-driven – decides during execution.
N
cycles
2 T0: CAS
T0
![Page 23: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/23.jpg)
23
Fixed-size batching
Workload-driven – decides during execution.
N
cycles
4T1: CAS
T0 T1
![Page 24: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/24.jpg)
24
Fixed-size batching
Workload-driven – decides during execution.
N
cycles
6 T0: CAS
T0T1
![Page 25: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/25.jpg)
25
Fixed-size batching
Workload-driven – decides during execution.
N
cycles
8 T0: CAS
T0T1
![Page 26: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/26.jpg)
26
Fixed-size batching
Workload-driven – decides during execution.
N
cycles
10 T0: CAS
T0T1
![Page 27: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/27.jpg)
27
Fixed-size batching
Workload-driven – decides during execution.
N
cycles
12 T0: CAS
T0T1
![Page 28: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/28.jpg)
28
Fixed-size batching
Workload-driven – decides during execution.
N
cycles
progress
Pros: lightweightCons: minimum batch size, contention
![Page 29: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/29.jpg)
29
Fixed-size batching - contention
![Page 30: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/30.jpg)
30
Factoring, GSS, TS
Batch size varies.
N
cycles
progress
Pros: lightweightCons: contention
![Page 31: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/31.jpg)
31
Task-based work-stealing
N
cycles
0..2 2..4 4..8 8..16
![Page 32: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/32.jpg)
32
Task-based work-stealing
N
cycles
0..2 2..4 4..8 8..16
2..4
4..8
8..16
T0 T10..2
![Page 33: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/33.jpg)
33
Task-based work-stealing
N
cycles
0..2 2..4 4..8 8..16
2..4
4..8
8..16
T0 T10..2
steal – a rare event
![Page 34: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/34.jpg)
34
Task-based work-stealing
N
cycles
0..2 2..4 4..8 8..16
2..4
4..8
8..16
T0 T110..12
12..16
8..100..2
![Page 35: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/35.jpg)
35
Task-based work-stealing
Pros: can be adaptive - uses stealing informationCons: heavyweight - minimum batch size much larger
N
cycles
0..2 2..4 4..8 8..16
2..4
4..8
8..16
T0 T110..12
12..16
0..2 8..10
![Page 36: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/36.jpg)
36
Task-based work-stealing
N
cycles
0..2 2..4 4..8 8..16
Cannot be stolenafter T0 starts processing it
![Page 37: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/37.jpg)
37
Work-stealing tree
0 0T0 N
owned
![Page 38: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/38.jpg)
38
Work-stealing tree
0 0T0 N 0 50T0 N
owned owned
T0: CAS
![Page 39: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/39.jpg)
39
Work-stealing tree
0 0T0 N 0 50T0 N 0 NT0 N…
owned owned completed
T0: CAS T0: CAS
What about stealing?
![Page 40: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/40.jpg)
40
Work-stealing tree
0 0T0 N 0 50T0 N 0 NT0 N…
owned owned completed
0 -51T0 N
T0: CAS
T1: CAS
stolen
T0: CAS
![Page 41: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/41.jpg)
41
Work-stealing tree
0 50T0 N 0 NT0 N…
owned completed
0 -51T0 N
T0: CAS
stolen
T0: CAS
0 0T0 N
owned
T1: CAS
![Page 42: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/42.jpg)
42
Work-stealing tree
0 50T0 N 0 NT0 N…
owned completed
0 -51T0 N
T0: CAS
stolen
0 -51T0 N
expanded
50 50T0 M M MT1 N
T0: CAS
0 0T0 N
owned
M = (50 + N) / 2
![Page 43: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/43.jpg)
43
Work-stealing tree
0 50T0 N 0 NT0 N…
owned completed
0 -51T0 N
T0: CAS
stolen
0 -51T0 N
expanded
50 50T0 M M MT1 N
T0: CAS
0 0T0 N
owned
M = (50 + N) / 2
T0 or T1: CAS
![Page 44: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/44.jpg)
44
Work-stealing tree
0 50T0 N 0 NT0 N…
owned completed
0 -51T0 N
T0: CAS
stolen
0 -51T0 N
expanded
50 50T0 M M MT1 N
T0 or T1: CAS
T0: CAS
0 0T0 N
owned
M = (50 + N) / 2
![Page 45: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/45.jpg)
45
Work-stealing tree - contention
![Page 46: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/46.jpg)
50
Work-stealing tree scheduling
1) find either a non-expanded, non-completed node2) if not found, terminate3) if not owned, steal and/or expand, and descend4) advance until node is completed or stolen5) go to 1)
![Page 47: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/47.jpg)
51
Work-stealing tree scheduling
1) find either a non-expanded, non-completed node2) if not found, terminate3) if not owned, steal and/or expand, and descend4) advance until node is completed or stolen5) go to 1)
1) find either a non-expanded, non-completed node
![Page 48: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/48.jpg)
52
Choosing the node to steal
Find first, in-order traversal
2 9
5
3
![Page 49: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/49.jpg)
53
Choosing the node to steal
Find first, in-order traversal
2 9
5
3
Catastrophic – a lot of stealing, huge trees
![Page 50: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/50.jpg)
54
Choosing the node to steal
Find first, in-order traversal Find first, random order traversal
2 9
5
3
2 9
5
3
Catastrophic – a lot of stealing, huge trees
![Page 51: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/51.jpg)
55
Choosing the node to steal
Find first, in-order traversal Find first, random order traversal
2 9
5
3
2 9
5
3
Catastrophic – a lot of stealing, huge trees
Works reasonably well.
![Page 52: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/52.jpg)
56
Choosing the node to steal
Find first, in-order traversal Find first, random order traversal Find most elements
2 9
5
3
2 9
5
3
2 9
5
3
Catastrophic – a lot of stealing, huge trees
Works reasonably well. Generates least nodes.Seems to be best.
![Page 53: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/53.jpg)
57
Comparison with fixed-size batching
![Page 54: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/54.jpg)
58
Comparison with fixed-size batching
![Page 55: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/55.jpg)
59
Comparison with task work-stealing
![Page 56: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/56.jpg)
60
Thank you!
Questions?
![Page 57: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/57.jpg)
61
Finding work
![Page 58: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads](https://reader036.vdocument.in/reader036/viewer/2022062501/56816583550346895dd82a1f/html5/thumbnails/58.jpg)
62
Other workloads