![Page 1: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/1.jpg)
Sound and Precise Analysis ofParallel Programs through
Schedule Specialization
Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng YangColumbia University
1
![Page 2: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/2.jpg)
2
Motivation
soundness (# of analyzed schedules / # of total schedules)
precision Total Schedules
AnalyzedSchedules
StaticAnalysis
DynamicAnalysis
AnalyzedSchedules
?
• Analyzing parallel programs is difficult.
![Page 3: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/3.jpg)
3
• Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime.
Schedule Specialization
soundness (# of analyzed schedules / # of total schedules)
precision Total Schedules
StaticAnalysis
DynamicAnalysis
AnalyzedSchedules
EnforcedSchedules
ScheduleSpecialization
![Page 4: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/4.jpg)
4
Enforcing Schedules Using Peregrine
• Deterministic multithreading– e.g. DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet
(ASPLOS ’10), Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11)
– Performance overhead• e.g. Kendo: 16%, Tern & Peregrine: 39.1%
• Peregrine– Record schedules, and reuse them on a wide range of
inputs.– Represent schedules explicitly.
![Page 5: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/5.jpg)
5
• Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime.
Schedule Specialization
soundness (# of analyzed schedules / # of total schedules)
precision
StaticAnalysis
DynamicAnalysis
AnalyzedSchedules
EnforcedSchedulesSchedule
Specialization
![Page 6: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/6.jpg)
6
Framework
• Extract control flow and data flow enforced by a set of schedules
Schedule
ScheduleSpecializationProgram
C/C++ programwith Pthread
Total order ofsynchronizations
SpecializedProgram
Extra def-usechains
![Page 7: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/7.jpg)
7
Outline
• Example• Control-Flow Specialization• Data-Flow Specialization• Results• Conclusion
![Page 8: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/8.jpg)
Running Example
int results[p_max];int global_id = 0;
int main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}
void *worker(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}
8
Thread 0 Thread 1 Thread 2
create
create
join
join
lock
unlocklock
unlock
Race-free?
![Page 9: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/9.jpg)
Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}
9
create
create
join
join
atoi
++i
create
return
i = 0
i < p
++i
join
i < p
i = 0
![Page 10: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/10.jpg)
Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}
10
create
create
join
join
atoi
++i
create
return
i = 0
i < p
++i
join
i < p
i = 0
atoi
create
i = 0
i < p
![Page 11: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/11.jpg)
Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}
11
create
create
join
join
atoi
++i
create
return
i = 0
i < p
++i
join
i < p
i = 0
create
atoi
i = 0
i < p
create
++i
create
i < p
![Page 12: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/12.jpg)
Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}
12
create
create
join
join
atoi
++i
create
return
i = 0
i < p
++i
join
i < p
i = 0
atoi
create
i = 0
i < p
++i
create
i < p
++i
i < p
join
i < p
i = 0
++i
join
i < p
++i
i < p
return
![Page 13: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/13.jpg)
13
Control-Flow Specialized Program
int main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); i = 0; // i < p == true pthread_create(&child[i], 0, worker.clone1, 0); ++i; // i < p == true pthread_create(&child[i], 0, worker.clone2, 0); ++i; // i < p == false i = 0; // i < p == true pthread_join(child[i], 0); ++i; // i < p == true pthread_join(child[i], 0); ++i; // i < p == false return 0;}
atoi
create
i = 0
i < p
++i
create
i < p
++i
i < p
join
i < p
i = 0
++i
join
i < p
++i
i < p
return
![Page 14: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/14.jpg)
14
More Challenges onControl-Flow Specialization
• Ambiguity
call
Caller Callee
call
S1
• A schedule has too many synchronizations
ret
S2
![Page 15: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/15.jpg)
Data-Flow Specialization
int global_id = 0;
void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}
void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}
15
Thread 0 Thread 1 Thread 2
create
create
join
join
lock
unlock
lock
unlock
global_id = 0
my_id = global_idglobal_id++
my_id = global_idglobal_id++
![Page 16: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/16.jpg)
Data-Flow Specialization
int global_id = 0;
void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}
void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}
16
Thread 0 Thread 1 Thread 2
create
create
join
join
lock
unlock
lock
unlock
global_id = 0
my_id = global_idglobal_id++
my_id = global_idglobal_id++
![Page 17: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/17.jpg)
Data-Flow Specialization
int global_id = 0;
void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}
void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}
17
Thread 0 Thread 1 Thread 2
create
create
join
join
lock
unlock
lock
unlock
global_id = 0
my_id = 0global_id = 1
my_id = global_idglobal_id++
![Page 18: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/18.jpg)
Data-Flow Specialization
int global_id = 0;
void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}
void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}
18
Thread 0 Thread 1 Thread 2
create
create
join
join
lock
unlock
lock
unlock
global_id = 0
my_id = 0global_id = 1
my_id = 1global_id = 2
![Page 19: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/19.jpg)
Data-Flow Specialization
int global_id = 0;
void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 1; pthread_mutex_unlock(&global_id_lock); results[0] = compute(0); return 0;}
void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 2; pthread_mutex_unlock(&global_id_lock); results[1] = compute(1); return 0;}
19
Thread 0 Thread 1 Thread 2
create
create
join
join
lock
unlock
lock
unlock
global_id = 0
my_id = 0global_id = 1
my_id = 1global_id = 2
![Page 20: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/20.jpg)
20
More Challenges onData-Flow Specialization
• Must/May alias analysis– global_id
• Reasoning about integers– results[0] = compute(0)– results[1] = compute(1)
• Many def-use chains
![Page 21: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/21.jpg)
21
Evaluation
• Applications– Static race detector– Alias analyzer– Path slicer
• Programs– PBZip2 1.1.5– aget 0.4.1– 8 programs in SPLASH2– 7 programs in PARSEC
![Page 22: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/22.jpg)
22
Program Original Specialized
aget 72 0
PBZip2 125 0
fft 96 0
blackscholes 3 0
swaptions 165 0
streamcluster 4 0
canneal 21 0
bodytrack 4 0
ferret 6 0
raytrace 215 0
cholesky 31 7
radix 53 14
water-spatial 2447 1799
lu-contig 18 18
barnes 370 369
water-nsquared 354 333
ocean 331 292
StaticRaceDetector
# of FalsePositives
![Page 23: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/23.jpg)
23
Program Original Specialized
aget 72 0
PBZip2 125 0
fft 96 0
blackscholes 3 0
swaptions 165 0
streamcluster 4 0
canneal 21 0
bodytrack 4 0
ferret 6 0
raytrace 215 0
cholesky 31 7
radix 53 14
water-spatial 2447 1799
lu-contig 18 18
barnes 370 369
water-nsquared 354 333
ocean 331 292
StaticRaceDetector
# of FalsePositives
![Page 24: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/24.jpg)
24
Program Original Specialized
aget 72 0
PBZip2 125 0
fft 96 0
blackscholes 3 0
swaptions 165 0
streamcluster 4 0
canneal 21 0
bodytrack 4 0
ferret 6 0
raytrace 215 0
cholesky 31 7
radix 53 14
water-spatial 2447 1799
lu-contig 18 18
barnes 370 369
water-nsquared 354 333
ocean 331 292
StaticRaceDetector
# of FalsePositives
![Page 25: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/25.jpg)
25
Program Original Specialized
aget 72 0
PBZip2 125 0
fft 96 0
blackscholes 3 0
swaptions 165 0
streamcluster 4 0
canneal 21 0
bodytrack 4 0
ferret 6 0
raytrace 215 0
cholesky 31 7
radix 53 14
water-spatial 2447 1799
lu-contig 18 18
barnes 370 369
water-nsquared 354 333
ocean 331 292
StaticRaceDetector
# of FalsePositives
![Page 26: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/26.jpg)
26
Static Race Detector: Harmful Races Detected
• 4 in aget• 2 in radix• 1 in fft
![Page 27: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/27.jpg)
27
Precision of Schedule-AwareAlias Analysis
![Page 28: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/28.jpg)
28
Precision of Schedule-AwareAlias Analysis
![Page 29: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/29.jpg)
29
Precision of Schedule-AwareAlias Analysis
![Page 30: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/30.jpg)
30
Conclusion and Future Work
• Designed and implemented schedule specialization framework– Analyzes the program over a small set of schedules– Enforces these schedules at runtime
• Built and evaluated three applications– Easy to use– Precise
• Future work– More applications– Similar specialization ideas on sequential programs
![Page 31: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/31.jpg)
31
Related Work
• Program analysis for parallel programs– Chord (PLDI ’06), RADAR (PLDI ’08), FastTrack (PLDI ’09)
• Slicing– Horgon (PLDI ’90), Bouncer (SOSP ’07), Jhala (PLDI ’05), Weiser
(PhD thesis), Zhang (PLDI ’04)• Deterministic multithreading
– DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet (ASPLOS ’10), Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11)
• Program specialization– Consel (POPL ’93), Gluck (ISPL ’95), Jørgensen (POPL ’92),
Nirkhe (POPL ’92), Reps (PDSPE ’96)
![Page 32: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/32.jpg)
32
Backup Slides
![Page 33: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/33.jpg)
33
Specialization Time
![Page 34: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/34.jpg)
34
Handling Races
• We do not assume data-race freedom. • We could if our only goal is optimization.
![Page 35: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/35.jpg)
35
Input Coverage
• Use runtime verification for the inputs not covered
• A small set of schedules can cover a wide range of inputs
![Page 36: Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University](https://reader035.vdocument.in/reader035/viewer/2022070415/5697c00f1a28abf838cca617/html5/thumbnails/36.jpg)
36