sound and precise analysis of parallel programs through schedule specialization

Sound and Precise Analysis ofParallel Programs through

Schedule Specialization

Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng YangColumbia University

Motivation

soundness (# of analyzed schedules / # of total schedules)

precision Total Schedules

AnalyzedSchedules

StaticAnalysis

DynamicAnalysis

AnalyzedSchedules

• Analyzing parallel programs is difficult.

• Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime.

precision Total Schedules

StaticAnalysis

DynamicAnalysis

AnalyzedSchedules

EnforcedSchedules

ScheduleSpecialization

Enforcing Schedules Using Peregrine

• Deterministic multithreading– e.g. DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet

(ASPLOS ’10), Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11)

– Performance overhead• e.g. Kendo: 16%, Tern & Peregrine: 39.1%

• Peregrine– Record schedules, and reuse them on a wide range of

inputs.– Represent schedules explicitly.

• Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime.

precision

StaticAnalysis

DynamicAnalysis

AnalyzedSchedules

EnforcedSchedulesSchedule

Specialization

Framework

• Extract control flow and data flow enforced by a set of schedules

Schedule

ScheduleSpecializationProgram

C/C++ programwith Pthread

Total order ofsynchronizations

SpecializedProgram

Extra def-usechains

Outline

• Example• Control-Flow Specialization• Data-Flow Specialization• Results• Conclusion

Running Example

int results[p_max];int global_id = 0;

int main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

void *worker(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

Thread 0 Thread 1 Thread 2

create

unlocklock

unlock

Race-free?

Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

create

return

create

return

create

return

create

return

create

return

Control-Flow Specialized Program

int main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); i = 0; // i < p == true pthread_create(&child[i], 0, worker.clone1, 0); ++i; // i < p == true pthread_create(&child[i], 0, worker.clone2, 0); ++i; // i < p == false i = 0; // i < p == true pthread_join(child[i], 0); ++i; // i < p == true pthread_join(child[i], 0); ++i; // i < p == false return 0;}

create

return

More Challenges onControl-Flow Specialization

• Ambiguity

Caller Callee

• A schedule has too many synchronizations

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

create

unlock

global_id = 0

my_id = global_idglobal_id++

int global_id = 0;

create

unlock

global_id = 0

int global_id = 0;

create

unlock

global_id = 0

my_id = 0global_id = 1

int global_id = 0;

create

unlock

global_id = 0

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 1; pthread_mutex_unlock(&global_id_lock); results[0] = compute(0); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 2; pthread_mutex_unlock(&global_id_lock); results[1] = compute(1); return 0;}

create

unlock

global_id = 0

More Challenges onData-Flow Specialization

• Must/May alias analysis– global_id

• Reasoning about integers– results[0] = compute(0)– results[1] = compute(1)

• Many def-use chains

Evaluation

• Applications– Static race detector– Alias analyzer– Path slicer

• Programs– PBZip2 1.1.5– aget 0.4.1– 8 programs in SPLASH2– 7 programs in PARSEC

Program Original Specialized

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

water-spatial 2447 1799

lu-contig 18 18

barnes 370 369

water-nsquared 354 333

ocean 331 292

StaticRaceDetector

# of FalsePositives

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

lu-contig 18 18

barnes 370 369

ocean 331 292

StaticRaceDetector

# of FalsePositives

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

lu-contig 18 18

barnes 370 369

ocean 331 292

StaticRaceDetector

# of FalsePositives

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

lu-contig 18 18

barnes 370 369

ocean 331 292

StaticRaceDetector

# of FalsePositives

Static Race Detector: Harmful Races Detected

• 4 in aget• 2 in radix• 1 in fft

Precision of Schedule-AwareAlias Analysis

Conclusion and Future Work

• Designed and implemented schedule specialization framework– Analyzes the program over a small set of schedules– Enforces these schedules at runtime

• Built and evaluated three applications– Easy to use– Precise

• Future work– More applications– Similar specialization ideas on sequential programs

Related Work• Program analysis for parallel programs

– Chord (PLDI ’06), RADAR (PLDI ’08), FastTrack (PLDI ’09)• Slicing

– Horgon (PLDI ’90), Bouncer (SOSP ’07), Jhala (PLDI ’05), Weiser (PhD thesis), Zhang (PLDI ’04)

• Deterministic multithreading– DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet (ASPLOS ’10),

Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11)• Program specialization

– Consel (POPL ’93), Gluck (ISPL ’95), Jørgensen (POPL ’92), Nirkhe (POPL ’92), Reps (PDSPE ’96)

Backup Slides

Specialization Time

Handling Races

• We do not assume data-race freedom. • We could if our only goal is optimization.

Input Coverage

• Use runtime verification for the inputs not covered

• A small set of schedules can cover a wide range of inputs

sound and precise analysis of parallel programs through schedule specialization

Documents

precise data leads to precise decisions

specialization in marketing & political communications...

aas specialization

globally precise-restartable execution of parallel programs

cerebral specialization

brain specialization

cerebral specialization 1 cerebral specialization during...

design specialization

speculative execution of parallel programs with precise...

scm specialization

cloud object storage | store & retrieve data...

sound and precise analysis of parallel programs through...

specialization module

specialization 2

cell specialization in multicellular organisms results...

manual specialization

academic program requirements - burmanu.ca 16 17... ·...

the perfect practice in 50 words - flaggmgmt.com › ny ›...

tourism specialization and economic development: evidence...

scientific revolutions, specialization and the...