2003/12/5 1 assisting technologies for program parallelization chikayama/taura lab. masakazu hayatsu

42
2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU [email protected]. jp

Upload: vincent-arnold

Post on 17-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

3 2003/12/5 Introduction Popularization of parallel computer  Commercial computer with very large # of processor  Low-end PC with 2-4 processor Performance  Progress of speedup of uni-processor is getting sluggish ⇒ Importance of a parallel program is increasing further

TRANSCRIPT

Page 1: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

2003/12/5 1

Assisting technologies for program parallelization

Chikayama/Taura Lab.Masakazu [email protected]

Page 2: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

22003/12/5

Agenda

Introduction Difficulty of Program Parallelization Assistant Tools for Program Parallelization

SUIF ExplorerS-CheckUrsa Minor

Conclusion

Page 3: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

32003/12/5

Introduction

Popularization of parallel computerCommercial computer with very large # of proces

sorLow-end PC with 2-4 processor

PerformanceProgress of speedup of uni-processor is getting s

luggish

⇒Importance of a parallel program

is increasing further

Page 4: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

42003/12/5

Difficulty of Program Parallelization

Dependencydead lockdata race

Avoid these problem

A AB B

X100

1

100?1?

Page 5: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

52003/12/5

Automatic Parallelization

Low performanceParallelization

technique is fragileKnowledge out of

code is often required

:for(i=0; i<N; i++){ a[f(i)] = 0; //A a[g(i)] = 1; // B}

:

×?

Page 6: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

62003/12/5

Development ProcessDesign & Improve Model

Finding Problems

Manually Optimizing Program

Run

Done

Speedup Evaluation Validity Check○

×

Data Race, Dead Lock …

Page 7: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

72003/12/5

Problem of Manual Parallelization

(define (RayTracing ViewPoint Vscan nref energy rgb) (if (<= nref 4) (let ((crashed? (tracer ViewPoint Vscan))) ;crashed ?     (if (and (not crashed?) (!= nref 0))     (let* ((hl0       (fcsyn (f+ (f* (vector-ref Vscan 0) (vector-ref Light 0))               (f* (vector-ref Vscan 1) (vector-ref Light 1))               (f* (vector-ref Vscan 2) (vector-ref Light 2)))))       (hl (if (f< hl0 0.0) 0.0 hl0)) (ihl (f* hl hl hl energy (car beam)))) (begin     (vector-set! rgb 0 (f+ (vector-ref rgb 0) ihl))     (vector-set! rgb 1 (f+ (vector-ref rgb 1) ihl))     (vector-set! rgb 2 (f+ (vector-ref rgb 2) ihl)))))   (if crashed?   (let* ((P (cdr crashed?)) ;intersection point      (m (car crashed?)) ;crashed object (NV (Get-NVector m Vscan P)))       (let* ((br (fcsyn (f+ (f* (vector-ref NV 0) (vector-ref Light 0))

    (f* (vector-ref NV 1) (vector-ref Light 1))    (f* (vector-ref NV 2) (vector-ref Light 2)))))

        (br1 (if (f< br 0.0) 0.0 br))          (bright (if (and (car sh) (Shadow-Check-One-Or-Matrix (car or-Net) P))

0.0 (f* (f+ br1 0.2) energy (vector-ref m 11))))) (begin   (utexture m P)          (vector-set! rgb 0 (f+ (vector-ref rgb 0) (f* bright (vector-ref m 13))))         (vector-set! rgb 1 (f+ (vector-ref rgb 1) (f* bright (vector-ref m 14))))         (vector-set! rgb 2 (f+ (vector-ref rgb 2) (f* bright (vector-ref m 15))))

User must fully understand many lines of code

It is prone tocause an error

Page 8: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

82003/12/5

Important factor for assistant tool

Assist for program parallelizationCombine the benefit of automatic/manual

automatic :can extract information by the

numbers manual :

can use high level information

Extract information, and highlight important information

Page 9: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

92003/12/5

Extraction of parallelism;; quick : v — array to be sorted left, right — renge for sort(define (quick v left right) (if (>= left right) v (let ( (new-left left) (new-right right) (pivot (vector-ref v (floor (/ (+ left right) 2)))) ) (do () ((> new-left new-right)) (do () ((>= (vector-ref v new-left) pivot)) (set! new-left (+ new-left 1))) (do () ((<= (vector-ref v new-right) pivot)) (set! new-right (- new-right 1))) (if (<= new-left new-right) (begin (swap v new-left new-right) (set! new-left (+ new-left 1)) (set! new-right (- new-right 1)) ))) (begin (quick v left new-right) (quick v new-left right) ))))

(quick #(4 5 3 1 4 0 5 6 ) 0 7)

;; quick : v — array to be sorted left, right — range for sort(define (quick v left right) (if (>= left right) v (let ( (new-left left) (new-right right) (pivot (vector-ref v (floor (/ (+ left right) 2)))) ) (do () ((> new-left new-right)) (do () ((>= (vector-ref v new-left) pivot)) (set! new-left (+ new-left 1))) (do () ((<= (vector-ref v new-right) pivot)) (set! new-right (- new-right 1))) (if (<= new-left new-right) (begin (swap v new-left new-right) (set! new-left (+ new-left 1)) (set! new-right ( - new-right 1)) ))) (begin (quick v left new-right) (quick v new-left right) ))))

(quick #(4 5 3 1 4 0 5 6 ) 0 7)

( 0R-05-01, 0R-05-02, 0R-05-03 )( 0R-0e-01, 0R-0e-02 )( 0R-0t-02, 0R-0t-03 )( 0R-0w-01, 0R-0w-02 )

Candidate for parallelization

Page 10: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

102003/12/5

notice

Different approachOur work: based on dependency analysisToday’s survey: based on profile data

Profile data? Isn't it enough if execution time is known?

Page 11: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

112003/12/5

Difficulty in Tuning a Parallel Program (1/2) Coverage

Percentage of total execution time spent in the parallel regions

Amdahl’s law

Granularity Average length of computation

between synchronizations Overhead of communication,

synchronization

10%100

parallel region

Page 12: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

122003/12/5

Difficulty in Tuning a Parallel Program (2/2) Critical Path

Top resource-using code segment

Simple consumption of resources does not mean that there is a corresponding potential for improvement

Page 13: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

132003/12/5

Assistant Tool for Program Parallelization

SUIF ExplorerCoverage and Granularity

S-CheckEffect of change on allover performance

Ursa MinorExperienced programmer's knowledge

Page 14: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

142003/12/5

Assistant Tool for Program Parallelization

SUIF ExplorerCoverage and Granularity

S-CheckEffect of change on allover performance

Ursa MinorExperienced programmer's knowledge

Page 15: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

152003/12/5

SUIF Explorer [Liao, et al 1999]

Objective Identify the important loops

Rules of thumbMost of a program’s execution time is spent on

a small percentage of the codeMost of a program’s execution time is spent on

loops

Page 16: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

162003/12/5

The SUIF Explorer System

ParallelizingCompiler

ExecutionAnalyzers

ParallelizationGuru

User

SequentialProgram

RivetVisualizer

1. Automaticparallelization

3.Guidance to improvingprogram performance

2.Collecting profile &dynamic dependences

Page 17: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

172003/12/5

The Parallelization Guru (1/2)

Parallelization guidanceThe coverage and granularity

Updates the information as new loops are parallelized

A list of loops to parallelize Sorted in order of execution time Have no I/O and are not nested under some

parallel loops

Dependence information on each loop

Page 18: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

182003/12/5

The Parallelization Guru (2/2)

User interactionStarts with the loop at the top of the list If (loop have many dependence)

user don’t choose to attempt else

User then determines if the static dependence can be ignored if an array can be privatized …etc. using program slice

Page 19: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

192003/12/5

program slice

contribute to the value

Page 20: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

202003/12/5

Page 21: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

212003/12/5

Page 22: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

222003/12/5

The Parallelization Guru

CommentPerformance data & Dependency information

are related closely ⇒ it cut down development cost

It is applicable only to loops

Page 23: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

232003/12/5

Assistant Tool for Program Parallelization

SUIF ExplorerCoverage and Granularity

S-CheckEffect of change on allover performance

Ursa MinorExperienced programmer's knowledge

Page 24: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

242003/12/5

S-Check [Snelick 1997]

Objective Identify the parts of the program that changes

to them will significantly improve overall performance

Effect predictionDetermine the effect of changes in the code

without actually making the changes

Page 25: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

252003/12/5

Sensitive Checker

Insert “delay” into segments of a parallel program, calculate sensitivity to perturbation

AssumptionA program code segment is

highly sensitive to slight perturbations comparable segment improvements⇒ will boost performance correspondingly

Page 26: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

262003/12/5

Program Model

Code = Transfer Function Taylor expansion

βj := indicating how sensitive execution is βi,j := interactions between code

...),...,,(1

1

2,,21

k

j

k

ji

k

jjijijjk IXXXXR

Page 27: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

272003/12/5

while(x>y){ // A delay(a);}delay(b); send(…); // B ・・・・・・do_computation{delay(c); …}; // C

Insert delays1:ON / 0:OFF

・・・・・・ delay(1) ・・・・・・ delay(1) ・・・・・・ delay(0)

・・・・・・ delay(0) ・・・・・・ delay(1) ・・・・・・ delay(0)

・・・・・・ delay(1) ・・・・・・ delay(0) ・・・・・・ delay(0)

・・・

・・・・・・ delay(0) ・・・・・・ delay(0) ・・・・・・ delay(0)

Analyze ResultsSolve for Effects

Effects Source 0.44 A 4.54 B 0.07 AB 1.21 C 0.02 BC 0.34 AC 0.00 ABC

while(x>y){ }send(…); ・・・・・・do_computation{…};

original parallel program

Mark possiblebottlenecks

Generate & Runnumerous versions

of program

// A

// B

// C

Page 28: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

282003/12/5

UserInteract (1/3)

Test code locations are selected manually or automatically

Information provided from profiler

•programming constructs (ex. while, for) •certain library function call (ex. barrier(), send())

Page 29: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

292003/12/5

User Interact(2/3)

Set the parameter• delay perturbation patterns• delay value

Trade off (info vs # of run)

Page 30: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

302003/12/5

UserInteract(3/3)

Higher effect code is more likely to be a bottleneckDependency is not dealt with

Page 31: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

312003/12/5

S-Check

Comment Identify the program segment linking directly

to a performanceKnowledge about the program is required in

order to mark possible bottleneckscode size get bigger, sensitivity test take

longer timeDependence information is not available

Page 32: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

322003/12/5

Assistant Tool for Program Parallelization

SUIF ExplorerCoverage and Granularity

S-CheckEffect of change on allover performance

Ursa MinorKnowledge of experienced programmer's

Page 33: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

332003/12/5

Ursa Minor [Kim, et al. 2000]

Objective× stop at pointing to problematic code

〇 present with possible causes and solutions

Transfer knowledge to novice programmer from experienced programmer

Page 34: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

342003/12/5

UrsaMinor System

DatabaseManager

GUI Manager

MerlinPerformance

Adviser

User

ParallelProgram

Table View

Analyze problemSuggest solution

Database

StaticData

DynamicData

Structure View

Store analyzed data,Map file, etc.

Import/ExportData files fromPolaris or other

Page 35: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

352003/12/5

Merlin Performance Advisor

Knowledge databaseknowledge on diagnosis and solutionsTransfer programming experience from

experts to new users (with “MAP” file) Performance model Architecture … etc.

Page 36: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

362003/12/5

MerlinSymptom ⇒Diagnostic

Suggestions

Page 37: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

372003/12/5

Advisor Map (1/2) Advisor Map

Problem Domain General performance problems from the

viewpoint of programmersDiagnostics Domain

Possible causes of these problemsSolution Group

Possible remedies

Page 38: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

382003/12/5

Advisor Map (2/2)

Problem Diagnostics Solution

poor speedup

speedup < 1 Serialization

# of stride-1 accesses < # of non stride-1 accesses

Loop Interchange

speedup < 2.5 Loop Fusion

large # of stalls

::

::

Page 39: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

392003/12/5

Expression Evaluator Basic Spreadsheet Operations

Numeric Functions: NEG, ADD, SPDUP, PERCO, ARVG, etc.

Relational Functions: EQ, NE, etc.Query Functions:

PARALLEL, HASIO, HASCALL, HASDEP, etc.Logical Functions: AND, OR, etc.

Page 40: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

402003/12/5

Merlin

CommentThe idea which progressed further rather than

indication of a bottleneckWho write the “MAP”?The effect of this technology depends on

quality of the MAP

Page 41: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

412003/12/5

Comparison

SUIF Explorer vs. S-CheckNo configuration, dependence informationEfficiency?

Two vs. Ursa MinorPracticalNot kind to beginners

Page 42: 2003/12/5 1 Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU

422003/12/5

Conclusion

Several approach to guide the user with smart information

Future work Integration

Profiler and Dependence AnalyzerPortability

Different architecture, OS, performance