2003/12/5 1 assisting technologies for program parallelization chikayama/taura lab. masakazu hayatsu

Post on 17-Jan-2018

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

3 2003/12/5 Introduction Popularization of parallel computer  Commercial computer with very large # of processor  Low-end PC with 2-4 processor Performance  Progress of speedup of uni-processor is getting sluggish ⇒ Importance of a parallel program is increasing further

TRANSCRIPT

2003/12/5 1

Assisting technologies for program parallelization

Chikayama/Taura Lab.Masakazu HAYATSUhayatsu@logos.t.u-tokyo.ac.jp

22003/12/5

Agenda

Introduction Difficulty of Program Parallelization Assistant Tools for Program Parallelization

SUIF ExplorerS-CheckUrsa Minor

Conclusion

32003/12/5

Introduction

Popularization of parallel computerCommercial computer with very large # of proces

sorLow-end PC with 2-4 processor

PerformanceProgress of speedup of uni-processor is getting s

luggish

⇒Importance of a parallel program

is increasing further

42003/12/5

Difficulty of Program Parallelization

Dependencydead lockdata race

Avoid these problem

A AB B

X100

1

100?1?

52003/12/5

Automatic Parallelization

Low performanceParallelization

technique is fragileKnowledge out of

code is often required

:for(i=0; i<N; i++){ a[f(i)] = 0; //A a[g(i)] = 1; // B}

:

×?

62003/12/5

Development ProcessDesign & Improve Model

Finding Problems

Manually Optimizing Program

Run

Done

Speedup Evaluation Validity Check○

×

Data Race, Dead Lock …

72003/12/5

Problem of Manual Parallelization

(define (RayTracing ViewPoint Vscan nref energy rgb) (if (<= nref 4) (let ((crashed? (tracer ViewPoint Vscan))) ;crashed ?     (if (and (not crashed?) (!= nref 0))     (let* ((hl0       (fcsyn (f+ (f* (vector-ref Vscan 0) (vector-ref Light 0))               (f* (vector-ref Vscan 1) (vector-ref Light 1))               (f* (vector-ref Vscan 2) (vector-ref Light 2)))))       (hl (if (f< hl0 0.0) 0.0 hl0)) (ihl (f* hl hl hl energy (car beam)))) (begin     (vector-set! rgb 0 (f+ (vector-ref rgb 0) ihl))     (vector-set! rgb 1 (f+ (vector-ref rgb 1) ihl))     (vector-set! rgb 2 (f+ (vector-ref rgb 2) ihl)))))   (if crashed?   (let* ((P (cdr crashed?)) ;intersection point      (m (car crashed?)) ;crashed object (NV (Get-NVector m Vscan P)))       (let* ((br (fcsyn (f+ (f* (vector-ref NV 0) (vector-ref Light 0))

    (f* (vector-ref NV 1) (vector-ref Light 1))    (f* (vector-ref NV 2) (vector-ref Light 2)))))

        (br1 (if (f< br 0.0) 0.0 br))          (bright (if (and (car sh) (Shadow-Check-One-Or-Matrix (car or-Net) P))

0.0 (f* (f+ br1 0.2) energy (vector-ref m 11))))) (begin   (utexture m P)          (vector-set! rgb 0 (f+ (vector-ref rgb 0) (f* bright (vector-ref m 13))))         (vector-set! rgb 1 (f+ (vector-ref rgb 1) (f* bright (vector-ref m 14))))         (vector-set! rgb 2 (f+ (vector-ref rgb 2) (f* bright (vector-ref m 15))))

User must fully understand many lines of code

It is prone tocause an error

82003/12/5

Important factor for assistant tool

Assist for program parallelizationCombine the benefit of automatic/manual

automatic :can extract information by the

numbers manual :

can use high level information

Extract information, and highlight important information

92003/12/5

Extraction of parallelism;; quick : v — array to be sorted left, right — renge for sort(define (quick v left right) (if (>= left right) v (let ( (new-left left) (new-right right) (pivot (vector-ref v (floor (/ (+ left right) 2)))) ) (do () ((> new-left new-right)) (do () ((>= (vector-ref v new-left) pivot)) (set! new-left (+ new-left 1))) (do () ((<= (vector-ref v new-right) pivot)) (set! new-right (- new-right 1))) (if (<= new-left new-right) (begin (swap v new-left new-right) (set! new-left (+ new-left 1)) (set! new-right (- new-right 1)) ))) (begin (quick v left new-right) (quick v new-left right) ))))

(quick #(4 5 3 1 4 0 5 6 ) 0 7)

;; quick : v — array to be sorted left, right — range for sort(define (quick v left right) (if (>= left right) v (let ( (new-left left) (new-right right) (pivot (vector-ref v (floor (/ (+ left right) 2)))) ) (do () ((> new-left new-right)) (do () ((>= (vector-ref v new-left) pivot)) (set! new-left (+ new-left 1))) (do () ((<= (vector-ref v new-right) pivot)) (set! new-right (- new-right 1))) (if (<= new-left new-right) (begin (swap v new-left new-right) (set! new-left (+ new-left 1)) (set! new-right ( - new-right 1)) ))) (begin (quick v left new-right) (quick v new-left right) ))))

(quick #(4 5 3 1 4 0 5 6 ) 0 7)

( 0R-05-01, 0R-05-02, 0R-05-03 )( 0R-0e-01, 0R-0e-02 )( 0R-0t-02, 0R-0t-03 )( 0R-0w-01, 0R-0w-02 )

Candidate for parallelization

102003/12/5

notice

Different approachOur work: based on dependency analysisToday’s survey: based on profile data

Profile data? Isn't it enough if execution time is known?

112003/12/5

Difficulty in Tuning a Parallel Program (1/2) Coverage

Percentage of total execution time spent in the parallel regions

Amdahl’s law

Granularity Average length of computation

between synchronizations Overhead of communication,

synchronization

10%100

parallel region

122003/12/5

Difficulty in Tuning a Parallel Program (2/2) Critical Path

Top resource-using code segment

Simple consumption of resources does not mean that there is a corresponding potential for improvement

132003/12/5

Assistant Tool for Program Parallelization

SUIF ExplorerCoverage and Granularity

S-CheckEffect of change on allover performance

Ursa MinorExperienced programmer's knowledge

142003/12/5

Assistant Tool for Program Parallelization

SUIF ExplorerCoverage and Granularity

S-CheckEffect of change on allover performance

Ursa MinorExperienced programmer's knowledge

152003/12/5

SUIF Explorer [Liao, et al 1999]

Objective Identify the important loops

Rules of thumbMost of a program’s execution time is spent on

a small percentage of the codeMost of a program’s execution time is spent on

loops

162003/12/5

The SUIF Explorer System

ParallelizingCompiler

ExecutionAnalyzers

ParallelizationGuru

User

SequentialProgram

RivetVisualizer

1. Automaticparallelization

3.Guidance to improvingprogram performance

2.Collecting profile &dynamic dependences

172003/12/5

The Parallelization Guru (1/2)

Parallelization guidanceThe coverage and granularity

Updates the information as new loops are parallelized

A list of loops to parallelize Sorted in order of execution time Have no I/O and are not nested under some

parallel loops

Dependence information on each loop

182003/12/5

The Parallelization Guru (2/2)

User interactionStarts with the loop at the top of the list If (loop have many dependence)

user don’t choose to attempt else

User then determines if the static dependence can be ignored if an array can be privatized …etc. using program slice

192003/12/5

program slice

contribute to the value

202003/12/5

212003/12/5

222003/12/5

The Parallelization Guru

CommentPerformance data & Dependency information

are related closely ⇒ it cut down development cost

It is applicable only to loops

232003/12/5

Assistant Tool for Program Parallelization

SUIF ExplorerCoverage and Granularity

S-CheckEffect of change on allover performance

Ursa MinorExperienced programmer's knowledge

242003/12/5

S-Check [Snelick 1997]

Objective Identify the parts of the program that changes

to them will significantly improve overall performance

Effect predictionDetermine the effect of changes in the code

without actually making the changes

252003/12/5

Sensitive Checker

Insert “delay” into segments of a parallel program, calculate sensitivity to perturbation

AssumptionA program code segment is

highly sensitive to slight perturbations comparable segment improvements⇒ will boost performance correspondingly

262003/12/5

Program Model

Code = Transfer Function Taylor expansion

βj := indicating how sensitive execution is βi,j := interactions between code

...),...,,(1

1

2,,21

k

j

k

ji

k

jjijijjk IXXXXR

272003/12/5

while(x>y){ // A delay(a);}delay(b); send(…); // B ・・・・・・do_computation{delay(c); …}; // C

Insert delays1:ON / 0:OFF

・・・・・・ delay(1) ・・・・・・ delay(1) ・・・・・・ delay(0)

・・・・・・ delay(0) ・・・・・・ delay(1) ・・・・・・ delay(0)

・・・・・・ delay(1) ・・・・・・ delay(0) ・・・・・・ delay(0)

・・・

・・・・・・ delay(0) ・・・・・・ delay(0) ・・・・・・ delay(0)

Analyze ResultsSolve for Effects

Effects Source 0.44 A 4.54 B 0.07 AB 1.21 C 0.02 BC 0.34 AC 0.00 ABC

while(x>y){ }send(…); ・・・・・・do_computation{…};

original parallel program

Mark possiblebottlenecks

Generate & Runnumerous versions

of program

// A

// B

// C

282003/12/5

UserInteract (1/3)

Test code locations are selected manually or automatically

Information provided from profiler

•programming constructs (ex. while, for) •certain library function call (ex. barrier(), send())

292003/12/5

User Interact(2/3)

Set the parameter• delay perturbation patterns• delay value

Trade off (info vs # of run)

302003/12/5

UserInteract(3/3)

Higher effect code is more likely to be a bottleneckDependency is not dealt with

312003/12/5

S-Check

Comment Identify the program segment linking directly

to a performanceKnowledge about the program is required in

order to mark possible bottleneckscode size get bigger, sensitivity test take

longer timeDependence information is not available

322003/12/5

Assistant Tool for Program Parallelization

SUIF ExplorerCoverage and Granularity

S-CheckEffect of change on allover performance

Ursa MinorKnowledge of experienced programmer's

332003/12/5

Ursa Minor [Kim, et al. 2000]

Objective× stop at pointing to problematic code

〇 present with possible causes and solutions

Transfer knowledge to novice programmer from experienced programmer

342003/12/5

UrsaMinor System

DatabaseManager

GUI Manager

MerlinPerformance

Adviser

User

ParallelProgram

Table View

Analyze problemSuggest solution

Database

StaticData

DynamicData

Structure View

Store analyzed data,Map file, etc.

Import/ExportData files fromPolaris or other

352003/12/5

Merlin Performance Advisor

Knowledge databaseknowledge on diagnosis and solutionsTransfer programming experience from

experts to new users (with “MAP” file) Performance model Architecture … etc.

362003/12/5

MerlinSymptom ⇒Diagnostic

Suggestions

372003/12/5

Advisor Map (1/2) Advisor Map

Problem Domain General performance problems from the

viewpoint of programmersDiagnostics Domain

Possible causes of these problemsSolution Group

Possible remedies

382003/12/5

Advisor Map (2/2)

Problem Diagnostics Solution

poor speedup

speedup < 1 Serialization

# of stride-1 accesses < # of non stride-1 accesses

Loop Interchange

speedup < 2.5 Loop Fusion

large # of stalls

::

::

392003/12/5

Expression Evaluator Basic Spreadsheet Operations

Numeric Functions: NEG, ADD, SPDUP, PERCO, ARVG, etc.

Relational Functions: EQ, NE, etc.Query Functions:

PARALLEL, HASIO, HASCALL, HASDEP, etc.Logical Functions: AND, OR, etc.

402003/12/5

Merlin

CommentThe idea which progressed further rather than

indication of a bottleneckWho write the “MAP”?The effect of this technology depends on

quality of the MAP

412003/12/5

Comparison

SUIF Explorer vs. S-CheckNo configuration, dependence informationEfficiency?

Two vs. Ursa MinorPracticalNot kind to beginners

422003/12/5

Conclusion

Several approach to guide the user with smart information

Future work Integration

Profiler and Dependence AnalyzerPortability

Different architecture, OS, performance

top related