on-line parallel tomography

32
1 On-line Parallel Tomography Shava Smallen UCSD

Upload: tamira

Post on 22-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

On-line Parallel Tomography. Shava Smallen UCSD. Talk Outline. I) Introduction to On-line Parallel Tomography II) Tunable On-line Parallel Tomography III) User-directed application-level scheduler IV) Experiments V) Conclusion. What is tomography?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: On-line Parallel Tomography

1

On-line Parallel Tomography

Shava SmallenUCSD

Page 2: On-line Parallel Tomography

2

I) Introduction to On-line Parallel Tomography

II) Tunable On-line Parallel Tomography

III) User-directed application-level scheduler

IV) Experiments

V) Conclusion

Talk Outline

Page 3: On-line Parallel Tomography

3

What is tomography?

• A method for reconstructing the interior of an object from its projections

• At the National Center for Microscopy and Imaging Research (NCMIR), tomography is applied to electron microscopy to study specimens at the cellular and subcellular level

Page 4: On-line Parallel Tomography

4

Tomogram of spiny dendrite(Images courtesy of Steve Lamont)

Example

Page 5: On-line Parallel Tomography

5

Parallel Tomography at NCMIR

• Embarrassingly parallel

X

Y

slice

specimen

Z

scanlineprojection

projection

scanline

Page 6: On-line Parallel Tomography

6

NCMIR Usage Scenarios

Off-line parallel tomography (off-line PT)

– Data resides somewhere on secondary storage

– Single, high quality tomogram

– Reduce turnaround time

– Previous work (HCW’ 00)

On-line parallel tomography (on-line PT)

– Data streamed from the electron microscope

• long makespan, configuration errors, etc.

– Iteratively computed tomogram

– Soft real-time execution

Page 7: On-line Parallel Tomography

7

On-line PT

• Real-time feedback on quality of data acquisition1 ) First projection acquired from microscope2 ) Generate coarse tomogram3 ) Iteratively refine tomogram using subsequent

projections (refresh)• Update each voxel value • Size of tomogram is constant

Page 8: On-line Parallel Tomography

8

NCMIR Target Platform

• Multi-user, heterogenous resources– NCMIR cluster

• SGI Indigo2, SGI Octane, SUN ULTRA, SUN Enterprise

• IRIX, Solaris– Meteor cluster

• Pentium III dual proc• Linux, PBS

– Blue Horizon• AIX, Loadleveler, Maui Scheduler

network

Page 9: On-line Parallel Tomography

slices

preprocessor

ptomo

ptomo

ptomo

ptomo

ptomo

writer

On-line PT Architecture

projection

scanlines

tomogram

Page 10: On-line Parallel Tomography

10

On-line PT Design

1) Frame on-line parallel tomography as a tunable application– Resource limitations / dynamic– Availability of alternate configurations [Chang,et al]

• each configuration corresponds to different output quality and resource usage

2) Coupled with user-directed application-level scheduler (AppLeS)– adaptive scheduler– promote application performance

Page 11: On-line Parallel Tomography

11

On-line PT Configuration

• Triple: (f, r, su)• Reduction factor (f)

– Reduce resolution of data reduce both computation and communication

• Projections per refresh (r)– Reduce refinement frequency reduce

communication• Service Units - (su)

– Increase cost of execution increase computational power

Page 12: On-line Parallel Tomography

12

User Preferences

• Best configuration (f, r, su) = (1, 1, 0 )• Several possible configurations user

specifies bounds– projections should be at least size 256x256

• 1 f 4 or 1 f 8– user could tolerate up to a 10 minute time wait

• 1 r 13– reasonable upper bound

• 0 su (50 x acquisition period x c)

Page 13: On-line Parallel Tomography

13

User-directed

• Feasible?– Use dynamic load information– if work allocation found

• Better? – e.g.

1. (1, 6, 4) - best f2. (2, 2, 8) - good su/r3. (2, 1, 20) - best r

reduction factor

projections per refresh

service units

Page 14: On-line Parallel Tomography

generaterequest

displaytriples

adjustrequest

reviewtriples

processrequest

findwork

allocation

executeon-line PT

accepts one

rejects all

infeasible

feasible

User-directed AppLeS

User

User-directed AppLeS

Page 15: On-line Parallel Tomography

15

Triple Search

• Search parameter space– If triple satisfies constraints feasible

• Constrained optimization problem based on soft real-time execution– compute constraint– transfer constraint

• Heuristics to reduce search space– e.g. assume user will always choose (1,2,1)

over (1,2,4)

Page 16: On-line Parallel Tomography

16

Work Allocation

work allocation

transfer constraints

cost

user constraints

compute constraints

cpu availability

processor availability

ptomo-to-writer bandwidth

subnet-to-writer bandwidth

Multiple mixed-integer programs approx soln

Page 17: On-line Parallel Tomography

17

Experiments

• Impact of dynamic information on scheduler performance

• Usefulness of tunability Grid environments

• Scheduling latency

Page 18: On-line Parallel Tomography

18

Dynamic Information

• We fix the triple and let schedulers determine work allocation

Infinite bandwidth

Dynamic bandwidth

Dedicated cpu

wwa wwa+bw

Dynamic cpu

wwa+cpu AppLeS

Page 19: On-line Parallel Tomography

19

• Evaluate schedulers– Repeatibility – Long makespan– several resource environments

• Simgrid (Casanova [CCGrid’2001])– API for evaluating scheduling algorithms

• tasks• resources modeled using traces

– E.g. Parameter sweep applications [HCW’00]• Simtomo

Simulation

Page 20: On-line Parallel Tomography

20

relative refresh lateness

expected refresh period

actual refresh period

• Relative refresh lateness

Performance Metric

Page 21: On-line Parallel Tomography

21

NCMIR experiments

• Traces (8 machines)– 8 hour work day on March 8th, 2001

• Ran simulations throughout day at 10 minute intervals

8:00 am 4:00 pm

Page 22: On-line Parallel Tomography

22

Perfect Load Predictions

0 1 2 3 4 5 6 7 810

0

101

102

103

104

hours since 3/8/2001 - 8:00 PST

mea

n re

lativ

e re

fresh

late

ness

wwawwa+cpuwwa+bwAppLeS

Page 23: On-line Parallel Tomography

23

Imperfect Load Predictions

0 1 2 3 4 5 6 7 8100

101

102

103

104

hours since 3/8/2001 - 8:00 PST

mea

n re

lativ

e re

fresh

late

ness

wwawwa+cpuwwa+bwAppLeS

Page 24: On-line Parallel Tomography

24

Synthetic Grids

• Bandwidth predictibility– Average prediction error– pi {L, M, H}

– p1 p2 p3

• e.g. LMH

– 27 types– 2510 Grids

x 4 schedulers– 10,040 simulations

writer

cluster3

cluster2

cluster1

p1

p2

p3

Page 25: On-line Parallel Tomography

25

wwa wwa+cpu wwa+bw AppLeS 0

500

1000

1500

2000

2500

3000

scheduler

num

ber o

f run

s1st2nd3rd4th

Relative Scheduler Performance

705.89 658.91 127.10 1.07

Page 26: On-line Parallel Tomography

26

Partial Ordering

• Performance vs. bandwidth predictability• Grid predictibility

– Partial orders using p1 p2 p3

– Comparable/Not Comparable• e.g. HML is comparable to HLL• e.g. HLM is not comparable to LHM

• HHH, HHM, HMM, HLM, MLM, LLM, LLL

Page 27: On-line Parallel Tomography

27

Example Partial Order

HHH HHM HMM HLM MLM LLM LLL . 10

0

101

102

103

104

rela

tive

refre

sh la

tene

ss (s

econ

ds)

wwawwa+cpuwwa+bwAppLeS

Page 28: On-line Parallel Tomography

28

Tunability Experiments

• How useful is tunability?– variability

• Fixed topology– categorized traces

• L, M, H

– v1 v2 v3 v4 v5

– 243 Grid types cluster2

cluster1

writer

supercomputer

v2

v1

v3

v4

v5

Page 29: On-line Parallel Tomography

29

Tunability Experiments

• Run over a 2 day period– back-to-back– assume single user

model• f, r, su

• Set of triples chosen– T = {1,…,61}

02

46

8

05

10150

2

4

6

x 104

fr

su

Page 30: On-line Parallel Tomography

30

Tunability Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

fract

ion

of c

hang

es

parameters

frsu

• Count how many times a triple changed per 2-day simulation

• e.g.– 12.9%– 25.7%

Page 31: On-line Parallel Tomography

31

0 2 4 6 8 100

1000

2000

3000

4000

5000

6000

7000

seconds

num

ber o

f exp

erim

ents

Scheduling Latency

• Time to search for feasible triples• e.g.

– 88% under 1 sec– 63% under 1 sec

Page 32: On-line Parallel Tomography

32

Conclusions and Future Work

• Grid-enabled version of on-line parallel tomography– Tunable application

• Tunability is useful in Grid environments– User-directed AppLeS

• Importance of bandwidth predictability – e.g. rescheduling

• Scheduling latency is nominal

• Production use