rassul ayani 1 performance of parallel and distributed systems what is the purpose of measurement? ...

Rassul Ayani 1

Performance of parallel and distributed systems

What is the purpose of measurement?To evaluate a system (or an architecture)To compare two or more systemsTo compare different algorithms

metric to be used?speedup

Rassul Ayani 2

Workload-Driven Evaluation

Approach Run a workload (trace) and measure performance of the system

Traces Real trace Synthetic trace

Other issues How representative is the workload?

Rassul Ayani 3

Type of systems

For existing systems Run workload and evaluate performance of the system Problem: Is the workload representative?

For future systems (an architectural idea): Develop a simulator of the system run workload and evaluate the system Problem:

• Developing a simulator is difficult and expensive• How do define system parameters, such as memory access time and

communication cost?

Rassul Ayani 4

Measuring Performance

Performance metric most important to end userPerformance =Work / Time unit

Performance improvement due to parallelism

Time(1)Time(p)

Speedup=

Rassul Ayani 5

Performance evaluation of a parallel computer

Speedup(p) = Time(1) / Time(p)

What is Time(1)?

1. Parallel program on one processor of parallel machine?

2. A sequential algorithm on one processor of the parallel machine?

3. “Best” sequential program on one processor of the parallel machine?

4. “Best” sequential program on agreed-upon standard machine?

Which one is reasonable?

Rassul Ayani 6

Speedup

What is Time(p)?The time needed by the parallel machine to run the same workload?Is it fair?How does the size affects our measurement?

Rassul Ayani 7

Example 1: Our experimence Parallel simulation of Multistage Interconnection Network (MIN)

d: number of stages

n: number of nodes

n= (d+1)* 2 d

Rassul Ayani 8

MIN on CM2

1

10

100

1000

10000

10 11 12 13 14 15 16

number of stages

spee

dup

Size

11

12

13

14

15

speedup

40

45

70

1600

2200

Speedup of MIN on CM-2Speedup=T(1)/T(p), where T(1)=execution time of sequential simulator on a sun sparcT(p)=execution time of parallel simulator on CM-2 with 8k processors

Rassul Ayani 9

Why problem size is important?

The problem size is too small:May be appropriate for small machine, but not for the parallel

machine Not enough work for the PM Parallelism overheads begin to dominate benefits for the PM

• Load imbalance • Communication to computation ratio

May even achieve slowdownsDoesn’t reflect real usage, and inappropriate for large machines

Can exaggerate benefits of architectural improvements, especially when measured as percentage improvement in performance

Rassul Ayani 10

Size is too large

May not “fit” in small machine Can’t run Thrashing to disk Working set doesn’t fit in cache

May lead to super linear speedup

What is the right size?How do we find the right size?

Rassul Ayani 11

Scaling: Example 2 Small and big equation solvers on SGI Origin2000

(fom Parallel Computer Architecture, Culler & Singh)

Number of processors Number of processors

Spe

edup

Spe

edup

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 310

5

10

15

20

25

30 Ideal Ocean: 258 x 258

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 310

5

10

15

20

25

30

35

40

45

50 Grid solver: 12 K x 12 K Ideal

Rassul Ayani 12

Scaling issues

Important issuesReasonable problem sizeScaling problem sizeScaling machine size

ExampleConsider a dispatcher based cluster and compare three load

balancing algorithms Round Robin (RR) Least connection (LC) first Least loaded first (LL)

Rassul Ayani 13

Scalin: example Dispatcher based web server

Rassul Ayani 14

Determine the problem size

0

500

1000

1500

2000

2500

3000

3500

25 50 75 100 125 150 175 200 225 250 300 350 400

arrival rate (requests/second)

aver

age

wai

ting

time

(ms)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

aver

age

serv

er u

tiliz

atio

n

Rassul Ayani 15

Scale problem size, but keep machine size fixed

arrival rate

(requests/sec)

average waiting time (ms)

average response time (ms)

average utilization

Baseline

RR LCBaselin

eRR LC

Baseline

RR LC

250 0.2 10.5 0.9 3.8 14.1 4.50.226

0.001

0.226

0.002

0.230.001

500 1.8 32.99 3.9 5.4 36.6 7.50.453

0.002

0.453

0.003

0.453

0.002

750 49.5 127.553.1

53.2 131.156.7

0.679

0.001

0.679

0.004

0.680

0.001

1000 849.51112.

3853.0

853.11115.

9856.7

0.906

0.00

0.905

0.006

0.905

0.001

12507008

47011

870085

700877012

170088

0.998

0.001

0.991

0.006

0.997

0.001

Table 3: Performance of a 4-Server Cluster

Rassul Ayani 16

Scaling problem size (cont’d)

1

10

100

250 500 750 1000 1250

arrival rate (requests/second)

norm

aliz

ed w

aitin

g tim

e RR

LC

Conclusion: for low arrival rate LC is much better than RR, but for high arrival rate both converge to the BL algorithm

Is it a fare conclusion?

Rassul Ayani 17

Scaling problem and machine size

no. of servers

arrival rate

average response time (ms)average waiting time (ms) average utilization

baseline

RR LCbaselin

eRR LC

baseline

RR LC

1 250 3467 3467 0.906

2 500 1722.0 1913.2 1724.6 1718.4 1909.5 1721.00.906

0.0000.905

0.0020.905

0.000

4 1000 853.1 1115.9 856.7 849.5 1112.3 853.00.906

0.000.9050.006

0.9050.001

8 2000 419.7 741.4 421.6 416.0 737.8 418.00.906

0.0010.905

0.0070.906

0.001

16 4000 213.0 608.0 215.6 209.4 604.4 212.00.906

0.0010.903

0.0170.905

0.002

Rassul Ayani 18

Scaling problem and machine size (cont’d)

1

1.5

2

2.5

3

3.5

1 2 4 8 16

number of servers

norm

alize

d av

erag

e w

aiting

time

RR

LC

Conclusion: LC is much better than RR

Is it a fare conclusion?

Rassul Ayani 19

Questions in ScalingHow should the application be scaled?

Look at the web server

Scaling machine size e.g., by adding identical nodes, each bringing memoryMemory size is increasedLocality may be changed Extra work (e.g., overhead for task scheduling) will be increased

Problem size: scaling problem size may change localityworking set sizeCommunication cost

Rassul Ayani 20

Why Scaling?

Two main reasons for scaling:to increase performance, e.g. increase number of transactions per

second Of interest to users

to utilize resources (processor and memory) more efficiently more interesting for managers More difficult

scaling models:Problem constrained (PC)Memory constrained (MC)Time constrained (TC)

Rassul Ayani 21

Problem Constrained Scaling

Problem size is kept fixed, but the machine is scaledMotivation: User wants to solve the same problem, only

faster.Some examples:

Video compressionComputer graphicsMessage routing in a router (or switch)

Speedup (p) = Time(1)Time(p)

Rassul Ayani 22

Machine Constrained Scaling

Scale problem size, but the machine (memory) remains fixed Motivation: It is good to find limits of a given machine e.g.,

what is the maximum problem size that can avoid memory thrashing?

Performance measurement:previous definition of Speedup: Time(1) / Time(p) NOT validNew definition:

Performance improvement = increase in work/increase in time

How to measure work?Work can be defined as the number of instructions, operations, or

transactions

Rassul Ayani 23

Time Constrained Scaling

Time is kept fixed as the machine is scaledMotivation: User has fixed time to use the machine (or wait

for result as in real-time systems), but wish to do more work during this time

Performance = Work/Time as usual, and time is fixed, so

SpeedupTC(p) =

How Work(1) affects the result?Work(1) must be reasonable to avoid thrashing

Work(p)Work(1)

Rassul Ayani 24

Evaluation using Workload Must consider three major factors:

Workload characteristicsProblem Sizemachine size

Rassul Ayani 25

Impact of Workload

Should adequately represent domains of interest

Easy to mislead with workloads Choose those with features for which machine is good, avoid

others

Some features of interest:Working set size and spatial localityFine-grained or coarse-grained tasksSynchronization patternsContention, and Communication patterns

Should have enough to utilize the processorsIf load imbalance dominates, may not be much machine can do

Rassul Ayani 26

Problem size

Many critical characteristics depend on problem sizeCommunication pattern (IPC)Synchronization patternLoad imbalance

Need to choose problem sizes appropriatelyInsufficient to use a single problem size

Rassul Ayani 27

Steps in Choosing Problem Sizes

1. Expert view May know that users care only about a few problem sizes

2. Determine range of useful sizesBelow which bad performance or unrealistic time distribution in

phasesAbove which execution time or memory usage too large

3. Use understanding of inherent characteristicsCommunication-to-computation ratio, load balance...

Rassul Ayani 28

Summary

Performance improvement due to parallelism is often measured by speedup

Problem size is important

Scaling is often needed

Scaling models are fundamental to proper evaluationTime constrained scaling is a realistic method for many

applicationsScaling only data problem size can yield misleading results

Proper scaling requires understanding the workload

rassul ayani 1 performance of parallel and distributed systems what is the purpose of measurement? ...

Documents