rassul ayani 1 performance of parallel and distributed systems what is the purpose of measurement? ...
TRANSCRIPT
Rassul Ayani 1
Performance of parallel and distributed systems
What is the purpose of measurement?To evaluate a system (or an architecture)To compare two or more systemsTo compare different algorithms
metric to be used?speedup
Rassul Ayani 2
Workload-Driven Evaluation
Approach Run a workload (trace) and measure performance of the system
Traces Real trace Synthetic trace
Other issues How representative is the workload?
Rassul Ayani 3
Type of systems
For existing systems Run workload and evaluate performance of the system Problem: Is the workload representative?
For future systems (an architectural idea): Develop a simulator of the system run workload and evaluate the system Problem:
• Developing a simulator is difficult and expensive• How do define system parameters, such as memory access time and
communication cost?
Rassul Ayani 4
Measuring Performance
Performance metric most important to end userPerformance =Work / Time unit
Performance improvement due to parallelism
Time(1)Time(p)
Speedup=
Rassul Ayani 5
Performance evaluation of a parallel computer
Speedup(p) = Time(1) / Time(p)
What is Time(1)?
1. Parallel program on one processor of parallel machine?
2. A sequential algorithm on one processor of the parallel machine?
3. “Best” sequential program on one processor of the parallel machine?
4. “Best” sequential program on agreed-upon standard machine?
Which one is reasonable?
Rassul Ayani 6
Speedup
What is Time(p)?The time needed by the parallel machine to run the same workload?Is it fair?How does the size affects our measurement?
Rassul Ayani 7
Example 1: Our experimence Parallel simulation of Multistage Interconnection Network (MIN)
d: number of stages
n: number of nodes
n= (d+1)* 2 d
Rassul Ayani 8
MIN on CM2
1
10
100
1000
10000
10 11 12 13 14 15 16
number of stages
spee
dup
Size
11
12
13
14
15
speedup
40
45
70
1600
2200
Speedup of MIN on CM-2Speedup=T(1)/T(p), where T(1)=execution time of sequential simulator on a sun sparcT(p)=execution time of parallel simulator on CM-2 with 8k processors
Rassul Ayani 9
Why problem size is important?
The problem size is too small:May be appropriate for small machine, but not for the parallel
machine Not enough work for the PM Parallelism overheads begin to dominate benefits for the PM
• Load imbalance • Communication to computation ratio
May even achieve slowdownsDoesn’t reflect real usage, and inappropriate for large machines
Can exaggerate benefits of architectural improvements, especially when measured as percentage improvement in performance
Rassul Ayani 10
Size is too large
May not “fit” in small machine Can’t run Thrashing to disk Working set doesn’t fit in cache
May lead to super linear speedup
What is the right size?How do we find the right size?
Rassul Ayani 11
Scaling: Example 2 Small and big equation solvers on SGI Origin2000
(fom Parallel Computer Architecture, Culler & Singh)
Number of processors Number of processors
Spe
edup
Spe
edup
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 310
5
10
15
20
25
30 Ideal Ocean: 258 x 258
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 310
5
10
15
20
25
30
35
40
45
50 Grid solver: 12 K x 12 K Ideal
Rassul Ayani 12
Scaling issues
Important issuesReasonable problem sizeScaling problem sizeScaling machine size
ExampleConsider a dispatcher based cluster and compare three load
balancing algorithms Round Robin (RR) Least connection (LC) first Least loaded first (LL)
Rassul Ayani 13
Scalin: example Dispatcher based web server
Rassul Ayani 14
Determine the problem size
0
500
1000
1500
2000
2500
3000
3500
25 50 75 100 125 150 175 200 225 250 300 350 400
arrival rate (requests/second)
aver
age
wai
ting
time
(ms)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
aver
age
serv
er u
tiliz
atio
n
Rassul Ayani 15
Scale problem size, but keep machine size fixed
arrival rate
(requests/sec)
average waiting time (ms)
average response time (ms)
average utilization
Baseline
RR LCBaselin
eRR LC
Baseline
RR LC
250 0.2 10.5 0.9 3.8 14.1 4.50.226
0.001
0.226
0.002
0.230.001
500 1.8 32.99 3.9 5.4 36.6 7.50.453
0.002
0.453
0.003
0.453
0.002
750 49.5 127.553.1
53.2 131.156.7
0.679
0.001
0.679
0.004
0.680
0.001
1000 849.51112.
3853.0
853.11115.
9856.7
0.906
0.00
0.905
0.006
0.905
0.001
12507008
47011
870085
700877012
170088
0.998
0.001
0.991
0.006
0.997
0.001
Table 3: Performance of a 4-Server Cluster
Rassul Ayani 16
Scaling problem size (cont’d)
1
10
100
250 500 750 1000 1250
arrival rate (requests/second)
norm
aliz
ed w
aitin
g tim
e RR
LC
Conclusion: for low arrival rate LC is much better than RR, but for high arrival rate both converge to the BL algorithm
Is it a fare conclusion?
Rassul Ayani 17
Scaling problem and machine size
no. of servers
arrival rate
average response time (ms)average waiting time (ms) average utilization
baseline
RR LCbaselin
eRR LC
baseline
RR LC
1 250 3467 3467 0.906
2 500 1722.0 1913.2 1724.6 1718.4 1909.5 1721.00.906
0.0000.905
0.0020.905
0.000
4 1000 853.1 1115.9 856.7 849.5 1112.3 853.00.906
0.000.9050.006
0.9050.001
8 2000 419.7 741.4 421.6 416.0 737.8 418.00.906
0.0010.905
0.0070.906
0.001
16 4000 213.0 608.0 215.6 209.4 604.4 212.00.906
0.0010.903
0.0170.905
0.002
Rassul Ayani 18
Scaling problem and machine size (cont’d)
1
1.5
2
2.5
3
3.5
1 2 4 8 16
number of servers
norm
alize
d av
erag
e w
aiting
time
RR
LC
Conclusion: LC is much better than RR
Is it a fare conclusion?
Rassul Ayani 19
Questions in ScalingHow should the application be scaled?
Look at the web server
Scaling machine size e.g., by adding identical nodes, each bringing memoryMemory size is increasedLocality may be changed Extra work (e.g., overhead for task scheduling) will be increased
Problem size: scaling problem size may change localityworking set sizeCommunication cost
Rassul Ayani 20
Why Scaling?
Two main reasons for scaling:to increase performance, e.g. increase number of transactions per
second Of interest to users
to utilize resources (processor and memory) more efficiently more interesting for managers More difficult
scaling models:Problem constrained (PC)Memory constrained (MC)Time constrained (TC)
Rassul Ayani 21
Problem Constrained Scaling
Problem size is kept fixed, but the machine is scaledMotivation: User wants to solve the same problem, only
faster.Some examples:
Video compressionComputer graphicsMessage routing in a router (or switch)
Speedup (p) = Time(1)Time(p)
Rassul Ayani 22
Machine Constrained Scaling
Scale problem size, but the machine (memory) remains fixed Motivation: It is good to find limits of a given machine e.g.,
what is the maximum problem size that can avoid memory thrashing?
Performance measurement:previous definition of Speedup: Time(1) / Time(p) NOT validNew definition:
Performance improvement = increase in work/increase in time
How to measure work?Work can be defined as the number of instructions, operations, or
transactions
Rassul Ayani 23
Time Constrained Scaling
Time is kept fixed as the machine is scaledMotivation: User has fixed time to use the machine (or wait
for result as in real-time systems), but wish to do more work during this time
Performance = Work/Time as usual, and time is fixed, so
SpeedupTC(p) =
How Work(1) affects the result?Work(1) must be reasonable to avoid thrashing
Work(p)Work(1)
Rassul Ayani 24
Evaluation using Workload Must consider three major factors:
Workload characteristicsProblem Sizemachine size
Rassul Ayani 25
Impact of Workload
Should adequately represent domains of interest
Easy to mislead with workloads Choose those with features for which machine is good, avoid
others
Some features of interest:Working set size and spatial localityFine-grained or coarse-grained tasksSynchronization patternsContention, and Communication patterns
Should have enough to utilize the processorsIf load imbalance dominates, may not be much machine can do
Rassul Ayani 26
Problem size
Many critical characteristics depend on problem sizeCommunication pattern (IPC)Synchronization patternLoad imbalance
Need to choose problem sizes appropriatelyInsufficient to use a single problem size
Rassul Ayani 27
Steps in Choosing Problem Sizes
1. Expert view May know that users care only about a few problem sizes
2. Determine range of useful sizesBelow which bad performance or unrealistic time distribution in
phasesAbove which execution time or memory usage too large
3. Use understanding of inherent characteristicsCommunication-to-computation ratio, load balance...
Rassul Ayani 28
Summary
Performance improvement due to parallelism is often measured by speedup
Problem size is important
Scaling is often needed
Scaling models are fundamental to proper evaluationTime constrained scaling is a realistic method for many
applicationsScaling only data problem size can yield misleading results
Proper scaling requires understanding the workload