a data center by ulrike talbiersky, holger wichert, christian lohrengel, andré augustyniak case...
TRANSCRIPT
![Page 1: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/1.jpg)
A Data Center
by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak
Case Study
Source:
D. Menasce, V.A. Almeida, L.W. Dowdy
Performance by Design: Computer Capacity Planning by Example
Prentice Hall, 2004
![Page 2: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/2.jpg)
2
Table of Contents:
• Introduction
• The Data Center
• First Model Attempt: Markov Chain
• Tasks
• Second Model Attempt: Two-Device QN
• Cost Analysis
![Page 3: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/3.jpg)
3
Introduction
Data centers offer a variety of services Trend: service-based data centers Problems:
Compliance with SLA default tolerance, privacy, security (...)
Too expensive How to choose the optimal size?
( cost)
![Page 4: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/4.jpg)
4
The Data Center
Machine-Repair-Model: M machines (functionally identical) N repair people Diagnostic system:
Detect failures of the machines Maintain a queue of machines waiting to be
repaired Log failure time record repair times
![Page 5: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/5.jpg)
5
GSPN-Model
MiO Machines in operation
MBR Machines being repaired
MWR Machines waiting to be repaired
(Sharpe)
Failure rate
Repair rate
![Page 6: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/6.jpg)
6
Queueing Model
Machines waiting to be repaired
Machines in operation
Machines being repaired
![Page 7: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/7.jpg)
7
Parameters Failure rate
1/ MTTF (Mean Time to Failure)
Repair rate
1/ Time to repair a machine
MTTR Mean Time to Repair
MTBF Mean Time Between Failures
![Page 8: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/8.jpg)
8
Building a Model~1~
Example: Markov Chain
k number of failed machines
k →k+1 transition when a machine fails
k →k-1 transition when a machine is repaired
λk = (M-k)λ aggregate failure rate
MNkN
Nkkk ),...,1(
,...,1
aggregate repair rate
![Page 9: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/9.jpg)
9
Building a Model~2~
1-dim. Generalized Birth-Death (GBD)
0,1,2,...k 1
0 10
k
i i
ik pp
M-k machines in operation
![Page 10: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/10.jpg)
10
Building a Model~3~
Average aggregate rate at which machines fail
(which equals average aggregate rate at which
machines are repaired):
1
0
1
0
)(M
kk
M
kkkf pkMpX
![Page 11: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/11.jpg)
11
Building a Model~4~
Interactive Response Time Law:
1
ff X
MMTTF
X
MMTTR
Client work station ↔ machines in operation
Average think time Z ↔ MTTF
Average response time R ↔ MTTR
System throughput fXX 0
ZX
MR
0
![Page 12: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/12.jpg)
12
Building a Model~5~
Little´s Law: (Box of reparation)
f
ff
XMMTTRXN
R ↔ MTTR
Nf = average number of failed machines
XRN
fXX
![Page 13: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/13.jpg)
13
Building a Model~6~
Little´s Law: (operational machines)
R ↔ MTTF
No = average number of operational machines
XRN
fXX
f
fo
XMTTFXN
)( 0 fNNM
![Page 14: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/14.jpg)
14
Values for the Example
120 machines
MTTF = 500 min
= 0.002 per min
Time to repair a machine = 20 min
= 0.05 per min
![Page 15: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/15.jpg)
15
Task 1
Given is
• failure rate of machines = 0.002 per min• number of machines M = 120• repair rate of machines = 0.05 per min
What is the probability that exactly j machines are operational?
![Page 16: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/16.jpg)
16
Task 1
Use:
pexactly j machines in operation = pM-j
MNkN
kN
K
Mp
NkK
Mp
pkNk
k
k
),...,1(!
!
,...,1
0
0
1
0 10 !
!
N
k
M
Nk
kNkk
N
kN
K
M
K
Mp
![Page 17: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/17.jpg)
17
Task 1 N = 2,5,10
![Page 18: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/18.jpg)
18
Task 2
Given is
• failure rate of machines = 0.002 per min• number of machines M = 120• number of repair people N• repair rate of machines = 0.05 per min
What is the probability Pj that at least j
machines are operational ?
![Page 19: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/19.jpg)
19
Task 2
Use Task 1 and:
once the personnel becomes overloaded, the system tends towards failure
if M>>N: having extra machines is pointless
M
jiiMj pP
![Page 20: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/20.jpg)
20
Task 3
Given is
• failure rate of machines = 0.002 per min• number of machines M = 120
• wanted probability: Pj = 0.9
• Time to repair a machine = 20 per min
How many repair people are necessary to guarantee that at least two thirds of the machines are operational with Pj = 0.9 ?
![Page 21: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/21.jpg)
21
Task 2,3 N = 2,3,4,5,10
![Page 22: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/22.jpg)
22
Task 4Given are the values
13
120 machines
MTTF = 500 min
= 0.002 per min
Time to repair a machine = 20 min
= 0.05 per min
What is the effect of the size of the repair team, N, on the MTTR a machine ?
![Page 23: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/23.jpg)
23
Task 4
computation
1 5
U s e :
P e x a c t l y j m a c h i n e s i n o p e r a t i o n = P M - j
MNkN
kN
K
Mp
NkK
Mp
pkNk
k
k
),...,1(!
!
,...,1
0
0
N
k
M
Nk
kNkk
N
kN
K
Mp
K
Mpp
0 1000 !
!
1. p0
2. pk
![Page 24: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/24.jpg)
24
Task 4
computation
1. p0
2. pk
fX.3
9
B u i l d i n g a M o d e l~ 3 ~
A v e r a g e a g g r e g a t e r a t e a t w h i c h m a c h i n e s f a i l
e q u a l s a v e r a g e a g g r e g a t e r a t e a t w h i c h
m a c h i n e s a r e r e p a i r e d :
1
0
1
0
)(M
kk
M
kkkf pkMpX
![Page 25: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/25.jpg)
25
Task 4
computation
1. p0
2. pk
4. MTTR
1 0
B u i l d i n g a M o d e l~ 4 ~
1
ff X
MMTTF
X
MMTTR
fX.3
![Page 26: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/26.jpg)
26
Task 4
computation
1. p0
2. pk
4. MTTR
5. No
1 2
B u i l d i n g a M o d e l~ 6 ~
f
fo
XMTTFXN
fX.3
![Page 27: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/27.jpg)
27
Task 4
computation
1. p0
2. pk
4. MTTR
5. No
6. Nf 1 1
B u i l d i n g a M o d e l~ 5 ~
f
ff
XMMTTRXN
fX.3
![Page 28: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/28.jpg)
28
Task 4 Effect of Number of Repair People
N repair peopleNO average number of operational machinesNf average number of failed machinesMTTR Mean Time to Repair
![Page 29: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/29.jpg)
29
Task 4
• number of repair people is increased beyond 5, further decreases in the MTTR is minimal
with 5 repair people: • 111 machines operational• down time of 38 minutes
(MTTR = 38 min: 20 min repair, 18 min wait)
![Page 30: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/30.jpg)
30
Task 4
case N = M =120:
11ff XMTTRMTTFXM
M
MTTFXN fo
M
X f
![Page 31: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/31.jpg)
31
Task 5Given are the values
13
120 machines
MTTF = 500 min
= 0.002 per min
N = 5
What is the effect of a repair person´s skill level on the overall down time ?
![Page 32: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/32.jpg)
32
Task 5Given are the values
13
120 machines
MTTF = 500 min
= 0.002
N = 5
How does the skill level affect the percentage of operational machines ?
![Page 33: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/33.jpg)
33
Task 5 Effect of the Repair Rate
NO average number of operational machinesNf average number of failed machinesMTTR Mean Time to Repair
![Page 34: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/34.jpg)
34
Second Modeling Attempt~1~
The Failure-recovery-model can also be modeled by a two-device QN:
• 1st device: delay server( Machines in Operation)
• 2nd device: load-dependent server( repair people)
![Page 35: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/35.jpg)
35
Second Modeling Attempt~2~
Delay server:
A fixed machine goes into operation without queuing.
The time a machine is valid depends only on its MTTF.
![Page 36: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/36.jpg)
36
Second Modeling Attempt~3~
Load-dependent server:
total rate at which machines are repaired (TRMR) depends on:
- number of failed machines k
- number of repair people N
service rate:
MNkN
Nkkk
),...,1(
....,,.........1)(
![Page 37: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/37.jpg)
37
Second Modeling Attempt~4~
Use MVA method with load-dependent devices for solving this model
required: service rate´multipliers
, k=1,...,M (s.Chp 14)
MNkNN
Nkkk
k),...,1(
....,,.........1)(
)1(
)()(
k
k
![Page 38: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/38.jpg)
38
Second Modeling Attempt~5~
The solution of this MVA model gives us:
• average throughput:
• average residence time at the LD-device:
= MTTR
X
´
LDR
Little´s Law to LD device:
av. number of failed machines:
av. number of machines in op.:
´LDf RXN
fNMN 0
![Page 39: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/39.jpg)
39
A Cost Analysis
Cp annual personnel cost
Cm annual cost per machine
constant revenue multiplier No average number of machines in operation
Mmin minimum number of machines that need to be in operation for the data center not to have to pay a penalty
Cα cost
Rα revenue
![Page 40: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/40.jpg)
40
A Cost Analysis
cost:
revenue:
profit:
mp CMCNC
minMNR o
mpo CMCNMNCRP min
![Page 41: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/41.jpg)
41
A Cost Analysis
![Page 42: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/42.jpg)
42
A Cost Analysis
negative profit for low numbers of personnel, because of low machine availability
with more than 6 personnel costs increases more then revenue, thus 6 service personnel are optimal
![Page 43: A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance](https://reader035.vdocument.in/reader035/viewer/2022081519/56649efd5503460f94c10ba5/html5/thumbnails/43.jpg)
43
References
Skripts And Talks Of Menasce CS672_Performance
cs672-07CaseStudy-III-DataCenter.pdf
cs672-03QuantifyingPerformanceModels.pdf
Skript SN1
Haverkort: Computer Communication Systems
Performance Analysis