box leangsuksun computer science center for ...box/hapc/quantify_non_functional.pdf · microsoft...
TRANSCRIPT
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 1
enitc
Measuring/estimating System Reliability and PerformanceBox LeangsuksunComputer ScienceCenter for Entrepreneurship and Information TechnologyLouisiana Tech University
enitc
IntroductionNon-functional requirements are equally if not more importantWhy?World is impatientMore Cost-effective upfront than retrofitEfficiency Inconvenience Life-threateningLost of money and/or opportunitiesEtc.
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 2
enitc
Why? GoalsCompare AlternativesDetermine impacts (per features)System Tuningquantify relative Rel/Avail/PerfDebuggingSet Expectation
enitc
How to measure or estimateMeasurementsSimulationsAnalytical Modeling
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 3
enitc
MeasurementsActual System ConstructionCreate a workload per requirementsProvides the best resultsInherent difficult and inflexibleAlmost impossible for What-if
enitc
Measurements (continued)Measure system or subsystem performance with tools
GprofTop/ ps etc..Benchmark programs (e.g. Linpak, Specmark, Winmark
What about reliability measurement? log, trace, outages.
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 4
enitc
SimulationA program to simulate important characteristics of targeted systemsFlexible and ease to modifyGood for the What-if analysis Difficult to model every small detailsPopular – cost-effective and flexibleSuffer from details
enitc
Analytical ModelingMathematical description of the systemProvide a quick insight
To help guiding in detail simulation or measurement-based
Results are much less believable or accurateExample
H = cache hit prob, Tm = memory access time, Tc= cache access timeT avg = H Tc + (1 – H) Tm
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 5
enitc
Comparison (Lilja’ book)
HighMediumLowAccuracyHighMediumLowBelievabilityHighMediumLowCostLowHighHighFlexibility
MeasurementSimulationAnalytical Modeling
Factor
enitc
Dependability Estimation/Measurement
Similarly to aforementioned 3 techniquesTwo measures
Availability (ratio of uptime/total)Reliability (MTTF)
Analytical modelingNon-state spaceState-space
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 6
enitc
Why Dependability measures?
comparisons with cost and performance. a proper focus for product-improvement efforts. Consideration of safety and risk issues.
enitc
Dependability ModelingInclude reliability modeling and availability modelingA designed system can be shown to meet performance and dependability requirement. provide a good mechanism for examining the behavior of a system, right from the design stage to implementation and final deployment.
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 7
enitc
Dependability
Two measuresReliability (MTTF) Availability (ratio of uptime/total)
enitc
ReliabilityDefinition: The reliability R(t) of a system at time t is the probability that the system failure has not occurred in the interval [0,t). If X is a random variable that represents the time to occurrence of system failure, then R(t)=P(X>t).unreliability = 1-R(t)
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 8
enitc
ReliabilityDefinition MTTF of a system is the expected time until the occurrence of the (first) system failure. If X is a random variable that represents the time to occurrence of system failure, then MTTF=E[X].Given the system reliability R(t), the MTTF can be computed as,
MTTF = int R(t)dt
enitc
AvailabilityA measurement represents a ratio of uptime vs. total timesHigh availability - ability of a system to perform its function continuously (without interruption) for a significantly longer period of time than the reliabilities of its individual components would suggest. High availability is most often achieved through fault tolerance.
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 9
enitc
Degree of Availability
System Type Unavailability(minutes/year)
Availability(in percent)
Availability Class
Unmanaged 50,000 90 1
Managed 5,000 99 2
Well-managed 500 99.9 3
Fault-tolerant 50 99.99 4
High Availability 5 99.999 5
Very High Availability
0.599.9999 6
Ultra Availability 0.05 99.99999 7
enitc
AvailabilityDefinition: Availability A(t) of a system at time t is the probability that the system is functioning correctly at time t.Like the reliability measure, in some applications it is better to compute the system unavailability U(t) = 1 -A(t).
A steady = lim A(t) where t -> inf
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 10
enitc
Modeling TechniquesNon State-space
Fault-treeReliability Block Diagram
State-SpaceContinuous Markov ChainStochastic Petri Net
enitc
Example of system
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 11
enitc
Fault Tree
enitc
Availability ModelServer up Server down & repair
S1
S1
S2
time
Availability model
HA-OSCAR dual head model
S1&S2
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 12
enitc
HA-OSCAR SRN model
•Server sub-model
•Switches
•Compute nodes
enitc
Server Sub Model •P Server up•P Server down•Failover•P server repair•Failback
•S is up and ready•S takes control•S Server down•S repair
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 13
enitc
Compute node sub model
Switch sub model
enitc
Instantaneous Availability
Steady (A) = 99.993 (36 min) vs.
Beowulf (A) = 99.65 (30 hr)
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 14
enitc
Stochastic Petri Net PackageR & D from Duke UVery popularPetri net based dependability analysis
enitc
PerformanceComputation
CPUMemoryI/O etc
Communication Latency Bandwidth
Transaction Possible more involvement than DB
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 15
enitc
Some Criteria Throughput – # of completed requests per time unitResponse time – amount of time it takes from when a request was submitted until the first response is produced, not outputCPU utilization – keep the CPU as busy as possibleTurnaround time – amount of time to execute a particular request (finishing time –arrival time)
enitc
Performance issue discovery phase
Requirement Architecture/design Development/code test
1/19/2004 3/19/2004
2/1/2004 3/1/2004
1/19/2004 - 3/19/2004Re-design, code, re-test
Telcomm industry architecture review:1/3 related issues to performance
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 16
enitc
Performance Measures
ModelingSimulationMeasurement
enitc
Analytical ModelingExample for memory
H = cache hit prob, Tm = memory access time, Tc= cache access timeT avg = H Tc + (1 – H) Tm
Example of operation/transaction modeling
Browsing order Tb + submitting order Ts90 % vs 10% (volume)Weight 20% vs. 80% orderOrder = 50 instructions + 10 mem
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 17
enitc
Performance EngineeringUnderstand requirements and growth Should begin at planning and architecture stageResource needs and budgetUse quantitative methods to gauge the goals (&eliminate root causes)EstimateTracking and ManagementMeasurementTuning
enitc
PE (continued)Poor performance reflects a negativityCostly or high cost when retrofitting
Re-architectingAdd more hardware
Highly tuned code -> cost more in maintainance
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 18
enitc
Key PE activities
Predict-requirement-architecture/analysis-budget
Track
Measure
Correct
enitc
Key approach*Bound performance to acceptable level (based on requirement)Targets are quantitative requirements that define the acceptance criteria Budgets are the performance goals allocated across all of the architecture components that must all be met in order to meet the overall targetsEstimates are design component goals derived from experience or previous measurement of existing components
•These definitions are excerpted from AT&T performance engineering course and only used for educational propose.
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 19
enitc
Estimate -> How well can the system perform?
Budget -> How well must the system perform?
enitc
Performance Engineering Life CycleArchitecture
Design
M1
m2
m3
m4
m5
m6
m7
17 18 19 20 21 22
23 24 25 26 27 28 29
Spread Sheet
Budget
Initial Performance Model
MeasurementTest
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 20
enitc
How to start (Target)Seek the boundary (requirement)Start with Back of the Envelope calculation
Ball park (e.g. no of transactions, normal and at peak)Don’t get hung up on precision earlyE.g. How much water flow out?
enitc
Budgettarget or educated guessIterative processStart from subsystem and then down to modulesBudgeted resources items for each process/modules/subsystems
CPU, memory, Disk I/O, network bandwidth
TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 21
enitc
Budget typesConcurrency: percentage of resource allocationA sequential: wall clock timeExample of Budget response time for a transaction
T trans = T cpu + (1 – Cmem) T disk + T network
enitc
ExercisesSee the handouts in the class