Download - Predictive Performance Testing: Integrating Statistical Tests into Agile Development Life-cycles
©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Predictive Performance TestingIntegrating Statistical Tests into Agile Development Lifecycles
Tom KleingarnLead, Performance Engineering
Digital River
http://www.linkedin.com/in/tomkleingarn
http://www.perftom.com
Agenda
> Introduction
> Performance engineering
> Agile
> Outputs from LoadRunner
> Basic statistics
> Advanced statistics
> Summary
> Practical application
About Me> Tom Kleingarn
> Lead, Performance Engineering - Digital River
> 4 years in performance engineering
> Tested over 100 systems/applications
> 100’s of performance tests
> Tools> LoadRunner
> JMeter
> Webmetrics, Keynote, Gomez
> ‘R’ and Excel
> Quality Center
> QuickTest Professional
> Leading provider of global e-commerce solutions
> Builds and manages online businesses for software and game publishers, consumer electronics manufacturers, distributors, online retailers and affiliates.
> Comprehensive platform offers > Site development and hosting> Order management> Fraud management> Export control> Tax management> Physical and digital product fulfillment> Multi-lingual customer service> Advanced reporting and strategic marketing
Performance Engineering
> The process of experimental design, test execution, and results analysis, utilized to validate system performance as part of the Software Development Lifecycle (SDLC).
> Performance requirements – measureable targets of speed, reliability, and/or capacity used in performance validation.
> Latency < 10ms, measured at the 99th percentile
> 99.95% uptime
> Throughput of 1,000 requests per second
Performance Testing Cycle
1. Requirements Analysis
2. Create test plan
3. Create automated scripts
4. Define workload model
5. Execute scenarios
6. Analyze results
> Rinse and repeat if…
> Defects identified
> Change in requirements
> Setup or environment issues
> Performance requirement not met
Digital River Test Automation
Agile
> A software development paradigm that emphasizes rapid process cycles, cross-functional teams, frequent examination of progress, and adaptability.
Scrum
Initial Plan
Deploy
Agile Performance Engineering
> Clear and constant communication
> Involvement in initial requirements and design phase
> Identify key business processes before they are built
> Coordinate with analysts and development to build key business processes first
> Integrate load generation requirements into project schedule
> Test immediately with v1.0
> Schedule tests to auto-start, run independently
> Identify invalid test results before deep analysis
LoadRunner Results
> Measures of central tendency> Average = ∑(all samples)/(sample size) =
> Median = 50th percentile
> Mode – highest frequency, the value that occurred the most
> Measures of variability> Min, max
> Standard Deviation =
> 90th percentile
LoadRunner Results
50% 50%
90% 10%
Basic Statistics – Sample vs. Population
> Performance requirement: average latency < 3 seconds
> What if you ran 50 rounds? 100 rounds?
Basic Statistics – Sample vs. Population
> Sample – set of values, subset of population
> Population – all potentially observable values
> Measurements
> Statistic – the estimated value from a collection of samples
> Parameter – the “true” value you are attempting to estimate
Not a representative sample!
Basic Statistics – Sample vs. Population> Sampling distribution – the probability distribution of a
given statistic based on a random sample of size n> Dependent on the underlying population
> How do you know the system under test met the performance requirement?
Basic Statistics – Normal Distribution
> With larger samples, data tend to cluster around the mean
Basic Statistics – Normal Distribution
Sir Francis Galton’s “Bean Machine”
Confidence Intervals
> The probability that an interval made up of two endpoints will contain the true mean parameter μ
> 95% confidence interval:
> … where 1.96 is a score from the normal distribution associated with 95% probability:
Confidence Intervals
> In repeated rounds of testing, a confidence interval will contain the true mean parameter with a certain probability:
True Average
Confidence Intervals in Excel
> 95% confidence - true average latency 3.273 to 3.527 seconds
> 99% confidence - true average latency 3.233 to 3.567 seconds
> Our range is wider at 99% compared to 95%, 0.334 sec vs. 0.254 sec
Statistic Value 95% Value 99% Formula
Average 3.40 3.40
Standard Deviation 1.45 1.45
Sample size 500 500
Confidence Level 0.95 0.99
Significance Level 0.05 0.01 =1-(Confidence Level)
Margin of Error 0.0127 0.167 =CONFIDENCE(Sig. Level, Std Dev, Sample Size)
Lower Bound 3.273 3.233 =Average - Margin of Error
Upper Bound 3.527 3.567 =Average + Margin of Error
The T-test
> Test that your sample mean is greater than/less than a certain value
> Performance requirement:
Mean latency < 3 seconds
> Null hypothesis:
Mean latency >= 3 seconds
> Alternative hypothesis:
Mean latency is < 3 seconds
Add pic
T-test – Raw Data from LoadRunner
n = 500
T-test in ‘R’> ‘R’ for statistical analysis
> http://www.r-project.org/
Load test data from a file:> datafile <- read.table("C:\\Data\\test.data",
header = FALSE, col.names= c("latency"))
Attach the dataframe:> attach(datafile)
Create a “vector” from the dataframe:
> latency <- datafile$latency
T.Test in ‘R’
> t.test(latency, alternative="less", mu=3, tails=1)
One Sample t-test
data: latency
t = -2.9968, df = 499, p-value = 0.001432
alternative hypothesis: true mean is less than 3
> There is a 0.14% probability that the true average latency of the system is greater than 3 seconds. In this case we would reject the null hypothesis.
> There is a 99.86% probability that the true average latency is less than 3 seconds
T-test – Number of Samples Required
> power.t.test(sd=sd, sig.level=0.05, power=0.90, delta=mean(latency)*0.01, type="one.sample")
One-sample t test power calculation
n = 215.5319
delta = 0.03241267
sd = 0.1461401
sig.level = 0.05
power = 0.9
alternative = two.sided
> We need at least 216 samples
> Our sample size is 500, we have enough samples to proceed
Test for Normality
> Test that the data is “normal”
> Clustered around a central value, no outliers
> Roughly fits the normal distribution
> shapiro.test(latency)
Shapiro-Wilk normality test
data: latency
p-value = 0.8943
> Our sample distribution is approximately normal
> p-value < 0.05 indicates the distribution is not normal
Review
> Sample vs. Population
> Normal distribution
> Confidence intervals
> T-test
> Sample size
> Test for normality
> Practical application
> Performance requirements
> Compare two code builds
> Compare system infrastructure changes
Case Study
> Engaged in a new web service project
> Average latency < 25ms
> Applied statistical analysis
> System did not meet requirement
> Identified problem transaction
> Development fix applied
> Additional test, requirement met
> Prevented a failure in production
Implementation in Agile Projects
> Involvement in early design stages
> Identify performance requirements
> Build key business processes first
> Calculate required sample size
> Apply statistical analysis
> Run fewer tests with greater confidence in your results
> Prevent performance defects from entering production
> Prevent SLA violations in production