Download - The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The Statistics of Web Performance
Philip Tellis / [email protected]
ConFoo / 2010-03-12
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
$ finger philip
Philip [email protected]
@bluesmoonyahoogeek
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The goalPerformance Measurement
Introduction
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The goalPerformance Measurement
Accurately measure page performanceAt least, as accurately as possible
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The goalPerformance Measurement
Accurately measure page performanceAt least, as accurately as possible
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The goalPerformance Measurement
Be unintrusive
If you try to measure something accurately, you will changesomething related
– Heisenberg’s uncertainty principle
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The goalPerformance Measurement
And one number to rule them all
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The goalPerformance Measurement
Bandwidth
Real bandwidth v/s advertised bandwidthBandwidth to your server, not to the ISPBandwidth during normal internet usage
If the user’s always watching movies, you’re not winning
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The goalPerformance Measurement
Bandwidth
Real bandwidth v/s advertised bandwidthBandwidth to your server, not to the ISPBandwidth during normal internet usage
If the user’s always watching movies, you’re not winning
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The goalPerformance Measurement
Latency
How long does it take a byte to get to the user?Wired, wireless, mobile, satellite?How many hops in between?Speed of light is constant
This is not a battle we will soon win.When was the last time you heard latency mentioned in aTV ad?
http://www.stuartcheshire.org/rants/Latency.html
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The goalPerformance Measurement
Latency
How long does it take a byte to get to the user?Wired, wireless, mobile, satellite?How many hops in between?Speed of light is constant
This is not a battle we will soon win.When was the last time you heard latency mentioned in aTV ad?
http://www.stuartcheshire.org/rants/Latency.html
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The goalPerformance Measurement
Latency
How long does it take a byte to get to the user?Wired, wireless, mobile, satellite?How many hops in between?Speed of light is constant
This is not a battle we will soon win.When was the last time you heard latency mentioned in aTV ad?
http://www.stuartcheshire.org/rants/Latency.html
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The goalPerformance Measurement
User perceived page load time
Time from “click on a link” to “spinner stops spinning”This is what users notice
Depends on how long your page takes to buildDepends on what’s in your pageDepends on how long components take to loadDepends on how long the browser takes to execute andrender
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The goalPerformance Measurement
We need to measure real user data
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
The goalPerformance Measurement
The statistics apply to any kind of performance data though
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Statistics - I
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Disclaimer
I am not a statistician
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Population
All possible users of your system
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Sample
Representative subset of the population
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Bad sample
Sometimes it’s not
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
How to randomize?
Pick 10% of users at random and always test them
OR
For each user, decide at random if they should be tested
http://tech.bluesmoon.info/2010/01/statistics-of-performance-measurement.html
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Select 10% of users - I
if($sessionid % 10 === 0) {// instrument code for measurement
}
Once a user enters the measurement bucket, they staythere until they log outFixed set of users, so tests may be more consistentError in the sample results in positive feedback
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Select 10% of users - II
if(rand() < 0.1 * getrandmax()) {// instrument code for measurement
}
For every request, a user has a 10% chance of beingtestedGets rid of positive feedback errors, but sample size !=10% of population
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
How big a sample is representative?
Select n such that∣∣∣1.96 σ√n
∣∣∣ ≤ 5%µ
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Standard Deviation
Standard deviation tells you the spread of the curveThe narrower the curve, the more confident you can be
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
MoE at 95% confidence
±1.96 σ√n
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
MoE & Sample size
There is an inverse square root correlation between samplesize and margin of error
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
But wait... it’s not complicated enough.We have different types of margins of error...more about that later
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
But wait... it’s not complicated enough.We have different types of margins of error...more about that later
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
But wait... it’s not complicated enough.We have different types of margins of error...more about that later
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Ding dong
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
One number
Mean (Arithmetic)Good for symmetric curvesAffected by outliers
Mean(10, 11, 12, 11, 109) = 30
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
One number
MedianMiddle value measures central tendency wellNot trivial to pull out of a DB
Median(10, 11, 12, 11, 109) = 11
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
One number
ModeNot often usedMulti-modal distributions suggest problems
Mode(10, 11, 12, 11, 109) = 11
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Other numbers
A percentile point in the distribution: 95th, 98.5th or 99th
Used to find out the worst user experienceMakes more sense if you filter data first
P95th(10, 11, 12, 11, 109) = 12
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Other means
Geometric meanGood if your data is exponential in nature(with the tail on the right)
GMean(10, 11, 12, 11, 109) = 16.68
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Wait... how did I get that?
N
√ΠN
i=1xi — could lead to overflow
e
(ΣN
i=1 loge(xi )N
)— computationally simpler
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Wait... how did I get that?
N
√ΠN
i=1xi — could lead to overflow
e
(ΣN
i=1 loge(xi )N
)— computationally simpler
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Wait... how did I get that?
N
√ΠN
i=1xi — could lead to overflow
e
(ΣN
i=1 loge(xi )N
)— computationally simpler
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Wait... how did I get that?
N
√ΠN
i=1xi — could lead to overflow
e
(ΣN
i=1 loge(xi )N
)— computationally simpler
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
Other means
And there is also the Harmonic mean, but forget about that
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
...though consequently
We have other margins of errorGeometric margin of error
Uses geometric standard deviationMedian margin of error
Uses ranges of actual values from data set
Stick to the arithmetic MoE– simpler to calculate, simpler to read and not incorrect
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Random SamplingMargin of ErrorCentral Tendency
...though consequently
We have other margins of errorGeometric margin of error
Uses geometric standard deviationMedian margin of error
Uses ranges of actual values from data set
Stick to the arithmetic MoE– simpler to calculate, simpler to read and not incorrect
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
Statistics - II
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
Outliers
Out of range data pointsNothing you can fix hereThere’s even a book aboutthem
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
Outliers
Out of range data pointsNothing you can fix hereThere’s even a book aboutthem
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
Outliers
Out of range data pointsNothing you can fix hereThere’s even a book aboutthem
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
Outliers
Out of range data pointsNothing you can fix hereThere’s even a book aboutthem
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
DNS problems can cause outliers
2 or 3 DNS servers for an ISP30 second timeout if first fails... 30 second increase in page load timeMaybe measure both and fix what you canhttp://nms.lcs.mit.edu/papers/dns-ton2002.pdf
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
Band-pass filtering
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
Band-pass filtering
Strip everything outside a reasonable rangeBandwidth range: 4kbps - 4GbpsPage load time: 50ms - 120s
You may need to relook at the ranges all the time
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
IQR filtering
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
IQR filtering
Here, we derive the range from the data
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
Let’s look at some real charts
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
Bandwidth distribution for web devs
x-axis is linear
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
Now let’s use log(kbps) instead of kbps
x-axis is exponential
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
Exponential == Geometric
Categories/Buckets grow exponentiallyData is related geometricallyUse the geometric mean and geometric margin of error
Error_range =[
gmean/gmoe, gmean ∗ gmoe]
Non-linear ranges are hard for humans to grok
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
Exponential == Geometric
Categories/Buckets grow exponentiallyData is related geometricallyUse the geometric mean and geometric margin of error
Error_range =[
gmean/gmoe, gmean ∗ gmoe]
Non-linear ranges are hard for humans to grok
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
FilteringThe Log-Normal distribution
Exponential == Geometric
Categories/Buckets grow exponentiallyData is related geometricallyUse the geometric mean and geometric margin of error
Error_range =[
gmean/gmoe, gmean ∗ gmoe]
Non-linear ranges are hard for humans to grok
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
So...
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Further reading
Web Performance - Not a Simple Numberhttp://www.netforecast.com/Articles/BCR+C25+Web+Performance+-+Not+A+Simple+Number.pdf
Revisiting statistics for web performance (introduction toLog-Normal)http://home.pacbell.net/ciemo/statistics/WhatDoYouMean.pdf
Random Samplinghttp://tech.bluesmoon.info/2010/01/statistics-of-performance-measurement.html
Khan Academy’s tutorials on statisticshttp://khanacademy.com/
Learning about Statistical Learninghttp://measuringmeasures.blogspot.com/2010/01/learning-about-statistical-learning.html
Wikipedia articles on Random Sampling, Central Tendency,Standard Error, Confounding, Means and IQR
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Summary
Choose a reasonable sample size and sampling factorTune sample size for minimal margin of errorDecide based on your data whether to use mode, medianor one of the meansFigure out whether your data is Normal, Log-Normal orsomething elseFilter out anomalous outliers
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
contact me
Philip [email protected]
bluesmoon.info@bluesmoon
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
Photo credits
http://www.flickr.com/photos/leoffreitas/332360959/ by leoffreitas
http://www.flickr.com/photos/cobalt/56500295/ by cobalt123
http://www.flickr.com/photos/sophistechate/4264466015/ by LisaBrewster
http://www.flickr.com/photos/nchoz/243216008/ by nchoz
ConFoo / 2010-03-12 The Statistics of Web Performance
IntroductionStatistics - IStatistics - II
List of figures
http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg
http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg
http://en.wikipedia.org/wiki/File:KilroySchematic.svg
http://en.wikipedia.org/wiki/File:Boxplot_vs_PDF.png
ConFoo / 2010-03-12 The Statistics of Web Performance