the statistics of web performance

65
Introduction Statistics - I Statistics - II The Statistics of Web Performance Philip Tellis / [email protected] ConFoo / 2010-03-12 ConFoo / 2010-03-12 The Statistics of Web Performance

Upload: philip-tellis

Post on 18-May-2015

6.311 views

Category:

Technology


2 download

DESCRIPTION

Analysis of user experience is typically done by taking a random sample of users, measuring their experiences and extracting a single number from that sample. In terms of web performance, the experience we need to measure is user perceived page load time, and the single number we need to extract depends on the distribution of measurements across the sample. There are a few contenders for what the magic number should be. Do you use the mean, median, mode, or something else? How do you determine the correctness of this number or whether your sample size is large enough? Is one number sufficient? This talk covers some of the statistics behind figuring out which numbers one should be looking at and how to go about extracting it from the sample.

TRANSCRIPT

Page 1: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The Statistics of Web Performance

Philip Tellis / [email protected]

ConFoo / 2010-03-12

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 2: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

$ finger philip

Philip [email protected]

@bluesmoonyahoogeek

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 3: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The goalPerformance Measurement

Introduction

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 4: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The goalPerformance Measurement

Accurately measure page performanceAt least, as accurately as possible

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 5: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The goalPerformance Measurement

Accurately measure page performanceAt least, as accurately as possible

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 6: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The goalPerformance Measurement

Be unintrusive

If you try to measure something accurately, you will changesomething related

– Heisenberg’s uncertainty principle

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 7: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The goalPerformance Measurement

And one number to rule them all

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 8: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The goalPerformance Measurement

Bandwidth

Real bandwidth v/s advertised bandwidthBandwidth to your server, not to the ISPBandwidth during normal internet usage

If the user’s always watching movies, you’re not winning

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 9: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The goalPerformance Measurement

Bandwidth

Real bandwidth v/s advertised bandwidthBandwidth to your server, not to the ISPBandwidth during normal internet usage

If the user’s always watching movies, you’re not winning

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 10: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The goalPerformance Measurement

Latency

How long does it take a byte to get to the user?Wired, wireless, mobile, satellite?How many hops in between?Speed of light is constant

This is not a battle we will soon win.When was the last time you heard latency mentioned in aTV ad?

http://www.stuartcheshire.org/rants/Latency.html

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 11: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The goalPerformance Measurement

Latency

How long does it take a byte to get to the user?Wired, wireless, mobile, satellite?How many hops in between?Speed of light is constant

This is not a battle we will soon win.When was the last time you heard latency mentioned in aTV ad?

http://www.stuartcheshire.org/rants/Latency.html

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 12: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The goalPerformance Measurement

Latency

How long does it take a byte to get to the user?Wired, wireless, mobile, satellite?How many hops in between?Speed of light is constant

This is not a battle we will soon win.When was the last time you heard latency mentioned in aTV ad?

http://www.stuartcheshire.org/rants/Latency.html

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 13: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The goalPerformance Measurement

User perceived page load time

Time from “click on a link” to “spinner stops spinning”This is what users notice

Depends on how long your page takes to buildDepends on what’s in your pageDepends on how long components take to loadDepends on how long the browser takes to execute andrender

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 14: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The goalPerformance Measurement

We need to measure real user data

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 15: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

The goalPerformance Measurement

The statistics apply to any kind of performance data though

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 16: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Statistics - I

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 17: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Disclaimer

I am not a statistician

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 18: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Population

All possible users of your system

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 19: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Sample

Representative subset of the population

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 20: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Bad sample

Sometimes it’s not

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 21: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

How to randomize?

Pick 10% of users at random and always test them

OR

For each user, decide at random if they should be tested

http://tech.bluesmoon.info/2010/01/statistics-of-performance-measurement.html

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 22: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Select 10% of users - I

if($sessionid % 10 === 0) {// instrument code for measurement

}

Once a user enters the measurement bucket, they staythere until they log outFixed set of users, so tests may be more consistentError in the sample results in positive feedback

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 23: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Select 10% of users - II

if(rand() < 0.1 * getrandmax()) {// instrument code for measurement

}

For every request, a user has a 10% chance of beingtestedGets rid of positive feedback errors, but sample size !=10% of population

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 24: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

How big a sample is representative?

Select n such that∣∣∣1.96 σ√n

∣∣∣ ≤ 5%µ

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 25: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Standard Deviation

Standard deviation tells you the spread of the curveThe narrower the curve, the more confident you can be

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 26: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

MoE at 95% confidence

±1.96 σ√n

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 27: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

MoE & Sample size

There is an inverse square root correlation between samplesize and margin of error

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 28: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

But wait... it’s not complicated enough.We have different types of margins of error...more about that later

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 29: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

But wait... it’s not complicated enough.We have different types of margins of error...more about that later

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 30: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

But wait... it’s not complicated enough.We have different types of margins of error...more about that later

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 31: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Ding dong

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 32: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

One number

Mean (Arithmetic)Good for symmetric curvesAffected by outliers

Mean(10, 11, 12, 11, 109) = 30

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 33: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

One number

MedianMiddle value measures central tendency wellNot trivial to pull out of a DB

Median(10, 11, 12, 11, 109) = 11

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 34: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

One number

ModeNot often usedMulti-modal distributions suggest problems

Mode(10, 11, 12, 11, 109) = 11

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 35: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Other numbers

A percentile point in the distribution: 95th, 98.5th or 99th

Used to find out the worst user experienceMakes more sense if you filter data first

P95th(10, 11, 12, 11, 109) = 12

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 36: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Other means

Geometric meanGood if your data is exponential in nature(with the tail on the right)

GMean(10, 11, 12, 11, 109) = 16.68

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 37: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Wait... how did I get that?

N

√ΠN

i=1xi — could lead to overflow

e

(ΣN

i=1 loge(xi )N

)— computationally simpler

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 38: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Wait... how did I get that?

N

√ΠN

i=1xi — could lead to overflow

e

(ΣN

i=1 loge(xi )N

)— computationally simpler

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 39: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Wait... how did I get that?

N

√ΠN

i=1xi — could lead to overflow

e

(ΣN

i=1 loge(xi )N

)— computationally simpler

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 40: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Wait... how did I get that?

N

√ΠN

i=1xi — could lead to overflow

e

(ΣN

i=1 loge(xi )N

)— computationally simpler

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 41: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

Other means

And there is also the Harmonic mean, but forget about that

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 42: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

...though consequently

We have other margins of errorGeometric margin of error

Uses geometric standard deviationMedian margin of error

Uses ranges of actual values from data set

Stick to the arithmetic MoE– simpler to calculate, simpler to read and not incorrect

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 43: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Random SamplingMargin of ErrorCentral Tendency

...though consequently

We have other margins of errorGeometric margin of error

Uses geometric standard deviationMedian margin of error

Uses ranges of actual values from data set

Stick to the arithmetic MoE– simpler to calculate, simpler to read and not incorrect

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 44: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

Statistics - II

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 45: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

Outliers

Out of range data pointsNothing you can fix hereThere’s even a book aboutthem

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 46: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

Outliers

Out of range data pointsNothing you can fix hereThere’s even a book aboutthem

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 47: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

Outliers

Out of range data pointsNothing you can fix hereThere’s even a book aboutthem

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 48: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

Outliers

Out of range data pointsNothing you can fix hereThere’s even a book aboutthem

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 49: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

DNS problems can cause outliers

2 or 3 DNS servers for an ISP30 second timeout if first fails... 30 second increase in page load timeMaybe measure both and fix what you canhttp://nms.lcs.mit.edu/papers/dns-ton2002.pdf

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 50: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

Band-pass filtering

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 51: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

Band-pass filtering

Strip everything outside a reasonable rangeBandwidth range: 4kbps - 4GbpsPage load time: 50ms - 120s

You may need to relook at the ranges all the time

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 52: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

IQR filtering

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 53: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

IQR filtering

Here, we derive the range from the data

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 54: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

Let’s look at some real charts

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 55: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

Bandwidth distribution for web devs

x-axis is linear

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 56: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

Now let’s use log(kbps) instead of kbps

x-axis is exponential

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 57: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

Exponential == Geometric

Categories/Buckets grow exponentiallyData is related geometricallyUse the geometric mean and geometric margin of error

Error_range =[

gmean/gmoe, gmean ∗ gmoe]

Non-linear ranges are hard for humans to grok

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 58: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

Exponential == Geometric

Categories/Buckets grow exponentiallyData is related geometricallyUse the geometric mean and geometric margin of error

Error_range =[

gmean/gmoe, gmean ∗ gmoe]

Non-linear ranges are hard for humans to grok

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 59: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

FilteringThe Log-Normal distribution

Exponential == Geometric

Categories/Buckets grow exponentiallyData is related geometricallyUse the geometric mean and geometric margin of error

Error_range =[

gmean/gmoe, gmean ∗ gmoe]

Non-linear ranges are hard for humans to grok

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 60: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

So...

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 61: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Further reading

Web Performance - Not a Simple Numberhttp://www.netforecast.com/Articles/BCR+C25+Web+Performance+-+Not+A+Simple+Number.pdf

Revisiting statistics for web performance (introduction toLog-Normal)http://home.pacbell.net/ciemo/statistics/WhatDoYouMean.pdf

Random Samplinghttp://tech.bluesmoon.info/2010/01/statistics-of-performance-measurement.html

Khan Academy’s tutorials on statisticshttp://khanacademy.com/

Learning about Statistical Learninghttp://measuringmeasures.blogspot.com/2010/01/learning-about-statistical-learning.html

Wikipedia articles on Random Sampling, Central Tendency,Standard Error, Confounding, Means and IQR

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 62: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Summary

Choose a reasonable sample size and sampling factorTune sample size for minimal margin of errorDecide based on your data whether to use mode, medianor one of the meansFigure out whether your data is Normal, Log-Normal orsomething elseFilter out anomalous outliers

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 63: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

contact me

Philip [email protected]

bluesmoon.info@bluesmoon

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 64: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

Photo credits

http://www.flickr.com/photos/leoffreitas/332360959/ by leoffreitas

http://www.flickr.com/photos/cobalt/56500295/ by cobalt123

http://www.flickr.com/photos/sophistechate/4264466015/ by LisaBrewster

http://www.flickr.com/photos/nchoz/243216008/ by nchoz

ConFoo / 2010-03-12 The Statistics of Web Performance

Page 65: The Statistics of Web Performance

IntroductionStatistics - IStatistics - II

List of figures

http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg

http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg

http://en.wikipedia.org/wiki/File:KilroySchematic.svg

http://en.wikipedia.org/wiki/File:Boxplot_vs_PDF.png

ConFoo / 2010-03-12 The Statistics of Web Performance