the statistics of web performance analysis

67
Philip Tellis .com [email protected] @bluesmoon geek paranoid speedfreak http://bluesmoon.info/ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 1

Upload: philip-tellis

Post on 27-Jan-2015

110 views

Category:

Technology


5 download

DESCRIPTION

If you're interested in measuring real user web performance, you'll find tools like boomerang or episodes quite handy. Some popular web frameworks even have modules that make it easy to add them to your site. However, what does one do once one has collected the data? How do you filter out the noise and get meaningful insights from the data? In this talk, I'll go over the techniques we've picked up by analyzing millions of datapoints daily. I'll cover some simple rules to filter out invalid data, and the statistics to analyze and make sense of what's left. Do you use the mean, median or mode? What about the geometric mean and standard deviation? How confident are we in the results? And finally, why should we care? This talk should help you gain useful insights from a histogram, or at the very least point you in the right direction for further analysis.

TRANSCRIPT

Page 1: The Statistics of Web Performance Analysis

• Philip Tellis

• .com• [email protected]

• @bluesmoon• geek paranoid speedfreak• http://bluesmoon.info/

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 1

Page 2: The Statistics of Web Performance Analysis

I’m a Web Speedfreak

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 2

Page 3: The Statistics of Web Performance Analysis

We measure real user website performance

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 3

Page 4: The Statistics of Web Performance Analysis

This talk is about the Statistics we learned while building it

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 4

Page 5: The Statistics of Web Performance Analysis

The Statistics of Web Performance Analysis

Philip Tellis / [email protected]

Boston #WebPerf Meetup / 2012-08-14

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 5

Page 6: The Statistics of Web Performance Analysis

0Numbers

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 6

Page 7: The Statistics of Web Performance Analysis

Accurately measure page performance∗

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 7

Page 8: The Statistics of Web Performance Analysis

Be unintrusive

If you try to measure something accurately, you will changesomething related

– Heisenberg’s uncertainty principle

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 8

Page 9: The Statistics of Web Performance Analysis

And one number to rule them all

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 9

Page 10: The Statistics of Web Performance Analysis

What do we measure?

• Network Throughput• Network Latency• User perceived page load time

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 10

Page 11: The Statistics of Web Performance Analysis

We measure real user data

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 11

Page 12: The Statistics of Web Performance Analysis

Which is noisy

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 12

Page 13: The Statistics of Web Performance Analysis

1Statistics - 1

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 13

Page 14: The Statistics of Web Performance Analysis

Disclaimer

I am not a statistician

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 14

Page 15: The Statistics of Web Performance Analysis

1-1Random Sampling

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 15

Page 16: The Statistics of Web Performance Analysis

Population

All possible users of your system

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 16

Page 17: The Statistics of Web Performance Analysis

Sample

Representative subset of the population

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 17

Page 18: The Statistics of Web Performance Analysis

Bad sample

Sometimes it’s not

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 18

Page 19: The Statistics of Web Performance Analysis

How to randomize?

http://xkcd.com/221/

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 19

Page 20: The Statistics of Web Performance Analysis

How to randomize?

• Pick 10% of users at random and always test them

OR

• For each user, decide at random if they should be tested

http://tech.bluesmoon.info/2010/01/statistics-of-performance-measurement.html

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 20

Page 21: The Statistics of Web Performance Analysis

Select 10% of users - I

if($sessionid % 10 === 0) {// instrument code for measurement

}

• Once a user enters the measurement bucket, they staythere until they log out

• Fixed set of users, so tests may be more consistent• Error in the sample results in positive feedback

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 21

Page 22: The Statistics of Web Performance Analysis

Select 10% of users - II

if(rand() < 0.1 * getrandmax()) {// instrument code for measurement

}

• For every request, a user has a 10% chance of beingtested

• Gets rid of positive feedback errors, but sample size !=10% of population

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 22

Page 23: The Statistics of Web Performance Analysis

How big a sample is representative?

Select n such that∣∣∣1.96 σ√n

∣∣∣ ≤ 5%µ

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 23

Page 24: The Statistics of Web Performance Analysis

1-2Margin of Error

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 24

Page 25: The Statistics of Web Performance Analysis

Standard Deviation

• Standard deviation tells you the spread of the curve• The narrower the curve, the more confident you can be

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 25

Page 26: The Statistics of Web Performance Analysis

MoE at 95% confidence

±1.96 σ√n

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 26

Page 27: The Statistics of Web Performance Analysis

MoE & Sample size

There is an inverse square root correlation between sample sizeand margin of error

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 27

Page 28: The Statistics of Web Performance Analysis

1-3Central Tendency

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 28

Page 29: The Statistics of Web Performance Analysis

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 29

Page 30: The Statistics of Web Performance Analysis

One number

• Mean (Arithmetic)• Good for symmetric curves• Affected by outliers

Mean(10, 11, 12, 11, 109) = 30

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 30

Page 31: The Statistics of Web Performance Analysis

One number

• Median• Middle value measures central tendency well• Not trivial to pull out of a DB

Median(10, 11, 12, 11, 109) = 11

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 31

Page 32: The Statistics of Web Performance Analysis

One number

• Mode• Not often used• Multi-modal distributions suggest problems

Mode(10, 11, 12, 11, 109) = 11

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 32

Page 33: The Statistics of Web Performance Analysis

Other numbers

• A percentile point in the distribution: 95th, 98.5th or 99th

• Used to find out the worst user experience• Makes more sense if you filter data first

P95th(10, 11, 12, 11, 109) = 12

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 33

Page 34: The Statistics of Web Performance Analysis

Other means

• Geometric mean• Good if your data is exponential in nature

(with the tail on the right)

GMean(10, 11, 12, 11, 109) = 16.68

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 34

Page 35: The Statistics of Web Performance Analysis

Wait... how did I get that?

N

√ΠN

i=1xi — could lead to overflow

e

(ΣN

i=1 loge(xi )N

)— computationally simpler

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35

Page 36: The Statistics of Web Performance Analysis

Wait... how did I get that?

N

√ΠN

i=1xi — could lead to overflow

e

(ΣN

i=1 loge(xi )N

)— computationally simpler

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35

Page 37: The Statistics of Web Performance Analysis

Wait... how did I get that?

N

√ΠN

i=1xi — could lead to overflow

e

(ΣN

i=1 loge(xi )N

)— computationally simpler

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35

Page 38: The Statistics of Web Performance Analysis

Wait... how did I get that?

N

√ΠN

i=1xi — could lead to overflow

e

(ΣN

i=1 loge(xi )N

)— computationally simpler

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35

Page 39: The Statistics of Web Performance Analysis

Other means

And there is also the Harmonic mean, but forget about that

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 36

Page 40: The Statistics of Web Performance Analysis

...though consequently

We have other margins of error• Geometric margin of error

• Uses geometric standard deviation• Median margin of error

• Uses ranges of actual values from data set

• Stick to the arithmetic MoE– simpler to calculate, simpler to read and not incorrect

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 37

Page 41: The Statistics of Web Performance Analysis

...though consequently

We have other margins of error• Geometric margin of error

• Uses geometric standard deviation• Median margin of error

• Uses ranges of actual values from data set

• Stick to the arithmetic MoE– simpler to calculate, simpler to read and not incorrect

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 37

Page 42: The Statistics of Web Performance Analysis

2Statistics - 2

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 38

Page 43: The Statistics of Web Performance Analysis

2-1Distributions

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 39

Page 44: The Statistics of Web Performance Analysis

Let’s look at some real charts

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 40

Page 45: The Statistics of Web Performance Analysis

Sparse Distribution

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 41

Page 46: The Statistics of Web Performance Analysis

Log-normal distribution

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 42

Page 47: The Statistics of Web Performance Analysis

Bimodal distribution

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 43

Page 48: The Statistics of Web Performance Analysis

What does all of this mean?

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 44

Page 49: The Statistics of Web Performance Analysis

Distributions

• Sparse distribution suggests that you don’t have enoughdata points

• Log-normal distribution is typical• Bi-modal distribution suggests two (or more) distributions

combined

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 45

Page 50: The Statistics of Web Performance Analysis

In practice, a bi-modal distribution is not uncommon

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 46

Page 51: The Statistics of Web Performance Analysis

Hint: Does your site do a lot of back-end caching?

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 47

Page 52: The Statistics of Web Performance Analysis

2-2Filtering

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 48

Page 53: The Statistics of Web Performance Analysis

Outliers

• Out of range data points• Nothing you can fix here• There’s even a book about

them

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49

Page 54: The Statistics of Web Performance Analysis

Outliers

• Out of range data points• Nothing you can fix here• There’s even a book about

them

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49

Page 55: The Statistics of Web Performance Analysis

Outliers

• Out of range data points• Nothing you can fix here• There’s even a book about

them

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49

Page 56: The Statistics of Web Performance Analysis

Outliers

• Out of range data points• Nothing you can fix here• There’s even a book about

them

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49

Page 57: The Statistics of Web Performance Analysis

DNS problems can cause outliers

• 2 or 3 DNS servers for an ISP• 30 second timeout if first fails• ... 30 second increase in page load time• Maybe measure both and fix what you can• http://nms.lcs.mit.edu/papers/dns-ton2002.pdf

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 50

Page 58: The Statistics of Web Performance Analysis

Band-pass filtering

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 51

Page 59: The Statistics of Web Performance Analysis

Band-pass filtering

• Strip everything outside a reasonable range• Bandwidth range: 4kbps - 4Gbps• Page load time: 50ms - 120s

• You may need to relook at the ranges all the time

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 51

Page 60: The Statistics of Web Performance Analysis

IQR filtering

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 52

Page 61: The Statistics of Web Performance Analysis

IQR filtering

Here, we derive the range from the data

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 52

Page 62: The Statistics of Web Performance Analysis

Further Reading

lognormal.com/blog/2012/08/13/analysing-performance-data/

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 53

Page 63: The Statistics of Web Performance Analysis

Summary

• Choose a reasonable sample size and sampling factor• Tune sample size for minimal margin of error• Decide based on your data whether to use mode, median

or one of the means• Figure out whether your data is Normal, Log-Normal or

something else• Filter out anomalous outliers

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 54

Page 64: The Statistics of Web Performance Analysis

• Philip Tellis

• .com• [email protected]

• @bluesmoon• geek paranoid speedfreak• http://bluesmoon.info/

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 55

Page 65: The Statistics of Web Performance Analysis

Thank you

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 56

Page 66: The Statistics of Web Performance Analysis

Photo credits

• http://www.flickr.com/photos/leoffreitas/332360959/ by leoffreitas

• http://www.flickr.com/photos/cobalt/56500295/ by cobalt123

• http://www.flickr.com/photos/sophistechate/4264466015/ by LisaBrewster

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 57

Page 67: The Statistics of Web Performance Analysis

List of figures

• http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg

• http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg

• http://en.wikipedia.org/wiki/File:KilroySchematic.svg

• http://en.wikipedia.org/wiki/File:Boxplot_vs_PDF.png

Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 58