the statistics of web performance

IntroductionStatistics - IStatistics - II

The Statistics of Web Performance

Philip Tellis / philip@bluesmoon.info

ConFoo / 2010-03-12

ConFoo / 2010-03-12 The Statistics of Web Performance

$ finger philip

Philip Tellisphilip@bluesmoon.info

@bluesmoonyahoogeek

The goalPerformance Measurement

Introduction

Accurately measure page performanceAt least, as accurately as possible

Be unintrusive

If you try to measure something accurately, you will changesomething related

– Heisenberg’s uncertainty principle

And one number to rule them all

Bandwidth

Real bandwidth v/s advertised bandwidthBandwidth to your server, not to the ISPBandwidth during normal internet usage

If the user’s always watching movies, you’re not winning

Bandwidth

Real bandwidth v/s advertised bandwidthBandwidth to your server, not to the ISPBandwidth during normal internet usage

If the user’s always watching movies, you’re not winning

Latency

How long does it take a byte to get to the user?Wired, wireless, mobile, satellite?How many hops in between?Speed of light is constant

This is not a battle we will soon win.When was the last time you heard latency mentioned in aTV ad?

http://www.stuartcheshire.org/rants/Latency.html

Latency

User perceived page load time

Time from “click on a link” to “spinner stops spinning”This is what users notice

Depends on how long your page takes to buildDepends on what’s in your pageDepends on how long components take to loadDepends on how long the browser takes to execute andrender

We need to measure real user data

The statistics apply to any kind of performance data though

Random SamplingMargin of ErrorCentral Tendency

Statistics - I

Disclaimer

I am not a statistician

Population

All possible users of your system

Sample

Representative subset of the population

Bad sample

Sometimes it’s not

How to randomize?

Pick 10% of users at random and always test them

For each user, decide at random if they should be tested

http://tech.bluesmoon.info/2010/01/statistics-of-performance-measurement.html

Select 10% of users - I

if($sessionid % 10 === 0) {// instrument code for measurement

Once a user enters the measurement bucket, they staythere until they log outFixed set of users, so tests may be more consistentError in the sample results in positive feedback

Select 10% of users - II

if(rand() < 0.1 * getrandmax()) {// instrument code for measurement

For every request, a user has a 10% chance of beingtestedGets rid of positive feedback errors, but sample size !=10% of population

How big a sample is representative?

Select n such that∣∣∣1.96 σ√n

∣∣∣ ≤ 5%µ

Standard Deviation

Standard deviation tells you the spread of the curveThe narrower the curve, the more confident you can be

MoE at 95% confidence

±1.96 σ√n

MoE & Sample size

There is an inverse square root correlation between samplesize and margin of error

But wait... it’s not complicated enough.We have different types of margins of error...more about that later

Ding dong

One number

Mean (Arithmetic)Good for symmetric curvesAffected by outliers

Mean(10, 11, 12, 11, 109) = 30

One number

MedianMiddle value measures central tendency wellNot trivial to pull out of a DB

Median(10, 11, 12, 11, 109) = 11

One number

ModeNot often usedMulti-modal distributions suggest problems

Mode(10, 11, 12, 11, 109) = 11

Other numbers

A percentile point in the distribution: 95th, 98.5th or 99th

Used to find out the worst user experienceMakes more sense if you filter data first

P95th(10, 11, 12, 11, 109) = 12

Other means

Geometric meanGood if your data is exponential in nature(with the tail on the right)

GMean(10, 11, 12, 11, 109) = 16.68

Wait... how did I get that?

√ΠN

i=1xi — could lead to overflow

i=1 loge(xi )N

)— computationally simpler

√ΠN

i=1 loge(xi )N

√ΠN

i=1 loge(xi )N

√ΠN

i=1 loge(xi )N

Other means

And there is also the Harmonic mean, but forget about that

...though consequently

We have other margins of errorGeometric margin of error

Uses geometric standard deviationMedian margin of error

Uses ranges of actual values from data set

Stick to the arithmetic MoE– simpler to calculate, simpler to read and not incorrect

...though consequently

We have other margins of errorGeometric margin of error

Uses geometric standard deviationMedian margin of error

Uses ranges of actual values from data set

Stick to the arithmetic MoE– simpler to calculate, simpler to read and not incorrect

FilteringThe Log-Normal distribution

Statistics - II

Outliers

Out of range data pointsNothing you can fix hereThere’s even a book aboutthem

Outliers

DNS problems can cause outliers

2 or 3 DNS servers for an ISP30 second timeout if first fails... 30 second increase in page load timeMaybe measure both and fix what you canhttp://nms.lcs.mit.edu/papers/dns-ton2002.pdf

Band-pass filtering

Strip everything outside a reasonable rangeBandwidth range: 4kbps - 4GbpsPage load time: 50ms - 120s

You may need to relook at the ranges all the time

IQR filtering

Here, we derive the range from the data

Let’s look at some real charts

Bandwidth distribution for web devs

x-axis is linear

Now let’s use log(kbps) instead of kbps

x-axis is exponential

Exponential == Geometric

Categories/Buckets grow exponentiallyData is related geometricallyUse the geometric mean and geometric margin of error

Error_range =[

gmean/gmoe, gmean ∗ gmoe]

Non-linear ranges are hard for humans to grok

Error_range =[

the statistics of web performance

introduction statistics

goal statistics

error statistics

central tendency statistics

page performance

kind of performance

introduction confoo

possible confoo

Technology

web security and web defacement statistics

math performance success: elementary algebra-statistics

performance statistics

dr. veronika alhanaqtah. statistics - web viewto investigate...

hanover college academic performance...

statistics web services - five9 · statistics web services...

flarenet web statistics - core.ac.uk · pdf filetechnical...

the statistics of web performance analysis

uber 2014 stats - marketing blog performance statistics

aviation statistics airline on time performance · aviation...

managing statistics for optimal query performance

2012 sme statistics web

ibm spss-statistics-performance-best-practices

fishery statistics, stock status and performance

port performance freight statistics working group …...

programming and web statistics

analysing web statistics

annual planning performance statistics, 2019/20€¦ ·...

nhs performance statistics

statistics for performance evaluation