summarizing performance data confidence intervalsperfeval.epfl.ch/printme/conf.pdf · summarizing...

59
Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content JeanYves Le Boudec, February 2019

Upload: others

Post on 14-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Summarizing Performance DataConfidence Intervals

ImportantEasy to Difficult

Warning: some mathematical content

Jean‐Yves Le Boudec, February 2019

Page 2: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Contents

1. Summarized data2. Confidence Intervals

3. Independence Assumption4. Prediction Intervals

5. Which Summarization to Use ? 

2

Page 3: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

1  Summarizing Performance Data

How do you quantify: Central valueDispersion (Variability)

3

old new

Page 4: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Histogram is one answer

4

old new

Page 5: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

ECDF allow easy comparison

5

oldnew

execution time (ms)

cum proportion

Page 6: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Summarized Measures

Median, QuantilesMedian  : the data point in the middle:

if is odd, median =  else 

Quartiles‐quantiles

Mean and standard deviation

Mean 

Standard deviation 

Coefficient of variation =  (scale‐free)

6

Page 7: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Which are correct interpretations of standard deviation ? ( mean,  standard deviation)

A. With 95% probability, a new data sample lies in the interval  ±1.96

B. With 99.9% probability, a new data sample lies in the interval 

C. A and BD. NoneE. I don’t know

7

Page 8: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Example

9

mean and standard deviationquantiles

Boxplots

Page 9: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Other summarizations commonly used:Fairness indices

Jain’s fairness indexLorenz curve gap and Gini index

see lecture notes for more details

10

Page 10: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

2. Confidence Interval

Do not confuse with prediction intervalConfidence interval quantifies uncertainty about an estimation

11

Page 11: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

12

mean and standard deviationquantiles

Page 12: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Confidence Intervals for Mean of Difference

Mean reduction = 

0 is outside the confidence intervals for mean and for median

Confidence interval for median

13

Page 13: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

How are Confidence Intervals computed ?

This is simple if we can assume that the data comes from an iidmodel

iid = Independent Identically Distributed

14

Page 14: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

What is a confidence interval, really ?

1. Assume the data we have obtained is generated by a simulator, which has drawn the data from a distribution with CDF 

2. The distribution  has a true median  . i.e.  . , which we want to estimate.

3. We have obtained  ;  a point estimate  of . is derived using the formula given earlier; depends on the data and is also random (hopefully not too much)

4. A confidence interval for  . at confidence level 95% is an interval  such that

.

15

Page 15: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

The formula for CI for median

16

Page 16: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Using the formula ( level 95% )

We have 10 values 

Simple !

17

“ ”

Page 17: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Example n  , confidence interval for median

The median estimate is  

At confidence level 95%50 9.8 4051 9.8 61

a confidence interval for the median is  ;At confidence level 99%

50 12.8 3751 12.8 64

a confidence interval for the median is  ;

18

Page 18: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Idea of the proof of Theorem 2.1 

Say  , we want to compute  .Consider the game, done by an oracle who knows  . : 

for each data, if  . declare success, else declare failureLet  be the (random) number of successes

.

19

Page 19: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Idea of the proof of Theorem 2.1 

Say  , we want to compute  .Consider the game, done by an oracle who knows  . : 

for each data, if  . declare success, else declare failureLet  be the (random) number of successes

.

20

Numberofsuccesses Is . true?

Page 20: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Idea of the proof of Theorem 2.1 

Say  , we want to compute  .Consider the game, done by an oracle who knows  . : 

for each data, if  . declare success, else declare failureLet  be the (random) number of successes

.

22

Numberofsuccesses Is . true?

Page 21: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Idea of the proof of Theorem 2.1 

Say  , we want to compute  .Consider the game, done by an oracle who knows  . : 

for each data, if  . declare success, else declare failureLet  be the (random) number of successes

.

The event  . means : 

The event  . means : 

The event  . means : 

The distribution of  is Binomial.

, . , .

. . if the distribution of  has a density24

Page 22: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Confidence Interval for Mean 

This is the most commonly used confidence intervalBut requires some assumptions to hold, may be misleading if they do not holdThere is no exact theorem as for median and quantiles, but there are asymptotic results and a heuristic.

25

Page 23: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Computing a confidence interval for the mean

1. Assume the data we have obtained is generated by a simulator, which has drawn the data from a distribution with CDF 

2. The distribution  has a true mean  : 

3. We have obtained  ; we estimate using some formula  ; our point estimate depends on the data and is also random (hopefully not too much)

4. A confidence interval for  at confidence level 95% is an interval  such that

26

Page 24: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

CI for mean, asymptotic caseIf  central limit theorem holds, i.e the sample mean is approx. normal (in practice: n is large and distribution is not “wild”‐ not heavy tailed ‐‐see chapter 3)

27

Page 25: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Example

95% confidence level

CI for mean:   where 

sample meanestimate of standard deviation

amplitude of CI decreases in 

compare to prediction interval 28

Page 26: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Idea of Proof

,  

central limit theorem : approximately, for large 

with proba 95%, a standard normal variable is in  , thus

29

Page 27: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

CI for Mean, Normal CaseAssume data comes from an iid + normal distributionNot so frequent in reality;  Useful for very small data samples ( 30)

30

Page 28: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Example with Theorem 2.2.2(Large CI for mean: 

Theorem 2.2.3 (Normal case)CI for mean: 

31

InpracticebotharethesameifButTheorem2.2.2doesnotrequiredatatobenormal

Sample variance,= maximum likelihood estimator of variance

Unbiased estimator of variance

Page 29: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Tables in  [Weber‐Tables]

32

Page 30: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

We test a system 10’000 time for failures and find 200 failures: give a 95% confidence interval for the failure probability  .

Let  or  (failure / success); So we are estimating the mean. The asymptotic theory applies (no heavy tail)

… …

Confidence Interval:  at level 0.95

33

Page 31: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

We test a system 10 time for failures and find0 failure: give a 95% confidence interval for the failure probability .

A. [0 ; 0]B. [0 ; 0.1]C. [0 ; 0.11]D. [0 ; 0.21]E. [0; 0.31]F. [0; 1]G. I do not know

34

Page 32: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

36

Page 33: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

When we observe no event in  experiments, a confidence interval for the probability of occurrence is  with

For  and  .

37

Page 34: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

We test a system 10 time for failures and find 0 failure: give a 95% confidence interval for the failure probability  .

A. [0 ; 0]B. [0 ; 0.1]C. [0 ; 0.11]D. [0 ; 0.21]E. [0; 0.31]F. [0; 1]G. I do not know

38

Page 35: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

We test a system 10’000 time for failures and find 200 failures: give a 95% confidence interval for the failure probability .

Apply formula 2.29 ( and 

i.e. in this case it is the same as the classical formula for a confidence interval for the mean (Theorem 2.2.2). 

40

Page 36: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Take Home Message

Confidence interval for median (or other quantiles) is easy to get from the Binomial distributionRequires iid‐ness, No other assumptionConfidence interval for the mean requires iid‐ness and

Either if data sample is normal Or data sample is not wild (heavy‐tailed) and n is large enough

Confidence interval for success probability requires special attention when success or failure is rare

41

Page 37: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

3. The Independence Assumption

Confidence Intervals require that we can assume that the data comes from an iid model

Independent Identically Distributed

How do I know if this is true ?Controlled experiments: draw factors randomly with 

replacementSimulation: independent replications (with random seeds)

Else: we do not know – in some cases we can test it using auto‐correlation plots (Chapter 5)

42

Page 38: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

What happens if data is not iid ?

If data is positively correlatedNeighboring values look similarFrequent in measurementsConfidence Interval is underestimated: there is less information in 

the data than one thinks

Doing 10 positively correlated measurements gives less information than doing 10 independent measurements

43

Page 39: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

What is true when  are independent ?

A. Observing   gives no information about  B. Observing  gives no information about 

when we know the distribution of is a function of 

onlyD. A and BE. A and CF. B and CG. AllH. NoneI. I don’t know

44

Page 40: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

4. Prediction Interval

CI for mean or median summarizeCentral value + uncertainty about it

Prediction interval is often used to summarize variability of data 

46

Page 41: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Prediction Interval based on Order Statistic

Assume data comes from an iid modelSimplest  and most robust result (not well known, though):

For example:  ;  a prediction interval at level 0.95 is

47

Page 42: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Prediction Interval for small n

For  min max is a prediction interval at level 95%For  there is no prediction interval at level 95% with this methodBut there is one at level 90% for For  we have a prediction interval  min max at level 

81% 

48

Page 43: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Prediction Interval based on Mean

Very often used – but see later

Prediction interval at level 95% = 

49

Page 44: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Prediction Intervals for File Transfer Times

50

mean andstandard deviationorder statistic

thm 2.4.1

Page 45: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Which one is a “valid” prediction interval ?

A. The left oneB. The middle oneC. BothD. I don’t know

51

AB

Page 46: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Re‐ScalingMany results are simple if the data is normal, or close to it (i.e. not wild). An important question to ask is: can I change the scale of my data to have it look more normal.Ex: log of the data instead of the data

A generic transformation used in statistics is the Box‐Coxtransformation: 

Continuous in 

53

Page 47: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Prediction Intervals for File Transfer Times

54

order statisticthm 2.4.1

mean andstandard deviationon rescaled data

mean andstandard deviation

Page 48: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

How is the prediction interval computed when re‐scaling ?

is the mean,  is the std‐deviation of the log of the data

with 

is the mean,  is the std‐deviation of the data

is the std‐deviation of the log of the data

D. None of theseE. I don’t know

55

Page 49: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

QQplot is common tool for verifying normal assumption

Normal Qqplot

X‐axis: standard normal quantiles 

Y‐axis: Ordered statistic of sample: 

If data comes from a normal distribution, qqplot is close to a straight line (except for end points) Visual inspection is often enoughIf not possible or doubtful, we will use tests later (Chapter 4)

57

Page 50: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

QQPlots of File Transfer Times

58

Page 51: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Prediction Interval versusConfidence Interval

If data is assumed normal

estimated meanestimated variance

Confidence interval  for mean at level 95 %   .

Prediction interval at level 95% = 

59

Page 52: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

5. Which Summarization to Use ?

IssuesRobustness to outliersDistribution assumptions

60

Page 53: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

A Distribution with Infinite Variance

61

True mean

True median

True mean

True median

CI based on std dv CI based on bootsrp

CI for median

Page 54: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Outlier in File Transfer Time

62

Page 55: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Outlier in File Transfer Time

63

Original data, showing one outlier Outlier removed

Page 56: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Robustness of Conf/Prediction Intervals

64

mean + std dev

CI for median geom mean

Outlier removedOutlier present

Order stat

Based onmean + std dev

Based onmean + std dev

+ re-scaling

Page 57: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Robustness

Methods based on quantiles and order statistics are robust to outliers and do not make any distributional assumption other than iidMethods based on mean and standard deviation are sensitive to outliers and make assumptions about distributions ‐‐‐ use with care

65

Page 58: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Since methods based on mean and standard deviation are less robust, why are they used at all ?

A. By intellectual lazinessB. Because they are easier to computeC. They are more compactD. I don’t know

66

Page 59: Summarizing Performance Data Confidence Intervalsperfeval.epfl.ch/printMe/conf.pdf · Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical

Take‐Home Message

Use methods that you understandMean and standard deviation make sense when data sets are not wild Close to normal, or not heavy tailed and large data sample

Use quantiles and order statistics if you have the choiceRescale if needed

68