summarizing performance data confidence intervals.pdf

Upload: rabbit

Post on 14-Apr-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    1/66

    1

    SummarizingPerformanceData

    ConfidenceIntervals

    ImportantEasytoDifficultWarning:somemathematicalcontent

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    2/66

    Contents

    1. Summarizeddata2. ConfidenceIntervals

    3. IndependenceAssumption4. PredictionIntervals

    5. WhichSummarizationtoUse?

    2

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    3/66

    3

    1 SummarizingPerformanceData

    Howdoyouquantify:CentralvalueDispersion(Variability)

    old new

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    4/66

    4

    Histogramisoneanswer

    old new

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    5/66

    5

    ECDFalloweasycomparison

    oldnew

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    6/66

    6

    SummarizedMeasures

    Median,QuantilesMedianQuartiles

    PquantilesMeanandstandarddeviation

    Mean

    Standarddeviation

    Whatistheinterpretationofstandarddeviation?

    A:ifdataisnormallydistributed,with95%probability,anewdatasampleliesintheinterval

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    7/66

    Example

    7

    mean and standard deviationquantiles

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    8/66

    8

    CoefficientofVariationSummarizes

    VariabilityScalefreeSecondorder

    Foradatasetwithnsamples

    Exponential distribution:CoV =1

    What does CoV =0mean ?

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    9/66

    LorenzCurve Gapis anAlternativetoCoV

    AlternativetoCoV

    Foradatasetwithnsamples

    Scalefree,indexofunfairness

    9

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    10/66

    Jains Fairness Indexis anAlternativetoCoV

    Quantifiesfairnessofx;

    Rangesfrom1:allxi equal1/n:maximumunfairness

    Fairnessandvariabilityaretwosidesofthesamecoin

    10

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    11/66

    LorenzCurve

    Oldcode,newcode:isJFIlarger?Gap?Ginisindexisalsoused;Def:2xareabetweendiagonalandLorenzcurve

    MoreorlessequivalenttoLorenzcurvegap11

    Lorenz Curve gap

    Perfect equality (fairness)

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    12/66

    12

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    13/66

    WhichSummarizationShouldOneUse?

    Thereare(too)manysyntheticindicestochoosefromTraditionalmeasures inengineeringarestandarddeviation,meanandCoV

    Traditionalmeasures in

    computer

    science

    aremeanandJFIJFIis equivalent toCoVIneconomy,gapandGinis index(avariantofLorenzcurve gap)

    Statisticians like medians andquantiles(robust tostatistical assumptions)Wewillcomebacktotheissueafterdiscussingconfidenceintervals

    13

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    14/66

    14

    2.ConfidenceInterval

    DonotconfusewithpredictionintervalQuantifiesuncertaintyaboutanestimation

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    15/66

    15

    mean and standard deviationquantiles

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    16/66

    16

    ConfidenceIntervalsforMeanofDifference

    Meanreduction=

    0isoutsidetheconfidenceintervalsformeanandformedianConfidenceintervalformedian

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    17/66

    17

    ComputingConfidenceIntervals

    Thisissimpleifwecanassumethatthedatacomesfromaniidmodel

    IndependentIdenticallyDistributed

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    18/66

    18

    CIformedian

    IsthesimplestofallRobust:alwaystrueprovidediidassumptionholds

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    19/66

    19

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    20/66

    20

    ConfidenceIntervalforMedian,level95%

    n=31

    n=32

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    21/66

    21

    Examplen ,confidenceintervalfor

    median

    Themedianestimateis Confidencelevel95% 50 9.8 40

    51 9.8 60aconfidenceintervalforthemedianis

    ; Confidencelevel99% 50 12.8 37 51 12.8 64

    aconfidenceintervalforthemediais ;

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    22/66

    22

    CIformeanandStandardDeviation

    Thisisanothermethod,mostcommonlyusedmethodButrequiressomeassumptionstohold,maybemisleadingiftheydonothold

    Thereisnoexacttheoremasformedianandquantiles,butthereareasymptoticresultsandaheuristic.

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    23/66

    23

    CIformean,asymptoticcase

    Ifcentrallimittheoremholds(inpractice:n islargeanddistributionisnotwild)

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    24/66

    24

    Example

    n=100;95%confidencelevel

    CIformean:

    amplitudeofCIdecreasesin

    comparetopredictioninterval

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    25/66

    25

    NormalCase

    Assumedatacomesfromaniid+normal

    distributionUsefulforverysmalldatasamples(n

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    26/66

    26

    Example

    n=100;95%confidencelevelCIformean:

    CIforstandarddeviation:

    sameasbeforeexceptsinsteadof forallninsteadof1.98forn=100

    Inpracticeboth(normalcaseandlargenasymptotic)arethesameifn>30Butlargenasymptoticdoesnotrequirenormalassumption

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    27/66

    27

    Tablesin [WeberTables]

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    28/66

    28

    StandardDeviation:norn1?

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    29/66

    29

    BootstrapPercentileMethod

    Aheuristicthatisrobust(requiresonlyiidassumption)Butbecarefulwithheavytail,seenext

    buttendstounderestimateCI

    SimpletoimplementwithacomputerIdea:usetheempiricaldistributioninplaceofthetheoretical(unknown)distribution

    Forexample,withconfidencelevel=95%:thedatasetisS=Dor=1tor=999

    (replayexperiment)Drawn

    bootstrapreplicateswithreplacement

    fromSComputesamplemeanTr

    Bootstrappercentileestimateis(T(25),T(975))

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    30/66

    30

    Example:CompilerOptions

    Doesdatalooknormal?No

    Methods2.3.1and2.3.2givesameresult(n>30)Method2.3.3(Bootstrap)givessameresult

    => Asymptoticassumptionvalid

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    31/66

    ConfidenceIntervalforFairnessIndex

    Usebootstrapifdataisiid

    31

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    32/66

    32

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    33/66

    We testasystem10000timeforfailures

    andfind 200

    failures:

    give a

    95%

    confidence

    interval forthefailure probability .

    33

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    34/66

    We testasystem10000timeforfailures

    andfind 200

    failures:

    give a

    95%

    confidence

    interval forthefailure probability .

    Let 0 or1 (failure /success); Sowe areestimating themean.Theasymptotic theoryapplies (noheavy tail)

    0.02

    1

    1

    1 0.02 0.98 0.02 0.02 0.14ConfidenceInterval: 0.02 0.003 atlevel 0.95

    34

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    35/66

    We testasystem10timeforfailures and

    find 0failure:

    give a

    95%

    confidence

    interval

    forthefailure probability .

    1. [0;0]2. [0;0.1]

    3. [0;0.11]4. [0;0.21]5. [0;0.31]

    35

    f d l f b b l

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    36/66

    ConfidenceIntervalforSuccessProbability

    Problemstatement:want toestimate proba offailure;observen outcomes;nofailure;confidenceinterval?Example:we testasystem10timeforfailures andfind 0failure:give a95%

    confidenceintervalforthefailureprobability.Isthisaconfidenceintervalforthemean?(explainwhy)Thegeneral theory does notgive goodresults when mean is very small

    36

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    37/66

    37

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    38/66

    38

    We test a system 10000 time for failures and find 200 failures: give

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    39/66

    We testasystem10 000timeforfailures andfind 200failures:give

    a95%confidenceinterval forthefailure probability .

    Applyformula2.29( 200 6and 60.02 1.9610000 200 1 0.02 0.02 1.9610000 10 2 0.02 0.003

    39

    T k H M

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    40/66

    40

    TakeHomeMessage

    Confidenceintervalformedian (orotherquantiles)iseasytogetfromtheBinomialdistributionRequiresiid

    NootherassumptionConfidenceintervalforthemean

    Requiresiid

    AndEitherifdatasampleisnormalandnissmallOrdatasampleisnotwildandnislargeenough

    TheboostrapismorerobustandmoregeneralbutismorethanasimpleformulatoapplyConfidenceintervalforsuccessprobabilityrequiresspecialattentionwhensuccessorfailureisrare

    Toweneedtoverify

    theassumptions

    3 The Independence Assumption

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    41/66

    3.TheIndependenceAssumption

    41

    ConfidenceIntervalsrequirethatwecanassumethatthedatacomesfromaniidmodel

    IndependentIdenticallyDistributed

    HowdoIknowifthisistrue?Controlledexperiments:drawfactorsrandomlywithreplacementSimulation:independentreplications(withrandomseeds)Else:wedonotknowinsomecaseswewillhavemethodsfortimeseries

    What does independence mean ?

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    42/66

    42

    Whatdoesindependencemean?

    Example

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    43/66

    Example

    Pretenddataisiid:CIformeanis[69;

    69.8]Isthisbiased?

    43

    data ACF

    What happens if data is not iid ?

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    44/66

    Whathappensifdataisnotiid?

    IfdataispositivelycorrelatedNeighbouringvalueslooksimilarFrequentinmeasurements

    CIisunderestimated:thereislessinformationinthedatathanonethinks

    44

    4 Prediction Interval

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    45/66

    45

    4.PredictionInterval

    CIformeanormediansummarizeCentralvalue+uncertainty aboutit

    Predictionintervalsummarizesvariability ofdata

    Prediction Interval based on Order Statistic

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    46/66

    46

    PredictionIntervalbasedonOrderStatistic

    AssumedatacomesfromaniidmodelSimplestandmostrobustresult(notwellknown,though):

    Prediction Interval for small n

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    47/66

    47

    PredictionIntervalforsmalln

    Forn=39,[xmin,xmax]isapredictionintervalatlevel95%Forn18

    Forn=10wehaveapredictioninterval[xmin,xmax]atlevel81%

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    48/66

    PredictionIntervalbasedonMean

    48

    Prediction Interval based on Mean

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    49/66

    Prediction Interval based onMean

    Ifdatais notnormal,there is nogeneral result bootstrap canbeused

    Ifdatais assumed normal,howdoCIformean andPredictionInterval based onmean compare?

    49

    Prediction Interval based on Mean

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    50/66

    Prediction Interval based onMean

    Ifdatais notnormal,there is nogeneral result bootstrap canbeused

    Ifdatais assumed normal,howdoCIformean andPredictionInterval based onmean compare?

    estimated mean estimated varianceCIformean at level 95% = .

    Prediction interval at level 95% = 1.96

    50

    ReScaling

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    51/66

    51

    Re Scaling

    Manyresultsaresimpleifthedataisnormal,orclosetoit(i.e.notwild).Animportantquestiontoaskis:canIchangethescale ofmydatatohaveitlookmorenormal.

    Ex:logofthedatainsteadofthedataAgenerictransformationusedinstatisticsistheBoxCoxtransformation:

    Continuousinss=0:logs=1:1/xs=1:identity

    Prediction Intervals for File Transfer Times

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    52/66

    52

    ed ct o te a s o e a s e es

    mean and

    standard deviationon rescaled data

    mean andstandard deviationorder statistic

    WhichSummarizationShouldIUse?

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    53/66

    53

    TwoissuesRobustnesstooutliersCompactness

    QQplotiscommontoolforverifyingassumption

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    54/66

    54

    p y g p

    NormalQqplotXaxis:standardnormalquantiles

    Yaxis:Orderedstatisticofsample:

    Ifdatacomesfromanormaldistribution,qqplotisclosetoastraightline(exceptforendpoints)

    VisualinspectionisoftenenoughIfnotpossibleordoubtful,wewillusetestslater

    QQPlotsofFileTransferTimes

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    55/66

    55

    TakeHomeMessage

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    56/66

    56

    g

    Theinterpretationof asmeasureofvariabilityismeaningfulifthedataisnormal(orclosetonormal).Else,itismisleading.Thedatashouldbebestrescaled.

    5.WhichSummarizationtoUse?

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    57/66

    IssuesRobustnesstooutliersDistributionassumptions

    57

    ADistributionwithInfiniteVariance

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    58/66

    58

    True mean

    True median

    True mean

    True median

    CI based on std dv CI based on bootsrp

    CI for median

    OutlierinFileTransferTime

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    59/66

    59

    RobustnessofConf/PredictionIntervals

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    60/66

    60

    mean + std dev

    CI for median geom mean

    Outlier removed

    Outlier present

    Order stat

    Based onmean + std dev

    Based on

    mean + std dev

    + re-scaling

    FairnessIndices

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    61/66

    ConfidenceIntervalsobtainedbyBootstrapHow?

    JFIisverydependentononeoutlier

    Asexpected,sinceJFIisessentiallyCoV,i.e.standarddeviationGapissensitive,butless

    Doesnotusesquaring;why?

    61

    Compactness

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    62/66

    62

    Ifnormalassumption(or,forCI;asymptoticregime)holds, and aremorecompacttwovaluesgiveboth:CIsatalllevels,predictionintervalsDerivedindices:CoV,JFI

    Incontrast,CIsformediandoesnotgiveinformationonvariability

    Predictionintervalbasedonorderstatisticisrobust(and,IMHO,best)

    TakeHomeMessage

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    63/66

    63

    UsemethodsthatyouunderstandMeanandstandarddeviationmakesensewhendatasetsarenotwildClosetonormal,ornotheavytailedandlargedatasample

    UsequantilesandorderstatisticsifyouhavethechoiceRescale

    Questions

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    64/66

    64

    Questions

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    65/66

    65

    Questions

  • 7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf

    66/66

    66