how$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for gamma distribution...

23
How do we test whether probability distribu2ons of the two datasets are sta2s2cally different? Reading assignment: Hartman’s note1: p 1523 Khan Academy: hDps://www.khanacademy.org/math/probability/ sta2s2csinferen2al/chisquare/v/pearsonschi squaretestgoodnessoffit hDps://www.khanacademy.org/math/probability/ sta2s2csinferen2al/anova/v/anova3hypothesis testwithfsta2s2c

Upload: others

Post on 27-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

How  do  we  test  whether  probability  distribu2ons  of  the  two  datasets  are  sta2s2cally  different?  

•  Reading  assignment:      •  Hartman’s  note-­‐1:  p  15-­‐23  •  Khan  Academy:  hDps://www.khanacademy.org/math/probability/sta2s2cs-­‐inferen2al/chi-­‐square/v/pearson-­‐s-­‐chi-­‐square-­‐test-­‐goodness-­‐of-­‐fit  

•  hDps://www.khanacademy.org/math/probability/sta2s2cs-­‐inferen2al/anova/v/anova-­‐3-­‐hypothesis-­‐test-­‐with-­‐f-­‐sta2s2c  

Page 2: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

Test probability distribution-Goodness of fit test

•  To  determine  whether  two  data  are  draw  from  the  same  sta2s2cal  distribu2on,  or  test  whether  one  data  are  draw  from  a  specific  sta2s2cal  distribu2on,  e.g.,  normal,  gamma,  etc.  The  most  commonly  used  tests  include:  –  Parametric  goodness  of  fit  tests:  χ2  test  – Non-­‐parametric  test:  Kolmogorov-­‐Smirnov  test  (can  be  used  to  test  for  parametric  distribu2on  func2on,  e.g.,  Q-­‐Q  test  for  normal  distribu2on)  

–  F-­‐test:  determine  whether  the  difference  of  two  data  distribu2ons  are  due  to  varia2on  within  each  data  or  due  to  difference  between  the  two  data.  

 

Page 3: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

Test  the  difference  in  probability  distribu4on  of  the  data  samples:  

•  Chi-­‐square  (χ2)  test:  A  simple  and  common  goodness-­‐of-­‐fit  test.    It  compares  a  data  histogram  (PDF)  of  the  binned  (discrete)  data  sample  distribu2on,  to  parametric  PDF,  such  as  Gaussian,  or  Gamma,  distribu2ons.    Thus,  it  also  can  be  used  to  test  whether  the  data  PDF  is  Gaussian  or  Gamma,  etc.  

χ 2 =(# of observed - # expected)2

# of expectedbins∑ =

(# of observed - N ⋅Pr{data in bin})2

N ⋅Pr{data in bin)bins∑

•  If  the  #  (probability)  of  observed  data  in  various  bins  are  very  different  from  those  expected  based  on  a  specified  PDF,  e.g.,  Gaussian,  χ2  will  be  larger  than  the  cri2cal  value  at  a  specified  significance  (say  95%),  then  the  null  hypothesis  that  the  PDF  of  the  observed  data  is  the  same  as  the  specified  PDF  can  be  rejected  at  the  specified  significance.    

•  You  can  also  use  the    χ2    to  determine  whether  the  two  datasets  are  significantly  different  from  each  other.  

Page 4: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

Example-­‐5:  χ2  test  

•  Test  whether  the  Jan  rainfall  in  Ithaca,  NY  in  the  past  50  year  (N=50)  has  Gamma  or  Gaussian  distribu2on:  

Bin/class   <1”   1-­‐1.5”   1.5-­‐2”   2-­‐2.5”  

2.5-­‐3”  

≥3”  

Obs  #   5   16   10   7   7   5  

Gamma  

Probability,  Pr   0.161   0.215   0.210   0.161   0.108   0.145  

Expected  #:  NPr   8.05   10.75   10.5   8.05   5.40   7.25  

Gaussian  

Probability   0.195   0.146   0.173   0.178   0.132   0.176  

Expected  #   9.75   7.30   8.65   8.90   6.60   8.80  

0  

2  

4  

6  

8  

10  

12  

14  

16  

18  

<1   1.5   2   2.5   3   >3  

#  of  Sam

ples  

rainfall  bins  (inches)  

observed  

gamma  

gaussian  

Page 5: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

χ 2 test for Gamma distribution

χ 2 =(3−8.05)2 + (16−10.75)2 + (10−10.5)2 + (7−8.05)2 + (7− 5.4)2 + (5− 7.25)2

(8.05+10.75+10.5+8.05+ 5.4+ 7.25) = 5.05χ 2 test for Gaussian distribution:

χ 2 =(3− 9.75)2 + (16− 7.3)2 + (10−8.65)2 + (7−8.9)2 + (7− 6.6)2 + (5−8.8)2

(9.75+ 7.3+8.65+8.9+ 6.6+8.8) = 14.96Critical value for the # of freedom ν=(# of bins-# of parameters-1)=6-2-1=3 at 95% confidence, χc

2 is 7.815 based on χ 2 distribution Table. Thus, the null hypothesis that the Jan. rainfall in Ithaca has Gamma distribution cannot be rejected at 95% confidence (χ 2 = 5.05< χc

2 = 7.815). However, the null hypothesis that the Jan. rainfall in Ithaca has Gaussian distribution can be rejected at 95% confidence (χ 2 =14.96 > χc

2 = 7.815).

Page 6: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)
Page 7: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

χ2-­‐Test  of  uncertainty  of  the  variance  To estimate the uncertainty of the variance derivedfrom the data we have with limited samples, we can use χ 2 test:

χ 2 = (N −1) s2

σ 2 where s2 is the variance derived from the data,

σ is the true variance, N: number of the samples

Assume χ 2 follows a normal distribution, f(χ )=foχυ−1e

12χ 2

where ν is the degree of freedoms ν= N - number of parameters being estimatedTo reject the null hypothesis: s2 is different σ 2 at 95% significance,

we need to show χ0.0252 < (N −1) s

2

σ 2 < χ0.9752 because χ 2 is not symmetric

(N −1)s2

χ0.0252 <σ 2 <

(N −1)s2

χ0.9752

Page 8: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

The  Kolmogorov-­‐Smirnov  (K-­‐S)  Test:  

•  K-­‐S  test:  Another  very  frequently  used  test  of  the  goodness  of  fit  for  comparing  the  cumula2ve  distribu2on  func2on  of  the  two  datasets,  or  the  empirical  data  with  the  theore2cal  PDF  of  con2nuous  distribu2ons.    It  does  not  require  to  know  the  distribu2on  func2ons  of  the  data.  

•  Cau2on:  when  compare  the  data  with  the  PDF  obtained  from  the  data  fiing,  you  would  likely  to  reduce  the  chance  to  reject  the  null  hypothesis.  

Page 9: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

A  variety  applica2ons  of  the  K-­‐S  test:  

•  Test  whether  two  data  sets  are  draw  from  the  same  sta2s2cal  distribu2on-­‐Smirnov  two  sample  test  

•  Test  whether  a  dataset  is  drawing  from  a  specific  probability  distribu2on,  e.g.,  Gaussian,  Gamma,  etc  – Lilliefors  test  – Q-­‐Q  test  

Page 10: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

KS-­‐Test  compares  CDFs  of  the  two  datasets  

One  can  compare  the  CDF  of  two  datasets,  or  with  specified  distribu2on,  such  With  Gaussian  distribu2on  with    

Page 11: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

The  Kolmogorov-­‐Smirnov  (K-­‐S)  Test:  

•  Commonly  used  K-­‐S  test  is  called  Lilliefors  (1967)  test:  

Dn = maxx

|F1(x)-F2 (x)| Maximum difference between the CDFs of the two

datasets, F(x) is the CDF evaluated at x.The critical value for K-S test at a specified signficant leve, α,

Cα =Kα

N + 0.12+ 0.11/ N,

where Kα =1.224, 1.358 and 1.628 for α=0.10, 0.05 and 0.01, respectively.If Dn >Cα, then the null hypothesis that the data has the same CDF as the specified or theoretic CDF can be rejected as the 1-α significance.

F1(x)  

F2x)  

Page 12: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

Confident  interval:  

•  One  can  determine  the  confidence  level  that  a  true  probability  F(x)  lies  within  FN(x)±Cα  at  1-­‐α  confidence.  

Page 13: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

Example-­‐6:  •  K-­‐S  test  on  whether  Jan  rainfall  in  Ithaca  during  1933-­‐82  has  

Gaussian  or  Gamma  distribu2on:.    rainfall  bin   obs  CDF   GammaCDF   GaussianCDF   obs-­‐Gam   Obs-­‐Gaussian  

50  

<1   0.1   0.161   0.10   0.061   0  1.5   0.42   0.376   0.30   0.03   0.12  2   0.62   0.586   0.42   0.034   0.20  

2.5   0.76   0.61   0.62   0.015   0.14  

3   0.9   0.850   0.76   0.05   0.14  >3   1   1   1   0   0  

D(N)   0.061   0.2  

C0.05 =Ks

50 + 0.12+ 0.11/ 50=

1.3587.07+ 0.12+ 0.016

= 0.188

For Gamma distribution: D50 = 0.061<C0.05 = 0.169. Thus, the null hypothesis that Jan rainfall distribution is drawn fromGamma distribution cannot be rejected at 95% confidence. We can also determine that the Jan rainfall distribution, F50 (x), at Ithcaa fit the true Gamma distribution within F50 (x)±C0.05 at 95% confidence.

For Gaussian distribution: D50 = 0.2 >C0.05 = 0.188. The null hypothesis that Jan rainfall distribution is drawn from Gaussian distributioncan be rejected at 95% confidence.

Page 14: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

Test  if  two  datasets  come  from  the  same  distribu2on  (are  sta2s2cally  the  same  or  different)  

Smirnov  two-­‐sample  test:  The null hypothesis: two batches of data, X1 with total number of samples ofn, and X2 with total number of samples m come from the same distribution,the Smirnov test looks for the maximum absolute difference between thesetwo batches of data:Ds =max

xFn (x1),−Fm (x2 )

and compare to the critical value at α level of probability, Ds,c,

Ds,c = −12

(1n+

1m

) ⋅ lnα2

If Ds >Ds,c, the null hypothesis can be rejected at 1-α confidence level.

Page 15: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

Example-­‐7  

•  Test  whether  the  Aus2n  climatological  monthly  rainfall  is  sta2s2cally  different  from  that  of  Dallas.  

2.23   2.37   2.51   2.28  2.66  

4.38  

2.45  

1.63  

2.49  

3.95  

2.95  

2.25  2.06  2.7  

3.49  3.07  

4.92  

4.11  

2.21  1.87  

2.84  

4.79  

2.88   2.74  

1   2   3   4   5   6   7   8   9   10   11   12  

climatological  monthly  rainfall  (in)  Aus2n   Dallas  

Month   AusOn  rainfall  (in)  

Dallas  rainfall  (in)  

1   2.23   2.06  

2   2.37   2.7  

3   2.51   3.49  

4   2.28   3.07  

5   2.66   4.92  

6   4.38   4.11  

7   2.45   2.21  

8   1.63   1.87  

9   2.49   2.84  

10   3.95   4.79  

11   2.95   2.88  

12   2.25   2.74  

Page 16: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

0  

0.2  

0.4  

0.6  

0.8  

1  

1.2  

1   2   3   4   5  

CDF  

Rainfall  Bins  

Smirnov  two  sample  test:  

Aus2n  CDF  

Dallas  CDF  

Use  Smirnov  two-­‐sample  test:  

The null hypothesis: the two data are drawnfrom same statistical distribution:Ds =max

xFn (x1),−Fm (x2 ) = 0.15

Ds,95% = −12

(1n+

1m

) ⋅ lnα2

= −12

( 112

+1

12) ⋅ ln 0.05

2= 0.55

Ds,90% = −12

(1n+

1m

) ⋅ lnα2

= −12

( 112

+1

12) ⋅ ln 0.1

2= 0.35

Ds = 0.15Ds,90% = 0.35<Ds,95% = .55, the null hypothesis cannot be rejected at 90% or 95% confidence level.

Page 17: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

Quan2le-­‐quan2le  test:  

•  A  common  approach  to  compare  whether  two  datasets  have  same  distribu2on.  

•  An  simple  and  effect  way  of  tes2ng  whether  the  data  has  Gaussian  distribu2on.  

•  Idea  behind  Q-­‐Q  test:  If  the  data  can  fit  perfected  to  a  fiDed  Gaussian  distribu2on,  the  empirical  CDF  of  this  data  would  agree  perfectly  with  the  CDF  of  the  fiDed  Gaussian  distribu2on.    Consequently,  the  probability  of  a  given  value  in  the  data  would  agree  perfectly  with  that  expected  from  the  fiDed  Gaussian  distribu2on,  or  they  would  have  perfect  correla2on,  r=1.    

•  One  can  either  plot  quan2le  ranked  empirical  values,  xi  (ith  the  smallest  value)  against  that  expected  from  the  fiDed  Gaussian  distribu2on  or  the  empirical  xi  or  ln  (xi)  vs.  z  values  in  the  Q-­‐Q  plot.  ln  (xi)  is  ooen  used  for  non-­‐linear  data  (exponen2ally  increase  or  decrease)  to  improve  the  linearity.  

Page 18: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

How  does  it  work?  

•  If  we  have  a  data  with  n  samples  and  ranks  them  from  the  smallest  to  largest  values,  each  sample  value  has  1/nth  probability  to  occur,  and  its  value,  xi  is  linearly  related  to  zi=(xi-­‐X)/s,  where  X  is  the  mean  and  s  is  the  standard  devia2on  es2mated  from  the  data.  Thus,  if  the  data  were  normally  distributed,  its  zi  (~xi)  distribu2on  would  be  similar  to  that  of  zi  of  a  normal  distribu2on.      

Page 19: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

Generated by CamScanner from intsig.com

Generated by CamScanner from intsig.com

Jan  rainfall  in  Ithaca  

For  a  dataset,  Xi,  i=1,n,  we  can  rank  them  from  the  smallest  to  the  largest  value.    For  the  ith  smallest  value,  its  cumula2ve  probability  can  be  determined  by    Either  cp(xi)=(i-­‐0.5)/n  or  a  varia2on  of  similar  formula,  i.e.,      cp=(i-­‐1/3)/(n+1/3)      (a=1/3  for  Tukey  ploing  posi2on)  Or  more  generally      cp(xi)=(i-­‐a)/(n+1-­‐2a),  where  0≤a≤1    The  find  z-­‐value  of  these  cp(xi),  or  inverse  of  the  CP(xi)  using  excel  func2on  (norm.s.inv)  and  plot  the  ranked  data  against  z-­‐value  s  of  each  sample.    If  the  dataset  has  a  normal  distribu2on,  the  data  points  will  follow  a  straight  line.    Otherwise,  they  will  deviate  from  the  straight  line.    For  more  informa2on  and  examples,  see  hDp://www.youtube.com/watch?v=X9_ISJ0YpGw  hDps://www.youtube.com/watch?v=eYp9QvlDzJA    

Page 20: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

Test  whether  the  wet  season  onset  over  S.  Amazon  follow  Gaussian  distribu2on  

The  null  hypothesis:  the  wet  season  onset  data  samples  follow  Gaussian  distribu2on.  Test:  •  Compute  z-­‐value  of  each  value  for  the  sorted  data  from  the  smallest  to  

largest  value  •  Plot  in  the  Q-­‐Q  plot.    For  a  Guassian  distribu2on,  the  data  samples  

should  follow  a  strait  line,  as  seen  in  the  figure  below.      •  Compute  the  correla2on  between  the  sorted  data  samples  and  

expected  data  samples  based  on  the  the  Gaussian  CDF,  r(obs,  exp)=0.99~1.0  

•  Thus,  the  Q-­‐Q  test  support  the  null  hypothesis  with  high  confidence.  

55.00  56.00  57.00  58.00  59.00  60.00  61.00  62.00  63.00  64.00  65.00  66.00  

-­‐2.00   -­‐1.00   0.00   1.00   2.00  

Q-­‐Q  plot  for  normal  distribuOon  

Q-­‐Q  plot  for  normal  distribu2on  

Years SA24*onset sorted Tukey*position z6values1979.00 1.00 58.00 56.00 0.07 61.48

1980.00 1.00 58.00 56.00 0.07 61.48 1.001981.00 1.00 57.00 56.00 0.07 61.48 61.001982.00 2.00 61.00 57.00 0.17 60.94 1.001983.00 2.00 57.00 57.00 0.17 60.94 61.001984.00 2.00 58.00 57.00 0.17 60.94 1.001985.00 2.00 57.00 57.00 0.17 60.94 61.001986.00 3.00 56.00 58.00 0.28 60.60 61.001987.00 3.00 61.00 58.00 0.28 60.60 1.001988.00 3.00 56.00 58.00 0.28 60.60 61.001989.00 3.00 60.00 58.00 0.28 60.60 1.001990.00 3.00 58.00 58.00 0.28 60.60 61.001991.00 3.00 59.00 58.00 0.28 60.60 1.001992.00 3.00 61.00 58.00 0.28 60.60 1.001993.00 4.00 59.00 59.00 0.38 60.31 61.001994.00 4.00 65.00 59.00 0.38 60.31 1.001995.00 4.00 63.00 59.00 0.38 60.31 61.001996.00 4.00 59.00 59.00 0.38 60.31 61.001997.00 4.00 64.00 59.00 0.38 60.31 1.001998.00 5.00 58.00 60.00 0.48 60.04 61.001999.00 5.00 60.00 60.00 0.48 60.04 1.002000.00 5.00 60.00 60.00 0.48 60.04 0.002001.00 5.00 62.00 60.00 0.48 60.04 1.002002.00 5.00 58.00 60.00 0.48 60.04 61.002003.00 6.00 60.00 61.00 0.59 0.22 1.002004.00 6.00 59.00 61.00 0.59 0.22 61.002005.00 6.00 59.00 61.00 0.59 0.22 0.002006.00 6.00 57.00 61.00 0.59 0.22 61.002007.00 7.00 58.00 62.00 0.69 0.49 1.002008.00 8.00 64.00 63.00 0.79 0.82 1.002009.00 9.00 61.00 64.00 0.90 1.26 61.002010.00 9.00 60.00 64.00 0.90 1.26 61.002011.00 10.00 56.00 65.00 1.00 1.48 61.00

S 24.92 z6values Mean 57.67Var(S) 4065.67 stadd*dev 2.31z= 0.41 R(obs,*exp) 1.00CDF 0.49

Page 21: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

0  

20  

40  

60  

80  

100  

120  

-­‐4   -­‐2   0   2   4   6  

Absolute  stream  flow  Q-­‐Q  

Absolute  stream  flow  Q-­‐Q  

Q-­‐Q  test  on  Gaussian  distribu2on  of  the  bull  creek  mean  stream  flow  in  April  2007-­‐2014.  

•  The  leo  and  right-­‐tails  extreme  values  are  much  larger  than  those  of  Gaussian  distribu2on  

0  0.005  0.01  

0.015  0.02  0.025  0.03  0.035  0.04  0.045  0.05  

0   20   40   60   80   100   120  

PDF  of  the  bull  creek  flow  

Q-­‐Q  plot  of  the  bull  creek  flow  

Non-­‐Gaussian  distribu2on  

Page 22: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

-­‐3  

-­‐2  

-­‐1  

0  

1  

2  

3  

-­‐4   -­‐2   0   2   4   6  

ln(flow)  

ln(flow)  

Q-­‐Q  test  on  Gaussian  distribu2on  of  the  bull  creek  mean  stream  flow  in  April  2007-­‐2014.  

•  Test  distribu2on  of  In(stream  flow)  

PDF  of  the  bull  creek  flow  Q-­‐Q  plot  of  the  log  of  bull  creek  flow  

S2ll  Non-­‐Gaussian  distribu2on  

0  

0.01  

0.02  

0.03  

0.04  

0.05  

-­‐4   -­‐2   0   2   4   6  

ln(flow)  

Page 23: How$do$we$testwhether$probability$distribu2ons$of$ the$two ... · χ2 test for Gamma distribution χ2= (3−8.05)2+(16−10.75)2+(10−10.5)2+(7−8.05)2+(7−5.4)2+(5−7.25)2 (8.05+10.75+10.5+8.05+5.4+7.25)

Summary for the Goodness of fit test:

•  The  most  commonly  used  tests  include:  –  Parametric  goodness  of  fit  tests:  χ2  test,  based  on  comparison  to  fiDed  PDF  of  a  chosen  parametric  

       distribu2on  func2on,  e.g.,  Gaussian,  etc.  – Non-­‐parametric  test:  Kolmogorov-­‐Smirnov  test  (can  be  used  to  test  for  parametric  distribu2on  func2on,  e.g.,  Q-­‐Q  test  for  normal  distribu2on)