an introduction to extreme value analysis

37
An Introduction to Extreme Value Analysis Whitney Huang [email protected] Clemson EVA Group, January 29, 2020

Upload: others

Post on 25-Dec-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to Extreme Value Analysis

An Introduction to Extreme Value Analysis

Whitney Huang

[email protected]

Clemson EVA Group, January 29, 2020

Page 2: An Introduction to Extreme Value Analysis

Outline

Motivation

Extreme Value Theorem & Block Maxima Method

Peaks–Over–Threshold (POT) Method

Page 3: An Introduction to Extreme Value Analysis

Extreme Rainfall During Hurricane Harvey

I “A storm forces Houston, the limitless city, to consider itslimits” – The New York Times (8.31.17)

Page 4: An Introduction to Extreme Value Analysis

Environmental Extremes: Heatwaves, Storm Surges, etc.

I Heat wave: The 2003 European heat wave led to the hottestsummer on record in Europe since 1540 that resulted in atleast 30,000 deaths

I Storm Surge: Hurricane Katrina produced the highest stormsurge ever recorded (27.8 feet) on the U.S. coast

Page 5: An Introduction to Extreme Value Analysis

Scientific Questions

I How to estimate the magnitude of extreme events (e.g.100-year rainfall)?

I How extremes vary in space?

I How extremes may change in future climate conditions?

Page 6: An Introduction to Extreme Value Analysis

Outline

Motivation

Extreme Value Theorem & Block Maxima Method

Peaks–Over–Threshold (POT) Method

Page 7: An Introduction to Extreme Value Analysis

Usual vs Extremes

Page 8: An Introduction to Extreme Value Analysis

Probability Framework

Let X1, · · · , Xniid∼ F and define Mn = max{X1, · · · , Xn}

Then the distribution function of Mn is

P(Mn ≤ x) = P(X1 ≤ x, · · · , Xn ≤ x)= P(X1 ≤ x)× · · · × P(Xn ≤ x) = Fn(x)

Remark

Fn(x)n→∞===

{0 if F (x) < 11 if F (x) = 1

⇒ the limiting distribution is degenerate.

Page 9: An Introduction to Extreme Value Analysis

Asymptotic: Classical Limit Laws

Recall the Central Limit Theorem:

Sn − nµ√nσ

d→ N(0, 1)

⇒ rescaling is the key to obtain a non-degenerate distribution

Question: Can we get the limiting distribution of

Mn − bnan

for suitable sequence {an} > 0 and {bn}?

Page 10: An Introduction to Extreme Value Analysis

Asymptotic: Classical Limit Laws

Recall the Central Limit Theorem:

Sn − nµ√nσ

d→ N(0, 1)

⇒ rescaling is the key to obtain a non-degenerate distribution

Question: Can we get the limiting distribution of

Mn − bnan

for suitable sequence {an} > 0 and {bn}?

Page 11: An Introduction to Extreme Value Analysis

Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)

Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If

∃ an > 0 and bn ∈ R such that, as n→∞, if

P(Mn − bn

an≤ x

)d→ G(x)

then G must be the same type of the following form:

G(x;µ, σ, ξ) = exp

{−[1 + ξ(

x− µσ

)]−1ξ

+

}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))

I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with

I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)

Page 12: An Introduction to Extreme Value Analysis

Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)

Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If

∃ an > 0 and bn ∈ R such that, as n→∞, if

P(Mn − bn

an≤ x

)d→ G(x)

then G must be the same type of the following form:

G(x;µ, σ, ξ) = exp

{−[1 + ξ(

x− µσ

)]−1ξ

+

}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))

I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with

I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)

Page 13: An Introduction to Extreme Value Analysis

Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)

Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If

∃ an > 0 and bn ∈ R such that, as n→∞, if

P(Mn − bn

an≤ x

)d→ G(x)

then G must be the same type of the following form:

G(x;µ, σ, ξ) = exp

{−[1 + ξ(

x− µσ

)]−1ξ

+

}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))

I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with

I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)

Page 14: An Introduction to Extreme Value Analysis

Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)

Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If

∃ an > 0 and bn ∈ R such that, as n→∞, if

P(Mn − bn

an≤ x

)d→ G(x)

then G must be the same type of the following form:

G(x;µ, σ, ξ) = exp

{−[1 + ξ(

x− µσ

)]−1ξ

+

}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))

I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with

I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)

Page 15: An Introduction to Extreme Value Analysis

Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)

Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If

∃ an > 0 and bn ∈ R such that, as n→∞, if

P(Mn − bn

an≤ x

)d→ G(x)

then G must be the same type of the following form:

G(x;µ, σ, ξ) = exp

{−[1 + ξ(

x− µσ

)]−1ξ

+

}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))

I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with

I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)

Page 16: An Introduction to Extreme Value Analysis

Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)

Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If

∃ an > 0 and bn ∈ R such that, as n→∞, if

P(Mn − bn

an≤ x

)d→ G(x)

then G must be the same type of the following form:

G(x;µ, σ, ξ) = exp

{−[1 + ξ(

x− µσ

)]−1ξ

+

}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))

I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with

I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)

Page 17: An Introduction to Extreme Value Analysis

Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)

Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If

∃ an > 0 and bn ∈ R such that, as n→∞, if

P(Mn − bn

an≤ x

)d→ G(x)

then G must be the same type of the following form:

G(x;µ, σ, ξ) = exp

{−[1 + ξ(

x− µσ

)]−1ξ

+

}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))

I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with

I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)

Page 18: An Introduction to Extreme Value Analysis

Max-Stability and GEV

DefinitionA distribution G is said to be max-stable if

Gk(akx+ bk) = G(x), k ∈ N

for some constants ak > 0 and bk

I Taking powers of a distribution function results only in achange of location and scale

I A distribution is max-stable ⇐⇒ it is a GEV distribution

Page 19: An Introduction to Extreme Value Analysis

Quantiles and Return Levels

I Quantiles of GEV

G(xp) = exp

{−[1 + ξ(

xp − µσ

)

]−1ξ

+

}= 1− p

⇒ xp = µ− σ

ξ

[1− {− log(1− p)−ξ}] 0 < p < 1

I In the extreme value terminology, xp is the return levelassociated with the return period 1

p

Page 20: An Introduction to Extreme Value Analysis

Clemson Daily Precipitation [Data Source: USHCN]

1950 1960 1970 1980 1990 2000 2010

0

1

2

3

4

5

6

7

Daily Precip in Clemson

Year

Pre

cipi

tatio

n (in

)

Page 21: An Introduction to Extreme Value Analysis

Block Maxima Method (Gumbel 1958)

1. Determine the block size and extract the block maxima

1950 1960 1970 1980 1990 2000 2010

0

1

2

3

4

5

6

7

Daily Precip in Clemson

Year

Pre

cipi

tatio

n (in

)

Page 22: An Introduction to Extreme Value Analysis

Block Maxima Method (Gumbel 1958)

1. Determine the block size and extract the block maxima

1950 1960 1970 1980 1990 2000 2010

0

1

2

3

4

5

6

7

Daily Precip in Clemson

Year

Pre

cipi

tatio

n (in

)

●●

● ●

●●

●●

● ●

Page 23: An Introduction to Extreme Value Analysis

Block Maxima Method (Gumbel 1958)

2. Fit the GEV to the maximal and assess the fit

1950 1960 1970 1980 1990 2000 2010

Daily Precip in Clemson

Year

0

1

2

3

4

5

6

7

Pre

cipi

tatio

n (in

)

●●

● ●

●●

●●

●●

● ●

Density

0.5

0.25 0

Page 24: An Introduction to Extreme Value Analysis

Block Maxima Method (Gumbel 1958)

2. Fit the GEV to the maximal and assess the fit

2 3 4 5 6

2

3

4

5

6

Quantile Plot

Model

Em

piric

al

Page 25: An Introduction to Extreme Value Analysis

Block Maxima Method (Gumbel 1958)

3. Perform inference for return levels, probabilities, etc

95% CI for 50−yr RL

Annmx Precip (in)

Den

sity

0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

0.6Delta CIProf CI

Page 26: An Introduction to Extreme Value Analysis

Outline

Motivation

Extreme Value Theorem & Block Maxima Method

Peaks–Over–Threshold (POT) Method

Page 27: An Introduction to Extreme Value Analysis

Recall the Block Maxima Method

1950 1960 1970 1980 1990 2000 2010

Daily Precip in Clemson

Year

0

1

2

3

4

5

6

7

Pre

cipi

tatio

n (in

)

●●

● ●

●●

●●

●●

● ●

Density

0.5

0.25 0

Question: Can we use data more efficiently?

Page 28: An Introduction to Extreme Value Analysis

Peaks–over–threshold (POT) method [Davison & Smith 1990]

1. Select a “sufficiently large” threshold u, extract the exceedances

1950 1960 1970 1980 1990 2000 2010

Daily Precip in Clemson

Year

0

1

2

3

4

5

6

7

Pre

cipi

tatio

n (in

)

0

1

2

3

4

5

6

7

Density

1.2

0.8

0.4 0

Page 29: An Introduction to Extreme Value Analysis

Peaks–over–threshold (POT) method [Davison & Smith 1990]

2. Fit an appropriate model to exceedances

1950 1960 1970 1980 1990 2000 2010

Daily Precip in Clemson

Year

0

1

2

3

4

5

6

7

Pre

cipi

tatio

n (in

)

0

1

2

3

4

5

6

7

Density

1.2

0.8

0.4 0

Page 30: An Introduction to Extreme Value Analysis

GPD for Exceedances

If Mn = maxi=1,··· ,nXi (for a large n) can be apprximated by aGEV(µ, σ, ξ), then for sufficently large u,

P(Xi > x+ u|Xi > u) =nP(Xi > x+ u)

nP(Xi > u)

(1 + ξ x+u−bnan

1 + ξ u−bnan

)−1ξ

=

(1 +

ξx

an + ξ(u− bn)

)−1ξ

⇒ Survival function of generalized Pareto distribution

Page 31: An Introduction to Extreme Value Analysis

Pickands–Balkema–de Haan Theorem (1974, 1975)

If Mn = max1≤i≤n{Xi} ≈ GEV(µ, σ, ξ), then, for a “large” u (i.e.,u→ xF = sup{x : F (x) < 1}),

P(X > u) ≈ 1

n

[1 + ξ

(u− µσ

)]−1ξ

Fu = P(X − u < y|X > u) is well approximated by the generalizedPareto distribution (GPD). That is:

Fu(y)d→ Hσ,ξ(y) u→ xF

where

Hσ,ξ(y) =

1− (1 + ξy/σ)−1/ξ ξ 6= 0;

1− exp(−y/σ) ξ = 0.

and σ = σ + ξ(u− µ)

Page 32: An Introduction to Extreme Value Analysis

How to Choose the Threshold?

Bias–variance tradeoff:

I Threshold too low ⇒ bias because of the model asymptoticsbeing invalid

I Threshold too high ⇒ variance is large due to few data points

0 1 2 3 4 5

0.2

0.4

0.6

0.8

1.0

1.2

Mean Residual Life

Threshold (in)

Mea

n E

xces

s

Task: To choose a u0 s.t. the Mean Residual Life curve behaveslinearly ∀u > u0

Page 33: An Introduction to Extreme Value Analysis

Peaks–over–threshold (POT) method [Davison & Smith 1990]

2. Fit an appropriate model to exceedances and assess the fit

1 2 3 4 5 6

1

2

3

4

5

6

Quantile Plot

Model

Em

piric

al

Page 34: An Introduction to Extreme Value Analysis

Peaks–over–threshold (POT) method [Davison & Smith 1990]

3. Perform inference for return levels, probabilities, etc

95% CI for 50−yr RL

Threshold excess (in)

Den

sity

1 2 3 4 5 6 7

0.0

0.5

1.0

1.5

2.0

Delta CIProf CI

Page 35: An Introduction to Extreme Value Analysis

Summary & Discussion

I Extreme value theory provides a framework to model extremevalues

I GEV for fitting block maxima

I GPD for fitting threshold exceedances

I Return level for communicating risk

I Practical Issues: seasonality, temporal dependence,non-stationarity, ...

Page 36: An Introduction to Extreme Value Analysis

Summary & Discussion

I Extreme value theory provides a framework to model extremevalues

I GEV for fitting block maxima

I GPD for fitting threshold exceedances

I Return level for communicating risk

I Practical Issues: seasonality, temporal dependence,non-stationarity, ...

Page 37: An Introduction to Extreme Value Analysis

For Further Reading

S. ColesAn Introduction to Statistical Modeling of Extreme Values.Springer, 2001.

J. Beirlant, Y Goegebeur, J. Segers, and J TeugelsStatistics of Extremes: Theory and Applications.Wiley, 2004.

L. de Haan, and A. FerreiraExtreme Value Theory: An Introduction.Springer, 2006.

S. I. ResnickHeavy-Tail Phenomena: Probabilistic and Statistical Modeling.Springer, 2007.