an introduction to extreme value analysis

An Introduction to Extreme Value Analysis

Whitney Huang

[email protected]

Clemson EVA Group, January 29, 2020

mailto:wkhuang@clemson

Outline

Motivation

Extreme Value Theorem & Block Maxima Method

Peaks–Over–Threshold (POT) Method

Extreme Rainfall During Hurricane Harvey

I “A storm forces Houston, the limitless city, to consider itslimits” – The New York Times (8.31.17)

Environmental Extremes: Heatwaves, Storm Surges, etc.

I Heat wave: The 2003 European heat wave led to the hottestsummer on record in Europe since 1540 that resulted in atleast 30,000 deaths

I Storm Surge: Hurricane Katrina produced the highest stormsurge ever recorded (27.8 feet) on the U.S. coast

Scientific Questions

I How to estimate the magnitude of extreme events (e.g.100-year rainfall)?

I How extremes vary in space?

I How extremes may change in future climate conditions?

Outline

Motivation



Usual vs Extremes

Probability Framework

Let X1, · · · , Xniid∼ F and define Mn = max{X1, · · · , Xn}

Then the distribution function of Mn is

P(Mn ≤ x) = P(X1 ≤ x, · · · , Xn ≤ x)= P(X1 ≤ x)× · · · × P(Xn ≤ x) = Fn(x)

Remark

Fn(x)n→∞===

{0 if F (x) < 11 if F (x) = 1

⇒ the limiting distribution is degenerate.

Asymptotic: Classical Limit Laws

Recall the Central Limit Theorem:

Sn − nµ√nσ

d→ N(0, 1)

⇒ rescaling is the key to obtain a non-degenerate distribution

Question: Can we get the limiting distribution of

Mn − bnan

for suitable sequence {an} > 0 and {bn}?

Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)

Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If

∃ an > 0 and bn ∈ R such that, as n→∞, if

P(Mn − bn

an≤ x

)d→ G(x)

then G must be the same type of the following form:

G(x;µ, σ, ξ) = exp

{−[1 + ξ(

x− µσ

)]−1ξ

+

}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))

I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with

I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)

Max-Stability and GEV

DefinitionA distribution G is said to be max-stable if

Gk(akx+ bk) = G(x), k ∈ N

for some constants ak > 0 and bk

I Taking powers of a distribution function results only in achange of location and scale

I A distribution is max-stable ⇐⇒ it is a GEV distribution

Quantiles and Return Levels

I Quantiles of GEV

G(xp) = exp

{−[1 + ξ(

xp − µσ

)

]−1ξ

+

}= 1− p

⇒ xp = µ− σ

ξ

[1− {− log(1− p)−ξ}] 0 < p < 1

I In the extreme value terminology, xp is the return levelassociated with the return period 1

p

Clemson Daily Precipitation [Data Source: USHCN]

1950 1960 1970 1980 1990 2000 2010

0

1

2

3

4

5

6

7

Daily Precip in Clemson

Year

Pre

cipi

tatio

n (in

)

Block Maxima Method (Gumbel 1958)

1. Determine the block size and extract the block maxima

1950 1960 1970 1980 1990 2000 2010

0

1

2

3

4

5

6

7


Year

Pre

cipi

tatio

n (in

)


1. Determine the block size and extract the block maxima

1950 1960 1970 1980 1990 2000 2010

0

1

2

3

4

5

6

7


Year

Pre

cipi

tatio

n (in

)

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●


2. Fit the GEV to the maximal and assess the fit

1950 1960 1970 1980 1990 2000 2010


Year

0

1

2

3

4

5

6

7

Pre

cipi

tatio

n (in

)

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

Density

0.5

0.25 0


2. Fit the GEV to the maximal and assess the fit

2 3 4 5 6

2

3

4

5

6

Quantile Plot

Model

Em

piric

al


3. Perform inference for return levels, probabilities, etc

95% CI for 50−yr RL

Annmx Precip (in)

Den

sity

0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

0.6Delta CIProf CI

Outline

Motivation



Recall the Block Maxima Method

1950 1960 1970 1980 1990 2000 2010


Year

0

1

2

3

4

5

6

7

Pre

cipi

tatio

n (in

)

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

Density

0.5

0.25 0

Question: Can we use data more efficiently?

Peaks–over–threshold (POT) method [Davison & Smith 1990]

1. Select a “sufficiently large” threshold u, extract the exceedances

1950 1960 1970 1980 1990 2000 2010


Year

0

1

2

3

4

5

6

7

Pre

cipi

tatio

n (in

)

0

1

2

3

4

5

6

7

Density

1.2

0.8

0.4 0


2. Fit an appropriate model to exceedances

1950 1960 1970 1980 1990 2000 2010


Year

0

1

2

3

4

5

6

7

Pre

cipi

tatio

n (in

)

0

1

2

3

4

5

6

7

Density

1.2

0.8

0.4 0

GPD for Exceedances

If Mn = maxi=1,··· ,nXi (for a large n) can be apprximated by aGEV(µ, σ, ξ), then for sufficently large u,

P(Xi > x+ u|Xi > u) =nP(Xi > x+ u)

nP(Xi > u)

→

(1 + ξ x+u−bnan

1 + ξ u−bnan

)−1ξ

=

(1 +

ξx

an + ξ(u− bn)

)−1ξ

⇒ Survival function of generalized Pareto distribution

Pickands–Balkema–de Haan Theorem (1974, 1975)

If Mn = max1≤i≤n{Xi} ≈ GEV(µ, σ, ξ), then, for a “large” u (i.e.,u→ xF = sup{x : F (x) < 1}),

P(X > u) ≈ 1

n

[1 + ξ

(u− µσ

)]−1ξ

Fu = P(X − u < y|X > u) is well approximated by the generalizedPareto distribution (GPD). That is:

Fu(y)d→ Hσ,ξ(y) u→ xF

where

Hσ,ξ(y) =

1− (1 + ξy/σ)−1/ξ ξ 6= 0;

1− exp(−y/σ) ξ = 0.

and σ = σ + ξ(u− µ)

How to Choose the Threshold?

Bias–variance tradeoff:

I Threshold too low ⇒ bias because of the model asymptoticsbeing invalid

I Threshold too high ⇒ variance is large due to few data points

0 1 2 3 4 5

0.2

0.4

0.6

0.8

1.0

1.2

Mean Residual Life

Threshold (in)

Mea

n E

xces

s

Task: To choose a u0 s.t. the Mean Residual Life curve behaveslinearly ∀u > u0


2. Fit an appropriate model to exceedances and assess the fit

1 2 3 4 5 6

1

2

3

4

5

6

Quantile Plot

Model

Em

piric

al


3. Perform inference for return levels, probabilities, etc

95% CI for 50−yr RL

Threshold excess (in)

Den

sity

1 2 3 4 5 6 7

0.0

0.5

1.0

1.5

2.0

Delta CIProf CI

Summary & Discussion

I Extreme value theory provides a framework to model extremevalues

I GEV for fitting block maxima

I GPD for fitting threshold exceedances

I Return level for communicating risk

I Practical Issues: seasonality, temporal dependence,non-stationarity, ...

For Further Reading

S. ColesAn Introduction to Statistical Modeling of Extreme Values.Springer, 2001.

J. Beirlant, Y Goegebeur, J. Segers, and J TeugelsStatistics of Extremes: Theory and Applications.Wiley, 2004.

L. de Haan, and A. FerreiraExtreme Value Theory: An Introduction.Springer, 2006.

S. I. ResnickHeavy-Tail Phenomena: Probabilistic and Statistical Modeling.Springer, 2007.

an introduction to extreme value analysis

Documents