an introduction to extreme value analysis
TRANSCRIPT
An Introduction to Extreme Value Analysis
Whitney Huang
Clemson EVA Group, January 29, 2020
Outline
Motivation
Extreme Value Theorem & Block Maxima Method
Peaks–Over–Threshold (POT) Method
Extreme Rainfall During Hurricane Harvey
I “A storm forces Houston, the limitless city, to consider itslimits” – The New York Times (8.31.17)
Environmental Extremes: Heatwaves, Storm Surges, etc.
I Heat wave: The 2003 European heat wave led to the hottestsummer on record in Europe since 1540 that resulted in atleast 30,000 deaths
I Storm Surge: Hurricane Katrina produced the highest stormsurge ever recorded (27.8 feet) on the U.S. coast
Scientific Questions
I How to estimate the magnitude of extreme events (e.g.100-year rainfall)?
I How extremes vary in space?
I How extremes may change in future climate conditions?
Outline
Motivation
Extreme Value Theorem & Block Maxima Method
Peaks–Over–Threshold (POT) Method
Usual vs Extremes
Probability Framework
Let X1, · · · , Xniid∼ F and define Mn = max{X1, · · · , Xn}
Then the distribution function of Mn is
P(Mn ≤ x) = P(X1 ≤ x, · · · , Xn ≤ x)= P(X1 ≤ x)× · · · × P(Xn ≤ x) = Fn(x)
Remark
Fn(x)n→∞===
{0 if F (x) < 11 if F (x) = 1
⇒ the limiting distribution is degenerate.
Asymptotic: Classical Limit Laws
Recall the Central Limit Theorem:
Sn − nµ√nσ
d→ N(0, 1)
⇒ rescaling is the key to obtain a non-degenerate distribution
Question: Can we get the limiting distribution of
Mn − bnan
for suitable sequence {an} > 0 and {bn}?
Asymptotic: Classical Limit Laws
Recall the Central Limit Theorem:
Sn − nµ√nσ
d→ N(0, 1)
⇒ rescaling is the key to obtain a non-degenerate distribution
Question: Can we get the limiting distribution of
Mn − bnan
for suitable sequence {an} > 0 and {bn}?
Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)
Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If
∃ an > 0 and bn ∈ R such that, as n→∞, if
P(Mn − bn
an≤ x
)d→ G(x)
then G must be the same type of the following form:
G(x;µ, σ, ξ) = exp
{−[1 + ξ(
x− µσ
)]−1ξ
+
}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))
I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with
I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)
Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)
Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If
∃ an > 0 and bn ∈ R such that, as n→∞, if
P(Mn − bn
an≤ x
)d→ G(x)
then G must be the same type of the following form:
G(x;µ, σ, ξ) = exp
{−[1 + ξ(
x− µσ
)]−1ξ
+
}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))
I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with
I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)
Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)
Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If
∃ an > 0 and bn ∈ R such that, as n→∞, if
P(Mn − bn
an≤ x
)d→ G(x)
then G must be the same type of the following form:
G(x;µ, σ, ξ) = exp
{−[1 + ξ(
x− µσ
)]−1ξ
+
}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))
I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with
I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)
Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)
Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If
∃ an > 0 and bn ∈ R such that, as n→∞, if
P(Mn − bn
an≤ x
)d→ G(x)
then G must be the same type of the following form:
G(x;µ, σ, ξ) = exp
{−[1 + ξ(
x− µσ
)]−1ξ
+
}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))
I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with
I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)
Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)
Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If
∃ an > 0 and bn ∈ R such that, as n→∞, if
P(Mn − bn
an≤ x
)d→ G(x)
then G must be the same type of the following form:
G(x;µ, σ, ξ) = exp
{−[1 + ξ(
x− µσ
)]−1ξ
+
}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))
I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with
I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)
Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)
Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If
∃ an > 0 and bn ∈ R such that, as n→∞, if
P(Mn − bn
an≤ x
)d→ G(x)
then G must be the same type of the following form:
G(x;µ, σ, ξ) = exp
{−[1 + ξ(
x− µσ
)]−1ξ
+
}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))
I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with
I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)
Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)
Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If
∃ an > 0 and bn ∈ R such that, as n→∞, if
P(Mn − bn
an≤ x
)d→ G(x)
then G must be the same type of the following form:
G(x;µ, σ, ξ) = exp
{−[1 + ξ(
x− µσ
)]−1ξ
+
}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))
I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with
I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)
Max-Stability and GEV
DefinitionA distribution G is said to be max-stable if
Gk(akx+ bk) = G(x), k ∈ N
for some constants ak > 0 and bk
I Taking powers of a distribution function results only in achange of location and scale
I A distribution is max-stable ⇐⇒ it is a GEV distribution
Quantiles and Return Levels
I Quantiles of GEV
G(xp) = exp
{−[1 + ξ(
xp − µσ
)
]−1ξ
+
}= 1− p
⇒ xp = µ− σ
ξ
[1− {− log(1− p)−ξ}] 0 < p < 1
I In the extreme value terminology, xp is the return levelassociated with the return period 1
p
Clemson Daily Precipitation [Data Source: USHCN]
1950 1960 1970 1980 1990 2000 2010
0
1
2
3
4
5
6
7
Daily Precip in Clemson
Year
Pre
cipi
tatio
n (in
)
Block Maxima Method (Gumbel 1958)
1. Determine the block size and extract the block maxima
1950 1960 1970 1980 1990 2000 2010
0
1
2
3
4
5
6
7
Daily Precip in Clemson
Year
Pre
cipi
tatio
n (in
)
Block Maxima Method (Gumbel 1958)
1. Determine the block size and extract the block maxima
1950 1960 1970 1980 1990 2000 2010
0
1
2
3
4
5
6
7
Daily Precip in Clemson
Year
Pre
cipi
tatio
n (in
)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
Block Maxima Method (Gumbel 1958)
2. Fit the GEV to the maximal and assess the fit
1950 1960 1970 1980 1990 2000 2010
Daily Precip in Clemson
Year
0
1
2
3
4
5
6
7
Pre
cipi
tatio
n (in
)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
Density
0.5
0.25 0
Block Maxima Method (Gumbel 1958)
2. Fit the GEV to the maximal and assess the fit
2 3 4 5 6
2
3
4
5
6
Quantile Plot
Model
Em
piric
al
Block Maxima Method (Gumbel 1958)
3. Perform inference for return levels, probabilities, etc
95% CI for 50−yr RL
Annmx Precip (in)
Den
sity
0 2 4 6
0.0
0.1
0.2
0.3
0.4
0.5
0.6Delta CIProf CI
Outline
Motivation
Extreme Value Theorem & Block Maxima Method
Peaks–Over–Threshold (POT) Method
Recall the Block Maxima Method
1950 1960 1970 1980 1990 2000 2010
Daily Precip in Clemson
Year
0
1
2
3
4
5
6
7
Pre
cipi
tatio
n (in
)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
Density
0.5
0.25 0
Question: Can we use data more efficiently?
Peaks–over–threshold (POT) method [Davison & Smith 1990]
1. Select a “sufficiently large” threshold u, extract the exceedances
1950 1960 1970 1980 1990 2000 2010
Daily Precip in Clemson
Year
0
1
2
3
4
5
6
7
Pre
cipi
tatio
n (in
)
0
1
2
3
4
5
6
7
Density
1.2
0.8
0.4 0
Peaks–over–threshold (POT) method [Davison & Smith 1990]
2. Fit an appropriate model to exceedances
1950 1960 1970 1980 1990 2000 2010
Daily Precip in Clemson
Year
0
1
2
3
4
5
6
7
Pre
cipi
tatio
n (in
)
0
1
2
3
4
5
6
7
Density
1.2
0.8
0.4 0
GPD for Exceedances
If Mn = maxi=1,··· ,nXi (for a large n) can be apprximated by aGEV(µ, σ, ξ), then for sufficently large u,
P(Xi > x+ u|Xi > u) =nP(Xi > x+ u)
nP(Xi > u)
→
(1 + ξ x+u−bnan
1 + ξ u−bnan
)−1ξ
=
(1 +
ξx
an + ξ(u− bn)
)−1ξ
⇒ Survival function of generalized Pareto distribution
Pickands–Balkema–de Haan Theorem (1974, 1975)
If Mn = max1≤i≤n{Xi} ≈ GEV(µ, σ, ξ), then, for a “large” u (i.e.,u→ xF = sup{x : F (x) < 1}),
P(X > u) ≈ 1
n
[1 + ξ
(u− µσ
)]−1ξ
Fu = P(X − u < y|X > u) is well approximated by the generalizedPareto distribution (GPD). That is:
Fu(y)d→ Hσ,ξ(y) u→ xF
where
Hσ,ξ(y) =
1− (1 + ξy/σ)−1/ξ ξ 6= 0;
1− exp(−y/σ) ξ = 0.
and σ = σ + ξ(u− µ)
How to Choose the Threshold?
Bias–variance tradeoff:
I Threshold too low ⇒ bias because of the model asymptoticsbeing invalid
I Threshold too high ⇒ variance is large due to few data points
0 1 2 3 4 5
0.2
0.4
0.6
0.8
1.0
1.2
Mean Residual Life
Threshold (in)
Mea
n E
xces
s
Task: To choose a u0 s.t. the Mean Residual Life curve behaveslinearly ∀u > u0
Peaks–over–threshold (POT) method [Davison & Smith 1990]
2. Fit an appropriate model to exceedances and assess the fit
1 2 3 4 5 6
1
2
3
4
5
6
Quantile Plot
Model
Em
piric
al
Peaks–over–threshold (POT) method [Davison & Smith 1990]
3. Perform inference for return levels, probabilities, etc
95% CI for 50−yr RL
Threshold excess (in)
Den
sity
1 2 3 4 5 6 7
0.0
0.5
1.0
1.5
2.0
Delta CIProf CI
Summary & Discussion
I Extreme value theory provides a framework to model extremevalues
I GEV for fitting block maxima
I GPD for fitting threshold exceedances
I Return level for communicating risk
I Practical Issues: seasonality, temporal dependence,non-stationarity, ...
Summary & Discussion
I Extreme value theory provides a framework to model extremevalues
I GEV for fitting block maxima
I GPD for fitting threshold exceedances
I Return level for communicating risk
I Practical Issues: seasonality, temporal dependence,non-stationarity, ...
For Further Reading
S. ColesAn Introduction to Statistical Modeling of Extreme Values.Springer, 2001.
J. Beirlant, Y Goegebeur, J. Segers, and J TeugelsStatistics of Extremes: Theory and Applications.Wiley, 2004.
L. de Haan, and A. FerreiraExtreme Value Theory: An Introduction.Springer, 2006.
S. I. ResnickHeavy-Tail Phenomena: Probabilistic and Statistical Modeling.Springer, 2007.