an introduction to extreme value analysis

An Introduction to Extreme Value Analysis

Whitney Huang

wkhuang@clemson.edu

Clemson EVA Group, January 29, 2020

Outline

Motivation

Extreme Value Theorem & Block Maxima Method

Peaks–Over–Threshold (POT) Method

Extreme Rainfall During Hurricane Harvey

I “A storm forces Houston, the limitless city, to consider itslimits” – The New York Times (8.31.17)

Environmental Extremes: Heatwaves, Storm Surges, etc.

I Heat wave: The 2003 European heat wave led to the hottestsummer on record in Europe since 1540 that resulted in atleast 30,000 deaths

I Storm Surge: Hurricane Katrina produced the highest stormsurge ever recorded (27.8 feet) on the U.S. coast

Scientific Questions

I How to estimate the magnitude of extreme events (e.g.100-year rainfall)?

I How extremes vary in space?

I How extremes may change in future climate conditions?

Outline

Motivation

Usual vs Extremes

Probability Framework

Let X1, · · · , Xniid∼ F and define Mn = max{X1, · · · , Xn}

Then the distribution function of Mn is

P(Mn ≤ x) = P(X1 ≤ x, · · · , Xn ≤ x)= P(X1 ≤ x)× · · · × P(Xn ≤ x) = Fn(x)

Remark

Fn(x)n→∞===

{0 if F (x) < 11 if F (x) = 1

⇒ the limiting distribution is degenerate.

Asymptotic: Classical Limit Laws

Recall the Central Limit Theorem:

Sn − nµ√nσ

d→ N(0, 1)

⇒ rescaling is the key to obtain a non-degenerate distribution

Question: Can we get the limiting distribution of

Mn − bnan

for suitable sequence {an} > 0 and {bn}?

Asymptotic: Classical Limit Laws

Recall the Central Limit Theorem:

Sn − nµ√nσ

d→ N(0, 1)

⇒ rescaling is the key to obtain a non-degenerate distribution

Question: Can we get the limiting distribution of

Mn − bnan

for suitable sequence {an} > 0 and {bn}?

Extremal Types Theorem (Fisher–Tippett 1928, Gnedenko 1943)

Define Mn = max{X1, · · · , Xn} where X1, · · · , Xni.i.d.∼ F . If

∃ an > 0 and bn ∈ R such that, as n→∞, if

P(Mn − bn

an≤ x

)d→ G(x)

then G must be the same type of the following form:

G(x;µ, σ, ξ) = exp

{−[1 + ξ(

x− µσ

)]−1ξ

}where x+ = max(x, 0) and G(x) is the distribution function of thegeneralized extreme value distribution (GEV(µ, σ, ξ))

I µ and σ are location and scale parametersI ξ is a shape parameter determining the rate of tail decay, with

I ξ > 0 giving the heavy-tailed case (Frechet)I ξ = 0 giving the light-tailed case (Gumbel)I ξ < 0 giving the bounded-tailed case (reversed Weibull)

P(Mn − bn

an≤ x

)d→ G(x)

{−[1 + ξ(

x− µσ

)]−1ξ

P(Mn − bn

an≤ x

)d→ G(x)

{−[1 + ξ(

x− µσ

)]−1ξ

P(Mn − bn

an≤ x

)d→ G(x)

{−[1 + ξ(

x− µσ

)]−1ξ

P(Mn − bn

an≤ x

)d→ G(x)

{−[1 + ξ(

x− µσ

)]−1ξ

P(Mn − bn

an≤ x

)d→ G(x)

{−[1 + ξ(

x− µσ

)]−1ξ

P(Mn − bn

an≤ x

)d→ G(x)

{−[1 + ξ(

x− µσ

)]−1ξ

Max-Stability and GEV

DefinitionA distribution G is said to be max-stable if

Gk(akx+ bk) = G(x), k ∈ N

for some constants ak > 0 and bk

I Taking powers of a distribution function results only in achange of location and scale

I A distribution is max-stable ⇐⇒ it is a GEV distribution

Quantiles and Return Levels

I Quantiles of GEV

G(xp) = exp

{−[1 + ξ(

xp − µσ

]−1ξ

}= 1− p

⇒ xp = µ− σ

[1− {− log(1− p)−ξ}] 0 < p < 1

I In the extreme value terminology, xp is the return levelassociated with the return period 1

Clemson Daily Precipitation [Data Source: USHCN]

1950 1960 1970 1980 1990 2000 2010

Daily Precip in Clemson

Block Maxima Method (Gumbel 1958)

1. Determine the block size and extract the block maxima

1950 1960 1970 1980 1990 2000 2010

1. Determine the block size and extract the block maxima

1950 1960 1970 1980 1990 2000 2010

●●

● ●

●●

● ●

2. Fit the GEV to the maximal and assess the fit

1950 1960 1970 1980 1990 2000 2010

●●

● ●

●●

● ●

Density

0.25 0

2. Fit the GEV to the maximal and assess the fit

2 3 4 5 6

Quantile Plot

3. Perform inference for return levels, probabilities, etc

95% CI for 50−yr RL

Annmx Precip (in)

0 2 4 6

0.6Delta CIProf CI

Outline

Motivation

Recall the Block Maxima Method

1950 1960 1970 1980 1990 2000 2010

●●

● ●

●●

● ●

Density

0.25 0

Question: Can we use data more efficiently?

Peaks–over–threshold (POT) method [Davison & Smith 1990]

1. Select a “sufficiently large” threshold u, extract the exceedances

1950 1960 1970 1980 1990 2000 2010

Density

2. Fit an appropriate model to exceedances

1950 1960 1970 1980 1990 2000 2010

Density

GPD for Exceedances

If Mn = maxi=1,··· ,nXi (for a large n) can be apprximated by aGEV(µ, σ, ξ), then for sufficently large u,

P(Xi > x+ u|Xi > u) =nP(Xi > x+ u)

nP(Xi > u)

(1 + ξ x+u−bnan

1 + ξ u−bnan

)−1ξ

an + ξ(u− bn)

)−1ξ

⇒ Survival function of generalized Pareto distribution

Pickands–Balkema–de Haan Theorem (1974, 1975)

If Mn = max1≤i≤n{Xi} ≈ GEV(µ, σ, ξ), then, for a “large” u (i.e.,u→ xF = sup{x : F (x) < 1}),

P(X > u) ≈ 1

[1 + ξ

(u− µσ

)]−1ξ

Fu = P(X − u < y|X > u) is well approximated by the generalizedPareto distribution (GPD). That is:

Fu(y)d→ Hσ,ξ(y) u→ xF

Hσ,ξ(y) =

1− (1 + ξy/σ)−1/ξ ξ 6= 0;

1− exp(−y/σ) ξ = 0.

and σ = σ + ξ(u− µ)

How to Choose the Threshold?

Bias–variance tradeoff:

I Threshold too low ⇒ bias because of the model asymptoticsbeing invalid

I Threshold too high ⇒ variance is large due to few data points

0 1 2 3 4 5

Mean Residual Life

Threshold (in)

Task: To choose a u0 s.t. the Mean Residual Life curve behaveslinearly ∀u > u0

2. Fit an appropriate model to exceedances and assess the fit

1 2 3 4 5 6

Quantile Plot

3. Perform inference for return levels, probabilities, etc

95% CI for 50−yr RL

Threshold excess (in)

1 2 3 4 5 6 7

Delta CIProf CI

Summary & Discussion

I Extreme value theory provides a framework to model extremevalues

I GEV for fitting block maxima

I GPD for fitting threshold exceedances

I Return level for communicating risk

I Practical Issues: seasonality, temporal dependence,non-stationarity, ...

Summary & Discussion

I Extreme value theory provides a framework to model extremevalues

I GEV for fitting block maxima

I GPD for fitting threshold exceedances

I Return level for communicating risk

I Practical Issues: seasonality, temporal dependence,non-stationarity, ...

For Further Reading

S. ColesAn Introduction to Statistical Modeling of Extreme Values.Springer, 2001.

J. Beirlant, Y Goegebeur, J. Segers, and J TeugelsStatistics of Extremes: Theory and Applications.Wiley, 2004.

L. de Haan, and A. FerreiraExtreme Value Theory: An Introduction.Springer, 2006.

S. I. ResnickHeavy-Tail Phenomena: Probabilistic and Statistical Modeling.Springer, 2007.

an introduction to extreme value analysis

Documents

modelling and forecasting unbiased extreme value ... ·...

extreme value distributions

extreme value analysis - world hydrological cycle ... ·...

extremes 2.0: an extreme value analysis package...

evim: a software package for extreme value analysis in...

extreme value statistics

extreme value analysis in climatology - univerzita...

bayesian point process modeling for extreme value analysis,...

using conditional extreme value theory to estimate value...

5. extreme value analysis considering trends: application to...

non-stationary extreme value analysis applied to seismic...

the role of extreme value analysis to enhance …

extreme value analysis: an introduction · journal de la...

extreme value theory

computing extreme value

extreme value analysis: wave data - japan … ·...

extreme value frequency analysis by tl-moments and

vector generalized additive models and applications to...

extreme value analysis: an introduction

extreme value analysis