lecture 7: non normal distributions and their uses in...

33
Lecture 7: Non Normal Distributions and their Uses in GARCH Modelling Prof. Massimo Guidolin 20192– Financial Econometrics Spring 2017

Upload: others

Post on 10-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Lecture 7: Non Normal Distributions and their Uses in GARCH Modelling

Prof. Massimo Guidolin

20192– Financial Econometrics

Spring 2017

Page 2: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Overview

2

Non-normalities in (standardized) residuals from asset return models

Tools to detect non-normalities: Jarque-Bera tests, kernel density estimators, Q-Q plots

Conditional and unconditional t-Student densities; MLE vs. method-of-moment estimation

Cornish-Fisher density approximations and their applications in risk managements

Hints to Extreme Value Theory (EVT)

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 3: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Overview and General Ideas

3

Let’s recap where we are at in the course. This is what we said… We will proceed in three steps following a stepwise distribution

modeling (SDM) approach: Establish a variance forecasting model for each of the assets individually and introduce methods for evaluating the performance of these forecasts DONE! Consider ways to model conditionally non-normal aspects of the returns on the assets in our portfolio—i.e., aspects that are not captured by conditional means, variances, and covariances NEXT• We still study RPF,t and possibly assume a GARCH has been fitted Link individual variance forecasts with correlations

Recall baseline model:

In this lecture we learn how to model departures of the marginal conditional densities from normality

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 4: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Why an Interest in the Conditional Density?

4

In lecture 6, we have studied dynamic univariate models of conditional heteroskedasticity• It has been stressed that these induce unconditional return

distributions which are non-normal However ARCH models do not seem to induce sufficient non-

normality• This can be seen in the fact that the standardized residuals from most

GARCH models fail to be normally distributed

(G)ARCH models fail to produce sufficient non-normalities

Lecture 7: Non-normal distributions – Prof. Guidolin

Matching Gaussian Kernel density

Page 5: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Tools to Test for Normality: Jarque-Bera

5

For instance, in a Gaussian GARCH(1,1) model,Rt+1 = t+1zt+1, zt+1 N(0,1) 2

t+1 = + R2t + 2

tand zt+1 = Rt+1/t+1 N(0,1) is a testable implication

• This GARCH is called “Gaussian” because zt+1 N(0,1), where zt is the standardized residual series

Therefore non-normalities keep plaguing standardized residuals from many types of Gaussian GARCH models

Two issues:(A) How can we detect non-normalities in an empirical density (for either returns or standardized residuals)?(B) What can we do about it?

Jarque-Bera test based on sample skewness & kurtosis If X is a r. v. with mean μ and standard deviation , the skewness

measures the asymmetry of the density function:

Lecture 7: Non-normal distributions – Prof. Guidolin

In our case, standardized residuals– but this can be applied generally

Page 6: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Tools to Test for Normality: Jarque-Bera

6

Skewness is computed as an odd power scaled central moment Its sign depends on the relative weight of the observations below

the mean respect to those above the mean:• Skew = 0, symmetric distribution (e.g., Normal)• Skew > 0, asymmetric to the right (e.g., Log-normal)• Skew < 0, asymmetric to the left (e.g., many empirical densities for

realized asset returns)

Kurtosis is instead defined as:• This measure gives large weights to the observations far from the

mean, i.e. the observations that falls in the tails of the distribution• The normal distribution has kurtosis of 3, so that its excess of kurtosis

(kurt-3) is 0; a kurtosis larger than 3 means tails fatter than in the normal case

Skewness is the scaled third central moment and reveals whether the empirical distributions of standardized residuals is asymmetric around the mean

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 7: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Tools to Test for Normality: Jarque-Bera

7

Jarque and Bera (1980) proposed a test that measures departure from normality in terms of the skewness and kurtosis• Under the null of normally distributed errors, the asymptotic

distribution of sample estimators of skewness and kurtosis are:

• Asymptotic means that the normal approximation becomes increasingly good as the sample size grows

• Because they are asymptotically independent, the squares of their standardized forms can be added to obtain the Jarque-Bera statistic:

Kurtosis is the scaled fourth central moment and reveals whether the empirical distributions of standardized residuals has tails thicker than a Gaussian distribution

Jarque-Bera test summarizes any non-zero skewness and any non-zero excess kurtosis in a formal test of hypothesis

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 8: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Tools to Test for Normality: Kernel Estimators

8

• Large values of this statistic indicate departures from normality• Example on S&P 500 daily returns, 1926-2010:

A kernel density estimator is an empirical density “smoother” based on the choice of two objects, the kernel function K(x) and the bandwidth parameter h:

• It generalizes the “histogram estimator”:

A kernel density estimator is a “smoother” of a standard empirical histogram

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 9: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Tools to Test for Normality: Kernel Estimators

9

• (x) is the delta (Dirac) function, with (x) always zero but at x=0, when (0) = 1

• Let’s give a few examples. The most common type of kernel function used in applied finance is the Gaussian kernel:

• A K(x) with optimal (in a Mean-Squared Error sense) properties is Epanechnikov’s:

• Other popular kernels are the triangular and box kernels:

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 10: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Tools to Test for Normality: Kernel Estimators

10

• The bandwidth parameter h is usually chosen according to the rule (T here is sample size):

• The choice of the bandwidth in this way depends on the fact that it minimizes the integrated MSE:

• Do different choices of K(x) make a big differences?• It seems not, financial returns are typically leptokurtic, i.e., they have

fat tails and highly peaked densities around mean

Lecture 7: Non-normal distributions – Prof. Guidolin

Moment-matched Gaussian

Page 11: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Tools to Test for Normality: Q-Q Plots

11

• A less formal and yet powerful method to visualize non-normalitiesconsists of quantile-quantile (Q-Q) plots

The idea is to plot the quantiles of the returns against the quantilesof the normal (or otherwise selected) theoretical distribution

• If the returns are truly normal, then the graph should look like a straight line on a 45-degree angle• Systematic deviations from the 45-degree line signal that the returns

are not well described by the normal distribution• The recipe is: sort all standardized returns zt = RPF,t/σPF,t in ascending

order, and call the ith sorted value zi

• Then calculate the empirical probability of getting a value below the actual as (i−0.5)/T , where T is number of obs.

• The subtraction of .5 is an adjustment allowing for a continuous distribution

A Q-Q plot represents the quantiles of an empirical density vs. the quantile of some theoretical distribution

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 12: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Tools to Test for Normality: Q-Q Plots

12

• Calculate the standard normal quantiles as where denotes the inverse of the standard normal density

• We can scatter plot the standardized and sorted returns on the Y-axis against the standard normal quantiles on the X-axis

Why do risk managers care? Because differently from JB test and kernel density estimators, Q-Q plots provide information on where (in the support of the empirical return distribution) non-normalities occur

Lecture 7: Non-normal distributions – Prof. Guidolin

Raw S&P 500 returns After GARCH(1,1)

Page 13: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Non-Normality: What Can We do?

13

An obvious question is then: if all (most) financial returns have non-normal distributions, what can we do about it?

Probably, to stop pretending asset returns are “more or less” Gaussian in many applications and conceptualizations

Given that, there are two possibilities. First, to keep assuming that asset returns are IID, but with marginal, unconditional distributions different from the Normal• Such marginal distributions will have to capture the fat tails and

possibly also the presence of asymmetries Second, stop assuming that asset returns are IID and model instead

the presence of dynamics/time-variation in conditional densities• You have done this already: GARCH models!

It turns out that both approaches are needed by high frequency (e.g., daily) return data

Two key approaches to deal with non-normalities: to model conditional Gaussian moments; change the marginal density

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 14: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Non-Normality: t Student Returns

14

Perhaps the most important deviations from normality are the fatter tails and the more pronounced peak in the standardized returns distribution as compared with the normal

The standardized Student, t(d) parameterized by d, is a relatively simple distribution that is well suited to deal with these features:

where d > 2 and () is a standard gamma function• d should be in principle an integer, but d real number is usually

accepted in estimation• It can be shown that the first d moments of t(d) will exist, so that d > 2

is a way to guarantee that at least variance exists• and check out the “gamma” function in Wikepedia

A Student-t distribution captures thickness in the tails in excess of the Gaussian through a power-type pdf

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 15: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Non-Normality: t Student Returns

15

Key feature of the t(d) distribution is that the random variable, z, is taken to a power, rather than an exponential, as in the normal case

This allows t(d) to have fatter tails than the normal, that is, higher values of the density f(z) when z is far from zero Example: exp(-(52)) = 1.39e-011 << (1+(-52)/8)-11/2 = 0.00041 (for d

= 10); 0.00041/1.39e-011 almost 30 million times larger d is the only but key parameter of a t-Student; as d ∞, a t-Student

effectively becomes Gaussian For high frequency standardized returns, the non-standard t(d)

distribution is symmetric around zero, and the mean, variance, skewness (1), and excess kurtosis (2) are:

d is the only but key parameter of a t-Student; as d ∞, a t-Student effectively becomes Gaussian

Lecture 7: Non-normal distributions – Prof. Guidolin

E[zt] Var[zt]

Page 16: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Non-Normality: t Student Returns

16

• Notice that a t-Student is symmetric around the mean, i.e., it has zero odd moments (e.g., skewness)

We can estimate the parameters using MLE or the method of moments• In the MLE case, see earlier lectures• One exploits the knowledge of the density function of the

(standardized) residuals, z• The method of moments (MM) relies on the idea of estimating any

unknown parameters by simply matching the sample moments in the data with the theoretical moments implied by a t-Student density

• Because MM does not exploit the entire empirical density of the data but only a few sample moments, it is clearly not as efficient as MLE

• This means that the Cramer-Rao lower bound won’t be attained• Also recall that while the density f(z) (or the CDF F(z)) has

implications for all the moments (an infinity of them), the moments fail to pin down the density function

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 17: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Non-Normality: t Student Returns

17

• Equivalently, while f(z) MGF(z), the opposite does not hold so that it is NOT true that f(z) MGF(z)

Let us define the sample moments, non central and central, of order i as

Equating sample and theoretical moments, we get the following system to be solved respect to the unknown parameters:

Not surprisingly, because the t-Student is used to capture fat tails, is simply the excess kurtosis coefficient

This means that the higher is the sample kurtosis of returns, the lower is the d.f. parameter d

Lecture 7: Non-normal distributions – Prof. Guidolin

^

^

^ ^

^

^

^

Page 18: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Non-Normality: t Student Returns

18

• Notice also that a low d > 2 has a dimming effect on the volatility coefficient, given sample standard deviation ( ), in the sense that as d ∞, volatility converges to sample standard deviation, but it is otherwise lower

Let’s examine the case of our 4 asset classes, monthly:

• VW CRSP Stock Returns: rt+1= 0.890 + 3.900zt+1 with d = 6.70• VW REIT Returns: rt+1 = 1.052 + 3.780zt+1 with d = 4.69

Under a t-Student distribution, d declines with the sample excess kurtosis of the data and volatility is lower than sample standard deviation

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 19: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Applications of t Student Retuns: Density Modelling

19

• 10Y Treas. Note Returns: rt+1 = 0.670 + 2.034zt+1 with d = 8.57• 1M Treasury Bill Returns: rt+1 = 0.465 + 0.225zt+1 with d = 8.50• d < 9 in all cases is a rather powerful indication of non-normalities

We can generalize Q-Q plots to assess the appropriateness of non-normal distributions• E.g., assess if returns standardized by GARCH conform to the t(d) distr.

The quantiles of t(d) are usually not easily found. One then uses the relationship:

t-Student conditional distributions may often improve GARCH fitLecture 7: Non-normal distributions – Prof. Guidolin

After GARCH(1,1) After t-GARCH(1,1)

Page 20: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Applications of t Student Returns: Value-at-Risk

20

Remember (see Appendix A) that VaRt,K > 0 is such that Pr(RP

t,K < -VaRt,K) = p• The calculation of VaRt,1 is trivial in the univariate case, when n =1,

and Rt,K has a Gaussian density:p = Pr(Rt+1 < -VaRt+1) = Pr((Rt+1 - t+1)/t+1 < -(VaRt+1+ t+1)/t+1)

= Pr(z <-(VaRt+1 + t+1)/ t+1) = Φ(-(VaRt+1+ t+1)/ t+1)

• Φ() is the standard normal CDF; t+1= Et[Rt+1] and 2t+1= Vart[Rt+1]

• Call Φ-1(p) the inverse Gaussian CDF, i.e. the value of z such that Φ(z) = p; clearly, Φ-1(Φ(z)) = z

Then it is easy to see that Φ-1(p) = Φ-1(Φ(-(VaRt+1+ t+1)/t+1)) = -(VaRt+1+ t+1)/t+1, or

VaRt+1(p) = - t+1Φ-1(p) - t+1 > 0 (as Φ-1(p) <0 if p < 0.5) E.g., if t+1= 0% t+1= 2.5%, VaRt+1(1%) = -0.025(-2.33) – 0 = 5.85%

Under Gaussian returns, VaRt+1(p) = - t+1Φ-1(p) - t+1 > 0

Lecture 7: Non-normal distributions – Prof. Guidolin

If t+1 is relatively small

Page 21: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Applications of t Student Returns: Value-at-Risk

21

• This means that on any single day, there is a probability of 1% to record a percentage loss more than 5.85%

• Yes, it is not that high and yet the data used are rather plausible: start having some doubts on the Gaussian density as a density for returns…

• The corresponding absolute VaR on an investment of $10M is then: $VaRt+1 (1%) = (1-exp(-0.028))($10M) = $276,116 a day

What happens when ptf. returns follow a t-Student distribution? In this case, the expression

VaRt+1(p) = - t+1Φ-1(p) - t+1 is easily extended to:VaRtS

t+1(p) = - t+1[(d-2)/d]1/2xx t-1

p (d) - t+1 This derives

from

Under a t-Student, VaRt+1(p) = -t+1[(d-2)/d]1/2 tp-1(d) - t+1 > 0

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 22: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Cornish-Fisher Approximations

22

• For instance, for our monthly data set on stock portfolio returns, t+1= 0.89% t+1= 3.90%, estimated d = 6.70, and t1%

-1(6.70) = -3.036• VaRtS

t+1(1%) = -3.900(-3.036) – 0.890 = 10.95% per month• A Gaussian IID Var would have been: VaRt+1(1%) = = -4.657(-2.326) +

– 0.890 = 9.94% per month, remarkably lower The t(d) distribution is the most used tool that allows for

conditional non-normality in portfolio returns However, it builds on only one parameter and it does not allow for

conditional skewness Approximations represent a simple alternative in risk management

that allow for skewness and excess kurtosis Here Cornish-Fisher approximation (other approximations exist):

t-Student models only accommodate fat tails and fail to capture asymmetries in the empirical distribution of returns

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 23: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

= 0.128= -0.520= -0.423

Cornish-Fisher Approximations

23

The Cornish-Fisher quantile, CF-1p, can be viewed as a Taylor

expansion around the normal distribution If we have neither skewness nor excess kurtosis so that 1 = 2 = 0,

then we simply get the quantile of the normal distribution back, CF-1

p = Φ-1p

• For instance, for our monthly data set on stock portfolio returns, t+1 = 0.89%, t+1 = 3.90%, 1 = -0.584, 2 = 5.226 – 3 = 2.226. Because Φ-1

p = -2.326, we have:

• Therefore CF-11% = -3.148 and VaRCF

t+1(1%) = 13.77% per month

A Cornish-Fisher quantile is an expansion around the Normal that depends on sample skewness and excess kurtosis

Lecture 7: Non-normal distributions – Prof. Guidolin

Sample Skewness Sample Excess Kurtosis

Φ-1p Φ-1(p)

Page 24: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Cornish-Fisher Approximations

24

You can use the difference btw. VaRCFt+1(1%) = 13.77% and

VaRtSt+1(1%) = 10.95% to quantify the importance of negative

skewness for monthly risk management (2.82% per month) Gaussian VaRt+1(1%) = 9.94% looks increasingly dangerous! The following plot concerns 1% VaR for monthly US stock returns

data (i.e., t+1 = 0.89% t+1 = 3.27%) The approach to risk

management followed so far is a bit odd: we careextremely for the left tailof the density of ptf.returns, but we modelthe entire density

Can we do any differently?

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 25: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Extreme Value Theory

25

Typically, the biggest risks to a portfolio is the sudden occurrence of a single large negative return

Having an as-precise-as-possible knowledge of the probabilities of such extremes is therefore essential

Pre-requisite condition: an appropriately scaled version of asset returns must be IID according to some distribution,

• Appropriate scaling will often involve specifying and estimating a volatility (GARCH) model

Consider the probability of standardized returns z less a threshold u being below a value x given that the standardized return itself is beyond the threshold, u:

Extreme value theory estimates (conditional) tail probabilities of IID returns standardized according to an appropriate volatility model

Lecture 7: Non-normal distributions – Prof. Guidolin

0

Page 26: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Extreme Value Theory

26

Hold on: what doesreally mean?

Not really useful torisk managers, is it?

The solution is simple:instead of consideringz (standardizedreturns), consider –z,the negative ofstandardized returns

Notice given u, x > 0, Fu(x) = Pr{-z – u x|-z > u} = Pr{z –(x + u)|z < -u} = Pr{-u > z – (x + u)|z < -u }

Using the general definition of a conditional probability,

Lecture 7: Non-normal distributions – Prof. Guidolin

0

u

x+u

-

Ah! That’s what you want!Pr(A|B)=Pr(AB)/P(B)

Page 27: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Extreme Value Theory

27

So it seems that all one needs is a model of the conditional CDF, as we have been developing so far

However, EVT has one key result: as you let the threshold, u, get large, almost any distribution, Fu(x), converges to the generalized Pareto (GP) distribution, G(x; , ), where > 0

is the key parameter of the GPD: • > 0 implies a thick tail distribution such as the t-Student• = 0 leads to a Gaussian density• < 0 a thin-tailed distribution• Note that = 0 Gaussian, not a surprise: tails decay exponentially

Extreme value theory (EVT) exploits the fact that the tails of the density of any IID series can be approximated by a generalized Pareto as we move towards the extreme tails

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 28: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

^ ^^

Extreme Value Theory

28

At this point,

does not have a “congenial” expression for applied purposes Re-write instead (for y x + u):

• Now let T denote the total sample size and let Tu denote the number of observations beyond the threshold, u

• The term 1 − F(u) can then be estimated simply by the proportion of data point beyond the threshold, u, call it Tu/T

Fu(y-u) can be estimated by MLE on the standardized observations in excess of the chosen threshold

This means: assuming , ≠0, suppose to have obtained ML estimates of and in G(x; , )

Then the resulting CDF is:

Maximum likelihood estimates of the parameters of the GPD can be obtained using standard methods

Lecture 7: Non-normal distributions – Prof. Guidolin

Page 29: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Extreme Value Theory: Hill’s Estimator

29

This way of proceeding represents the “high” way because it is based on MLE + an application of the GPD approximation result• However, this is not the most common approach: when > 0 (the case

of fat tails most common in finance), then a very easy estimator exists, namely the so-called Hill’s estimator

The idea is that a rather complex ML estimation under the GPD may be approximated in the following way (for y > u):

which exploits the fact that the tails are a slowly varying function of y for most distributions and is thus set to a constant, c

See Appendix B for a sketch of proof of the following result

F(y) = 1 – cy-1/ =

Using an approximation based on the fact that the tails have a smooth shape, Hill’s estimator is obtained in closed form

Lecture 7: Non-normal distributions – Prof. Guidolin

^^ [ ]^ -1

Page 30: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Extreme Value Theory: Hill’s Estimator

30

• What is the payoff of all our approximation efforts? Our estimates are available in closed form—they do not require numerical optimization!

• They are therefore extremely easy to calculate• A first application of Hill’s ETV estimator consists of the computation of

(partial) Q-Q plots for returns below some threshold loss –u < 0• It can be shown that the QQ plot from EVT can be built using the

relationship

where yi is the ith standardized loss sorted in descending order (i.e., for negative standardized returns )

• Being a partial CDF estima-tor, ETV-based QQ plots are frequently excellent

• They obviously suffer from consistency issues, as same quantile varies with the threshold u

Lecture 7: Non-normal distributions – Prof. Guidolin

– u

Page 31: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Carefully read these Lecture Slides + class notes

Possibly read CHRISTOFFERSEN, chapter 6

Lecture Notes are available on Prof. Guidolin’s personal web page

Jaschke, S. (2002) “The Cornish-Fisher-Expansion in the Context of Delta-Gamma-Normal Approximations”, Journal of Risk, Number 4, Summer 2002.

Teräsvirta, T. (2009) “An Introduction to Univariate GARCH Models”, in Andersen, T., R. Davis, J.-P. Kreiß, and T. Mikosch, Handbook of Financial Time Series, Springer.

Reading List/How to prepare the exam

31Lecture 7: Non-normal distributions – Prof. Guidolin

Page 32: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

Appendix A: Value-at-Risk

32Lecture 3: Multivariate Time Series Analysis– Prof. Guidolin

Let’s review the definition of (relative) VaR: VaR simply answers the question “What percentage loss is such that it will only be exceeded p x 100% of the time in the next K trading periods (days)?”

Formally: VaRt,K > 0 is such that Pr(RPt,K < -VaRt,K) = p

where RP is a continuously compounded portfolio return The absolute $VaR has a similar definition with “dollar/euro”

replacing “percentage” in the definition above Continuously compounded means that RP

t,K ln(VPt+K) – ln(VP

t) where VP

t is the portfolio value

Absolute $VaR is defined as Pr(exp(RPt,K)< exp(-VaRt,K)) = p or

Pr((VPt+K/VP

t)-1 < exp(-VaRt,K)-1) [subtract 1] = Pr(VP

t+K – VPt < (exp(-VaRt,K)-1)VP

t) [multiply by VPt]

= Pr($Losst,K>(1-exp(-VaRt,K))VPt) = Pr($Losst,K > $VaRt,K) = p

Page 33: Lecture 7: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_7_Non_No… · distributions which are non-normal However ARCH models

^^

Appendix B: Deriving Hill’s Estimator

33Lecture 3: Multivariate Time Series Analysis– Prof. Guidolin

Proceed to develop into B(x)x-1/ and absorb the parameter into the c constant

Writing the log-likelihood function for the approximate conditional density, taking first-order conditions and solving, deliver a simple estimator for yields:

which is easy to implement and remember We can estimate the c parameter by ensuring that the fraction of

observations beyond the threshold is accurately captured by the density as in :

because we have approximated as F(u) = 1 – cu-1/

Solving this equation for c yields:

^ Hill