lecture 6: non normal distributions and their uses in...

22
Lecture 6: Non Normal Distributions and their Uses in GARCH Modelling Prof. Massimo Guidolin 20192– Financial Econometrics Spring 2015

Upload: vuduong

Post on 24-Mar-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Lecture 6: Non Normal Distributions and their Uses in GARCH Modelling

Prof. Massimo Guidolin

20192– Financial Econometrics Spring 2015

Page 2: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Overview

2

Non-normalities in (standardized) residuals from asset return models

Tools to detect non-normalities: Jarque-Bera tests, kernel density estimators, Q-Q plots

Conditional and unconditional t-Student densities; MLE vs. method-of-moment estimation

Cornish-Fisher density approximations and their applications in risk managements

Hints to Extreme Value Theory (EVT)

Lecture 6: Non-normal distributions – Prof. Guidolin

Page 3: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Non-Normality: t Student Returns

14

Perhaps the most important deviations from normality are the fatter tails and the more pronounced peak in the standardized returns distribution as compared with the normal

The standardized Student, t(d) parameterized by d, is a relatively simple distribution that is well suited to deal with these features:

where d > 2 and l() is a standard gamma function • d should be in principle an integer, but d real number is usually

accepted in estimation • It can be shown that the first d moments of t(d) will exist, so that d > 2

is a way to guarantee that at least variance exists • and check out the “gamma” function in Wikepedia

A Student-t distribution captures thickness in the tails in excess of the Gaussian through a power-type pdf

Lecture 6: Non-normal distributions – Prof. Guidolin

Page 4: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Non-Normality: t Student Returns

15

Key feature of the t(d) distribution is that the random variable, z, is taken to a power, rather than an exponential, as in the normal case

This allows t(d) to have fatter tails than the normal, that is, higher values of the density f(z) when z is far from zero Example: exp(-(52)) = 1.39e-011 << (1+(-52)/8)-11/2 = 0.00041 (for d

= 10); 0.00041/1.39e-011 almost 30 million times larger d is the only but key parameter of a t-Student; as d ∞, a t-Student

effectively becomes Gaussian For high frequency standardized returns, the non-standard t(d)

distribution is symmetric around zero, and the mean, variance, skewness (1), and excess kurtosis (2) are:

d is the only but key parameter of a t-Student; as d ∞, a t-Student effectively becomes Gaussian

Lecture 6: Non-normal distributions – Prof. Guidolin

Page 5: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Non-Normality: t Student Returns

16

• Notice that a t-Student is symmetric around the mean, i.e., it has zero odd moments (e.g., skewness)

We can estimate the parameters using MLE or the method of moments • In the MLE case, see earlier lectures • One exploits the knowledge of the density function of the

(standardized) residuals, z • The method of moments (MM) relies on the idea of estimating any

unknown parameters by simply matching the sample moments in the data with the theoretical moments implied by a t-Student density

• Because MM does not exploit the entire empirical density of the data but only a few sample moments, it is clearly not as efficient as MLE

• This means that the Cramer-Rao lower bound won’t be attained • Also recall that while the density f(z) (or the CDF F(z)) has

implications for all the moments (an infinity of them), the moments fail to pin down the density function Lecture 6: Non-normal distributions – Prof. Guidolin

Page 6: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Non-Normality: t Student Returns

17

• Equivalently, while f(z) MGF(z), the opposite does not hold so that it is NOT true that f(z) MGF(z)

Let us define the sample moments, non central and central, of order i as

Equating sample and theoretical moments, we get the following system to be solved respect to the unknown parameters: Not surprisingly, because the t-Student is used to capture fat tails,

is simply the excess kurtosis coefficient This means that the higher is the sample kurtosis of returns, the lower

is the d.f. parameter d Lecture 6: Non-normal distributions – Prof. Guidolin

^

^

^ ^

^

^

^

Page 7: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Non-Normality: t Student Returns

18

• Notice also that a low d > 2 has a dimming effect on the volatility coefficient, given sample standard deviation ( ), in the sense that as d ∞, volatility converges to sample standard deviation, but it is otherwise lower

Let’s examine the case of our 4 asset classes, monthly: • VW CRSP Stock Returns: rt+1= 0.890 + 3.900zt+1 with d = 6.70 • VW REIT Returns: rt+1 = 1.052 + 3.780zt+1 with d = 4.69

Under a t-Student distribution, d declines with the sample excess kurtosis of the data and volatility is lower than sample standard deviation

Lecture 6: Non-normal distributions – Prof. Guidolin

Page 8: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Applications of t Student Retuns: Density Modelling

19

• 10Y Treas. Note Returns: rt+1 = 0.670 + 2.034zt+1 with d = 8.57 • 1M Treasury Bill Returns: rt+1 = 0.465 + 0.225zt+1 with d = 8.50 • d < 9 in all cases is a rather powerful indication of non-normalities

We can generalize Q-Q plots to assess the appropriateness of non-normal distributions • E.g., assess if returns standardized by GARCH conform to the t(d) distr.

The quantiles of t(d) are usually not easily found. One then uses the relationship:

t-Student condiotional distributions may often improve GARCH fit Lecture 6: Non-normal distributions – Prof. Guidolin

After GARCH(1,1) After t-GARCH(1,1)

Page 9: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Applications of t Student Returns: Value-at-Risk

20

Remember (see Appendix A) that VaRt,K > 0 is such that Pr(RP

t,K < -VaRt,K) = p • The calculation of VaRt,1 is trivial in the univariate case, when n =1,

and Rt,K has a Gaussian density: p = Pr(Rt+1 < -VaRt+1) = Pr((Rt+1 - t+1)/t+1 < -(VaRt+1+ t+1)/t+1)

= Pr(z <-(VaRt+1 + t+1)/ t+1) = Φ(-(VaRt+1+ t+1)/ t+1)

• Φ() is the standard normal CDF; t+1= Et[Rt+1] and 2t+1= Vart[Rt+1]

• Call Φ-1(p) the inverse Gaussian CDF, i.e. the value of z such that Φ(z) = p; clearly, Φ-1(Φ(z)) = z

Then it is easy to see that Φ-1(p) = Φ-1(Φ(-(VaRt+1+ t+1)/t+1)) = -(VaRt+1+ t+1)/t+1, or VaRt+1(p) = - t+1Φ-1(p) - t+1 > 0 (as Φ-1(p) <0 if p < 0.5) E.g., if t+1= 0% t+1= 2.5%, VaRt+1(1%) = -0.025(-2.33) – 0 = 5.85%

Under Gaussian returns, VaRt+1(p) = - t+1Φ-1(p) - t+1 > 0

Lecture 6: Non-normal distributions – Prof. Guidolin

If t+1 is relatively small

Page 10: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Applications of t Student Returns: Value-at-Risk

21

• This means that on any single day, there is a probability of 1% to record a percentage loss more than 5.85%

• Yes, it is not that high and yet the data used are rather plausible: start having some doubts on the Gaussian density as a density for returns…

• The corresponding absolute VaR on an investment of $10M is then: $VaRt+1 (1%) = (1-exp(-0.028))($10M) = $276,116 a day

What happens when ptf. returns follow a t-Student distribution? In this case, the expression

VaRt+1(p) = - t+1Φ-1(p) - t+1 is easily extended to:

VaRtSt+1(p) = - t+1[(d-2)/d]1/2x

x t-1p (d) - t+1

This derives from

Under a t-Student, VaRt+1(p) = -t+1[(d-2)/d]1/2 tp-1(d) - t+1 > 0

Lecture 6: Non-normal distributions – Prof. Guidolin

Page 11: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Cornish-Fisher Approximations

22

• For instance, for our monthly data set on stock portfolio returns, t+1= 0.89% t+1= 3.90%, estimated d = 6.70, and t1%

-1(6.70) = -3.036 • VaRtS

t+1(1%) = -3.900(-3.036) – 0.890 = 10.95% per month • A Gaussian IID Var would have been: VaRt+1(1%) = = -4.657(-2.326) +

– 0.890 = 9.94% per month, remarkably lower The t(d) distribution is the most used tool that allows for

conditional non-normality in portfolio returns However, it builds on only one parameter and it does not allow for

conditional skewness Approximations represent a simple alternative in risk management

that allow for skewness and excess kurtosis Here Cornish-Fisher approximation (other approximations exist):

t-Student models only accommodate fat tails and fail to capture asymmetries in the empirical distribution of returns

Lecture 6: Non-normal distributions – Prof. Guidolin

Page 12: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

= 0.128 = -0.520 = -0.423

Cornish-Fisher Approximations

23

The Cornish-Fisher quantile, CF-1

p, can be viewed as a Taylor expansion around the normal distribution

If we have neither skewness nor excess kurtosis so that 1 = 2 = 0, then we simply get the quantile of the normal distribution back, CF-1

p = Φ-1p

• For instance, for our monthly data set on stock portfolio returns, t+1 = 0.89%, t+1 = 3.90%, 1 = -0.584, 2 = 5.226 – 3 = 2.226. Because Φ-1

p = -2.326, we have:

• Therefore CF-1

1% = -3.148 and VaRCFt+1(1%) = 13.77% per month

A Cornish-Fisher quantile is an expansion around the Normal that depends on sample skewness and excess kurtosis

Lecture 6: Non-normal distributions – Prof. Guidolin

Sample Skewness Sample Excess Kurtosis

Φ-1p Φ-1(p)

Page 13: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Cornish-Fisher Approximations

24

You can use the difference btw. VaRCFt+1(1%) = 13.77% and

VaRtSt+1(1%) = 10.95% to quantify the importance of negative

skewness for monthly risk management (2.82% per month) Gaussian VaRt+1(1%) = 9.94% looks increasingly dangerous! The following plot concerns 1% VaR for monthly US stock returns

data (i.e., t+1 = 0.89% t+1 = 3.27%) The approach to risk

management followed so far is a bit odd: we care extremely for the left tail of the density of ptf. returns, but we model the entire density

Can we do any differently?

Lecture 6: Non-normal distributions – Prof. Guidolin

Page 14: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Extreme Value Theory

25

Typically, the biggest risks to a portfolio is the sudden occurrence of a single large negative return

Having an as-precise-as-possible knowledge of the probabilities of such extremes is therefore essential

Pre-requisite condition: an appropriately scaled version of asset returns must be IID according to some distribution, • Appropriate scaling will often involve specifying and estimating a

volatility (GARCH) model Consider the probability of standardized returns z less a

threshold u being below a value x given that the standardized return itself is beyond the threshold, u:

Extreme value theory estimates (conditional) tail probabilities of IID returns standardized according to an appropriate volatility model

Lecture 6: Non-normal distributions – Prof. Guidolin

0

Page 15: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Extreme Value Theory

26

Hold on: what does really mean?

Not really useful to risk managers, is it?

The solution is simple: instead of considering z (standardized returns), consider –z, the negative of standardized returns

Notice given u, x > 0, 1 - Fu(x) = 1 - Pr{-z – u x|-z > u} = 1 - Pr{z –(x + u)|z < -u} = Pr{z – (x + u)|z < -u }

Using the general definition of a conditional probability,

Lecture 6: Non-normal distributions – Prof. Guidolin

0

u

x+u

-

Ah! That’s what you want! Pr(A|B)=Pr(AB)/P(B)

Page 16: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Extreme Value Theory

27

So it seems that all one needs is a model of the conditional CDF, as we have been developing so far

However, EVT has one key result: as you let the threshold, u, get large, almost any distribution, Fu(x), converges to the generalized Pareto (GP) distribution, G(x; , ), where > 0

is the key parameter of the GPD: • > 0 implies a thick tail distribution such as the t-Student • = 0 leads to a Gaussian density • < 0 a thin-tailed distribution • Note that = 0 Gaussian, not a surprise: tails decay exponentially

Extreme value theory (EVT) exploits the fact that the tails of the density of any IID series can be approximated by a generalized Pareto as we move towards the extreme tails

Lecture 6: Non-normal distributions – Prof. Guidolin

Page 17: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

^ ^ ^

Extreme Value Theory

28

At this point,

does not have a “congenial” expression for applied purposes Re-write instead (for y x + u):

• Now let T denote the total sample size and let Tu denote the number of observations beyond the threshold, u

• The term 1 − F(u) can then be estimated simply by the proportion of data point beyond the threshold, u, call it Tu/T

Fu(y-u) can be estimated by MLE on the standardized observations in excess of the chosen threshold

This means: assuming , ≠0, suppose to have obtained ML estimates of and in G(x; , )

Then the resulting CDF is:

Maximum likelihood estimates of the parameters of the GPD can be obtained using standard methods

Lecture 6: Non-normal distributions – Prof. Guidolin

Page 18: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Extreme Value Theory: Hill’s Estimator

29

This way of proceeding represents the “high” way because it is based on MLE + an application of the GPD approximation result • However, this is not the most common approach: when > 0 (the case

of fat tails most common in finance), then a very easy estimator exists, namely the so-called Hill’s estimator

The idea is that a rather complex ML estimation under the GPD may be approximated in the following way (for y > u):

which exploits the fact that the tails are a slowly varying function of y for most distributions and is thus set to a constant, c

See Appendix B for a sketch of proof of the following result

F(y) = 1 – cy-1/ =

Using an approximation based on the fact that the tails have a smooth shape, Hill’s estimator is obtained in closed form

Lecture 6: Non-normal distributions – Prof. Guidolin

^ ^ [ ] ^ -1

Page 19: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Extreme Value Theory: Hill’s Estimator

30

• What is the payoff of all our approximation efforts? Our estimates are available in closed form—they do not require numerical optimization!

• They are therefore extremely easy to calculate • A first application of Hill’s ETV estimator consists of the computation of

(partial) Q-Q plots for returns below some threshold loss –u < 0 • It can be shown that the QQ plot from EVT can be built using the

relationship

where yi is the ith standardized loss sorted in descending order (i.e., for negative standardized returns )

• Being a partial CDF estima- tor, ETV-based QQ plots are frequently excellent

• They obviously suffer from consistency issues, as same quantile varies with the threshold u Lecture 6: Non-normal distributions – Prof. Guidolin

– u

Page 20: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Carefully read these Lecture Slides + class notes

Possibly read CHRISTOFFERSEN, chapter 6 Lecture Notes are available on Prof. Guidolin’s personal web page Jaschke, S. (2002) “The Cornish-Fisher-Expansion in the Context of Delta-Gamma-

Normal Approximations”, Journal of Risk, Number 4, Summer 2002.

Teräsvirta, T. (2009) “An Introduction to Univariate GARCH Models”, in Andersen, T., R. Davis, J.-P. Kreiß, and T. Mikosch, Handbook of Financial Time Series, Springer.

Reading List/How to prepare the exam

31 Lecture 6: Non-normal distributions – Prof. Guidolin

Page 21: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

Appendix A: Value-at-Risk

32 Lecture 3: Multivariate Time Series Analysis– Prof. Guidolin

Let’s review the definition of (relative) VaR: VaR simply answers the question “What percentage loss is such that it will only be exceeded p x 100% of the time in the next K trading periods (days)?”

Formally: VaRt,K > 0 is such that Pr(RPt,K < -VaRt,K) = p

where RP is a continuously compounded portfolio return The absolute $VaR has a similar definition with “dollar/euro”

replacing “percentage” in the definition above Continuously compounded means that RP

t,K ln(VPt+K) – ln(VP

t) where VP

t is the portfolio value

Absolute $VaR is defined as Pr(exp(RPt,K)< exp(-VaRt,K)) = p or

Pr((VPt+K/VP

t)-1 < exp(-VaRt,K)-1) [subtract 1] = Pr(VP

t+K – VPt < (exp(-VaRt,K)-1)VP

t) [multiply by VPt]

= Pr($Losst,K>(1-exp(-VaRt,K))VPt) = Pr($Losst,K > $VaRt,K) = p

Page 22: Lecture 6: Non Normal Distributions and their Uses in ...didattica.unibocconi.it/mypage/dwload.php?nomefile=Lec_6_Non...Non-Normality: t Student Returns 14 Perhaps the most important

^ ^

Appendix B: Deriving Hill’s Estimator

33 Lecture 3: Multivariate Time Series Analysis– Prof. Guidolin

Proceed to develop into B(x)x-1/ and absorb the parameter into the c constant

Writing the log-likelihood function for the approximate conditional density, taking first-order conditions and solving, deliver a simple estimator for yields:

which is easy to implement and remember We can estimate the c parameter by ensuring that the fraction of

observations beyond the threshold is accurately captured by the density as in :

because we have approximated as F(u) = 1 – cu-1/

Solving this equation for c yields:

^ Hill