nonparametric density estimation in nance and insurance · the thesis \nonparametric density...
TRANSCRIPT
Nonparametric density estimation in financeand insurance
Galyna Ignatenko
August 24, 2010
Master thesis
Supervisors: Bert van Es and Peter Spreij
KdV Instituut voor wiskunde
Faculteit der Natuurwetenschappen, Wiskunde en Informatica
Universiteit van Amsterdam
AbstractVolatility density estimation and estimation of loss distributionsare discussed. The bivariate deconvolution kernel volatilitydensity estimator is used for the volatility process. An FFT-based computational algorithm is developed and simulations ofbivariate volatility densities, reflecting clustering phenomena,are performed. Also kernel density estimation with a Champer-nowne transformation for the estimation of loss distributions isdiscussed.
KEYWORDS: Volatility, Clustering, Loss distribution, Kernel es-timation, Deconvolution, Transformation.
DetailsTitle: Nonparametric density estimation in finance and insuranceAuthor: Galyna Ignatenko, [email protected], 5901065Supervisors: Bert van Es and Peter SpreijFinished: August 24, 2010
Korteweg-de Vries Instituut voor WiskundeUniversiteit van AmsterdamScience Park 904 1098 XH Amsterdamhttp://www.science.uva.nl/math
Contents
1 Introduction 3
2 Kernel methods in the density estimation 52.1 Density estimation and its usage in the exploration of data . . 52.2 Kernel density estimation . . . . . . . . . . . . . . . . . . . . 62.3 Transformed kernel density estimation . . . . . . . . . . . . . 82.4 Deconvolution kernel density estimation . . . . . . . . . . . . 92.5 Multivariate kernel density estimation . . . . . . . . . . . . . . 11
3 Deconvolution Kernel Volatility Density Estimation 133.1 A stochastic model for the price of an asset . . . . . . . . . . . 133.2 Deconvolution Kernel Volatility Density Estimation - Univari-
ate case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3 Deconvolution Kernel Volatility Density Estimation - Bivari-
ate case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.4 Computational considerations . . . . . . . . . . . . . . . . . . 16
3.4.1 Computational algorithm for the deconvolution kernelvolatility density estimation - univariate case . . . . . . 17
3.4.2 Computational algorithm for the deconvolution kernelvolatility density estimation - bivariate case . . . . . . 18
3.5 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . 203.5.1 The bivariate deconvolution kernel density estimation
of the volatility simulated as the GARCH(1, 1) process 203.5.2 The bivariate deconvolution kernel density estimation
of the volatility simulated as the mixture of twoGARCH(1, 1)processes . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5.3 The bivariate deconvolution kernel density estimationof the volatility simulated according to the De Vriesmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1
4 Nonparametric estimation of loss distributions 264.1 Champernowne transformation of the insurance data . . . . . 264.2 Estimation of loss distributions . . . . . . . . . . . . . . . . . 29
5 Conclusions 30
A Notation 31
B The Fourier transform 32
C R-code 34C.1 The bivariate deconvolution kernel density estimation of the
volatility simulated as the GARCH(1, 1) process . . . . . . . . 34C.2 The bivariate deconvolution kernel density estimation of the
volatility simulated as the mixture of two GARCH(1, 1) pro-cesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
C.3 The bivariate deconvolution kernel density estimation of thevolatility simulated according to the De Vries model . . . . . . 41
2
Chapter 1
Introduction
The thesis “Nonparametric density estimation in finance and insurance“ isfocused on two problems: estimation of a bivariate density of a financialvolatility process and estimation of loss distributions in insurance. Our maininterest will be in the financial application. Estimation of loss distributionsshows another interesting application but is not investigated as extensivelyas the first one.In Chapter 3 we consider estimation of the volatility density. Volatility is themeasure of uncertainty in the return of an asset [8]. Knowledge about thevolatility process is very important in areas such as portfolio managementand option pricing. The main goal of this thesis is to perform simulationsdetecting volatility clustering, known as one of the stylized properties ofthe volatility process. Its essence can be described by the following: smallchanges in a price of an asset tend to be followed by small changes and largechanges tend to be followed by large changes [5]. Estimation of a bivariatedensity of the volatility process will be used as an illustration of this phe-nomena.To motivate usage of nonparametric estimation consider the following stochas-tic volatility model. Assume that the log price St satisfies the equality
dSt = btdt+ σtdWt, S0 = 0,
where a drift bt and a volatility σt satisfy regularity conditions and Wt is astandard Brownian motion. It is assumed that the evolution of the volatilityis driven by a Brownian motion Bt and that it is modeled by
dXt = d(Xt)dt+ a(Xt)dBt.
We can for instance model the processes σ2t or log σ2
t by a process like Xt.Then, under regularity conditions, the invariant density of the volatility is
3
equal to
π(x) =1
Ma2(x)exp
(2
∫ x
x0
d(u)
a2(u)du). (1.1)
The invariant density describes a variety of shapes depending on the param-eters. This motivates us to use nonparametric methods for volatility densityestimation in this thesis.An illustration of volatility clustering will be made using the bivariate densi-ties of (log σ2
t , log σ2t+1). One of the difficulties of volatility density estimation
is that the volatility process is a non-observable process which does not allowus to use standard estimation techniques. The deconvolution kernel volatilitydensity estimator for discrete time models introduced in [6] will be used. Theestimation will be performed with a specially developed computational algo-rithm based on the two dimensional fast Fourier transformation described inAppendix B.The second application in Chapter 4 discusses the use of nonparametric es-timation in insurance. The loss distribution is the probability distributionof the amount to be paid to the insured for the damage, see [2]. Estima-tion of loss distributions is a fundamental part of the business because largelosses can have significant influence on the profit of the company. There isno single parametric model for estimation of loss distributions. In practice,a threshold between large and small losses is determined first and then twodifferent parametric models are used for the estimation. This motivates usto consider nonparametric estimation in this case. The kernel density esti-mation approach with the modified Champernowne transformation, see [4],will be discussed.
4
Chapter 2
Kernel methods in the densityestimation
In this chapter the general theory of the nonparametric approach to densityestimation is discussed. The kernel estimator and its transformed extension,the deconvolution kernel estimator are described along with their multivariateextensions. Asymptotic properties of the kernel estimator are derived.
2.1 Density estimation and its usage in the
exploration of data
Given the random variable X defined on a probability space (R,B,P), wedefine the cumulative distribution function (cdf) as P (X ≤ x) =
∫ x−∞ dP.
This function is remarkable because it fully describes the distribution of X.If the measure P is absolutely continuous w.r.t to the Lebesgue measure, it ispossible to express the cdf as
∫ x−∞ f(x)dx, where f(x) is called a probability
density function. This function also fully determines the distribution of Xand will later be referred to as the density.Suppose now that we have a set of observations X1, X2, ...Xn sampled froman unknown distribution with density f(x). Practitioners are interested intwo questions: how to estimate the density from the sample and how thisestimate can be used in data analysis.The first question can be answered by parametric or nonparametric estima-tion. The parametric procedure includes an assumption about how the datais distributed, in terms of a parametric model, while the nonparametric ap-proach does not require any distribution-related assumptions and let the data”speak for itself”, see [10]. In this thesis we study a financial and an insuranceapplication: concerning the volatility process of a financial time series and
5
losses in insurance. These processes have such complex structure that exist-ing parametric procedures can not reflect all their features. Nonparametricestimation, the subject of our research, is more likely to do that.
2.2 Kernel density estimation
Omitting the histogram, the oldest nonparametric estimator, and its slightimprovement, the naive estimator, we discuss the first smooth estimator -the kernel density estimator, described, for example, in [13]. It is assumedthat we have a sample of i.i.d. observations X1, X2, ..., Xn, taken from acontinuous univariate distribution with a density fX which we are trying toestimate. The kernel density estimator of fX is given by
fX(x;h) =1
nh
n∑i=1
K(x−Xi
h
)= n−1
n∑i=1
Kh(x−Xi). (2.1)
The function K is called the kernel and satisfies∫K(x)dx = 1, the function
Kh(u) = 1hK(u
h) is the scaled kernel and the positive variable h is called the
bandwidth or the smoothing parameter. From Equation (2.1) it is seen thatthe kernel density estimate in a certain point is constructed by placing ascaled kernel at all observation points and then computing the average of nscaled kernel ordinates at the point we are interested in. From the notationit is clear that the spread of the scaled kernel depends on the bandwidth,hence this parameter controls the smoothness of the estimate.It should be noted that the kernel estimate is the convolution of the scaledkernel Kh and the empirical distribution function of the data defined asϕemp(t) = 1
n
∑ni=1 e
itXi . This observation is vital for effective computations ofthe density. Taking the Fourier transform (FT) of both sides of the equationleads to the equality
ϕfX (t) = ϕK(ht)ϕemp(t).
Here we have used properties of the FT, see Appendix B.After the kernel estimator is constructed we are interested in the quality ofthe estimation at a fixed point. A natural measure for this purpose is themean square error (MSE), given by
MSE(fX(x;h)) = E[(fX(x;h)− fX(x))2].
By standard properties of the mean and the variance we have
MSE(fX(x;h)) = V ar(fX(x;h)) + (E[fX(x;h)− fX(x)])2. (2.2)
6
The expression E[fX(x;h) − fX(x)] is called the bias of the estimator. Thenotation biash(x) will be used further.The expressions for the bias and the variance are :
biash(x) = EfX(x;h)− fX(x) =
∫Kh(x− y)fX(y)dy − fX(x), (2.3)
Var(fX(x;h)) =1
n
∫K2h(x− y)fX(y)dy − 1
n(fX(x) + biash(x))2. (2.4)
Since the bias and the variance are important for understanding the perfor-mance of the estimator, one would like to express them in a more intuitiveway. Fortunately, it is possible to obtain approximations of (2.3) and (2.4)under the conditions presented below. The kernel function is assumed tosatisfy
∫K(t)dt = 1, (2.5)∫tK(t)dt = 0, (2.6)∫t2K(t)dt = C > 0, (2.7)
and the unknown density fX is assumed to have continuous derivatives of allorders requested. Also we assume that limn→∞ hn = 0 and limn→∞ nhn =∞. All asymptotic approximations of properties of univariate kernel-basedestimators considered in this thesis will be performed under these assumtions.After using assumption (2.5) and after the change of the variable y = x− htwe have
biash(x) =
∫K(t)fX(x− ht)dt− fX(x) =
∫K(t)(fX(x− ht)− fX(x))dt.
Using a Taylor series expansion of the function f in the point x − ht weobtain
biash(x) = −hf ′X(x)
∫tK(t)dt+
1
2h2f ′′X(x)
∫t2K(t)dt+ ... =
1
2h2C + higher-order terms in h.
7
Hence we havebiash(x) = O(h2). (2.8)
Now the variance approximation based on (2.4) and (2.8). After the changeof variable y = x− ht we get
V ar(fX(x;h)) ≈1
nh
∫K2(t)fX(x− ht)dt− 1
n(fX(x) +O(h2))2.
The Taylor expansion of fX(x − ht) and the assumption that 1n(fX(x) +
O(h2)) = O( 1n) yields
V ar(fX(x;h)) ≈1
nh
∫(fX(x)− htf ′X(x) + ...)K2(t)dt+O
( 1
n
)(2.9)
=1
nhfX(x)
∫K2(t)dt+O
( 1
n
)≈
1
nhfX(x)
∫K2(t)dt. (2.10)
It should be noted from (2.8) and (2.10) that higher values of h will reduce thevariance, but magnify the bias. On the other hand, smaller values will reducethe systematic error at the expense of the random error. This shows that atrade-off between bias and variance takes place and a smoothing parameteris essentially the only parameter affecting this trade-off. The problem of anoptimal smoothing parameter occurs whatever method of density estimationis being used. Even though this problem is very interesting for investigation,in this thesis the choice of bandwidth is performed manually on the basis ofpractical experience.
2.3 Transformed kernel density estimation
The kernel estimator presented in the previous section gives a clear estima-tion procedure, but it has drawbacks. The problem is that it is not applicableto all types of data, especially when the data is bounded from one or fromboth sides. This is a so-called boundary problem.Such problems can be solved by a transformation approach. If the kernelestimation of the density fX of the random sample X1, X2, ..., Xn is not sat-isfactory, this sample can be transformed into Y1, Y2, ..., Yn with a densityfY which is more convenient for kernel estimation. Then one would performback-transformation of the estimate fY to obtain the estimate fX .Assume that the one-to-one transformation of the data is given by Yi = T (Xi)where T is an increasing differentiable function defined on the support of fX .
8
From a standard result from probability theory, see for instance [1], the den-sity fY is equal to
fY (y) = fX(T−1(y))| ddyT−1(y)| = fX(T−1(y)
T ′(T−1(y)).
Hence fX(x) can be obtained by the fX(x) = fY (T (x))T ′(x). A transfor-mation density estimator of fX is obtained by replacing fY by its kernelestimator. We then get
fX(x, h, T ) =1
n
n∑i=1
Kh(T (x)− T (Xi))T′(x). (2.11)
Asymptotic bias and variance of a transformation kernel density estimationare stated in the following theorem [4]
Theorem 1. Let X1, X2, ..., Xn be i.i.d random variables having a continuousunivariate distribution with the density fX . Assume that T is an increasingdifferentiable function defined on the support of fX and let fX(x, h, T ) be atransformation kernel density estimator defined as in 2.11. Then the biasand the variance of fX(x, h, T ) are given by
E[fX(x, h, T )] = fX(x) +1
2µ2(K)h2
((fX(x)
T ′(x)
)′ 1
T ′(x)
)′+ o(h2),
V ar[fX(x, h, T )] =1
nhR(K)T
′(x)fX(x) + o(
1
nh),
as n→∞, where µ2(K) =∫u2K(u)du and R(K) =
∫K2(u)du.
The choice of a transformation mostly depends on the problem one considers.For example, the estimation of loss distribution discussed later in this thesisincludes the modified Champernowne transformation introduced in [4].
2.4 Deconvolution kernel density estimation
As previously seen, the kernel density estimator and its transformed improve-ment require direct data, however it is not always fulfilled on practice. If,for any reason, we can not observe data of interest directly, other methods ofestimation are needed. An example is the additive measurement error modelgiven by
Yi = Xi + Zi, i = 1, ..., n. (2.12)
9
Where Y1, ..., Yn are the observed data, while we want to estimate the densityfX of the non-observed sample X1, .., Xn. The i.i.d random variables Zi,independent of the Xi, represent a noise process with a known density fZwhich will be further referred to as the error density.From probability theory it is known that the density of the sum of twoindependent random variables is the convolution of corresponding densities,see [1],
fY = fX ∗ fZ .
Due to the time-convolution theorem, given in Appendix B, a simple expres-sion for the characteristic function of fY is obtained
ϕfY = ϕfXϕfZ .
Let fX and fY be estimates of fX and fY , correspondingly. An expressionfor the characteristic function of fX
ϕfX =ϕfYϕfZ
.
Now, the estimation of fX consists of a few steps. First, kernel densityestimation based on the observed Yi, yielding the density estimate fY , isused to compute ϕfY . Note that ϕfY (t) = ϕk(ht)ϕemp(t). After that, the
estimate fX can be found by the inverse Fourier transformation (IFT)
fX(x) =1
2π
∫e−itxϕfX (t)dt =
1
2π
∫e−itx
ϕfY (t)
ϕfZ (t)dt. (2.13)
The final expression for the estimate fX(x, h) is
fX(x;h) =1
2π
∫e−itx
ϕK(ht)ϕemp(t)
ϕfZ(t)
dt. (2.14)
Or
fX(x;h) =1
nh
n∑j=1
vh
(x− Yjh
), (2.15)
where
vh(x) =1
2π
∫ ∞−∞
ϕK(s)
ϕfZ (s/h)e−isxds. (2.16)
Asymptotic properties of the variance and the bias of the deconvolution ker-nel density estimator will be described in Chapter 3 for estimation of thevolatility density.
10
2.5 Multivariate kernel density estimation
The aim of this thesis is to perform simulations of a bivariate volatility den-sity detecting volatility clustering. Hence a multivariate extension of ker-nel estimation is needed. Fortunately, the kernel density estimation has astraightforward extension to the multivariate case, see [13].Let X1,X2, ...,Xn denote a d-variate random sample with density fX.The d-dimensional kernel density estimator is defined as:
fX(x; H) = n−1
n∑i=1
KH(x−Xi), (2.17)
where H is a symmetric positive definite d× d matrix called the bandwidthmatrix and x = (x1, x2, ..., xd) is a real vector with d components. The scaledkernel is defined as
KH(x) = |H|−12K(H−
12x),
and K is a d-variate kernel function satisfying∫dK(x)dx = 1 which can be
chosen in different ways. In this thesis the product rule is used which meansthat the multivariate kernel K is generated from a symmetric univariatekernel K by the following rule:
K(x) =d∏i=1
K(xi).
The kernel K is called a product kernel.Before discussing asymptotic approximations of bias and variance of the es-timator described by (2.17) we need the following condition, see [13]
Condition 2. A density fX, a bandwidth matrix H and a kernel K areassumed to satisfy the following conditions:
• Each entry of the Hessian matrix Hf (·) of f is piecewise continuousand square integrable,
• H = Hn is a sequence of bandwidth matrices such that n−1|H|−1/2 andall entries of H approach to zero as n → ∞. Also it is assumed thatthe ratio of the largest and smallest eigenvalues of H is bounded for alln,
• K is bounded, compactly supported d-variate kernel satisfying∫dK(x)dx =
1,∫dxK(x)dx = 0 and
∫dxxTK(x)dx = µ2(K)I
where µ2(K) =∫dx2iKdx is independent of i.
11
Assume that Condition 2 is satisfied, then the asymptotic bias and the vari-ance of the estimator fX(x; H) defined as in (2.17) satisfy
EfX(x; H) = fX(x) +1
2µ2(K)tr(HHf (·)) + o(tr(H)),
V arfX(x; H) = n−1|H|−1/2R(K)fX(x) + o(n−1|H|−1/2),
where R(K) =∫dK(x)2dx.
These approximations seem quite complicated, but they appear in a sim-pler way if a certain form of a smoothing matrix is chosen. If, for exam-ple, a matrix is chosen to be diagonal H = diag(h2
1, ..., h2d), then |H|−1/2 =(∏d
j=1 hj
)−1
, which already makes expressions above less complicated. The
formula of the kernel estimator in this case is the following
fX(x; H) = n−1(d∏l=1
hl)−1
n∑i=1
K(x1 −Xi1
h1
, ...,xd −Xid
hd
). (2.18)
This form of a smoothing matrix allows to perform estimation in directionsof coordinates x and y. If an estimation in other directions needed, a smooth-ing matrix has to be chosen differently. This thesis deals with the densityestimation of volatility pairs (σt, σt+1). In this case the following smoothingmatrix is used:
H = diag(h2, ..., h2).
This choice was made because the range of the first coordinate in the volatilitypair is approximately equal to the range of the second coordinate.
12
Chapter 3
Deconvolution Kernel VolatilityDensity Estimation
In this chapter, the theoretical background of uni- and bivariate deconvolu-tion kernel volatility density estimation for discrete time models [7] is given.The computational algorithm, based on the two dimensional fast Fouriertransform, of the bivariate volatility density estimation is developed andsimulation results are discussed.
3.1 A stochastic model for the price of an
asset
Assume that the asset price Pt follows the model for Geometrical BrownianMotion [8]
dPt = µPtdt+ σtPtdWt, P0 = 1. (3.1)
Let St = logPt, then, after applying the Ito theorem and defining bt asµ− 1
2σ2t , the previous equation can be rewritten in terms of St
dSt = btdt+ σtdWt, S0 = 0, (3.2)
where σt and Wt are independent for every t, bt and σt satisfy regularityconditions. In this chapter a bivariate volatility density estimator will beconstructed without any parametric assumptions about the volatility behav-ior.Consider the discrete time model for the log price without the drift term
St = σtZt, S0 = 0, (3.3)
13
where σt is independent of Zt for each t and Zt is a standard Gaussian noisesequence. After taking a logarithm of both sides of the equation we obtainan additive measurement error model as in (2.12). We have
logS2t = log σ2
t + logZ2t , (3.4)
where the log price St is an observable component while the volatility σt isa non-observable. This model allows the application of deconvolution kerneldensity estimation which is the topic of the next section.
3.2 Deconvolution Kernel Volatility Density
Estimation - Univariate case
Assume that the log price evolves according to the model (3.3) and assumethat σ is a positive, strictly stationary process satisfying a certain mixingcondition and that the univariate marginal distributions of σ have a densityw.r.t to the Lebesgue measure on (0,∞). From the previous section we have
logS2t = log σ2
t + logZ2t .
Here Zt are i.i.d standard normal random variables. This implies that thedensity and the characteristic function of logZ2
t are the following
fZ(x) =1√2πex/2e−e
x/2,
ϕZ(t) =1√π
2itΓ(1
2+ it).
Where Γ denotes the complex-valued Gamma function.The kernel function used in the straight kernel estimation is chosen to be thesame as in [7]
K(x) =48x(x2 − 15) cosx− 144(2x2 − 5) sinx
πx7, (3.5)
which has the characteristic function
ϕK(t) = (1− t2)3, |t| ≤ 1.
Using results of Chapter 2 we can construct the deconvolution kernel volatil-ity density estimator in the point x = log σ2
t given log prices S1, ..., Sn. Wedefine
f(x;h) =1
nh
n∑j=1
vh
(x− logS2j
h
), (3.6)
14
where
vh(x) =1
2π
∫ ∞−∞
ϕK(s)
ϕZ(s/h)e−isxds. (3.7)
Asymptotic properties of this estimator, derived in [7], will be presented inthe next section for the bivariate case.
3.3 Deconvolution Kernel Volatility Density
Estimation - Bivariate case
Assume that the log price evolves according to the model (3.3) and assumethat σ is a positive, strictly stationary process satisfying a certain mixingcondition and that the 2-dimensional marginal distributions of (σt, σt+1) havedensities w.r.t the Lebesgue measure on (0,∞)2. Using the multivariate ex-tension of the kernel density estimator we obtain the estimate of the bivariatevolatility density of (log σ2
t , log σ2t+1) in the point x = (x1, x2):
fBIV (x; h) =1
n
1
h1h2
n∑i=1
υh1
(x1 − logSih1
)υh2
(x2 − logSi+1
h2
), (3.8)
where υh1(x) = 12π
∫∞−∞
φK(s)φZ(s/h1)
e−isxds and υh2(x) = 12π
∫∞−∞
φK(s)φZ(s/h2)
e−isxds.
The kernel function K is chosen to be as in (3.5), this choice guaranteesa fulfilment of condition 3 formulated below. This condition is used in theo-rems for the bias and variance of the deconvolution kernel volatility densityestimator, see [7].
Condition 3. Let K be a real valued symmetric function with characteristicfunction ϕK having support on [-1,1]. We assume that
∫|K(u)|du <∞,∫K(u)du = 1,∫u2|K(u)|du <∞,
lim|u|→∞
K(u) = 0,
ϕK(1− t) = Atα + o(tα) as t ↓ 0, for some α > 0.
15
The next two theorems, see [7], give bounds for the bias and the variance ofthe bivariate deconvolution kernel density estimator. In the first theorem theprocess σ is assumed to be predictable w.r.t the filtration generated by Z andin the second theorem the volatility process is assumed to be independent ofZ.
Theorem 4. Assume that the process S is strongly mixing with coefficientαk satisfying
∞∑j=1
αβj <∞,
for some β ∈ (0, 1). Let the kernel function K satisfy the Condition 3 andlet the bivariate density of the vector (log σ2
t , log σ2t+1) be bounded and twice
continuously differentiable with bounded second order partial derivatives. As-sume that σ is a predictable process with respect to the filtration generatedby the process Z. Then, we have for the estimator of the bivariate densitydefined as in (3.8) and h→ 0
EfBIV (x) = f(x) +1
2h2
∫ ∫u2∇2f(x)uw(u)du + o(h2),
and
V arfBIV (x) = O(1
n(h2ρ−βeπ/h)2).
Theorem 5. Assume that the process S is strongly mixing with coefficientαk satisfying
∞∑j=1
αβj <∞,
for some β ∈ (0, 1). Let the kernel function K satisfy the Condition 3 andlet the bivariate density of the vector (log σ2
t , log σ2t+1) be bounded and twice
continuously differentiable with bounded second order partial derivatives. As-sume that σ and Z are independent processes. Then, the bivariate densityestimator fBIV has the same bias expansion as in the Theorem 4. For thevariance the sharper bound is obtained
V arfBIV (x) = O(1
n(h2ρeπ/h)2).
3.4 Computational considerations
In this section the computational algorithm for the bivariate deconvolutionkernel volatility density estimation is derived. The algorithm for the univari-ate deconvolution kernel volatility density estimation based on ideas from [10]
16
is also presented. Both algorithms are based on the fast Fourier transformcomputational procedures.
3.4.1 Computational algorithm for the deconvolutionkernel volatility density estimation - univariatecase
Despite the simplicity of ideas behind the kernel-based estimation proce-dures, they turned out to be quite difficult in practical implementation. Thedirect computations according to the formula for the kernel estimation (2.1)or the deconvolution kernel estimation (2.15) are highly inefficient. The ef-fective method of estimation is based on the previously noted observationthat a kernel estimate is a convolution of the data with the scaled kernel [9],then the FFT, see Appendix B, is used to perform the most time-consumingcomputations. Let FT (f) and IFT (f) denote the Fourier transform of fand the inverse Fourier transform of f correspondingly. The deconvolutiondensity estimator can be rewritten as
f(x;h) = IFT(FT(υh
)(th)u(t)
)(x),
where u(t) = 1n
∑ni=1 e
itSi is the empirical characteristic function. Since thefast Fourier transform is the algorithm of computing the discrete Fourier
transform (DFT) on a grid of points, it will be used to find FT(υh
)(th)u(t)
and to perform the IFT.The algorithm starts with the data discretization.Consider an interval [a, b] on which all data points lie. Choose M = 2r forsome integer r, this choice is explained in [9]; the density estimate will befound at M points in the interval [a, b]. Define
δ = (b− a)/M,
tk = a+ kδ,
for k = 0, 1, ...,M−1. The discretization rule is the following: if a data pointX falls in the interval [tk, tk+1], it is split into a weight n−1δ−2(tk+1 −X) attk and a weight n−1δ−2(X − tk) at tk+1.
17
The weight sequence ξk, obtained by this rule, sums up to the δ−1. On thesecond step the FFT is used to compute the following sum
Yl =1
M
M−1∑k=0
ξkei2πklM ,
where −12M ≤ l ≤ 1
2M. This sum helps to find the value of the empirical
characteristic function u(sl), where sl = 2πl(b− a)−1 :
Yl = M−1eiaslM−1∑k=0
ξkeitksl ≈M−1δ−1eiasl
1
n
n∑j=1
eislSj = eiasl(b− a)−1u(sl)
Define ζ∗l = FT (υ(hsl))Yl and let ζk = IFT (ζ∗l ), then
ζk =
l=M/2∑l=−M/2
e−2πiklM ζ∗l ≈
∑l
e−isltke−iaslFT (υ(hsl))Yl =
∑l
e−isltkFT (υ(hsl))(b−a)−1u(sl) ≈1
2π
∫e−istkFT (υ(hs))u(s)ds = f(tk;h).
3.4.2 Computational algorithm for the deconvolutionkernel volatility density estimation - bivariatecase
In this section we aim to find the estimate of the bivariate density of (log σ2t , log σ2
t+1)in the point x = (x1, x2). The estimation will be performed according to thefollowing formula
fBIV (x; h) =1
n
1
h1h2
n∑i=1
υh1
(x1 − logSih1
)υh2
(x2 − logSi+1
h2
).
18
This bivariate estimate can be seen as the convolution of the product kerneland the data. Then, as the FFT in the univariate case, the two-dimensionalFFT (2DFFT) is used to compute two dimensional DFT on the grid of points.Let 2DFT (f) and 2DIFT (f) denote the two dimensional Fourier transformof f and the two dimensional inverse Fourier transform of f correspondingly.Then the formula for the bivariate estimate can be rewritten as
fBIV (x; h) = 2DIFT(
2DFT(υh1(x1)υh2(x2)
)(th)u(t)
)(x),
where u(t) = 1n
∑nk=1 e
i(t1Sk+t2Sk+1) is the bivariate empirical characteristicfunction.As in the univariate case, the first step of the algorithm is the data discretiza-tion.Consider two intervals [a1, b1] and [a2, b2] on which all bivariate data pointslie. Choose M = r2 where r denotes the number of discretization points ineach direction. The density estimate will be found on the grid of [
√M×√M ]
points. Defineδ1 = (b1 − a1)/(
√M − 1),
tk1 = a1 + k1δ1,
δ2 = (b2 − a2)/(√M − 1),
tk2 = a2 + k2δ2.
We will use the Linear Binning rule, introduced in [12] and presented in thefollowing picture, to find the matrix of weights ξ.
Note that the sum of all components of the weight matrix ξ defined by thisrule is equal to n.
19
Now, for −√M/2 ≤ l1 ≤
√M/2 and −
√M/2 ≤ l2 ≤
√M/2 define Yl1l2 as
Yl1l2 = (M)−1
√M−1∑k1=0
√M−1∑k2=0
ξk1k2ei2πk1l1√M−1 e
i2πk2l2√M−1 .
This expression can be found by the 2DFFT. Define sl1 = 2πl1(b1 − a1)−1
and sl2 = 2πl2(b2−a2)−1. Now we can find an approximation of the bivariate
empirical characteristic function
u(sl1 , sl2) = n−1
n∑k=1
ei(sl1Sk+sl2Sk+1) ≈ n−1
√M−1∑k2=0
√M−1∑k1=0
ξk1k2ei((a1+k1δ1)sl1+(a2+k2δ2)sl2 ) =
√M−1∑k2=0
√M−1∑k1=0
ξk1k2ei2πl1k1√M−1 e
i2πl2k2√M−1 = Mn−1ei(a1sl1+a2sl2 )Yl1l2 .
Define ζ∗l1l2 = 2DDFT(υh1(tk1)υh2(tk2)
)(hsl)Yl1l2 and let ζk1k2 be the I2DDFT
of ζl1l2 , where 2DDFT and I2DDFT denote the two dimensional discreteFourier transform and its inverse version. Using the similar reasoning as inthe univariate case
ζk1k2 =
√M/2∑
l1=−√M/2
√M/2∑
l2=−√M/2
e− i2πl1k1√
M−1 e− i2πl2k2√
M−1 ζ∗l1l2 =
∑l1
∑l2
e−i(tk1sl1+tk2sl2 )2DFT(υh1(tk1)υh2(tk2)
)(hsl)e
i(sl1a1+sl2a2)Yl1l2
=n
MfBIV (tk1 , tk2).
3.5 Simulation results
This section contains simulations of the bivariate volatility density estimates.We are going to check whether these simulations are able to detect volatilityclustering. Three different situations will be considered and two differenttypes of models will be used.
3.5.1 The bivariate deconvolution kernel density esti-mation of the volatility simulated as the GARCH(1, 1)process
The Generalized Autoregressive Conditionally Heteroscedastic model (GARCH)model was firstly introduced by Bollerslev in 1986. It was specially invented
20
to capture the clustering feature of the volatility process [5]. In this thesisthe GARCH(1,1) model is used for the volatility modeling. Along with themodel for the log return process rt, the model is given by
rt = σtZt,
σ2t = α0 + α1r
2t−1 + β1σ
2t−1,
where σt is independent of Zt for each t, Zt ∼ N(0, 1).Simulations were made for the following parameters of the GARCH(1,1)model: α0 = 1, α1 = 0.7, β1 = 0.2. The number of observations N is
1000, the bandwidth matrix H equals
(0.4 00 0.4
), and the kernel function
(3.5) was used.The bivariate density estimate of the log σ2 based on direct observations ofthe volatility process is on the left part of Figure 1 and the bivariate decon-volution kernel volatility density estimation is presented on the right part
Figure 1
Figure 2 shows the contour plot of the deconvolution kernel volatility densityestimate and it indeed illustrates the volatility clustering in the sense thatsmall changes in a price of an asset tend to be followed by small changesand large changes tend to be followed by large changes. Which means thatthe concentration of the bivariate volatility density should be around thediagonal and this is what is observed in Figure 2.
21
Figure 2
3.5.2 The bivariate deconvolution kernel density esti-mation of the volatility simulated as the mixtureof two GARCH(1, 1) processes
In this section simulations were made for the mixture of two GARCH(1,1)processes. In an economical sense it means that a market switches betweentwo clearly distinguished regimes. The volatility of the first state is modelledby the GARCH(1,1) multiplied by 0.1 and with parameters: α0 = 1, α1 =0.2, β1 = 0.7. For the volatility of the second state the GARCH(1,1) withparameters: α0 = 2, α1 = 0.2, β1 = 0.7 was used. At each point of time aswitch occurs with the probability p = 0.5. The number of observations N is
3000, the bandwidth matrix H equals
(0.4 00 0.4
), and the kernel function
(3.5) was used.The bivariate density estimate of log σ2 based on direct observations of thevolatility process is in the left part of Figure 3 and the bivariate deconvolutionkernel volatility density estimation is presented in the right part
22
Figure 3
The presence of four bumps on the picture reveals another type of clustering,different from that illustrated in the previous example. The reason is thatin these simulations we forced market to switch between periods of high andlow volatility, which is clearly seen in Figure 3. And of course, because theGARCH model was used, the clustering described in the previous section canbe seen here as well.
3.5.3 The bivariate deconvolution kernel density esti-mation of the volatility simulated according tothe De Vries model
This section contains simulations while the volatility evolves according themodel described in [11]
rt = σtZt,
σ2t =
1
Z2t−1
+ ασ2t−1,
where rt is log return process, Zt ∼ N(0, 1) and 0 ≤ α < 1. Note that giventhe initial value, volatility is modeled independently of the price of an asset.Simulations were made for the following parameters : α = 0.5, the number of
observations N is 3000, the bandwidth matrix H equals
(0.4 00 0.4
), and
23
the kernel function (3.5) was used.The bivariate density estimate of log σ2 based on direct observations of thevolatility process is on the left part of Figure 4 and the bivariate deconvolu-tion kernel volatility density estimation is presented in the right part
Figure 4
The Figure 5 shows the contour plot of the deconvolution kernel volatilitydensity estimate.
Figure 5
24
As the GARCH simulations, this simulation also illustrates the volatilityclustering. Even though the shape of the contour plot is slightly different,the allocation of the density values around the diagonal is still observed.
25
Chapter 4
Nonparametric estimation ofloss distributions
The loss distribution is the probability distribution of the amount to bepaid to the insured for damage [2]. In practice, extreme losses are rarelyobserved, but if present, they can have a significant influence on the profit ofthe company. So particular attention must be paid to the tails of estimatedloss distributions. It must be noted that there is no single parametric modelfor the estimation of such distributions. In practice a threshold between largeand small losses is determined first and then two different parametric modelsare used for estimation. It is observed that loss distributions for small andmedium claims behave similar to the log normal distribution, while in thetail they converge to a Pareto distribution. The kernel density estimatorbased on the modified Champernowne cdf transformation introduced in [4]is discussed in the next section. It gives a unified approach to the estimationof small and large losses and particular attention is paid to the tail.
4.1 Champernowne transformation of the in-
surance data
Since the ordinary kernel density estimation of loss distributions does notgive satisfactory results, transformed kernel methods need to be used [2].This problem was studied in [4] where the modified Champernowne cdf wasintroduced and used as a transformation in the estimation procedure.The original Champernowne distribution with the parameters α,M and λhas a density
dα,M,c,λ(x) =c
x(1/2(x/M)−α + λ+ 1/2(x/M)α), x ≥ 0.
26
The density with λ = 1 and c = α/2 is simply called the Champernownedistribution
dα,M,c(x) =αMαxα−1
(xα +Mα)2, x ≥ 0.
This distribution converges to a Pareto distribution in the tail while it be-haves like a lognormal distribution near 0 when α > 1. The Champernownedistribution with α = 2 and M = 3 is displayed in Figure 6.
Figure 6
The only problem why the Champernowne distribution can not be chosen asa transformation is that it is not defined in 0 unless α ≥ 1. This problem issolved with the help of the modified Champernowne distribution.The modified Champernowne cdf is defined for x ≥ 0 and has the form
Tα,M,c(x) =(x+ c)α − cα
(x+ c)α + (M + c)α − 2cα, (4.1)
where α > 0,M > 0 and c ≥ 0. Its density is given by
tα,M,c(x) =α(x+ c)α−1((M + c)α − cα)
((x+ c)α + (M + c)α − 2cα)2, x ≥ 0.
This distribution also converges to a Pareto distribution in the tail
tα,M,c(x) ≈α(
((M + c)α − cα)1/α)α
xα+1,
27
as x→∞.Moreover, for positive c it has a positive finite density in 0.Figure 7 contains pictures of cdf and density of the modified Champernownedistribution for three sets of parameters α,M and c.
Figure 7
The cdf is on the left part of the figure and the density is on the right. Theparameter M is equal to 10 in all cases while α takes values 0.5, 0 and 5
28
changing from the top to the bottom of the graph. Solid and dashed linesrelate to the calculations with c equal to 2 and 0, correspondingly. Note thatwhen c is equal to 0, α does not affect the result.The parameters of the transformation are estimated from the data. The firstparameter to be estimated is M and it is estimated as an empirical medianof the data set. The parameters α and c maximize the log likelihood functionfor fixed M .
l(α, c) = n log(α) + n log((M + c)α − cα) + (α− 1)n∑i=1
log(Xi + c)
−2n∑i=1
log((Xi + c)α + (M + c)α − 2cα).
After the parameters of the transformation are computed, the loss distribu-tions are estimated.
4.2 Estimation of loss distributions
Let X1, X2, ...Xn be the observed loss data having density fX . We apply thetransformation defined by (4.1) with estimated parameters (a,M, c) to thedata set Xn
Yi = Ta,M ,c(Xi),
for i = 1, ..., n. This data set is now close to the uniform distribution on(0, 1) which is easy to estimate with the kernel density estimator given by
ftrans(y) = n−1ky−1
n∑i=1
Kh(y − Yi),
where Kh(x) = 1hK(x
h) is the scaled kernel, h > 0 is the smoothing parameter
and ky =∫ min(1,(1−y)/h)max(−1,−y/h) K(u)du is the boundary correction. The density of
the original data set X1, ..., Xn can be found by
fX(x) =ftrans(Ta,M ,c(x))
| (T−1
a,M ,c)′(Ta,M ,c(x)) |
,
with the explicit expression of the estimator
fX(x) = n−1k−1
a,M ,c
n∑i=1
Kh(Ta,M ,c(x)− Ta,M ,c(Xi))T′
a,M ,c(x).
Note that Theorem 1 can not be directly applied to this type of estimatorssince here the transformation T is random, i.e it is adjusted to the data. Thisis an interesting problem for further research.
29
Chapter 5
Conclusions
In this thesis, two applications of nonparametric estimation were discussed.Kernel based density estimation techniques were applied to the volatility pro-cess in finance and the loss process in insurance.The bivariate density estimation of the volatility process was performed withthe deconvolution kernel volatility density estimator and a computational al-gorithm based on the FFT was developed. Simulation studies showed thatit is possible to illustrate the volatility clustering with the bivariate densityof (σt, σt+1) where σ is the volatility process. Three simulation studies wereperformed. Firstly, the GARCH(1,1) model was used to model volatility andthe shape of the contour plot of the bivariate density showed that the densityvalues are concentrated around the diagonal. This is indeed an illustrationthat the GARCH model was invented to capture volatility clustering phe-nomena. In the second study, a mixture of two GARCH(1,1) models wasused, so the process changed between regimes of high and low volatility. Thebivariate density in this case was concentrated around four regions. This factindicated a, different from the previous example, type of clustering, whichcan be called regime clustering. The third simulation study based on the DeVries model gave similar results to the first case, which is understandable.For the estimation of loss distributions the transformed kernel density estima-tion based on the modified Champernowne cdf was discussed. This approachgives clear instructions of how to estimate loss distributions based on dataand without any parametric restrictions. It is also clear that further researchis needed for this model.
30
Appendix A
Notation
Notation concerning sequences
• an = O(bn) as n→∞,iff limn→∞ sup | an/bn |<∞,
• an = o(bn) as n→∞,iff limn→∞ | an/bn |= 0.
Notation concerning the volatility density estimator
• fZ(x) = 1√2πex/2e−e
x/2, the error density,
• Γ(z) =∫∞
0tz−1e−t, gamma function.
Notation concerning matrices
• Hf (x) is the d × d matrix with (i, j) entry equal to ∂2
(∂xi∂xj)f(x), the
Hessian matrix,
• tr(A) is the trace of matrix A,
• I is the identity matrix,
• |H|1/2 is a square root of a matrix.
31
Appendix B
The Fourier transform
The Fourier transform (FT) of the function f(t) is defined as in [3]
FT (f)(s) =
∫ +∞
−∞f(t)e−i2πstdt
The inverse Fourier transform (IFT) of the function g(t) is defined as thefollowing:
IFT (g)(t) =
∫ +∞
−∞g(h)ei2πhtdh
It should be noted that there exist several equivalent definitions of the FT,but definitions used in this thesis were presented above. Properties of theFourier transform used in this thesis are :
• The inverse Fourier transform and the Fourier transform create a Fouriertransform pair, which means that f(t) = IFT (FT (f)(s))(t)
• Time and Frequency scaling. Assume that FT (f)(s) = F (s), thenThe Fourier Transform of the scaled function f(ks) is equal to 1
kF (h
k)
hence the pair(f(ks), 1
kF (h
k))
creates a Fourier transform pair.
• Time convolution theorem. Assume that the Fourier Transformsof functions f(s) and x(s) are FT (f)(h) = F (h) and FT (x)(h) =
X(h), then the convolution of this functions(f ∗x
)(s) has the Fourier
transform F (h)X(h) hence(h ∗ x
)(s) and F (h)X(h) create a Fourier
transform pair.
Computations of the Fourier transform are usually based on the concept ofthe Discrete Fourier Transform (DFT). The DFT of the function f(t) defined
32
on the sequence of points (sk)
DFT (f)(sk) =N−1∑i=1
f(ti)e−i2πskti(ti+1 − ti) (B.1)
where k = 0, 1, ..., N − 1.This transform must be used if there is no closed-form Fourier Transformsolution. However if we look carefully at the expression we will see that ifthere are N data points of the function f and if we want to determine theexpression for N separate frequencies sk, then the computation time is oforder N2. This means that even with modern high-speed computers, timeneeded to compute such sums would be very high for large N .The fast algorithm to compute such sums was developed in 1965 by Cooleyand Tukey known as the ”fast Fourier transform”. The computational al-gorithm Fast Fourier transform (FFT) helps to reduce the computing timeof the sums of type (B.1) to a time proportional to Nlog(N). In this thesisthe two-dimensional Fourier transform is mostly used because we are dealingwith the bivariate density estimate. The two-dimensional Fourier transform(2DFT) is defined as
2DFT (f)(u, v) =
∫ +∞
−∞
∫ +∞
−∞f(x, y)e−i2π(ux+vy)dxdy
with the two-dimensional inverse Fourier transform (2DIFT) defined analo-gous to one-dimensional case.
2DIFT (g)(x, y) =
∫ +∞
−∞
∫ +∞
−∞g(u, v)ei2π(ux+vy)dudv
Note, that the 2DFT can be seen as two consecutive one-dimensional FT’swhich we will illustrate in the definition of the two-dimensional discreteFourier transform (2DDFT). Assume that the two dimensional function h(x, y)was has been sampled with sample intervals Tx and Ty for dimensions x andy correspondingly.
2DDFT (h)(n/NTx,m/MTy) =M−1∑q=0
[N−1∑p=0
h(pTx, qTy)e−i2πnp/N
]e−i2πmq/M
where p, n ∈ {0, 1, ..., N − 1} and q,m ∈ {0, 1, ...,M − 1}.The two dimensional fast Fourier transform (2DFFT) procedures were de-veloped for the computation of the 2DDFT which allowed to save the com-putational time of such sums from O(N2M2) to O(NM logNM) [3].
33
Appendix C
R-code
C.1 The bivariate deconvolution kernel den-
sity estimation of the volatility simulated
as the GARCH(1, 1) process
fi<- function(t=real){
fi<- (1/sqrt(pi))*2^{1i*t}*cgamma(0.5 + t*1i)
}
fiker<-function(t=real, h=real){
if ((h*t>=-1)&(h*t<1))
{fiker<-(1-(h*t)^2)^3} else {fiker<-0}
}
Garch_bivariate<- function(N1=integer, h=vector){
N<-N1-1
z1<-array(0, dim=c(N1,1))
z<-array(0, dim=c(N,2))
z1<-garch.sim(alpha = c(1,0.7), beta=0.2, n=N1)
for (i in 1:N)
z[i,1]<-log((z1[i])^2)
for (j in 2:N1)
z[j-1,2]<-log((z1[j])^2)
a<-min (log(z1[i]^2))-1
b<-max(log(z1[i]^2))+1
34
a1<-min(z[,1])-1
b1<-max(z[,1])+1
a2<-min(z[,2])-1
b2<-max(z[,2])+1
# number of grid points
M<-32^2
#size of the grid
M1<-sqrt(M)
delta<-(b-a)/(M1-1)
delta1<-(b1-a1)/(M1-1)
delta2<-(b2-a2)/(M1-1)
#array which contains numbers of indexes
n <- array(0, dim=c(N,2))
for ( i in 1:N)
{
n[i,1]<-floor((z[i,1]-a1)/delta1)
n[i,2]<-floor((z[i,2]-a2)/delta2)
}
# finding the sequence of weights
ksi<- array(0, dim=c(M,1))
S<- delta1*delta2
S1<-array(0, dim = c(N,1))
S2<-array(0, dim = c(N,1))
S3<-array(0, dim = c(N,1))
S4<-array(0, dim=c(N,1))
for ( i in 1:N)
{
n[i,1]<-floor((z[i,1]-a1)/delta1)
n[i,2]<-floor((z[i,2]-a2)/delta2)
x<- a1 + n[i,1]*delta1
y<- a2 + n[i,2]*delta2
x1<- x + delta1
y1<-y+delta2
35
S1[i]<- abs(((z[i,1]-x)*(y1-z[i,2])))
S2[i]<-abs(((y1-z[i,2])*(x1-z[i,1])))
S3[i]<- abs(((z[i,1]-x)*(z[i,2]-y)))
S4[i]<- S-(S1[i]+S2[i]+S3[i])
j<-n[i,1]+(n[i,2]-1)*M1
ksi[j]<-ksi[j]+S2[i]/S
ksi[j+1]<-ksi[j+1]+ S1[i]/S
k<-n[i,1]+n[i,2]*M1
ksi[k]<-ksi[k]+S4[i]/S
ksi[k+1]<-ksi[k+1]+S3[i]/S
}
ksi_a<-array(0, dim = c( sqrt(M), sqrt(M)))
for ( j in 1:sqrt(M))
{
k<-1
for ( i in (sqrt(M)*(j-1)+1):(sqrt(M)*j))
{
ksi_a[k,j]<-ksi[i]
k<-k+1
}
}
# computation of Y_l
y_l <- array(0, dim = c(sqrt(M)+1,sqrt(M)+1))
y_y<- array(0, dim = c(sqrt(M),sqrt(M)))
psi<-array(0, dim = c( sqrt(M), sqrt(M)))
for ( j in 1:sqrt(M))
for ( i in 1:sqrt(M))
psi[i,j]<-(1/N)*ksi_a[i,j]*exp(-pi*1i*(j+i-2))
y_l<-fft(psi, inverse=TRUE)/M
#check\empirical characteristic function
for ( j in 1:(sqrt(M)))
for ( i in 1:(sqrt(M)))
36
y_y[i,j]<-y_l[i,j]*exp(1i*a1*(-pi*M1*(1/(b1-a1))+
(i-1)*2*pi*(1/(b1-a1)))+
a2*1i*(-pi*M1*(1/(b2-a2))+ (j-1)*2*pi*(1/(b2-a2))))
xi<-array(0, dim = c(sqrt(M),sqrt(M)))
zi<-array(0, dim = c(sqrt(M),sqrt(M)))
for ( j in 1:(sqrt(M)))
for ( i in 1:(sqrt(M)))
xi[i,j]<-((y_l[i,j]*fiker(-pi*M1*(1/(b1-a1))+
(i-1)*2*pi*(1/(b1-a1)),h[1])*
fiker(-pi*M1*(1/(b2-a2))+
(j-1)*2*pi*(1/(b2-a2)),h[2]))/(fi(-pi*M1*(1/(b1-a1))
+ (i-1)*2*pi*(1/(b1-a1)))*
fi(-pi*M1*(1/(b2-a2))+ (j-1)*2*pi*(1/(b2-a2)))))
zi<-fft(xi, inverse=FALSE)
#########################################################
#estimation of the density
################################
mul<-array(0, dim= c(sqrt(M),sqrt(M)))
for ( j in 1: sqrt(M))
for ( i in 1:sqrt(M))
mul[i,j]<- exp(pi*1i*(i-1+j-1))*zi[i,j]
x<-seq(a1,b1,delta1)
y<-seq(a2,b2,delta2)
open3d()
persp3d(x,y, Re(mul),col = "lightblue")
contour(Re(mul))
#estimation based on direct observations
straight(N1,h)
}
C.2 The bivariate deconvolution kernel den-
sity estimation of the volatility simulated
as the mixture of two GARCH(1, 1) pro-
cesses
Garch_switch<- function(N1=integer, h=vector){
N<-N1-1
37
z1<-array(0, dim=c(N1,1))
z0<-array(0, dim=c(N1,1))
z<-array(0, dim=c(N,2))
z0<-garch.sim(alpha = c(1,0.2), beta=0.7, n=N1)
z1<-0.1*garch.sim(alpha = c(1,0.2), beta=0.7, n=N1)
z2<-array(0, dim=c(N1,1))
z2<-garch.sim(alpha = c(2,0.2), beta=0.7, n=N1)
qmatr<-matrix(c(0.5,0.5,0.5,0.5),2,2)
sim.df <- data.frame(state1 = z1, state2 = z2, time = seq(1, N1, 1))
d1<-simmulti.msm(sim.df, qmatrix=qmatr)
z8<-c(1:N1)
for ( i in 1:N1)
{
if (d1[i,3]==1)
(z8[i]<- d1[i,4])
else (z8[i]<- d1[i,5])
}
z<-array(0, dim=c(N,2))
for (i in 1:N)
z[i,1]<-log((z8[i])^2)
for (j in 2:N1)
z[j-1,2]<-log((z8[j])^2)
a1<-min(z[,1])-1
b1<-max(z[,1])+1
a2<-min(z[,2])-1
b2<-max(z[,2])+1
# number of grid points
M<-32^2
#size of the grid
M1<-sqrt(M)
delta1<-(b1-a1)/(M1-1)
delta2<-(b2-a2)/(M1-1)
#array which contains numbers of indexes where z is
n <- array(0, dim=c(N,2))
for ( i in 1:N)
{
38
n[i,1]<-floor((z[i,1]-a1)/delta1)
n[i,2]<-floor((z[i,2]-a2)/delta2)
}
# finding the sequence of weights
ksi<- array(0, dim=c(M,1))
S<- delta1*delta2
S1<-array(0, dim = c(N,1))
S2<-array(0, dim = c(N,1))
S3<-array(0, dim = c(N,1))
S4<-array(0, dim=c(N,1))
for ( i in 1:N)
{
n[i,1]<-floor((z[i,1]-a1)/delta1)
n[i,2]<-floor((z[i,2]-a2)/delta2)
x<- a1 + n[i,1]*delta1
y<- a2 + n[i,2]*delta2
x1<- x + delta1
y1<-y+delta2
S1[i]<- abs(((z[i,1]-x)*(y1-z[i,2])))
S2[i]<-abs(((y1-z[i,2])*(x1-z[i,1])))
S3[i]<- abs(((z[i,1]-x)*(z[i,2]-y)))
S4[i]<- S-(S1[i]+S2[i]+S3[i])
j<-n[i,1]+(n[i,2]-1)*M1
ksi[j]<-ksi[j]+S2[i]/S
ksi[j+1]<-ksi[j+1]+ S1[i]/S
k<-n[i,1]+n[i,2]*M1
ksi[k]<-ksi[k]+S4[i]/S
ksi[k+1]<-ksi[k+1]+S3[i]/S
}
ksi_a<-array(0, dim = c( sqrt(M), sqrt(M)))
for ( j in 1:sqrt(M))
{
k<-1
for ( i in (sqrt(M)*(j-1)+1):(sqrt(M)*j))
39
{
ksi_a[k,j]<-(1/N)*ksi[i]
k<-k+1
}
}
# computation of Y_l
y_l <- array(0, dim = c(sqrt(M)+1,sqrt(M)+1))
y_y<- array(0, dim = c(sqrt(M),sqrt(M)))
psi<-array(0, dim = c( sqrt(M), sqrt(M)))
for ( j in 1:sqrt(M))
for ( i in 1:sqrt(M))
psi[i,j]<-ksi_a[i,j]*exp(-pi*1i*(j+i-2))
y_l<-fft(psi, inverse=TRUE)/M
#check\empirical characteristic function
for ( j in 1:(sqrt(M)))
for ( i in 1:(sqrt(M)))
y_y[i,j]<-y_l[i,j]*exp(1i*a1*(-pi*M1*(1/(b1-a1))
+ (i-1)*2*pi*(1/(b1-a1)))+
a2*1i*(-pi*M1*(1/(b2-a2))+ (j-1)*2*pi*(1/(b2-a2))))
xi<-array(0, dim = c(sqrt(M),sqrt(M)))
zi<-array(0, dim = c(sqrt(M),sqrt(M)))
for ( j in 1:(sqrt(M)))
for ( i in 1:(sqrt(M)))
xi[i,j]<-((y_l[i,j]*fiker(-pi*M1*(1/(b1-a1))
+ (i-1)*2*pi*(1/(b1-a1)),h[1])*
fiker(-pi*M1*(1/(b2-a2))+
(j-1)*2*pi*(1/(b2-a2)),h[2]))/(fi(-pi*M1*(1/(b1-a1))
+ (i-1)*2*pi*(1/(b1-a1)))*
fi(-pi*M1*(1/(b2-a2))+ (j-1)*2*pi*(1/(b2-a2)))))
zi<-fft(xi, inverse=FALSE)
mul<-array(0, dim= c(sqrt(M),sqrt(M)))
for ( j in 1: sqrt(M))
40
for ( i in 1:sqrt(M))
mul[i,j]<- exp(pi*1i*(i-1+j-1))*zi[i,j]
x<-seq(a1,b1,delta1)
y<-seq(a2,b2,delta2)
open3d()
persp3d(x,y, Re(mul),col = "lightblue")
contour(Re(mul))
#estimation based on direct observations
straight(N1,h)
}
C.3 The bivariate deconvolution kernel den-
sity estimation of the volatility simulated
according to the De Vries model
deVries_bivariate<- function(N1=integer, h=vector){
N<-N1-1
sigma<-c(1:N1)
sigma0<-0.2
alpha<-0.5
noise0<-1/(rnorm(1)^2)
sigma[1]<-noise0 + alpha*(sigma0)^2
noise<-c(1:N1)
for ( i in 1:N1)
noise[i]<-1/(rnorm(1))^2
for ( i in 2:N1)
{
sigma[i]<- sqrt(noise[i-1] + alpha*(sigma[i-1])^2)
}
#simulation of the error
met<-c(1:N1)
met<-rnorm(N1)
err<-c(1:N1)
err<-log(met^2)
z<-array(0,dim = c(N,2))
41
for (i in 1:N)
{
z[i,1]<-log(sigma[i])+err[i]
z[i,2]<-log(sigma[i+1]) + err[i+1]
}
a<-min(log(sigma[i]))-1
b<-max(log(sigma[i]))+1
a1<-min(z[,1])-1
b1<-max(z[,1])+1
a2<-min(z[,2])-1
b2<-max(z[,2])+1
# number of grid points
M<-32^2
#size of the grid
M1<-sqrt(M)
delta<-(b-a)/(M1-1)
delta1<-(b1-a1)/(M1-1)
delta2<-(b2-a2)/(M1-1)
#array which contains numbers of indexes where z is
n <- array(0, dim=c(N,2))
for ( i in 1:N)
{
n[i,1]<-floor((z[i,1]-a1)/delta1)
n[i,2]<-floor((z[i,2]-a2)/delta2)
}
# finding the sequence of weights
ksi<- array(0, dim=c(M,1))
S<- delta1*delta2
S1<-array(0, dim = c(N,1))
S2<-array(0, dim = c(N,1))
S3<-array(0, dim = c(N,1))
S4<-array(0, dim=c(N,1))
for ( i in 1:N)
{
42
n[i,1]<-floor((z[i,1]-a1)/delta1)
n[i,2]<-floor((z[i,2]-a2)/delta2)
x<- a1 + n[i,1]*delta1
y<- a2 + n[i,2]*delta2
x1<- x + delta1
y1<-y+delta2
S1[i]<- abs(((z[i,1]-x)*(y1-z[i,2])))
S2[i]<-abs(((y1-z[i,2])*(x1-z[i,1])))
S3[i]<- abs(((z[i,1]-x)*(z[i,2]-y)))
S4[i]<- S-(S1[i]+S2[i]+S3[i])
j<-n[i,1]+(n[i,2]-1)*M1
ksi[j]<-ksi[j]+S2[i]/S
ksi[j+1]<-ksi[j+1]+ S1[i]/S
k<-n[i,1]+n[i,2]*M1
ksi[k]<-ksi[k]+S4[i]/S
ksi[k+1]<-ksi[k+1]+S3[i]/S
}
ksi_a<-array(0, dim = c( sqrt(M), sqrt(M)))
for ( j in 1:sqrt(M))
{
k<-1
for ( i in (sqrt(M)*(j-1)+1):(sqrt(M)*j))
{
ksi_a[k,j]<-(1/N)*ksi[i]
k<-k+1
}
}
# computation of Y_l
y_l <- array(0, dim = c(sqrt(M)+1,sqrt(M)+1))
y_y<- array(0, dim = c(sqrt(M),sqrt(M)))
psi<-array(0, dim = c( sqrt(M), sqrt(M)))
for ( j in 1:sqrt(M))
43
for ( i in 1:sqrt(M))
psi[i,j]<-ksi_a[i,j]*exp(-pi*1i*(j+i-2))
y_l<-fft(psi, inverse=TRUE)/M
#check\empirical characteristic function
for ( j in 1:(sqrt(M)))
for ( i in 1:(sqrt(M)))
y_y[i,j]<-y_l[i,j]*exp(1i*a1*(-pi*M1*(1/(b1-a1))
+ (i-1)*2*pi*(1/(b1-a1)))+
a2*1i*(-pi*M1*(1/(b2-a2))+ (j-1)*2*pi*(1/(b2-a2))))
xi<-array(0, dim = c(sqrt(M),sqrt(M)))
zi<-array(0, dim = c(sqrt(M),sqrt(M)))
for ( j in 1:(sqrt(M)))
for ( i in 1:(sqrt(M)))
xi[i,j]<-((y_l[i,j]*fiker(-pi*M1*(1/(b1-a1))
+ (i-1)*2*pi*(1/(b1-a1)),h[1])*
fiker(-pi*M1*(1/(b2-a2))
+ (j-1)*2*pi*(1/(b2-a2)),h[2]))/(fi(-pi*M1*(1/(b1-a1))
+ (i-1)*2*pi*(1/(b1-a1)))*
fi(-pi*M1*(1/(b2-a2))+ (j-1)*2*pi*(1/(b2-a2)))))
zi<-fft(xi, inverse=FALSE)
#estimation of the density
################################
mul<-array(0, dim= c(sqrt(M),sqrt(M)))
for ( j in 1: sqrt(M))
for ( i in 1:sqrt(M))
mul[i,j]<- exp(pi*1i*(i-1+j-1))*zi[i,j]
x<-seq(a1,b1,delta1)
y<-seq(a1,b1,delta1)
open3d()
persp3d(x,y,ksi_a, col="lightblue")
x<-seq(a1,b1,delta1)
y<-seq(a2,b2,delta2)
open3d()
persp3d(x,y, Re(mul),col = "lightblue")
contour(Re(mul))
44
#estimation based on direct observations
straight(N1,h)
}
45
Bibliography
[1] Bain, L.J., Engelhardt, M. Introduction to probability and mathematicalstatistics. Second edition, Duxbury, 1992.
[2] Bolance, C., Guillen, M. and Nielsen, J.P., Kernel density estimationof actuarial loss functions, Insurance: Mathematics and Economics, 32,19-36, 2003.
[3] Brigham, E.O., The fast Fourier transform and its applications,Prentice-Hall signal processing series, 1988.
[4] Buch-Larsen, T., Nielsen, J.P., Guillen, M. and Bolance, C., Kernel den-sity estimation for heavy-tailed distributions using the Champernownetransformation, Statistics, Vol.39, No. 6, 502-518, 2005.
[5] Cont, R., Volatility clustering in financial markets: empirical facts andagent-based models, appeared in Teyssiere, G., Kirman, A.P (eds.) :Long memory in economics, Springer, 2005.
[6] Van Es, B., Spreij, P., Van Zanten, H., Nonparametric volatility densityestimation, Bernoulli 9(3), 451-465, 2003.
[7] Van Es, B., Spreij, P., Van Zanten, H., Nonparametric volatility densityestimation for discrete time models, Journal of Nonparametric Statistics,17:2, 237-249, 2005.
[8] Hull, J.C., Options, futures, and other derivatives. Sixth edition, PearsonInternational Edition, 2006.
[9] Silverman, B.W., Kernel density estimation using the fast Fourier tran-formation, Journal of the Royal Statistical Society. Series C (AppliedStatistics), Vol. 31, No.1, pp. 93-99, 1982.
[10] Silverman, B.W., Density estimation for statistics and data analysis,London:Chapman and Hall, 1986.
46
[11] De Vries, C.G., On the relation between GARCH and stable process,Journal of Econometrics 48, 313-324, 1991.
[12] Wand, M.P., Fast computation of multivariate kernel estimators, Journalof Computational and Graphical Statistics, Vol. 3, No.4, pp. 433-445,1994.
[13] Wand, M.P and Jones, M.C., Kernel Smoothing, London:Chapman andHall, 1995.
47