nonlinear time seriesmason.gmu.edu/~jgentle/csi779/14s/l08_chapter4_14s.pdfnonlinear time series...
TRANSCRIPT
![Page 1: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/1.jpg)
Nonlinear Time Series
Recall that a linear time series {Xt} is one that follows the rela-
tion,
Xt = µ+∞∑
i=0
ψiAt−i,
where {At} is iid with mean 0 and finite variance.
A linear time series is stationary if∑∞i=0ψ
2i <∞.
A time series that cannot be put in this form is nonlinear.
1
![Page 2: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/2.jpg)
Tests for Nonlinearity
What kinds of statistics would be useful in testing for nonlinear-
ity?
Null hypothesis: the data follow a linear time series model.
One approach would be to fit some kind of general linear model
to the data, and then use some statistic computed from the
residuals.
Another approach would be based on comparisons of transforms
with known properties of transforms of data following the hy-
pothesized model.
For tests regarding time series specifically (and maybe a few
other types of data) the transform could be into the frequency
domain.
A different approach would be to specify an alternative hypoth-
esis and test against it specifically.
2
![Page 3: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/3.jpg)
Tests for Autocorrelations of Squared Residuals
Residuals of what?
An ARMA(p, q) model is a pretty good approximation for linear
time series.
Before attempting to fit an ARMA model, we should do some
exploratory analyses to make sure we’re even in the right ballpark.
Are there any obvious departures from stationarity?
Trends? Would differencing help?
When it appears that we may have a stationary process, we go
through the usual motions to fit an ARMA(p, q) model.
Is the model a good fit?
What could go wrong?
There may be an ARCH effect.
3
![Page 4: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/4.jpg)
Tests for Autocorrelations of Squared Residuals
The “ARCH effect” arises from autocorrelations of squared resid-
uals from an ARMA model.
The simplest test for autocorrelations is based on the asymptotic
normality of ρ̂(h) under the null hypothesis of 0 autocorrelation
at lag h.
(Recall that the test would be a t test, where the denominator
is√
√
√
√
√
1 + 2h−1∑
i=1
ρ̂2(i)
/n.
The denominator is not obvious.)
4
![Page 5: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/5.jpg)
Tests for Autocorrelations of Squared Residuals
Of course, if ρ̂(h) is asymptotically normal, then ρ̂2(h) prop-
erly normalized is asymptotically chi-squared, and if ρ̂2(i) for
i = 1, . . . ,m are independent then the sum of them, each prop-
erly normalized, is asymptotically chi-squared with m degrees of
freedom.
These facts led to the Q∗(m) portmanteau test of Box and
Pierce, and then led to the modified portmanteau test of Ljung
and Box, using the statistic
Q(m) = n(n+ 2)m∑
i=1
ρ̂2(i)(n− i).
This is asymptotically chi-squared with m degrees of freedom.
5
![Page 6: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/6.jpg)
Tests for Autocorrelations of Squared Residuals
As we have seen, the Q test applied to squared residuals can be
used to detect an ARCH effect, as suggested by McLeod and Li.
We choose a value of m.
So this is one test for nonlinearity.
A related test is the F test suggested by Engle. This is the usual
F test of
H0 : β1 = · · · = βm = 0
in the linear regression model
a2t = β0 + β1a2t−1 + · · · + βma
2t−m,
where the ai are the residuals from the fitted ARMA model.
6
![Page 7: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/7.jpg)
Stationary Processes in the Frequency Domain
Time series models of the form
Xt = f(Xt−1, Xt−2, . . . , At, At−1, . . . , β),
are said to be represented in the “time domain”.
Representations of a time series as a composition of periodic
behaviors are said to be in the “frequency domain”.
Processes with strong periodic behavior and periodic processes
with a small number of periodicities (audio signals, for example)
are usually modeled better in the frequency domain than in the
time domain.
Financial time series are best analyzed in the time domain.
Stationary processes (and of course, there aren’t many of those
in financial time series!) have an important relationship between
a time-domain measure and a frequency-domain function.
7
![Page 8: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/8.jpg)
Spectral Representation of the ACVF
If we have a stationary process with autocovariance γ(h), then
there exists a unique monotonically increasing function F (ω) on
the closed interval [−1/2,1/2], such that F (−1/2) = 0, and
F (1/2) = γ(0) and
γ(h) =∫ 1/2
−1/2e2πiωhdF (ω)
The function F (ω) is called the spectral distribution function.
The proof of this theorem, “the spectral representation theo-
rem”, is available in many books, but we will not prove it in this
class.
Note that my notation for Fourier transforms may differ slightly
from that in Tsay; the difference is whether or not frequencies
are in radians or in π radians.
8
![Page 9: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/9.jpg)
Spectral Density
The derivative of the spectral distribution function F (ω), which
we write as f(ω) is a measure of the intensity of any periodic
component at the frequency ω.
We call f(ω) the spectral density.
The ACVF is essentially the Fourier transform of the spectral
density.
By the Inversion Theorem for the Fourier transform, we have,
for −1/2 ≤ ω ≤ 1/2,
f(ω) =∞∑
h=−∞γ(h)e−2πiωh,
or
f(ω) =∞∑
h=−∞E(XtXt+h)e
−2πiωh.
9
![Page 10: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/10.jpg)
Bispectral Density
Now, if the third moment E(XtXt+uXt+v) exists and is finite, in
the linear time series
Xt = µ+∞∑
i=0
ψiAt−i,
we have
E(XtXt+uXt+v) = E(
X3t
)
∞∑
i=0
ψiψi+uψi+v.
Now by analogy, we call the double Fourier transform,
the bispectral density:
b(ω1, ω2) =∞∑
u=−∞
∞∑
v=−∞E(XtXt+uXt+v)e
−2πi(ω1u+ω2v).
10
![Page 11: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/11.jpg)
Spectral Densities
Letting Ψ represent the polynomial formed in the usual way from
the ψi in the linear time series model, we have for the spectral
density,
f(ω) = E(
X2t
)
Ψ(
e−2πiω1)
Ψ(
e2πiω2)
;
and for the bispectral density,
b(ω1, ω2) = E(
X3t
)
Ψ(
e−2πiω1)
Ψ(
e−2πiω2)
Ψ(
e2πi(ω1+ω2))
.
Now, note in this case that
|b(ω1, ω2)|2f(ω1)f(ω2)f(ω1 + ω2)
=
(
E(
X3t
))2
(
E(
X2t
))3,
which is constant.
11
![Page 12: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/12.jpg)
Bispectral Test
The constancy of the ratio on the previous slide provides the
basis for a test of nonlinearity.
How would you do that?
Compute it for several subsequences.
There are various nonparametric tests for constancy, and conse-
quently there are various bispectral tests.
Notice also that the numerator in the test statistic is 0, if the
time series is linear and the errors have a normal distribution.
12
![Page 13: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/13.jpg)
BDS Test
The BDS test is names after Brock, Dechert, and Scheinkman,
who proposed it.
The test is for strict stationarity of the error process.
For the data x1, . . . , xn, it is based on the normalized counts of
closeness of subsequences, Xmi and Xm
j , where Xmi = (xi, . . . , xi+m−1).
For fixed δ > 0, the closeness is measured by how many subse-
quences are with δ of each other in the supnorm.
We define
Iδ(Xmi , X
mj ) =
{
1 if ‖Xmi −Xm
j ‖∞ ≤ δ
0 otherwise
13
![Page 14: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/14.jpg)
BDS Test
We compare the counts for subsequences of length 1 and k:
C1(δ, n) =2
n(n− 1)
∑
i<j
Iδ(X1i , X
1j )
and
Ck(δ, n) =2
(n− k+ 1)(n− k)
∑
i<j
Iδ(Xki , X
kj ).
In the iid case,
Ck(δ, n) → (C1(δ, n))k,
and asymptotically√n(Ck(δ, n)−(C1(δ, n))
k) is normal with mean
0 and known variance (see Tsay, page 208).
The null hypothesis that the errors are iid, which is one of the
properties of a linear time series is tested using extreme quantiles
of the normal distribution.
14
![Page 15: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/15.jpg)
BDS Test
Notice that the BDS test depends on two quantities, δ and k.
Obviously, k should be small relative to n.
There is an R function, bds.test, in the tseries package that
performs the computations for the BDS test.
15
![Page 16: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/16.jpg)
RESET Test
The Regression Equation Specification Error Test (RESET) test
of Ramsey is a general test for misspecification of a linear model
(not just a linear time series).
It may detect omitted variables, incorrect functional form, and
heteroscedasticity.
For applications of the RESET test to linear time series, we
assume that the linear model is an AR model.
The test statistic is an F statistic computed from the residuals
of a fitted AR model (see equation (4.44) in Tsay).
Because of the omnibus nature of the alternative hypothesis, the
performance of the test is highly variable, and often has very low
power.
There is an R function, resettest, in the lmtest package that
performs the computations for the RESET test.
16
![Page 17: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/17.jpg)
F Tests
There are several variations on the F statistic used in the RESET
test.
Tsay mentions some of these, and you can probably think of
other modifications of the basic ideas.
17
![Page 18: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/18.jpg)
Threshold Test
There are various types of tests that could be constructed based
on dividing the time series into different regimes.
Simple approaches would be based on regimes that are separated
by fixed time points.
Other approaches could be based on regimes in which either
observed values or fitted residuals appear to be different.
Obviously, if the data are used to identify possible thresholds the
significance level of a test must take that fact into consideration.
In general, the form of the test statistics, however they are con-
structed are F statistics.
18
![Page 19: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/19.jpg)
Time Series Models
I don’t know of any other area of statistics that has so many
different models as in the time domain of time series analysis.
Each model has its own name – and sometimes different name.
The common linear models are of the AR and MA types.
We combine AR and MA to get ARMA.
Then we difference a time series to get an ARMA, and call the
complete model ARIMA.
Next, we from AR and MA relations at multiples of longer lags.
This yields seasonal ARMA and ARIMA models.
Most of the linear models in the time domain are of these types.
Then we have the nonlinear time series models.
19
![Page 20: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/20.jpg)
Nonlinear Time Series
Four general types that are useful:
•models of squared quantities such as variances; these are of-
ten coupled with other models to allow stochastic volatility;
ARIMA+GARCH, for example
•bilinear models
Xt = c+p
∑
i=1
φiXt−i −q
∑
j=1
θjAt−j +m∑
i=1
s∑
j=1
βijXt−iAt−j +At
•random coefficients models
Xt =p
∑
i=1
(φi + U(i)t )Xt−i +At
•threshold models – Tsay describes a number of these in Chap-
ters 3 and 4.
Another general source of nonlinearity is local fitting of a general
model.
20
![Page 21: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/21.jpg)
Nonlinear Time Series
In the area of nonlinear time series models is where the small modificationswith their specific names really proliferates.
First, we have the basic ones that account for conditional heteroscedasticity:ARCH and GARCH.
Then the modifications (Chapter 3):IGARCH, GARCH-M, EGARCH, TGARCH (also GJR), CHARMA (or RCA),LMSV
Then further modifications (Chapter 4):TAR (similar to TGARCH, but for linear terms), SETAR (“self-exciting”TAR; the regime depends on a lagged value), STAR, MSA (or MSAR), NAAR.
Other models are local regression models.
Finally, we have algorithmic models, such as neural nets.
I am not going to consider all of these little variations.
The most common method of fitting these models is by maximum likelihood.
There are R functions for many of them.
21
![Page 22: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/22.jpg)
Time Series Models
The names of the wide variety of time series models that evolved from thebasic ARCH and GARCH models can be rather confusing.
Some models go by different names. Tsay sometimes refers to the TGARCH(m, s)model as the GJR model (see p. 149). (“GJR” is not in the index for hisbook.)
Most of the models that Tsay uses are special cases of the APARCH modelof Ding, Grange, and Engle (1993).
This model is
At = σtεt, (1)
as the basic ARCH model, and
σδt = α0 +
m∑
i=1
αi(|At−i| − γiAt−i)δ +s
∑
j=1
βjσδt−j. (2)
This model includes several of the other variations on the GARCH model.
22
![Page 23: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/23.jpg)
Transition or Threshold Models
A two-regime transition model is of the general form
Xt =
g1(xpt−p−1, a
qt−q−1, β1) +At if condition 1
g2(xpt−p−1, a
qt−q−1, β2) +At otherwise
A threshold model usually depends on the past state and so is
of the general form
Xt =
g1(xpt−p−1, a
qt−q−1, β1) +At if x
pt−p−1 ∈ R1
g2(xpt−p−1, a
qt−q−1, β2) +At otherwise
For the specific case of an AR model, the gi functions above are
linear functions of xpt−p−1.
Also, the condition xpt−p−1 ∈ R1 is usually simplified to a simple
form xt−d ∈ R1.
23
![Page 24: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/24.jpg)
Smooth Transition Autoregressive (STAR)
Model
An obvious modification is to make the transition smooth by
using a smooth weighting function.
If the linear functions are AR relationships, we have a simple
instance, namely, the STAR(p) model:
Xt = φ1,0 +p
∑
i=1
φ1,ixt−i+F
(
xt−d −∆
s
)
φ2,0 +p
∑
i=1
φ2,ixt−i
+At,
where F (·) is a smooth function going from 0 to 1.
Tsay gives an R function to fit a STAR model on page 186.
I could not find this code anywhere, but if someone will key it in
and send it to me, I’ll post it.
24
![Page 25: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/25.jpg)
Markov Switching Model
Another simple transition model in which the underlying compo-
nents are AR is the Markov switching model (MSA).
Here, the regime is chosen as a Markov process.
The model for a two-state MSA, as before is
Xt =
φ1,0 +∑pi=1 φ1,ixt−i +A1t if state 1
φ2,0 +∑pi=1 φ2,ixt−i +A2t otherwise
All we need to specify are the transition probabilities
Pr(St|st−1).
Fitting this is a little harder, but again, can be done by max-
imum likelihood. The transition probabilities can be estimated
by MCMC or by the EM algorithm.
25
![Page 26: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/26.jpg)
The following slides are preliminary versions of the material we
will discuss on April 10.
26
![Page 27: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/27.jpg)
Kernel Regression
Local regression is another type of nonlinear modeling.
A simple form of local regression is to use a filter or kernel
function to provide local weighting of the observed data.
This approach ensures that at a given point the observations
close to that point influence the estimate at the point more
strongly than more distant observations.
A standard method in this approach is to convolve the observa-
tions with a unimodal function that decreases rapidly away from
a central point.
This function is the filter or the kernel.
A kernel function has two arguments representing the two points
in the convolution, but we typically use a single argument that
represents the distance between the two points.
27
![Page 28: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/28.jpg)
Choice of Kernels
Standard normal densities have these properties described above,
so the kernel is often chosen to be the standard normal density.
As it turns out, the kernel density estimator is not very sensitive
to the form of the kernel.
Although the kernel may be from a parametric family of distribu-
tions, in kernel density estimation, we do not estimate those pa-
rameters; hence, the kernel method is a nonparametric method.
28
![Page 29: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/29.jpg)
Choice of Kernels
Sometimes, a kernel with finite support is easier to work with.
In the univariate case, a useful general form of a compact kernel
is
K(t) = κrs(1 − |t|r)sI[−1,1](t),
where
κrs =r
2B(1/r, s+ 1), for r > 0, s ≥ 0,
and B(a, b) is the complete beta function.
29
![Page 30: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/30.jpg)
Choice of Kernels
This general form leads to several simple specific cases:
• for r = 1 and s = 0, it is the rectangular or uniform kernel;
• for r = 1 and s = 1, it is the triangular kernel;
• for r = 2 and s = 1 (κrs = 3/4), it is the “Epanechnikov”
kernel, which yields the optimal rate of convergence of the
MISE;
• for r = 2 and s = 2 (κrs = 15/16), it is the “biweight” kernel.
If r = 2 and s → ∞, we have the Gaussian kernel (with some
rescaling).
30
![Page 31: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/31.jpg)
Kernel Methods
In kernel methods, the locality of influence is controlled by a
window around the point of interest.
The choice of the size of the window, or the “bandwidth”, is the
most important issue in the use of kernel methods.
In univariate applications, the window size is just a length, usually
denoted by “h” (except maybe in time series applications).
In practice, for a given choice of the size of the window, the
argument of the kernel function is transformed to reflect the
size.
The transformation is accomplished using a positive definite ma-
trix, V , whose determinant measures the volume (size) of the
window.
31
![Page 32: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/32.jpg)
Local Linear Regression
Use of the kernel function is simple.
When least squares is the basic criterion, the kernel just becomes
the weight.
32
![Page 33: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/33.jpg)
Choice of Bandwidth
There are two ways to choose a bandwidth.
One is based on the mean-integrated squared error (MISE).
In this method, the MISE for an assumed model is determined,and then the bandwidth that minimizes this is determined.
The other method is a data-based method.
We use cross-validation to determine the optimal bandwidth.
In cross-validation, for a given bandwidth, we fit a model using allof data except for a few points (“leave-out-d”), then determinethe SSE using all of the data.
We do this over a grid of bandwidths.
Then we do this multiple times (“k-fold cross-validation”).
The best bandwidth is the one that minimizes the SSE (from alldata).
33
![Page 34: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/34.jpg)
Nonparametric Smoothing
Kernel methods may be parametric or nonparametric.
In nonparametric methods, the kernels are generally simple.
There are various methods, such as running medians or running
(weighted) means.
Running means are moving averages.
The R function lowess does locally weighted smoothing using
weighted running means.
These methods are widely used for smoothing time series.
The emphasis is on prediction, rather than model building.
34
![Page 35: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/35.jpg)
General Additive Time Series Models
A model of the form
yi = β0 + β1x1i + · · · + βmxmi + εi
can be generalized by replacing the constant (but unknown) co-
efficients by unknown functions (with specified forms):
yi = f1(x)x1i + · · ·+ fm(x)xmi + εi
Hastie and Tibshirani have written extensively on such models.
35
![Page 36: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/36.jpg)
Neural Networks
When the emphasis is on prediction, we can form a “black box”
algorithm that accepts a set of input values x, combines them
into intermediate values (in a “hidden layer”) and then combines
the values of the hidden layer into a single output y.
In a time series application, we have data r1, . . . , rn, and for
i = k, . . . , n, we choose a subsequence xi = (ri, . . . , ri−k as an
input to produce an output oi as a predictor of ri.
We train the neural net so as to minimize∑
(oi − ri)2
The R function nnet in nnet can be used to do this.
See Appendix B in Chapter 4 of Tsay.
Watch out for the assignment statements!
Never write R or S-Plus code like that!
36
![Page 37: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/37.jpg)
Monte Carlo Forecasting
Monte Carlo can be used for forecasting in any time series model
(“parametric bootstrap”).
At forecast origin t we forecast at the horizon t + h by use of
the fitted (or assumed) model and simulated errors (or “innova-
tions”).
Doing this many times, we get a sample of r̂(j)t+h.
The mean of this sample is the estimator, r̂t+h, and the sample
quantiles provide confidence limits.
37
![Page 38: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/38.jpg)
Fitting Time Series Models in R
There are a number of R functions that perform the computa-
tions to fit various time series models.
ARMA / ARIMA arima(stats)
ARMA order determination autofit(itsmr)
ARMA + GARCH garchFit(fGarch)
APARCH garchFit(fGarch)
The APARCH model includes the TGARCH and GJR models,
among others; see equation (2).
Also see the help page for fGarch-package in fGarch.
38
![Page 39: Nonlinear Time Seriesmason.gmu.edu/~jgentle/csi779/14s/L08_Chapter4_14s.pdfNonlinear Time Series Recall that a linear time series {Xt} is one that follows the rela- tion, Xt= µ+ X∞](https://reader033.vdocument.in/reader033/viewer/2022052718/5f0490207e708231d40e9689/html5/thumbnails/39.jpg)
Other R Functions for Time Series Models
There are R functions for forecasting using different time series
models that have been fitted.
ARMA/ARIMA predict.Arima(stats)
APARCH (including ARMA + GARCH) predict(fGarch)
There are also R functions for simulating data from different
time series models.
ARMA/ARIMA arima.sim(stats)
APARCH (including ARMA + GARCH) garchSim(fGarch)
39