Nonlinear analysis of time series
ARMA(p,q) model qtqttptptt zzzxxx 1111
Linear analysis / linear models
Advantages: 1. Simple
2. Gaussian process, established
theory for stochastic processes
and statistical inference
3. Useful in applications
Shortcomings: 1. Cannot explain irregular patterns
in the time series
- data (distribution) asymmetry
- time irreversibility
- «bursts»
2. Deterministic part:
- stable fixed point system
- unstable system
- periodic system
autocorrelation AR model
description of irregular
patterns explanation / detection of complex
deterministic patterns
Time series, Part 3
Nonlinear analysis of time series
),,,,( 21 tptttt XXXfX A general
nonlinear model
tptttt XXXfX ),,,( 21 additive
noise
p
ptttt
'XXX ,,, 211 X pf :
f ?
tptpttt XXXX 2211
Linear AR
model
Generalizations / extensions of the ΑR model
p ,,, 21
constant (linear ΑR)
random coefficients - RCA
- BL
constant (linear ΑR, ARMA)
function of Xt - ARCH
- GARCH
piecewise models
- SETAR
- Markovian
)1()1(
2
)1(
1 ,,, p )2()2(
2
)2(
1 ,,, p
)()(
2
)(
1 ,,, l
p
ll
Self-excited threshold autoregressive models (SETAR)
ll rrrr ,,,, 110
lrrr 10
lRRR 21
lirrR iii ,,1],,( 1
pPartition of
selection of a lag d,
partition of for dtX
t
j
pt
j
pt
j
t
j
t XXXX )()(
2
)(
21
)(
1
jdt RX
SETAR
when
)1,0(~0 αν4.00.1
0 αν6.00.2
11
11
t
ttt
ttt
tXX
XXX
Example for SETAR
-5 0 5-3
-2
-1
0
1
2
3
4
x(t-1)
x(t
)
(xt-1
,xt) for a SETAR model
AR models with probabilistic selection of threshold
Exponential autoregressive models (EAR)
tt
j
t
j
t XXX 2
)(
21
)(
1
1 με2
με1j
tt
j
t
j
t XXX 2
)(
21
)(
1
AR models with periodic coefficients
12 όταν2
2 όταν1
kt
ktj
1
)1(
1 0)1(
2 0)2(
1 2
)2(
2
Example
Markov chain driven AR models
ljJ t ,,2,1
The selection of the threshold
is determined by a Markov chain )|( 1 iJjJP tt
Transition matrix
Example
tt
J
t XX t 1
)( 9.0)1( 9.0)2(
8.02.0
9.01.0)|( 1 iJjJP tt =
Piecewise polynomial models
tptttt XXXfX ),,,( 21
1 2( , , , )t m t t t p tX p X X X
polynomial of
order p and
degree m
Example
2
1 1 1 1(1 )t t t t tX aX X aX aX logistic map 1a
aa /)1( Two fixed points: 0 and
Fractional autoregressive models
tq
j
j
tj
p
j
j
tj
t
Xbb
Xaa
X
1
10
1
1010 qp
0pa
0qb
Example
Fraction of two polynomials
random coefficients autoregressive models (RCA)
1 ttt XX AR(1) with multiplicative errors
p
i
titiit XtBbX1
)( RCA
ib constant )(tBb iii
)(,),(),( 21 tBtBtB pindependent of
t
tXrandom with mean 0
Example titit XtBX )(1.0 )9.0,0(~ 2tB
Bilinear models (BL)
BL of order 1: ttttt XbaXX 11
p
i
titiit XtAaX1
)(
s
k
ktjki btA1
)(
)(tAa iii coefficients
ts XXts const, tss ,- If linear w.r.t.
“Bilinear” because:
ts Xts const, tsX s ,- If linear w.r.t.
AR models with conditional heteroscedasticity
tX ~ ARCH ~ BL 2
tX
ARCH ttt VX 22
11 ptptt XXV 0
0i
Model of multiplicative noise
),0(~ 2 t
GARCH
q
i
iti
p
i
itit VXV11
2
0ittt VX
0
0i
Analysis with nonlinear models
1. Model selection
2. Parameter estimation
- maximum likelihood method
- method of ordinary least squares
3. Diagnostic checking
uncorrelated
following normal distribution
rgm m 2)(ˆ|ln2)(AIC xθx
Μ candidate models, m = 1,...,M
errors (rediduals):
Real world time series
mechanics
physiology
geophysics economics
Nonlinear time series analysis and dynamical systems
Time series 1 2, , , nx x x
Assumption:
: trajectory of the dynamical system dts
0s : state vector at time 0
dd: tf system function
t : continuous or discrete time
For time series we assume underlying systems to be dissipative
Trajectory in d attractor
d:h observation function
( )t tx h sobservation :
0( )t
t s f sNonlinear dynamical system
Attractor:
● stable fixed (equilibrium) point
● finite set of equilibrium points
● limit cycle
● torus
● strange attractor
self similarity - fractals
chaos sensitivity to initial conditions
can be derived by
a linear system
cannot be derived by
a linear system
Nonlinear dynamical systems, maps (discrete time)
si = 1 – 1.4 si-12 + 0.3si-2
chaotic map Hénon
2
1
1
1
64.0exp9.01
k
kk
s
iiss
chaotic map Ikeda
si = a si-1(1 - si-1)
periodic a=3.52 chaotic a=4
Logistic map
Nonlinear dynamical systems, flows (continuous time)
s3
s1
s2
s1, s2 , s3 Lorenz system:
2133
31212
121 )(
sscss
sssbss
ssas
3
82810 cba
sampling time τs
Noise in the time series
( )t tx h s
0( )t
t s f s
noise ( )t t tx h w s
observational noise
noise
Observation
Dynamical system
0( )t
t tf s s
dynamic (system) noise
tw : white noise, uncorrelated to and tx ts
t : white noise, uncorrelated to us tu
Noise: dynamic (system) ε observational (measurement) w
si = a si-1(1 - si-1)
xi = si + wi, wi ~ N(0,s)
logistic map
si = a si-1(1 - si-1) + εi , εi ~ N(0,s2) xi = si
chaotic
periodic
Scatter diagrams in 2 and 3 dimensions
d=1 d=3 d=2
d=1 d=3 d=2
0 50 100 150 200 250 3000
50
100
150
200
time index i
x(i)
annual sunspots 1700-1996
0 50 100 150 2000
50
100
150
200
x(i)
x(i-1)
sunspots
050
100150
200 0
50
100
150
200
0
50
100
150
200
x(i-1)
sunspots
x(i)
x(i-2)
0 50 100 150 200 250 3000
100
200
300
400
500
time index i
x(i)
square of AR(9)
0 100 200 300 400 5000
100
200
300
400
500
x(i)
x(i-1)
Square of AR(9)
0200
400600 0
200
400
600
0
100
200
300
400
500
x(i-1)
Square of AR(9)
x(i)
x(i-2)
50 100 150 200 2500
500
1000
1500
2000
time index i
x(i)
square of z-lorenz
0 500 1000 1500 20000
500
1000
1500
2000
x(i)
x(i-1)
square of z-lorenz
0500
10001500
2000 0
500
1000
1500
2000
0
500
1000
1500
2000
x(i-1)
square of z-lorenz
x(i)
x(i-2)
Scatter diagrams in 2 and 3 dimensions
- Other topics:
- Hypothesis testing for linearity / nonlinearity
- Control system evolution
- Synchronization
- …
- State space reconstruction
in order to observe the complexity / stochasticity / structure
of the system
- Estimation of characteristics of the system / attractor
measuring the complexity / dimension of the system
- Modeling / Prediction
Use nonlinear models to improve predictions
Topics in
the analysis of time series and dynamical systems
xi = [xi , xi-t ,…, xi-(m-1)t ]
Method of delays
Parameters
embedding dimension m
delay time t time window length tw
tw = (m-1)t
We assume that
the studied system
is deterministic
State space reconstruction
initial state
space
M
is
1is
)(1 ii sfs
x
R
observed
quantity
xi = h(si )
h
Embedding
?
1ixix
)(1 ii xFx
Rm
reconstructed
state space
xi = F(si ) Φ
condition: 12 Dm
m=2 τ=1
s(i)= 1 – 1.4 s(i-1)2 + 0.3s(i-2)
or
s1 (i)= 1 – 1.4 s1(i-1)2 + s2(i-1)
s2 (i)= 0.3 s1(i-1)
Method of delays
Example: Hénon map
xi= s1 (i)
projection
m=3 τ=1
m=2 τ=2
m=3 τ=2
self-intersections
τ =10 xi= s1 (i)
projection
τ=1
Method of delays, m=3
3213
21312
211 )(
cssss
sbssss
ssas
a=10, b=28, c=8/3
Example: Lorenz system
optimal τ ? τ =5
τ =20
• From the autocorrelation r(τ)
(measures linear correlation)
τ r(τ) =1/e ή τ r(τ) =0
Estimation of τ
)()(
),(log),(),(
, ypxp
yxpyxpYXI
YX
XY
yx
XY
)(),( t
t
IYXI
xYxX ii
• From the mutual information I(τ)
(measures linear and
nonlinear correlation)
τ first local minimum I(τ)
• Close points on the attractor are:
- either real neighboring points due to system dynamics
- or false neighboring points due to self-intersections and insufficiently low m
Method of false nearest neighbors (FNN)
Estimation of m Optimal m ?
R
R2
• Takens theorem:
… but D is unknown 12 Dm
• At a larger m where there are no self-intersections all false neighboring points
will be resolved as they will no longer be close
• The optimal m’ is the one for which there are no longer any false nearest
neighbors as the dimension increases by one from m’ to m’+1.
• Too small m
self-intersection in the attractor
• Too large m
“curse of dimensionality”
An example of estimating m by the method FNN
The estimation of m with the method FNN depends on:
- the delay τ
- noise
x-Lorenz without noise
2 4 6 8 100
5
10
15
20
25
30
35
40
m
% F
NN
FNN, x-lorenz, no-noise
t=2
t=5
t=10
t=20
x-Lorenz + 10% noise
2 4 6 8 100
5
10
15
20
25
30
35
40
m
% F
NN
FNN, x-lorenz 10% noise
t=2
t=5
t=10
t=20
• Dimension
1. Euclidean
2. Topologic
3. Fractal
(correlation, information, box counting, …)
• Lyapunov exponents
(largest, the whole spectrum)
• Entropy
Estimation of nonlinear characteristics
Nonlinear characteristics (invariant measures)
The correlation dimension ν characterizes the fractal structure of the
attractor (self-similarity at different scales) using the density of the points
of the attractor in the reconstructed state space
The basic idea is that the probability of two points being
closer than a distance r
Correlation dimension ν
rji xx
changes w.r.t. r as a power of r
i : number of points lying in a sphere with
radius r and center ix
i i jxr x x
scaling law r
xi ~
ν integer the attractor is a regular geometric object
ν non-integer attractor is a fractal
holds for
0r N
xi
xi
xi
xi
rrC )(Scaling law for small r
Convergence of ν(m) for m sufficiently large
Estimation d log ( )
d log
C r
r for a range of r
If ν small and non-integer and the system is deterministic
small dimension and fractal (chaotic) structure
Estimation of the correlation dimension ν
Correlation sum
N
i
N
ij
jrNN
rC1 1)1(
2)( xxi
Nii ,,1, xreconstruction time series , 1, , ( 1)ix i N m t
Estimation of xi
0 when 0( )
1 when 0
xx
x
Heaviside function
x-Lorenz + 10% observational noise, τ=2
x-Lorenz + 10% observational noise, τ=10
log C(r) vs log r local slope vs log r ν vs m x-Lorenz without noise, τ=2
The estimation of ν is affected by the following factors:
- correlation time wji
- selection of τ and m
- noise
- time series length
-2 -1.5 -1 -0.5 0 0.5-5
-4
-3
-2
-1
0
logr
logC
(r)
m=1
m=10
()
-2 -1.5 -1 -0.5 0 0.50
1
2
3
4
5
log r
local slo
pe
m=1
m=10
()
0 2 4 6 8 100
1
2
3
4
5
m
()
n=924
Hénon
-2 -1.5 -1 -0.5 0 0.5-5
-4
-3
-2
-1
0
logr
logC
(r)
m=1
m=10
()
-2 -1.5 -1 -0.5 0 0.50
1
2
3
4
5
log r
local slo
pe
m=1
m=10
()
0 2 4 6 8 100
1
2
3
4
5
m
(t)
Hénon
+ 10% white noise
-4 -3.5 -3 -2.5 -2 -1.5 -1
-5
-4
-3
-2
-1
0
logr
logC
(r)
m=1
m=10
()
-4 -3.5 -3 -2.5 -2 -1.5 -10
2
4
6
8
10
log r
local slo
pe
m=1
m=10
()
0 2 4 6 8 100
2
4
6
8
10
m
()
Returns of ASE index
1/1/2005 – 20/9/2005
-4 -3.5 -3 -2.5 -2 -1.5 -1
-5
-4
-3
-2
-1
0
logr
logC
(r)
m=1
m=10
()
-4 -3.5 -3 -2.5 -2 -1.5 -10
2
4
6
8
10
log r
local slo
pe
m=1
m=10
()
0 2 4 6 8 100
2
4
6
8
10
m
()
white noise
The Lyapunov exponents measure the average rate of divergence and convergence
of the trajectories on the attractor at the directions of the local state space
Lyapunov spectrum: m ...21
λi > 0 divergence
λi < 0 convergence
λi = 0 direction of flow
If λ1 > 0 and the system is deterministic
chaos
Lyapunov exponents
Dissipative system :
m
i
i
1
0
xi
xi’
xi+t
xi’+t
d0 dt
Largest Lyapunov exponent λ1
Initial distance d0= xi - xi’ of two nearby trajectories is
expected to increase exponentially with time
If t
t e 1
0
λ1 is the largest
Lyapunov exponent
N
j j
jt
Nt 1 ,0
,
1 ln1
Computation:
After time t: dt= xi+t - xi’+t
Example: x-Lorenz
without noise with 10%-noise
The estimation of λ1 depends on : τ, m, noise
The true system generating the time series: )(1 ii sfs
Prediction models
2
1, 1 1, 2,
2, 1 1,
1 1.4
0.3
i i i
i i
s s s
s s
Hénon map
1
1, 2, 1, 1( , )f
i i is s s
2
1, 2, 2, 1( , )f
i i is s s
1i ifs s
The true system generating the time series: unknown )(1 ii sfs
The problem of modeling and prediction of time series:
given x1, x2, … xi , to estimate / predict xi+1
State space reconstruction
with the method of delays:
xi = [xi, xi-t …, xi-(m-1)t]
Prediction models
The reconstructed system from the time series: estimation? )(1 ii xFx
The function that is relevant to
time series prediction:
)(1 ii xFx
)(1 ii Fx x
mm :F
mF :
1 1( , )i i ix F x x m = 2, τ = 1
• Semi-local models, e.g. neural networks
the form of function F is derived as a weighted sum of
local basic functions
Nonlinear prediction models
• Global models, e.g. polynomials
function F bears the same analytic expression
for the whole domain
• Local models, e.g. the local linear model
function F is defined differently at each point of the
reconstructed state space
Prediction using similar segments of the time series
Prediction at time i+T from the mappings Τ step ahead of
“similar” segments from the past of the time series
Local prediction models
Implementation of the idea of “similar” segments:
time series segments reconstructed points
},...,,{ )()2()1( Kiii xxxThe nearest neighboring points to xi:
Prediction of xi+T from the mappings of the neighbors: },...,,{ )()2()1( TKiTiTi xxx
Zeroth order prediction: TiiTi xTxx )1()(ˆ
Average prediction:
K
j
Tjii xK
Tx1
)(
1)(
Local linear prediction
We assume that for the neighbor of xi the local linear model is valid :
i
mimii
miiiii
'a
xaxaxaa
xxxFFx
xa
x
0
)1(210
)1(1 ),,,()(
tt
tt
xi(1)+T = a0 + a’ xi(1)
xi(2)+T = a0 + a’ xi(2)
xi(K)+T = a0 + a’ xi(K)
The model holds for
)()2()1( ,...,, Kiii xxx
K
j
mjimjijiaaa
xaxaaxm 1
2
)1()()(101)(,,,
)(min10
t
Estimation of parameters
(method of ordinary least squares) maaa ,,, 10
Estimation of prediction error
We split the time series in two parts:
1 11 2, 1, , , , ,N N Nx x x x x
learning set test set
1 1ˆ ˆ, ,N Nx x
predictions ˆ
i T i T i Te x x
prediction error
N
i
i
TN
Nt
TtTt
xxN
xxNTN
T
1
2
1
2
1
1
ˆ1
)(NRMSE 1statistic for
prediction error
( )ix T
Example: x-Lorenz • local linear prediction model (LLP)
Prediction with: • local average prediction model (LAP)
11,5,1 Kmt
without noise
with 10%-noise
0 2 4 6 8 100.7
0.8
0.9
1
1.1
m
nrm
se(m
)
()
ARLAM(K=15)LLM(K=15)
Prediction error (nrmse) for the
last 30 quarters
annual- quarter growth rate of GNP of USE in the period 1947 – 1991
164 166 168 170 172 174 176-0.01
-0.005
0
0.005
0.01
0.015
0.02()
realAR(3)LAM(m=5,K=15)LLM(m=5,K=15)
Predictions starting at the first
quarter of 1989 with prediction
horizon being the last 6 years
Prediction with
- linear model, AR
- local average model, LAM
- local linear model, LLM
Prediction starting at 20/9/2005
and prediction horizon is up to 16 days ahead
ASE index in the period 1/1/2002 – 20/9/2005
Predict index with
- linear model, AR
- local average model, LAM
returns 1
1
t tt
t
x xy
x
18 25 02 09 16-0.015
-0.01
-0.005
0
0.005
0.01
0.015
day
retu
rns o
f in
dex
()
general index returnsy
n(T), AR(7)
yn(T), LAM(m=7,K=20)
index
18 25 02 09 163200
3250
3300
3350
3400
3450
day
clo
se index
()
general indexx
n(T), AR(7)
xn(T), LAM(m=7,K=20)
One step ahead prediction
in the period 21/9/2005 – 12/10/2005
ASE index in the period 1/1/2002 – 20/9/2005
Predict index with
- linear model, AR
- local average model, LAM
returns 1
1
t tt
t
x xy
x
18 25 02 09 16-0.015
-0.01
-0.005
0
0.005
0.01
0.015
day
index r
etu
rn
()
general indexy
n(1) AR(7)
yn(1) LAM(m=7,K=20)
index
18 25 02 09 163200
3250
3300
3350
3400
3450
day
clo
se index
()
general indexx
n(1) AR(7)
xn(1) LAM(m=7,K=20)