lecture 1

24
Statistical Inference for Ergodic Diffusion Process Yu.A. Kutoyants Laboratoire de Statistique & Processus, Universit´ e du Maine, 72085 Le Mans, Cedex 9, FRANCE Johannes Gutenberg University, Mainz, June-July 2008 1

Upload: lameune

Post on 06-May-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 1

Statistical Inference for Ergodic Diffusion Process

Yu.A. Kutoyants

Laboratoire de Statistique & Processus,

Universite du Maine,

72085 Le Mans, Cedex 9,

FRANCE

Johannes Gutenberg University, Mainz, June-July 2008

1

Page 2: Lecture 1

Classical Statistics

We observe n independent copies Xn = X1, . . . , Xn of the samer.v. with the density function f (x).

Parameter Estimation

We suppose that f (x) = f (ϑ, x) , ϑ ∈ Θ = (α, β). Then the likelihoodfunction is L (ϑ,Xn) =

∏nj=1 f (θ,Xj) and the estimators (MLE ϑn,

BE ϑn) are defined by the equations

L(ϑn, Xn

)= sup

θ∈ΘL (ϑ,Xn) , ϑn =

∫ β

α

θp (θ|Xn) dθ.

2

Page 3: Lecture 1

Under regularity conditions (smooth f (θ, x)) these estimators areconsistent, asymptotically normal√

n(ϑn − ϑ

)=⇒ N

(0, I (ϑ)−1

),

√n

(ϑn − ϑ

)=⇒ N

(0, I (ϑ)−1

)

and asymptotically efficient. Here I (ϑ) =∫ f(ϑ,x)2

f(ϑ,x) µ (dx) is theFisher information.

In the case of non smooth (w.r.t. ϑ) f (θ, x) these estimators havedifferent rates. Say, if f (θ, x) = g (x− θ), where g (x) has a jump atsome point x∗, then

n(ϑn − ϑ

)=⇒ u, n

(ϑn − ϑ

)=⇒ u.

If the singularity is of cusp-type, say, g (x) = |x− x∗|κ, then the ratedepends on κ ∈ (0, 1/2):

nγ(ϑn − ϑ

)=⇒ uγ , nγ

(ϑn − ϑ

)=⇒ uγ .

3

Page 4: Lecture 1

Nonparametric Estimation

If f (x) is unknown function, then we can consider the problems ofdistribution function F (x) and density f (x) estimation. In the firstcase the empirical distribution function

Fn (x) =1n

n∑

j=1

1IXj<x

is consistent,√

n-asymptotically normal and asymptotically efficient.In the second case the kernel type estimators

fn (x) =1

nhn

n∑

j=1

K

(Xj − x

hn

)

have good properties (consistent, nk

2k+1 -as. normal and as. efficientin some sense).

4

Page 5: Lecture 1

It is possible to construct a similar theory for the followingcontinuous time models of observations XT = Xt, 0 ≤ t ≤ T.• Gaussian processes

Xt = S (ϑ, t) + N (t) , 0 ≤ t ≤ T

where S (ϑ, t) is a signal and N (t) is Gaussian noise.

• Diffusion type processes

dXt = St (ϑ,X) dt + εσt (X) dWt, 0 ≤ t ≤ T

with small noise (ε → 0) and ergodic diffusion process

dXt = S (ϑ,Xt) dt + σ (Xt) dWt, 0 ≤ t ≤ T

with asymptotics T →∞.

• Point processes (mainly inhomogeneous Poisson processes ofintensity S (ϑ, t)).

5

Page 6: Lecture 1

Model

Diffusion process

dXt = S(Xt) dt + σ(Xt) dWt, X0, t ≥ 0

with S (·) and σ (·) such that the process is ergodic with invariantdensity

f (x) = G (S)−1σ(x)−2 exp

2

∫ x

0

S(y)σ(y)2

dy

S(·) is unknown and σ(·) > 0 is known to the observer.

We consider two types of problems : parametric andnonparametric estimation by observations Xt, 0 ≤ t ≤ T asT →∞.

6

Page 7: Lecture 1

Correspondence

In classical statistics the properties of the estimators depend directlyon the regularity of the density function f (ϑ, x). In ergodic diffusionmodel the properties of estimators depend on regularity of the trendcoefficient S (ϑ, x). Say the Fisher informations in these cases are

I (ϑ) =∫

f (ϑ, x)2

f (ϑ, x)2f (ϑ, x) dx, I (ϑ) =

∫S (ϑ, x)2

σ (ϑ, x)2f (ϑ, x) dx.

In nonparametric estimation the problem of density estimation fori.i.d. model is quite close to the problem of trend coefficientestimation. Note that the problem of distribution function estimation(i.i.d.case) is similar to the problem of invariant density estimationfor ergodic diffusion model.

7

Page 8: Lecture 1

Stochastic Integral

We are given a probability space Ω, F,P and Ft, 0 ≤ t ≤ T is anincreasing family of σ-algebras (filtration), i.e., for any 0 ≤ s < t ≤ T

the inclusions Fs ⊂ Ft ⊂ F hold.

Let MT be the class of progressively measurable random functionsh(·) such that

P

∫ T

0

h(t, ω)2 dt < ∞

= 1.

We say that h (·) ∈ M 2T if h (·) ∈ MT and

E∫ T

0

h(t, ω)2 dt < ∞.

Standard Wiener process is a continuous (with probability 1)

8

Page 9: Lecture 1

Gaussian process with independent increments and with the followingfirst two moments: EWt = 0, EWt Ws = t ∧ s.

The stochastic Ito integral

IT (h) =∫ T

0

h(t, ω) dWt

is defined for the functions h (·) ∈ MT .

• If h(·) ∈ M 2T , then

E IT (h) = 0, E IT (h) |Ft = It(h).

• For any two functions h(·), g(·) ∈ M 2T

E IT (h) IT (g) = E∫ T

0

h(t, ω) g(t, ω) dt.

9

Page 10: Lecture 1

In particular,

EIT (h)2 = E∫ T

0

h(t, ω)2 dt.

• If h(·) ∈ MT , then for any δ > 0 and γ > 0

P

sup0≤t≤T

∣∣∣∣∫ t

0

h(s, ω) dWs

∣∣∣∣ > δ

≤ γ

δ2+P

∫ T

0

h(t, ω)2dt > γ

.

• Let h(·) ∈ M 2T and for some m ≥ 1

E∫ T

0

|h(t, ω)|2m dt < ∞.

Then

E |IT (h)|2m ≤ [m(2m− 1)]m Tm−1 E∫ T

0

|h(t, ω)|2m dt.

10

Page 11: Lecture 1

Let g(t, ω) be Ft-adapted for almost all t ∈ [0, T ],

P

∫ T

0

|g(t, ω)| dt < ∞

= 1,

and h(·) ∈ MT . Then the stochastic process

Xt = X0 +∫ t

0

g(s, ω) ds +∫ t

0

h(s, ω) dWs, 0 ≤ t ≤ T,

is called the Ito process. Here X0 is a F0-measurable randomvariable. In the shortened form it is usually written as

dXt = g(t, ω) dt + h(t, ω) dWt, X0, 0 ≤ t ≤ T.

11

Page 12: Lecture 1

The class of Ito processes is closed with respect to smoothtransformations in the following sense. Let Xt, 0 ≤ t ≤ T be an Itoprocess with stochastic differential and G(x, t) be a differentiablefunction with the following continuous derivatives: G′t(x, t), G′x(x, t),G′′xx(x, t) (with obvious notation). Then the stochastic processYt = G(Xt, t), 0 ≤ t ≤ T is the Ito process too with the stochasticdifferential

dYt =[G′t(Xt, t) + G′x(Xt, t) g(t, ω) +

12G′′xx(Xt, t)h2(t, ω)

]dt

+ G′x(Xt, t)h(t, ω) dWt, Y0 = G(X0, 0), 0 ≤ t ≤ T.

This equality is called the Ito formula and it can be written as

dYt =[G′t(Xt, t) +

12G′′xx(Xt, t)h2(t, ω)

]dt + G′x(Xt, t) dXt,

with the same initial value.

12

Page 13: Lecture 1

Let h(·) ∈ MT and for some H > 0 with probability 1∫ T

0

h (t, ω)2 dt ≥ H,

then the stopping time

τH = inf

t :∫ t

0

h (s, ω)2 ds ≥ H

is well defined and

LIτH (h) = N (0,H) ,

i.e., IτH (h) is a Gaussian random variable with mean zero andvariance H.

13

Page 14: Lecture 1

Homogeneous diffusion processes

We are given two functions S (x), σ (x) and the stochastic differential

dXt = S (Xt) dt + σ (Xt) dWt, X0

GL. (Globally Lipschitz condition) There exists a constant L such

that

|S(x)− S(y)|+ |σ(x)− σ(y)| ≤ L |x− y|

for all x, y ∈ IR.

Note that by this condition the functions S(·) and σ(·) satisfy thelinear growth condition

|S(x)|+ |σ(x)| ≤ |S(0)|+ |σ(0)|+ L |x| ≤ L (1 + |x|)

too.

14

Page 15: Lecture 1

Theorem 1 Let the condition GL be fulfilled and P|X0| < ∞ = 1.Then this equation has a unique (strong) solution Xt, 0 ≤ t ≤ T,continuous with probability 1. If moreover EX2m

0 < ∞, then

EX2mt ≤ (1 + EX2m

0 ) ecmt − 1,

where cm is some positive constant.

15

Page 16: Lecture 1

Local time

Let us consider a homogeneous diffusion process

dXt = S (Xt) dt + σ (Xt) dWt, X0, 0 ≤ t ≤ T

The local time of this diffusion process denoted by ΛT (x) is definedas the following limit (with probability 1):

ΛT (x) = limε↓0

meast : |Xt − x| ≤ ε, 0 ≤ t ≤ T4 ε

, T ≥ 0, x ∈ IR

and by the Tanaka–Meyer formula it admits the representation:

|XT − x| = |X0 − x|+∫ T

0

sgn (Xt − x) dXt + 2 ΛT (x),

16

Page 17: Lecture 1

Let h (·) be a measurable function. Then with probability 1∫ T

0

h (Xt) σ (Xt)2 dt = 2

∫ ∞

−∞h (x) ΛT (x) dx.

We will use this equality in a different form. Let us denote

fT

(x) =2ΛT (x)

Tσ (x)2

and remember that the function σ (x)2 is supposed to be positive.The statistic f

T(x) we call the local time estimator of the invariant

density. Then

1T

∫ T

0

h (Xt) dt =∫ ∞

−∞h (x) f

T(x) dx →

∫ ∞

−∞h (x) f (x) dx.

17

Page 18: Lecture 1

Likelihood ratio

Let us consider two stochastic differential equations

dXt = S1 (Xt) dt + σ (Xt) dWt, X(1)0 , 0 ≤ t ≤ T,

dXt = S2 (Xt) dt + σ (Xt) dWt, X(2)0 , 0 ≤ t ≤ T

and denote by P(T )1 and P(T )

2 the probability measures induced in(CT ,BT

)by the solutions of these equations respectively. Likelihood

ratio is

dP(T )2

dP(T )1

(XT

)=

f2(X0)f1(X0)

exp

∫ T

0

S2 (Xt)− S1 (Xt)σ (Xt)

2 dXt

−12

∫ T

0

S2 (Xt)2 − S1 (Xt)

2

σ (Xt)2 dt

.

18

Page 19: Lecture 1

Limit theorems

We are given a homogeneous diffusion process

dXt = S (Xt) dt + σ (Xt) dWt, X0 = x0, t ≥ 0,

The statistical inference for such models is essentially based on twolimit theorems: the law of large numbers (LLN) for ordinary integralsand the central limit theorem (CLT) for stochastic and ordinaryintegrals. Note that the CLT for the ordinary integral is aconsequence of the CLT for stochastic integrals.

Law of Large Numbers

Let τa = inf t ≥ 0 : Xt = a, τab = inf t ≥ τa : Xt = b. We saythat the stochastic process X = Xt,≥ 0 is recurrent ifP τab < ∞ = 1 for all a, b ∈ IR. The recurrent process X is calledrecurrent positive if Eτab < ∞ for all a, b ∈ IR and is called null

recurrent if Eτab = ∞ for all a, b ∈ IR.

19

Page 20: Lecture 1

Theorem 2 The process X is recurrent if and only if

V (x) =∫ x

0

exp−2

∫ y

0

S(u)σ(u)2

du

dy,−→ ±∞ as x → ±∞.

The recurrent process X is positive if and only if

G =∫ ∞

−∞σ(y)−2 exp

2

∫ y

0

S(z)σ(z)2

dz

dy < ∞.

The process X is recurrent null if it is recurrent and

G =∫ ∞

−∞σ(y)−2 exp

2

∫ y

0

S(z)σ(z)2

dz

dy = ∞.

20

Page 21: Lecture 1

Examples.

dXt = −ϑ1 (Xt − ϑ2) dt + σ dWt,

dXt = − ϑ1 X3t

1 + ϑ2 X2t

dt + σ dWt,

dXt = −ϑ1 Xt [1 + γ sin (ϑ2Xt)] dt + σdWt,

dXt =[−ϑ1 X3

t + ϑ2 Xt

]dt +

√1 + X2

t dWt

dXt = − sgn (Xt − ϑ) dt + dWt, 0 ≤ t ≤ T,

dXt = −Xt

(a + b χϑ<Xt<c+ϑ

)dt + dWt,

dXt = −Xt

(a + b χϑ1<Xt<ϑ2

)dt + dWt.

21

Page 22: Lecture 1

For the positive recurrent diffusion process we have the Law of Large

Numbers: for any function h (·) such that E |h (ξ)| < ∞ we have(with probability 1)

1T

∫ T

0

h (Xt) dt −→ Eh (ξ) =∫

h (x) f (x) dx

The random variable ξ has “invariant” density f (x).

22

Page 23: Lecture 1

Central Limit Theorem

Let h (·, ω) ,∈ MT . Then the stochastic integral∫ T

0

h (t, ω) dWt

is well defined and we have the following

Theorem 3 (Central Limit Theorem) Suppose that there exists a(nonrandom) function ϕT and a positive constant % such that

P− limT→∞

ϕ2T

∫ T

0

h (t, ω)2 dt = %2 < ∞.

Then

L

ϕT

∫ T

0

h (t, ω) dWt

=⇒ N (

0, %2).

23

Page 24: Lecture 1

Theorem 4 ( CLT for ordinary integral) Let h (·) be a measurablefunction, such that E |h (ξ)| < ∞ and Eh (ξ) = 0. Then if

δ2 = 4E

(∫ ξ

−∞

h (v) f(v)σ (ξ) f (ξ)

dv

)2

< ∞,

then

L

1√T

∫ T

0

h (Xt) dt

=⇒ N (

0, δ2).

24