apenalty method for nonparametric estimation of the logarithmic … · 2017-09-01 · apenalty...

Post on 07-May-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

APENALTY METHOD FOR NONPARAMETRIC ESTIMATION OFTHE LOGARITHMIC DERIVATIVE OF ADENSITY FUNCTION

BY

DENNI S D. COX

TECHNICAL REPORT NO. 38

JULY 1983

DEPARTMENT OF STATISTICSUNIVERSITY WASHINGTON

SEATTLE; WASHINGTON

A PENALTY METHOD FOR NONPARAMETRIC ESTIMATION OF 1,2

THE LOGARITHMIC DERIVATIVE OF A DENSITY FUNCTION

BY

• COX

DEPARTMENT OF STATISTICSUNIVERSITY OF WISCONSIN

MADISON" WISCONSIN53706

2

A

OF C

OF A lTV FUNCTION

by

is O. Cox

ABSTRACT

Given a random sample of size n from a density fo on the

real line satisfying certain regularity conditions~ we propose a non-

Paramet r i c estimator for w = -fl/f. The estimate is the minimizer'0 0 o :

of a functional of the form AJ($) + f[$2-2$JJd F ~n

where A > 0 is a smoothing parameter, J(.) is a

rou e s s pe na l t y , and Fn is the rical c.d.f. of the sample.

on of es

A more

of case is ven~ since it i rabl e

i

'Ie

i te

1 s

es e

1i runct.ton on an i

[a ~ in 1i ne~ and ne

d(x) = - 1

identity:

on~ weative distributionng Fofollowing

(1)

a= -f (x)~(x)l + f ~I dF

o b 0

= f CI dFC, 0'

valid for all functions ~ provided certain regularity conditions hold,

and the DOLtnCliarv in theAntegration rts either or

cancel. If Xl,X2, ••• ~Xn are i.i .d. random variables with distribution

Fo ' then we can imate any L2 (F) product of form

by

f I d -1= ni

n

is on s

te rhmonc-

on5.

recover an es a e

a on;

can reveal much a as

cross; s of ) .i ness tails (x) remains or(l i near i t y of w) and

'0 '

tends to 0 as x -+ too) ; and

(i i i) W can be used ino 1ike1i es on of location,

so an estimate of W can be used for adaptive maximum likelihoodoestimation of location (see Beran's paper or Stone, 1975).

We will make use of equation (1) to develop an estimator of ~o'

but with a very different form Beran's estimator. First, we need

to spell out the regularity conditions.

ASSUMPTION Al. Let fo be a probability density function with

distribution function Fo . Let [a,b], _00 ~ a < b ~ + 00, be an interval.

Assume:

(i) f (x ) > 0oif and onl y if x (a,b);

( i i) f0 (a) fo(b) = 0

(i i i) = -1 is absol

(iv) = is y

(v ) d ;

(vi) <

y inuous on (a,b);

nuous on .b

s CC,:IITU.,'?iOfl i. S as i

{ii are

< a < b < and f > 0 ono

(vii') ( = b) •

holds," we shall mean either Al or

A2 holds.

PROPOSITION 1.

(a) Under assumption Al, Wo is the unique minimizer of

where W varies over functions satisfying

f dF < co0

a< co

, is zer of (2)

(3)

:: b) .

(1),

(b)

1

{3}

J + d

nce

t

s term on r.h.s. ;s a constant, ;naepl:nclenlt of

Q. E.D.

Now we use Proposition 1 to a nonparametric estimate of

If we ace in (2) by Fn, the natural estimate of F , andoa to minimize the expression, we will run into problems. Essentially,

t ;s no mini zer, and we can make the objective function arbitrarily

small (close to _(0). However , if \ve penalize the estimate for "roughness",

it will turn out that we can in general obtain a unique minimizer

ich is a ningful estimate of To be more precise, define the

function spaces

I (m-l): ~,~ , ••• ,~ are absolutely continuous,

band J

a

=

associ

,b]: (j)(a) = (j)(b), O:sj <

norms

=a

+ 1

e L an

1 f

L +

the

on A2 is in

++t:IY'e>n't1 ab1e ony

et

ifj

mes

> 0

o s i

aj are j

(i) (b),

i

x a

(i)th a. (a) =J

) .[a ,b]

nimizer over (oro

of

(5)b

A fa

dx + [ x))

We then propose to use $n), as an esti of $0'

In Section 2, we show that there is a unique minimizer of (5), with

probability one, and give a characterization. To do this, we need to

make

ASSUMPTION B. interval whose interior

contains the data points

uti of

Xl,X2, ••• ,Xn, and let 1;1'

= 0 on ,bo]' We

, •.• ,1;m be linearly

e that set of

(x.): ls:ism, ls:jsm]=OJ

s eDI=S~lue measure zero. ( sta .)

case is

= _....-

i-1 ssumnt ton

e solution will yield In

on it is "len.

is i 1 i 1 3

on 3, we cons a s ia1

a c

of linear

{5}zeruni

(n+m) x (n+m)

case, y m=2 and

It is shown there that if A ~ 00, ~nA approaches

(6 ) () x-x~x =-nco S

where X and S are the sample mean and standard deviation. This is

of course the maximum likelihood estimate of ~ under the N ,02)o

model.

At this point, we should compare and contrast with penalized

maximum likelihood estimates of fo (or ~o), as introduced by Good

and skins (1 ) . A variant ch is closer to our a is the

one 0 who on of,

r (x) + C ) + [ (x),a

In s

re-

1 i za on em

is ica in t it a set of non-

1i near ons. We feel that es

on for reasons of nl!rn~'~~cal S t i se.

A

as n + 00 t

mal

at

irement for an

To s

is

we

it consis

ASSUMPTION C.

(Otoo) such that

1 im An + 00 n

= lim n-1/ 2 A~5/4m = 0 •n+ oo

THEOREM 5.

Suppose

Let A2

o

o s r < 2m - 5/2

~ 1 and

all > 0

r < q <

an

+ r .

all n 1

pr

> 2

The proof is given in Section 4.

ourselves to periodic case

assert the existence of constants c 1

The main reason for restricting

on A2) is t we can

and c 2 for whi ch

(7)

We conjecture that a modified version of Theorem 5 holds under Al, but

it is then necessary to deal with eigenvalues of singular differential

operators (see 1). An interesting asymptotic representation of

WnA comes out of the proof of Theorem 5, namely

(which

G{X.YiA) is a Green's -t-r ..ret ton for a boundary value

, we see

) =-- G(x

is

) .

est t

1 ly

are as i

5, '';:1 !

(

Recalling that

=0 (n-(q-r)/(2q+3))r p

~ already involves one derivative, we see thato

this rate of convergence in probability is probably the best possible.

This is certainly the best rate of convergence in probability implied

the ts in Wahba (1975), Bretagnolle and Huber (1979), and

11 e i es -

s 1) i-

(x y) - - 1 x y

s

a

1.

course,

vatives in y as

y

tsi y

must involve

y is a

roucnness

a IInOlrlPclrametrl II

to the method to cases where

satisfies certain constraints, e.g. $o(t») = o.

As in as a i

o values of

from a in M~:)Ulli~

Xl ' , ••• , Xn• An .......o

,b ], under A2 is a function ~ E Hm ,bJ.oon A isto taonly me 'tIe

Al is a function ~

a = _00, and similarly and b.

""".JI'\I... 1'l 2. If n 2 m, is a unique minimizer of (5) w.p.l

(with probability 1), and it is the unique function ~ satisfying

b 1 nJ 0 (L~)(x)(L~)(x)dx + - I ~(Xk)~(Xk)

ao n k=l1= -n

for a11 admissab1e varia t ions ~.

PROOF. The first and second variation of the objective function

are

=0d

zer

1'0, .1'.1,

)

if

1s e

dJF - 2 J ~ I dJFn n

)

at

s

o on

N(L),

+ 2 J

s

)(x)( )(x)dx + 2 f

i es

) 2Abo (= Ja

0

b::: 2A f 0

ao

) = 0

s

on B.

- 0 on

0 ve is s

es

so

e

is a

on.

In to a ion we ally low a rpnro-

ing 1 ) given in and

Lis a

,bO

) ' 1 (x,y) sa

x}, and

Put

O~j<m-l,

j = m - 1

G(x,y) =[0Ct(x,y)

if x < y

if x:?: y

,y} = ,Z}G(y,z}dz •

=

inner

= (2)

(L)

=i' ij

N(L).

ij

Let

is ker's ta , and 1 ' ••• , is given basis of

Now put

'i"

= Lj

(x) C;; .(y)J

then H is an rkhs with kernelm

k = k1 + k2,

t , e. for each xed y E [ao,bo], k{· ,y) E , and for all

,k(·,y = (y), <., .> inner

E H ,m

<. ,. Fur-thermore since m ~ 2, (y) is a 1

tona 1 on , a so for all l;

l;'{y) = , (.,y»

1 on

i == 1,2). 1

:=

1 ssa e vari ~, so t solon satisfies

+ f )k(-,x) d (x)::: J k01(_,x) d (x

M , is

unique a

strictly positive definite ouacrar rc form pl us ali near form), which is

the unique $M E M satisfying

where ~::: ke M:::

$M E M impl i es the l.h.s. of (8) is also in M and so equal 0. This

s a uti ch is fact in

Q D.

heson onmore inTl'\l"'mAd

on is +

l'lporpm 2, it i s

tnear eouattons +

scan in reduCE~d to an ( ) x

· . i

n

zer of (5).rem 2, 1

m n= ~ +L

, ... )1 and § = , ..., ) 1 can be vi

following linear system:

(AI -1+ Tg = A y

where I is an nxn identity, and

-1 n 01y~ = -n L k2 (X., X.)

1 j=l 1 J

-1 n= n 1 (X. )

i 1

1 ::; i s n, 1 ::; j s n

1 s n, 1 ::;<R, s m

1 s i ::; n

1 s s m

. D.

, \'1e 1i es are easily

< .,••

OY'clPY' S cs e , " ..,

( 9) )(x) ::: 0, x > Yn or x <

This follows since if (9) nit d for x > Y , we cann y ma

objective smaller by acing on x > Y byn solu on

initial value problem

j < m,

and similarly for x < Yl .

(l0)

(b) *Let L denote the formal adjoint of L, then if 1 s j < n,

This follows since M defi

Theorem 2, and the t1pnpY'::! M satisfies the equation.

(c) E C2m- 3 and e

of - at each ) .•. ,Yn· s

3 ) .,

s is se a

In s ne s est case of of es -SCIlSS Y i~

(11 ) L ::

we ta

:: X

~ (x-y P if X;?; Y ;?; 0

-t (X_y)3 if x ~ Y < 0

otherwise

This st

in

on is a si y version of

:: 0) ::

ned

11

is

f is a is

If it is a

i

ex

y =

version normal

Sil verman

(1 ) ,

s i i es

-fl/f

t at least

some i

1 ues of :\ ,

THEOREM 4. Let n ~ 2. Under assumptions Al and B, if L = d2/dx2~

then for all x

lim $n:\(x) = (x-X)/S ,~\ -+ 00

where

nX = n l Xi

i =1

i

2 -1S = n

nI

It ea veri inear on

is

x) ch zes

a 1

(12) I + ( ) 10 _n-l )y - T(T'T)-'z- -

is the proJection onto the of T (in Rn). As l + 00,

the r.h.s. of (12) remains bounded, which implies S + O. Hence

as l + 00. Since

-1 -1 Xn n

(TIT) =-1 X -2

X~n n1

z ' = (0, n-1 )

so

in

+ X, ) ,

s.

Q.L

I

use the K,

same in a )

L 1. ne

=J u2 d

Let

be the ei

1 ~ CA) ~ CA) ~ •••

ues of the Rayleigh quotient SIAl' repl ac-

cording to its multiplicity, and let

associated eigenfunctions which sfy

B =

(13 )

.v = 1,2, .•. } denote the

) )

Ul::IJl:::fIU on

ues are

+

2

( v >

for constants < 00

t (7) i ies norm ( )1/ 2u~u is equivalent

)dx +a

othe injection map Hm[a~b] ~ L2[a~b] is a compact operator

by Sobolev's embedding theorem (see Adams~ 1975)~ whence (7) implies theo

i tion Hm[a~bJ ~ L2(FO) is compact. Hence~ each positive eigenvalue

av(A) has finite multiplicity~ and the only point of accumulation of

52 of 1974)•

Since the injection is one to one and B(u~u) ~ AA(U~U), Vu~ it is clear

~ t, > O~ y normal ing

assure (13) is sa If ),

) ::: ) + (1 B )

::: + 1 -

it is

+ 1

If we it 1 ) = (l +

all > O.

veri t y 's sa s t s, consi the forms

C i J u (x) dxa

for i = 1,2, where c1 and c2 are given in (7). Then we have for allo

U E Hm that

so by t ma pri ncipl e ( 57 of nberqer , 1 ) ,

=1,2, ... ,

all I, > 0,

(' (1)ent B· 1 1 y

luenvdlues

1) ( :: 2, ••• ,

all > if construct i ) t as

2 = 1 ...

ues of

f"11:::> ... "1('ttl C bounoa r

are ea 1y seen to

value

odic

1ves ) , It

regularity of

ic

(1967) and

cj

64-65 of

sinceexi

are

11 v

62 of Na(b), pa)

(

posi ve rnnC:T::t

Hence, the y also satisfies such bounds.v

Q.E.D.

LEMMA 2. Define

G(X,Y;A) = l (1v

are as in 1. For a co nuous tion u on .b],

) = J G

so

)

1 x ,

o

-(~ K D

nr u 1 •

F. We show G(x,· ) E

o

, for each fixed x E [a,b].

ti

NGN(X,Y;A) = I

v=l(x (y)

we have for each fixed x

(x)Z ,

and it is only necessary to show the series remains bounded as N + 00,

since 1/2A"A(u,u)o

is equivalent to Hm norm. By Sobolov's inequality

(page 32 of Agmon, 1967), we have for any r ~ 1 and all x

K is a constant ng on [a ,b] and m. ng r = ( )2

1i ue bounds of 1, the fact t s

1, and (7) , and norm valence, we in

(x) ~ K' V 1,2, .•.

nce ~ 2,

) ) <

x

(15) (G(x,* ) ) = ~ }.

now veri fy (14). on of

in

A (1JJ ) = f ~ I d IF + fA n d{ ~},

Substituting ~ = G{X,*;A} into this equation and using {15} yields {14}.

Step 3. vJe claim that if p+q < 2m - 1/2 then for all A > 0,

(16) f [Gpq(x,Y;A}J2dxdy ~ K A-(2p+2q+l}/2m •a a

note

,

1 h

in0

) is HO = L2(

). rst of all

p = 0,1,2, ... an a ication of,

levIs i i

K p + <: K'o -

and by an inrtL,rT,nn argument (see Proposition 3.1 of Cox, 1982)

for k = 0,1,2, ••• Using standard interpolation theorems (see

Triebel, , Theorem 1.15.2{f)) it follows that for any real r > 0

and any p E {O,1,2, ... } ,

is a norm iva to hi

on it = • we

= I\l

2

_ -(2p+2q+l) \'- 11 L

v

00

~ K11-(2p+2q+l) fo

s K It-(2p+2q+l

(e x)2(p+q)2 dx

The r ( ) < 2m - 1/2 is so i in seeo

la line is fini s 1

on s if q < - 1/2

f x)

, • ) q )

s y if < - 1

nee r - 1

K G )

(see Th~,n~l>m 4.2.5 of Tri , 1978J, we have that for all r < 2m - 1/2

II( ')11 0 -(2r+1)/2mnG ',;:\ Hm

( [a, b)x [a, b]) s K x

; s proves (16).

u) ::

3/2

)d 2

:: (y) +

+

+

•+ j

Q.E.

3. ne

=

)

(. ,y )

) -

Under the hypotheses of Theorem 5, we have

(17)

and

liB II :s;K1!lPll A(q-r)/2mA riO q

Us i ng (1) obtain

=r

s Kn-1

(. .s

2r+3)/2m

(y) dFO(Y) - r •

1i norm ences in 3 of of

1

K1

:s; K r1

s (17) • (

E[{

1== - Jn

)

id for all

According to

line is bounded

[a , bJ and all

ion (16), the in

above by

ve integers r ~ 2m - 3/2.

1 with respect to dx of this last

which proves t.

Q.

of > 0

+ +r +

r

rK

-(

(19 )

so by Ass

(20)

on C is an of

) ,

< 1

lity ~ 1 - 5 on ch

for all A E In. Furthermore, by (18) of

is an event of probability ~ 1 - E/5 on

lsi 1i ty,

(21) V II < K - 112 "I - 5I4mnA Iq - n 1\

Invoking (20), (21), and (17), we have on an event of probability

~ 1 - 2E/5 that

Since the last quantity in braces tends to 0 as n + 00 by Assumption C,

si nee second in braces remains bounded (as q ~ 1), we have

t for all n sufficiently large is an event of bi 1i ty

z 1 - 5 and a cons

+

each on

+ ) +r r

r.h.s. of ( in 3,

s K( )/2m

r

event

(25)

lity ~ 1 - which

l'l v II :s; K' -1/2 A-(2r+3)/4m, nA r n •

A simple calculation shows that

1 (x) )

n 1 is

~ 1 -

2, 19), ( ), ( a so 1 n

ci 1

+ -(1

)

(q-r)1 I -1/2 -(+ K 0

on an event with pro bility ~ 1 - E/5.

ciently

na 11s, Lemma 2, (19), a

D :\-{2r+3)/4m IIR ,. IIn 0), \IJnIt I! 1

on

(

I -1/2 ,-5/4m) -1/2 .-(2r+3)/4m Ks K D

nn 1\ n A 1

Combi ni ng (24) through ) and putting them in (23), we have t there

is a K' such all n sUfficiently la is

~ 1 - E on i ch

)r

$ K { r

+ K' n -{

+ n K + K' n{

+ K' -1 /2 -{n n

:s; K (q-r)/2m +r

-1/2 -{n

Q.E.D.

, R. A. (1

, S. (1

) , York.

nd, Yo

n, R. (1976), "Adaptive esti for autore9ressive processes,"

minimax," 1

Cox, D. (1982), "Convergence rates for multivariate smoothing spline

functions," submitted to SIAM J. Num. Anal.

Dacunha-Cas tel l e , D. (1978), "Vitesse de convergence pour certains prob lemes

statistiques," in Ecole d ' et€ de Prob. de St.-Flour VII-1977, Lecture

in 678,

Good, I. J. and

for

A. (1971),

ity densities," _B-,-i::'~'2::......::.;.._5.:....8;... 255-277.

, G. (1970), "Spline runct i ons and es,1I

on of a 1 i

" O.,

i a 1

(l

5),

",

1 ) ,

es on,"

1

.,• no. 5,es is

nh~,rne~r, H. (1974),

ona1 co • s

top related