APENALTY METHOD FOR NONPARAMETRIC ESTIMATION OFTHE LOGARITHMIC DERIVATIVE OF ADENSITY FUNCTION
BY
DENNI S D. COX
TECHNICAL REPORT NO. 38
JULY 1983
DEPARTMENT OF STATISTICSUNIVERSITY WASHINGTON
SEATTLE; WASHINGTON
A PENALTY METHOD FOR NONPARAMETRIC ESTIMATION OF 1,2
THE LOGARITHMIC DERIVATIVE OF A DENSITY FUNCTION
BY
• COX
DEPARTMENT OF STATISTICSUNIVERSITY OF WISCONSIN
MADISON" WISCONSIN53706
2
A
OF C
OF A lTV FUNCTION
by
is O. Cox
ABSTRACT
Given a random sample of size n from a density fo on the
real line satisfying certain regularity conditions~ we propose a non-
Paramet r i c estimator for w = -fl/f. The estimate is the minimizer'0 0 o :
of a functional of the form AJ($) + f[$2-2$JJd F ~n
where A > 0 is a smoothing parameter, J(.) is a
rou e s s pe na l t y , and Fn is the rical c.d.f. of the sample.
on of es
A more
of case is ven~ since it i rabl e
i
'Ie
i te
1 s
es e
1i runct.ton on an i
[a ~ in 1i ne~ and ne
d(x) = - 1
identity:
on~ weative distributionng Fofollowing
(1)
a= -f (x)~(x)l + f ~I dF
o b 0
= f CI dFC, 0'
valid for all functions ~ provided certain regularity conditions hold,
and the DOLtnCliarv in theAntegration rts either or
cancel. If Xl,X2, ••• ~Xn are i.i .d. random variables with distribution
Fo ' then we can imate any L2 (F) product of form
by
f I d -1= ni
n
is on s
te rhmonc-
on5.
recover an es a e
a on;
can reveal much a as
cross; s of ) .i ness tails (x) remains or(l i near i t y of w) and
'0 '
tends to 0 as x -+ too) ; and
(i i i) W can be used ino 1ike1i es on of location,
so an estimate of W can be used for adaptive maximum likelihoodoestimation of location (see Beran's paper or Stone, 1975).
We will make use of equation (1) to develop an estimator of ~o'
but with a very different form Beran's estimator. First, we need
to spell out the regularity conditions.
ASSUMPTION Al. Let fo be a probability density function with
distribution function Fo . Let [a,b], _00 ~ a < b ~ + 00, be an interval.
Assume:
(i) f (x ) > 0oif and onl y if x (a,b);
( i i) f0 (a) fo(b) = 0
(i i i) = -1 is absol
(iv) = is y
(v ) d ;
(vi) <
y inuous on (a,b);
nuous on .b
s CC,:IITU.,'?iOfl i. S as i
{ii are
< a < b < and f > 0 ono
(vii') ( = b) •
holds," we shall mean either Al or
A2 holds.
PROPOSITION 1.
(a) Under assumption Al, Wo is the unique minimizer of
where W varies over functions satisfying
f dF < co0
a< co
, is zer of (2)
(3)
:: b) .
(1),
(b)
1
{3}
J + d
nce
t
s term on r.h.s. ;s a constant, ;naepl:nclenlt of
Q. E.D.
Now we use Proposition 1 to a nonparametric estimate of
If we ace in (2) by Fn, the natural estimate of F , andoa to minimize the expression, we will run into problems. Essentially,
t ;s no mini zer, and we can make the objective function arbitrarily
small (close to _(0). However , if \ve penalize the estimate for "roughness",
it will turn out that we can in general obtain a unique minimizer
ich is a ningful estimate of To be more precise, define the
function spaces
I (m-l): ~,~ , ••• ,~ are absolutely continuous,
band J
a
=
associ
,b]: (j)(a) = (j)(b), O:sj <
norms
=a
+ 1
e L an
1 f
L +
the
on A2 is in
++t:IY'e>n't1 ab1e ony
et
ifj
mes
> 0
o s i
aj are j
(i) (b),
i
x a
(i)th a. (a) =J
) .[a ,b]
nimizer over (oro
of
(5)b
A fa
dx + [ x))
We then propose to use $n), as an esti of $0'
In Section 2, we show that there is a unique minimizer of (5), with
probability one, and give a characterization. To do this, we need to
make
ASSUMPTION B. interval whose interior
contains the data points
uti of
Xl,X2, ••• ,Xn, and let 1;1'
= 0 on ,bo]' We
, •.• ,1;m be linearly
e that set of
(x.): ls:ism, ls:jsm]=OJ
s eDI=S~lue measure zero. ( sta .)
case is
= _....-
i-1 ssumnt ton
e solution will yield In
on it is "len.
is i 1 i 1 3
on 3, we cons a s ia1
a c
of linear
{5}zeruni
(n+m) x (n+m)
case, y m=2 and
It is shown there that if A ~ 00, ~nA approaches
(6 ) () x-x~x =-nco S
where X and S are the sample mean and standard deviation. This is
of course the maximum likelihood estimate of ~ under the N ,02)o
model.
At this point, we should compare and contrast with penalized
maximum likelihood estimates of fo (or ~o), as introduced by Good
and skins (1 ) . A variant ch is closer to our a is the
one 0 who on of,
r (x) + C ) + [ (x),a
In s
re-
1 i za on em
is ica in t it a set of non-
1i near ons. We feel that es
on for reasons of nl!rn~'~~cal S t i se.
A
as n + 00 t
mal
at
irement for an
To s
is
we
it consis
ASSUMPTION C.
(Otoo) such that
1 im An + 00 n
= lim n-1/ 2 A~5/4m = 0 •n+ oo
THEOREM 5.
Suppose
Let A2
o
o s r < 2m - 5/2
~ 1 and
all > 0
r < q <
an
+ r .
all n 1
pr
> 2
The proof is given in Section 4.
ourselves to periodic case
assert the existence of constants c 1
The main reason for restricting
on A2) is t we can
and c 2 for whi ch
(7)
We conjecture that a modified version of Theorem 5 holds under Al, but
it is then necessary to deal with eigenvalues of singular differential
operators (see 1). An interesting asymptotic representation of
WnA comes out of the proof of Theorem 5, namely
(which
G{X.YiA) is a Green's -t-r ..ret ton for a boundary value
, we see
) =-- G(x
is
) .
est t
1 ly
are as i
5, '';:1 !
(
Recalling that
=0 (n-(q-r)/(2q+3))r p
~ already involves one derivative, we see thato
this rate of convergence in probability is probably the best possible.
This is certainly the best rate of convergence in probability implied
the ts in Wahba (1975), Bretagnolle and Huber (1979), and
11 e i es -
s 1) i-
(x y) - - 1 x y
s
a
1.
course,
vatives in y as
y
tsi y
must involve
y is a
roucnness
a IInOlrlPclrametrl II
to the method to cases where
satisfies certain constraints, e.g. $o(t») = o.
As in as a i
o values of
from a in M~:)Ulli~
Xl ' , ••• , Xn• An .......o
,b ], under A2 is a function ~ E Hm ,bJ.oon A isto taonly me 'tIe
Al is a function ~
a = _00, and similarly and b.
""".JI'\I... 1'l 2. If n 2 m, is a unique minimizer of (5) w.p.l
(with probability 1), and it is the unique function ~ satisfying
b 1 nJ 0 (L~)(x)(L~)(x)dx + - I ~(Xk)~(Xk)
ao n k=l1= -n
for a11 admissab1e varia t ions ~.
PROOF. The first and second variation of the objective function
are
=0d
zer
1'0, .1'.1,
)
if
1s e
dJF - 2 J ~ I dJFn n
)
at
s
o on
N(L),
+ 2 J
s
)(x)( )(x)dx + 2 f
i es
) 2Abo (= Ja
0
b::: 2A f 0
ao
) = 0
s
on B.
- 0 on
0 ve is s
es
so
e
is a
on.
In to a ion we ally low a rpnro-
ing 1 ) given in and
Lis a
,bO
) ' 1 (x,y) sa
x}, and
Put
O~j<m-l,
j = m - 1
G(x,y) =[0Ct(x,y)
if x < y
if x:?: y
,y} = ,Z}G(y,z}dz •
=
inner
= (2)
(L)
=i' ij
N(L).
ij
Let
is ker's ta , and 1 ' ••• , is given basis of
Now put
'i"
= Lj
(x) C;; .(y)J
then H is an rkhs with kernelm
k = k1 + k2,
t , e. for each xed y E [ao,bo], k{· ,y) E , and for all
,k(·,y = (y), <., .> inner
E H ,m
<. ,. Fur-thermore since m ~ 2, (y) is a 1
tona 1 on , a so for all l;
l;'{y) = , (.,y»
1 on
i == 1,2). 1
:=
1 ssa e vari ~, so t solon satisfies
+ f )k(-,x) d (x)::: J k01(_,x) d (x
M , is
unique a
strictly positive definite ouacrar rc form pl us ali near form), which is
the unique $M E M satisfying
where ~::: ke M:::
$M E M impl i es the l.h.s. of (8) is also in M and so equal 0. This
s a uti ch is fact in
Q D.
heson onmore inTl'\l"'mAd
on is +
l'lporpm 2, it i s
tnear eouattons +
scan in reduCE~d to an ( ) x
· . i
n
zer of (5).rem 2, 1
m n= ~ +L
, ... )1 and § = , ..., ) 1 can be vi
following linear system:
(AI -1+ Tg = A y
where I is an nxn identity, and
-1 n 01y~ = -n L k2 (X., X.)
1 j=l 1 J
-1 n= n 1 (X. )
i 1
1 ::; i s n, 1 ::; j s n
1 s n, 1 ::;<R, s m
1 s i ::; n
1 s s m
. D.
, \'1e 1i es are easily
< .,••
OY'clPY' S cs e , " ..,
( 9) )(x) ::: 0, x > Yn or x <
This follows since if (9) nit d for x > Y , we cann y ma
objective smaller by acing on x > Y byn solu on
initial value problem
j < m,
and similarly for x < Yl .
(l0)
(b) *Let L denote the formal adjoint of L, then if 1 s j < n,
This follows since M defi
Theorem 2, and the t1pnpY'::! M satisfies the equation.
(c) E C2m- 3 and e
of - at each ) .•. ,Yn· s
3 ) .,
s is se a
In s ne s est case of of es -SCIlSS Y i~
(11 ) L ::
we ta
:: X
~ (x-y P if X;?; Y ;?; 0
-t (X_y)3 if x ~ Y < 0
otherwise
This st
in
on is a si y version of
:: 0) ::
ned
11
is
f is a is
If it is a
i
ex
y =
version normal
Sil verman
(1 ) ,
s i i es
-fl/f
t at least
some i
1 ues of :\ ,
THEOREM 4. Let n ~ 2. Under assumptions Al and B, if L = d2/dx2~
then for all x
lim $n:\(x) = (x-X)/S ,~\ -+ 00
where
nX = n l Xi
i =1
i
2 -1S = n
nI
It ea veri inear on
is
x) ch zes
a 1
(12) I + ( ) 10 _n-l )y - T(T'T)-'z- -
is the proJection onto the of T (in Rn). As l + 00,
the r.h.s. of (12) remains bounded, which implies S + O. Hence
as l + 00. Since
-1 -1 Xn n
(TIT) =-1 X -2
X~n n1
z ' = (0, n-1 )
so
in
+ X, ) ,
s.
Q.L
I
use the K,
same in a )
L 1. ne
=J u2 d
Let
be the ei
1 ~ CA) ~ CA) ~ •••
ues of the Rayleigh quotient SIAl' repl ac-
cording to its multiplicity, and let
associated eigenfunctions which sfy
B =
(13 )
.v = 1,2, .•. } denote the
) )
Ul::IJl:::fIU on
ues are
+
2
( v >
for constants < 00
t (7) i ies norm ( )1/ 2u~u is equivalent
)dx +a
othe injection map Hm[a~b] ~ L2[a~b] is a compact operator
by Sobolev's embedding theorem (see Adams~ 1975)~ whence (7) implies theo
i tion Hm[a~bJ ~ L2(FO) is compact. Hence~ each positive eigenvalue
av(A) has finite multiplicity~ and the only point of accumulation of
52 of 1974)•
Since the injection is one to one and B(u~u) ~ AA(U~U), Vu~ it is clear
~ t, > O~ y normal ing
assure (13) is sa If ),
) ::: ) + (1 B )
::: + 1 -
it is
+ 1
If we it 1 ) = (l +
all > O.
veri t y 's sa s t s, consi the forms
C i J u (x) dxa
for i = 1,2, where c1 and c2 are given in (7). Then we have for allo
U E Hm that
so by t ma pri ncipl e ( 57 of nberqer , 1 ) ,
=1,2, ... ,
all I, > 0,
(' (1)ent B· 1 1 y
luenvdlues
1) ( :: 2, ••• ,
all > if construct i ) t as
2 = 1 ...
ues of
f"11:::> ... "1('ttl C bounoa r
are ea 1y seen to
value
odic
1ves ) , It
regularity of
ic
(1967) and
cj
64-65 of
sinceexi
are
11 v
62 of Na(b), pa)
(
posi ve rnnC:T::t
Hence, the y also satisfies such bounds.v
Q.E.D.
LEMMA 2. Define
G(X,Y;A) = l (1v
are as in 1. For a co nuous tion u on .b],
) = J G
so
)
1 x ,
o
-(~ K D
nr u 1 •
F. We show G(x,· ) E
o
, for each fixed x E [a,b].
ti
NGN(X,Y;A) = I
v=l(x (y)
we have for each fixed x
(x)Z ,
and it is only necessary to show the series remains bounded as N + 00,
since 1/2A"A(u,u)o
is equivalent to Hm norm. By Sobolov's inequality
(page 32 of Agmon, 1967), we have for any r ~ 1 and all x
K is a constant ng on [a ,b] and m. ng r = ( )2
1i ue bounds of 1, the fact t s
1, and (7) , and norm valence, we in
(x) ~ K' V 1,2, .•.
nce ~ 2,
) ) <
x
(15) (G(x,* ) ) = ~ }.
now veri fy (14). on of
in
A (1JJ ) = f ~ I d IF + fA n d{ ~},
Substituting ~ = G{X,*;A} into this equation and using {15} yields {14}.
Step 3. vJe claim that if p+q < 2m - 1/2 then for all A > 0,
(16) f [Gpq(x,Y;A}J2dxdy ~ K A-(2p+2q+l}/2m •a a
note
,
1 h
in0
) is HO = L2(
). rst of all
p = 0,1,2, ... an a ication of,
levIs i i
K p + <: K'o -
and by an inrtL,rT,nn argument (see Proposition 3.1 of Cox, 1982)
for k = 0,1,2, ••• Using standard interpolation theorems (see
Triebel, , Theorem 1.15.2{f)) it follows that for any real r > 0
and any p E {O,1,2, ... } ,
is a norm iva to hi
on it = • we
= I\l
2
_ -(2p+2q+l) \'- 11 L
v
00
~ K11-(2p+2q+l) fo
s K It-(2p+2q+l
(e x)2(p+q)2 dx
The r ( ) < 2m - 1/2 is so i in seeo
la line is fini s 1
on s if q < - 1/2
f x)
, • ) q )
s y if < - 1
nee r - 1
K G )
(see Th~,n~l>m 4.2.5 of Tri , 1978J, we have that for all r < 2m - 1/2
II( ')11 0 -(2r+1)/2mnG ',;:\ Hm
( [a, b)x [a, b]) s K x
; s proves (16).
u) ::
3/2
)d 2
:: (y) +
+
+
•+ j
Q.E.
3. ne
=
)
(. ,y )
) -
Under the hypotheses of Theorem 5, we have
(17)
and
liB II :s;K1!lPll A(q-r)/2mA riO q
Us i ng (1) obtain
=r
s Kn-1
(. .s
2r+3)/2m
(y) dFO(Y) - r •
1i norm ences in 3 of of
1
K1
:s; K r1
s (17) • (
E[{
1== - Jn
)
id for all
According to
line is bounded
[a , bJ and all
ion (16), the in
above by
ve integers r ~ 2m - 3/2.
1 with respect to dx of this last
which proves t.
Q.
of > 0
+ +r +
r
rK
-(
(19 )
so by Ass
(20)
on C is an of
) ,
< 1
lity ~ 1 - 5 on ch
for all A E In. Furthermore, by (18) of
is an event of probability ~ 1 - E/5 on
lsi 1i ty,
(21) V II < K - 112 "I - 5I4mnA Iq - n 1\
Invoking (20), (21), and (17), we have on an event of probability
~ 1 - 2E/5 that
Since the last quantity in braces tends to 0 as n + 00 by Assumption C,
si nee second in braces remains bounded (as q ~ 1), we have
t for all n sufficiently large is an event of bi 1i ty
z 1 - 5 and a cons
+
each on
+ ) +r r
r.h.s. of ( in 3,
s K( )/2m
r
event
(25)
lity ~ 1 - which
l'l v II :s; K' -1/2 A-(2r+3)/4m, nA r n •
A simple calculation shows that
1 (x) )
n 1 is
~ 1 -
2, 19), ( ), ( a so 1 n
ci 1
+ -(1
)
(q-r)1 I -1/2 -(+ K 0
on an event with pro bility ~ 1 - E/5.
ciently
na 11s, Lemma 2, (19), a
D :\-{2r+3)/4m IIR ,. IIn 0), \IJnIt I! 1
on
(
I -1/2 ,-5/4m) -1/2 .-(2r+3)/4m Ks K D
nn 1\ n A 1
Combi ni ng (24) through ) and putting them in (23), we have t there
is a K' such all n sUfficiently la is
~ 1 - E on i ch
)r
$ K { r
+ K' n -{
+ n K + K' n{
+ K' -1 /2 -{n n
:s; K (q-r)/2m +r
-1/2 -{n
Q.E.D.
, R. A. (1
, S. (1
) , York.
nd, Yo
n, R. (1976), "Adaptive esti for autore9ressive processes,"
minimax," 1
Cox, D. (1982), "Convergence rates for multivariate smoothing spline
functions," submitted to SIAM J. Num. Anal.
Dacunha-Cas tel l e , D. (1978), "Vitesse de convergence pour certains prob lemes
statistiques," in Ecole d ' et€ de Prob. de St.-Flour VII-1977, Lecture
in 678,
Good, I. J. and
for
A. (1971),
ity densities," _B-,-i::'~'2::......::.;.._5.:....8;... 255-277.
, G. (1970), "Spline runct i ons and es,1I
on of a 1 i
" O.,
i a 1
(l
5),
",
1 ) ,
es on,"
1
.,• no. 5,es is
nh~,rne~r, H. (1974),
ona1 co • s