on unbiased estimation of density functionsboos/library/mimeo.archive/isms_1970… · density...

ON UNBIASED ESTIMATION OF DENSITY FUNCTIONS

Allan Henry Seheult

Institute of StatisticsMimeugraph Series No. 649Ph.D. Thesis - January 1970

INTRODUCTION •

LIST OF FIGURES • . . .

TABLE OF CONTENTS

Introduction • • • • . • • • . • • • •The Model •••The General Problem and Definitions ••Existence Theorems • • • • • • • • • .

REVIEW OF THE LITERATURE • •

GENERAL THEORY • • •

APPLICATIONS • •

. . . . . . . . . . . .

1)131520

Introduction • • • • •Methods • • • • • • •• ••• •Discrete Distributions . • • • • • • •Absolutely Continuous Distributions • • • . • • •

SUMMARy. . . . . .

LIST OF REFERENCES ••

. . , .. . . . .

26273438

LIST OF FIGURES

4.1 The U.M.V.U.E. of (x/e)I(O,e) • • • • • • • • • • • • • •• 46

1. INTRODUCTION

The problem of estimating probability density functions is of

fundamental importance to statistical theory and its applications.

Parzen (1962) states; "The problem of estimation of a probability

density function f(x) is interesting for many reasons. As one possible

application, we mention the problem of estimating the hazard, or condi

tional rate of failure, function f(x)/(l-F(x)}, where F(x) is the dis

tribution function corresponding to f(x)." Ifu,jek and Sidak. (1967) in

the introduction to chapter one of their book make the statement; "We

have tried to organize the multitUde of rank tests into a compact system.

However, we need to have some knowledge of the form of the unknown density

in order to make a rational selection from this system. The fact that no

workable procedure for the estimation of the form of the density is yet

available is a serious gap in the theory of rank tests." It is impli

citly understood in both of the statements cited above that the unknown

density function corresponds to a member of the class of all probability

distributions on the real line that are absolutely continuous with re~

spect to Lebesgue measure. Moreover, the obvious further implication

is that the problem falls within the domain of non-parametric statistics.

It is not surprising, therefore, that most of the research effort on this

problem in the last fifteen years or so has been directed towards devel

oping estimation procedures that are to some extent non-parametric in

character. However, like many non-parametric procedures those that have

been developed for estimating density functions are somewhat heuristic.

Any reasonable heuristic procedure would be expected to have good large

sample properties and this is about all that could be hoped for in this

particular non-parametric setting. The problem is that any attempt to

search for estimators which in some sense optimize a predetermined

criterion invariably leads one to the untenable situation of considering

estimators that are functions of the density function to be estimated.

Hence within this non-parametric framework the efforts, for example, of

'. Y,HaJek and Sidak; "••• to organize the multitude of rank tests into a

compact system ••• ;" is, indeed, a difficult task. The work in this

thesis is partly an attempt to resolve some of the difficulties alluded

to above. The approach adopted is classical in that attention is re-

stricted to unbiased estimators of density functions. However, the

question of the existence of unbiased estimators is of fundamental

importance and is one of primary consideration of the research herein.

The general setting is that of a family of probabili ty measures p,

dominated by a CT-fini te measure ~ on a measurable space (1,G), with 1:>

the family of densities (with respect to ~) that correspond to p. It

is assumed that

(n)X = (Xl' X2'

there is available n independent observations

••• , X ) on a random variable X with unknown distributionn

P, a member of P, and density P€1:>. The problem is, therefore, to find an

estimator ;(x) = ;(x;x(n)) such that for each x€1

i EpCp(x)J = p(x), for all P € p.

•The symbol Ep denotes mathematical expectation under the assumption that

p is the true distribution of X. Thus, with the exception of one simple

but important difference, the problem resembles the usual parametric

estimation problem. In the usual context an estimator is required of a

function ~(p) of the distribution P. Here, ~ depends on x as well as p.

Thus a certain global notion has been introduced which is not usually

present wi thin the classical framework of point estimation; one might

use the phrase 'uniformly unbiased' for estimators p satisfying (1.1).

At this point it is pertinent to make a few remarks about the classical

estimation problem. Very often the statistician claims he wants to

estimate a function ~(p) using the sample x(n). However, it is not

always clear for what purpose the estimate will be used. If all that is

required is to have an estimate of ~ then the classical procedures,

wi thin their own limitations, are adequate for this purpose. It is often

the case, however, that P (and hence p) is an explicit function of ~

alone; that is, for each A € a

peA) :: P(A;~), for all P € p. (1.2)

For example, if p is the family of normal distributions on the real line

with unit variance and unknown mean IJ., it is common to identify an

unknown member of this family by the symbol P , and a typical functionIJ.

cp(p ) :: IJ.,IJ.

for all IJ. € (- 00,(0). (1.3 )

•Once an estimate cp of cp has been obtained an estimate P of P is arrived

at by the 'method of sUbstitution'; that is, P is given by

This procedure implies that the ultimate objective is to estimate P (or p)

rather than~, and, therefore, desirable criteria should refer to the

estimation of P (or p) rather than~. This substitution method is per-

fectly valid if maximum likelihood estimators are required provided that

very simple regularity conditions are satisfied. In the above example

the mean X is a unique minimum variance unbiased estimator of J.L, but it

is not the case that Px and pX are the unique minimum variance unbiased

estimators of P and p. It is shown in chapter four that the uniqueJ.L J.L ,..

minimum variance unbiased estimator P of P is a normal distribution withJ.L

mean X but with variance (n-l)/n. This should be compared with the maxi-

mum likelihood estimator Px.It transpires that in the search for estimators satisfying (1.1) it

is necessary to consider unbiased estimators ;(A) = ;(A;X(n)) of the

corresponding probability measure P; that is, P is an unbiased estimator

of P if for each A E G

,..~[P(A)] = P(A), for all PEr. (1.5)

,.. ,..It is easy to see that if p is an unbiased estimator of p then SpdJi. is

Aan unbiased estimator of P, but in general the converse is not true. In

chapter three of this thesis conditions for the converse to hold are

established. It turns out that if an unbiased estimator p of p exists,

then it can be found immediately. The existence or non-existence of the

unbiased estimator p depends on whether or not there exists an unbiased

estimator P of the corresponding probability measure that is absolutely

continuous with respect of the original dominating measure J.L.

If a ~-continuous P exists, an unbiased estimator of p is given by the~

Radon-Nikodym derivative, ~. Thus if r is a sub-family of Lebesgue

continuous distributions on the real line and an unbiased density

estimator exists, it is obtained by differentiation of the distribution

function estimator corresponding to P.

Chapter three also includes a more precise description of the

general theoretical framework and analogous theorems on the existence

of unique minimum variance unbiased estimators of density functions.

The interesting result here is that if r admits a complete and suffi-

cient statistic then a unique minimum variance unbiased estimator P of

P exists. Moreover, if this P is absolutely continuous with respect to~

~ then ~ is the unique minimum variance unbiased estimator of p.

The application of the general theory developed in chapter three

to particular families of distributions on the real line forms the main

content of chapter four. The dominating measure ~ is taken to be either

ordinary Lebesgue measure or counting measure on some sequence of real

numbers. In particular, the result of Rosenblatt (1956), that there

exists no unbiased estimator of p if r is the family of all Lebesgue-

continuous distributions on the real line, is re-established as a simple

corollary to the theorems of chapter three. On the other hand, it is

also true that the theorems of chapter three are generalizations of

Rosenblatt's result.

Finally, the thesis is concluded with a summary and suggestions

for further research.

2. REVIEW OF THE LI':('].'R..u.TURE

MOst of the research on density estimation that appears in the

IIterature is concerned with the following problem. Let

X(n) = (Xl' X2' ••• , Xn ) be n independent observations on a real-valued

random variable X. The only knowledge of the distribution of X is that

its distribution function F is absolutely continuous. F and, hence, the

corresponding probability density function f are otherwise assumed to be

completely unknown. By making use of the observation x(n) the objective....

is to develop a procedure for estimating f(x) at each point x. For any

such estimation procedure let fn(x) = fn(X;X(n» denote the estimator of

f at the point x. Some researchers have also considered the case where

X takes on values in higher dimensional Euclidean spaces. However, for

the purpose of this review, no separate notation will be developed for

these multivariate extensions.

At the outset it should be remarked that, within the context out

lined above, a result of Rosenblatt (1956) has a direct bearing on the

work of this thesis and the relevance of certain sections of the existi.ng

literature on the subject of density estimation. He showed that there

exists no unbiased estimator of f. As a result much of the emphasis has

been on finding estimators with good large sample properties, such as

consistency and asymptotic normality•. Since these metho~s and procedures

do not have a direct bearing on the present work, only a brief review of

the literature on these methods and procedures will be presented here.

Very basic approaches have been adopted by Fix and Hodges (1951),

Loftsgaa.rden and Quesenberry (1965), and Elkins (1968). In connection

with non-parametric discrimination, Fix and Hodges estimate a k-dimen~

sional density by counting the number N o~ sample points in k-dimen-

sional Borel sets ~. They showed that i~ ~ is continuous at x,n

lim n A(6n) = 0, then, N/n A(~n) is a conn~CID

lim sup \x-dl = 0,n->CIDd€~

nsistent estimator o~ ~(x). Here, A is k-dimensional Lebesgue measure

and p(x,y) = \x-y\ is the usual Euclidean metric. Lo~tsgaarden and

Quesenberry obtain the same consistency result with ~ hyperspheresn

about x, and by posing the number o~ points and then ~inding the radius

o~ ~ which contains this number o~ points. In the two-dimensional casen

Elkin compares the e~~ects o~ chosing the ~ to be spheres or squaresn

(centered about x) in the Fix and ~odges type o~ estimator; the

criterion o~ comparison being mean integrated square error.

Apart ~rom the simple estimators discussed above, two other essen-

tially di~~erent approaches have been adopted in this complete non-para-

metric setting; namely, weighting ~ction type estimators, and series

expansion type estimators.

Kronmal and Tarter (1968), re~erring to papers on density esti-

mation by Rosenblatt (1956), Whittle (1958) and Parzen (1962) state;

"The density estimation problem 1s considered in these papers to be that

of finding the' focusingfunction6m(-) such that ,using the criteria ofn

Mean Integrated Square Error, M. I. S, E., ~ (x) = (lin) E 0 (x-X.)n i=l n ~

would be the best estimator o~ the density ~." It was shown by Watson

and Leadbetter (1963) that the solution to this problem could be

obtained by inverting an expression involving the characteristic function

o~ the density~. Papers by Bartlett (1963), Murthy (1965, 1966),

Nadaraya (1965), Cacoul1os (1966), and Wooqroofe (1967) are all cOllcern~d

with the properties and extensions to higher dimensions of estimators of

the type described above. Anderson (1969) compares some of the a.bove

estimators in terms of the M. I. S. E. criterion.OJ

A different approach has been exploited by Cencov (1962), Schwartz

(1962) and Kronmal and Tarter (1968). The basic idea on the one...dimen-

sional case is to use a density estimator

f (x)n

q= E a. cp.(x),

j=l In J(2.1)

1 na. =- Er(X.)cp.(X.).In n i=l 1 J 1

Here, q is an integer depending on n, and the CPj form on orthonormal

system o! functions with respect to a weight function r; that is,

Sr(x)CPi(X)CPj(X)dx = 5ij ,-co

(2.3 )

where 5.. is the Kronecker delta function. Cencov studied the general1J

case and its k-dimensional extension, Schwartz specialized to Hermite

functions with r(x) = 1, and Kronmal and Tarter specialized to trigono-

metri c functi ons with r (x) = 1.

Brunk (1965), Robertson (1966), and Wegman (1968) restrict their

attention to estimating unimodal densities. The maximum likelihood

method is used and the solution can be represented as a conditional

expectation given a ~-lattice. Brunk (1965) discusses such conditional

expectations and their application to various optimization problems.

The research which turns out to be most relevant to the present

work is to be found in papers concerned with the estimation of a

distribution fUnction Fe at a point x on the real line. Here, e i~ a

parameter which varies over some index set (8), which is usually a Borel

subset of a Euclidean space. With one exception, all of the writers

of these papers consider particular families of distributions on the

real line such that e admits a complete and sufficient statistic T,

say. Their objective is then to find unique minimum variance unbiased

estimators of Fe(X) for these particular families. Their approach is

the usual Rao-Blackwell-Lehmann-Scheffe method of conditioning a

simple estimator of Fe(X) by the complete and sufficient statistic T.

In fact, the usual estimator is given by

F(x) = P(Xl ~ x\T]. (2.4)

Barton (1961) obtained the estimators for the binomial, the Poisson

and the normal distribution fUnctions. Pugh (1963) obtained the esti-

mator for the one-parameter exponential distribution fUnction. Laurent

(1963) and Tate (1959) have considered the gamma and two-parameter

exponential distribution fUnctions. Folks, et !l (1965) find the esti-

mator for the normal distribution fUnction with known mean but unknown

variance. Basu (1964) has given a summary of all these methods. Sathe

and Yarde (1969) consider the estimation of the so-called reliability

fUnction, l-Fe(X). Their method is to find a statistic which is

stochastically independent of the complete and sufficient statistic

and whose distribution can be easily obtained. The unique minimum

variance unbiased estimator is based on this distribution. For

example, if Fe(x) is the normal distribution function with mean e

and variance one, Xl - X, say, is independent of the complete and

sufficient statistic X, and has a normal distribution with mean zero

and variance (n-l)/n. They give as their estimator of l-Fe(t),

G(w)~ 2

= J exp[-x !2]dx.w

They also apply their method to most of the distributions considered

by the previous authors mentioned above.

As an example of more general functions of single location and

scale parameters, Tate considered the problem of estimation Pe[X E A];

that is, e is the parameter in distributions given by densities,

(scale parameter)

(location parameter),

where p(x) is a density function on the real line, and e > 0 in (2.7)

and ~ < e < co in (2.8). Tate xoelies heavily on integral transform

theory. For example, he obt~ns the following result for the family

(2.7), under the assumption that p(x) vanishes for negative x. Let

H(X(n» be a non-negative homogeneous statistic of degree a ~ 0 with

density eag(eax), and suppose g(x) and 9p(ez) (for some fixed positive

number z) both have Mellin tr8llsfqrrns. Then, if there exists an,..

unbiased estimator p of 9P(ez) with a Mellin transform it will be

given by

(2·9)

where Mis the Mellin tran~tQrm ~efined for a function '(x), when it

exists, by

M[,(x);s] =SxS-1C(X)dx,o

(2.10)

-1and M denotes the inverse ~ct~on. Tate also obtained many of the

previously mentioned results. Washi0, ~ al (1956), again using

integral transform methods, ~tudy the problem of estimating functions

of the parameter in the Koppman-Pi,. tman family of densi ties (cf. Koopman

(1936) and Pitman (1936». To illustra~e their method they consider,

as one example, the prob~em of estimating Pe[X € A] for the normal

distribution with unknown mean and variance. In an earlier paper

Kolmogorov (1950) derived an estimator of Pe[X E A] for the normal

distribution with unknown mean and known variance. His approach was

to first obtain a unique minimum variance unbiased estimator of the

density function (which he assumed existed) and then to integrate

the solution on the set A. In his derivation he used the 'source

solution' of the heat equation

~(z,t) = (2nt)i exp[-z2/4t], 0 < t <~, ~ < z < ~,

to solve the integral equation that resulted directly from the

estimation problem.

(2.11)

3. GENERAL THEORY

3.1 Introduction

In this chapter the general set-up of the basic statistical model

is described, definitions of unbiasedness for probability measures andI

densities are given, and two basic theorems are established. The first

of these two theorems gives necessary and sufficient conditions for the

existence of an unbiased estimator of a density function, and under the

additional assumptions of sufficiency and completeness the second

theorem gives similar necessary and sufficient conditions for the

existence of a unique minimum variance unbiased estimator of a density

function. Moreover, these theorems also show that if, in any parti-

cular example, an unbiased estimator of the density function exists,

then the estimator can be computed immediately.

3.2 The Model

In this section the set-theoretical details of the general

statistical model are described. Let (l,a,g) be a ~-finite Euclidean

measure space; that is, I, with generic element x, is a Borel set in a

Euclidean space, a is the class of Borel subsets of I, and g is a

~-finite measure on a. Furthermore, suppose that there is given a

family P of probability measures P on a which is dominated by g; that

is, each P € P is absolutely continuous with respect to g. Then by

the Radon-Nikodym theorem there exists, for each P, a g-unique, finite

valued, non-negative, a-measurable function p, such that

P(A) • Sp(X)~(x),A

for all A € G.

The fam:f,.ly ~, of functions p, will be referred to as the family of

density tunctions(or densitie~) cOfresponding to p.

Let X be a random variable over the space (X,a). It will be

assumed ~hat the probability distribution of X identifies with an

unknown member P of p. Let X(n) = (Xl' X2' ••• , Xn ) b~ n indep~nde~t

random variables that are identically distributed as X, and denote by

x(n) = (xl' x2' ••• , X

n) an obs~rvation on x(n). Denote by X(n) the

sample ~pac~ ot observations x (n), and a(n) the product cr-algebra of'

subsets of X(n) that is determined by a in the usual way. For any

measure Q an a the cQrresponding product measure on a(n) will be

denoted by Q,(n).

All statistics to be considered are a(n).measurable functions on

X(n) to a. Euclidean measureaole sp~ce {~,8). .If T:::T(X(n» is such a

statistic, the~ denote by PT tha.t fa.m:i,ly of probability measures PT

on B induced by ~; that is, for each B € B

for all PEP.

Since both of the spaces (X,a) and (~,B) are Euclidean it will be

assumed tqat all conditional probability distributions are regul8f;

that is, if T is a statistic and Q an arbitrary probability measure

on ~(n), then a conditional probability function QT defined, for each

A € a(n) and T F t, by

for all B € B,

(3.3 )

is, for e~ch fixed t, a probability measure on a(n). Theorems 4 and 7

in chapter two of Lehmann(1959), for example, validate this assumption.

ActuallY, the assumption of Euclidean spaces is unnecessarily restric-

tive. As long as there exist regular conditional probability measures

all the results of this chapte.f will still be valid. Of course, from

a practical viewpoint, spaces other than Euclidean spaces are of little

interest. TlUs will be true in the next chapter where I will be taken

to be a Borel s'Q.bset of the real line.

Finally, it will be assumed that the definitions of completeness

and sufficiency for a family of probabili ty measures are known, and

hence no sep~ate definittons of these notions will be given.

3.3· The General froblem and Definitionsi

In this section the general problem of density estimation is dis-

cussed within the framework of decision theory, and definitions of

unbiased estimators for probability measures and probability density

functions are given to prepare the way for the theorems of the next

section•

The problem to be considered here is typical of a large class o~

statistical problems. A set of observations x (n) are available that

are ass~ed to have been generated from a distribution P with corres-

pondin~ dens;l. ty p. All that is assumed about P (1') is that it is a

member of some particular cla.ss of probability measures (probability

density functions) p (~). Hence, given the observations x(n), the

problem here is to estimate p in some optimal way. Viewed from the

standpoint of decision theory an estimator; = ;(x;x(n» of p at a

point x € I is required that minimizes the risk

for all P € p. (3.4 )

The s~bol Ep represents mathematical expectation under the assumption

that pen) is the true distribution of x(n), and L(.,.) is a loss functio~

which expresses the loss incurred (monetary or otherwise) when esti-

mating the density as p(x) when, in fact, p(x) is the true density at x.

Rather than think in terms of estimating p at some fixed point x it is

often desirable to think in terms of estimating the entire density,.,

function. Hence the problem becomes one of selecting a p which not

only minimizeS the risk in (3.4) for all P € P, but also, simultanously,

minimizes it for all x € I. Minimum risk estimators are rarely obtain-

ablE! so that the additional complication introduced by this global

notion increases the difficulty in general. This problem can be cir-

cumvented in the following way. Let w be a measure on a which in some

way expresses the intensity of interest of the statistician in the

points of I with reference to the estimation of p. For example, if I

is the real line and w is Lebesgue measure, the intensity of interest

is uniform over the points of I. The problem becomes one of finding a,.,p which minimizes

for all P € p.

To overcome the difficulty of finding estimators which minimize,

for all P € p, the risk function in (3.4) and the integrated risk

function in (3.5) the notions of minimax estimators and B~es esti-

mators have been introduced. The Bayes procedure in the context of

integrated risk, for example, introduces yet another averaging process;

a prior distribution TI on P. In this case the B~es estimator would be

the one that minimizes the B~es integrated risk

R" = SR,,(P)dTI(P).P P p

The approach to this problem of density estimation adopted in this

thesis is classical. The loss function is squared error loss; that is

"L(p(x),p(x))

" 2= (p(x)~p(x)) • (3.7)

The object will be to find unbiased estimators; = ;(x;x(n)), if they

exist, that minimize the risk ~~ction in (3.4) for all (x,p) € I X p.

Note that, with the loss function in (3.7) and the restriction to

unbiased estimators, the risk function in (304) becomes the variance

of p at the point x. It is convenient at this point to give a

definition of an unbiased estimator of a density function. In general

the estimator will depend on a statistic T = T(X(n)); for example,

The notation developed in the previous section concerning

such statistics will be adhered to in the definition below and the

subsequent ones in the remainder of this section.

"" '"Definition 3.1 A real-valued function p = p(x;t) on I X ~ is an

unbiased estimator of a density p € P if it is CL X B measurable and

satisfies

""ET[p(X;T)] = p(x), a.e. IJ., for all p € p.

Note, ET is an abbreviation for Ep •T

Notice that this definition does not require that an unbiased

estimator have the properties of a density, such as, being non-

negative, and integrating to unity. Notice also that the definition

allows for the estimator to be biased on a subset of I with IJ.-measure

zero. This is a technical necessity the reason for which will be made

clear in the proofs of the theorems in the next section. Apart from

this technical detail the above definition has, therefore, a certain

global content; an unbiased estimator of the entire function is

required.

Hence, with minor modifications, the problem is not new. The

search is for minimum variance unbiased estimators of density functions.

The solution to the problem has also not changed; if P .admits a complete

and sufficient statistic then unbiased estimators based on this statistic

will be unique minimum variance unbiased. However, the question of

existence is the fundamental problem that arises. In this connection it

is convenient to consider unbiased estimators of the corresponding

probability measure P. Therefore, the following definition will formal-

ize this notion.

Definition 3.2 A real-valued function P = P(A;t) on a x ~ is an un-

biased estimator of P € P if for each A € a it is a a-measurable

function and satisfies

for all P € p. (3.10 )

Notice that this definition does not require P to be a probability

measure or, in fact, a measure on a for each fixed t. However, in what

follows, it will be assumed that, for each t €~, such estimators are,

in fact, measures on a; and only such estimators will be considered in

the sequel. In the following development it will become clear that it

is desirable to restrict attention to estimators P which are measures.

Definition 3.3 An estimator P = P(A;T) of P € p, which is a measure on

a for each t €~., is said to be absolutely continuous with respect to a

~-finite measure ~ on a if, for every A € a for which ~(A) = 0,

P(A;t) = 0, a.e. PT; that is, except for a set of t-values with

PT-measure zero, for all PT € PTe

A consequence for this definition is that if P is such an estimator

then, by the Radon-Nikodym T~eo.rem, there exists, almost everywhere rT,

an a-measurable, g-unique, non-negative function f(x;t), such that

,..P(A;t) = Sf(x;t)dg(x),

Afor every A € a.

Although it is a-measurable, f(x;t), in general, will not be

a x IB-measurable. The G X IB-measurabili ty of such functions f(x;t)

is necessary for the application of Fubini's theorem in the proofs of

the theorems in the next section. Hence, it will be assumed, in the

sequel, that all such g-continuous estimators P of P have a X B-

measurable Radon-Nikodym. derivatives.

3.4 Existence Theorems

In this section the problems concerning the existence of an

unbiased estimator and a unique minimum variance unbiased estimator

of a density function are solved. A desirable property of these

solutions is that they also provide a method for computing such an

estimator, should it exist.

The terminology, notations, and conventions, introduced in the

previous sections of this chapter, will be adhered to throughout the

remainder of the present section.

Theorem 3.1, below, is concerned with the existence of an unbiased

estimator of a density p which is assumed to be a member of a family of

densities ~ corresponding (with respect to a ~-finite dominating

measure g) to a family r of probability measures P.

Theorem 3.1 An ~~biased estimator or p ~xists if and only if there

exists an unbiased estimator P of P that is absolutely continuous with

respect to the original dominating measure~. Moreover, when such a P

exists, an unbiased estimator of p is given by the Radon-Nikodym deriva-,.,

t " dPJ.ve, ~.

,., ,.,Proof: Let P = p(. ;T) be an unbiased estimator of P that is absolutely

continuous with respect to~. Then, by the unbiasedness property, for

each A € U

= Sp(x)~(x),A

for all P € p. (3.12)

Also, by the Radon-Nikodym theorem, there exists a non-negative function,., ,.,p = p(x;t) such that for each A € u,

,., ,.,P(A;t) = Sp(x;t)~(x),

It follows, on taking expectations (with respect to PT) in (3.13), and

applying Fubini's theorem to the third member of (3.14) below, that,

for each A € u,

,., ,.,P(A) = ET[P(A;T)] = ET[S p(x;T)~(x)J =

,.,SET[P(x;T)J~(x), for allP € p.

A(3.14 )

Then, by the uniqueness part of the Radon-Nikodym theorem, the desired

result follows from (3.12) and (3.14); that is,

ET[P(x;T)] = p(x), a.e. IJ., for all p € ~.

Now suppose that p = p(x;Tj is an unbiased estimator of p; that

is, (3.15) is true. It then follows, on applying Fubini's theorem to

the second member of (3.16) below, that, for each A € G,

P(A) = J ET[p(x;T)]dJJ.(x) = ET[J p(x;T)dJJ.(x)],A A

for all P € P. (3.16 )

Hence, from (3.16), J pdJJ. is an unbiased estimator of P(A) for everyA

A € G, and is clearly absolutely continuous with respect to IJ.. This

completes the proof.

Thus, if an unbiased estimator of P is available that happens to

be absolutely continuous with respect to IJ., it can be immediately

concluded that an unbiased estimator of p exists, and such an esti-

mator is given by the IJ.-derivative of the estimator of P. Further-

more, if there exists a complete and sufficient statistic for P, the

usual methods of the Rao-Blackwell-Lehmarill-Scheffe theory can be used

to obtain a unique minimum variance unbiased estimator of p at every

point x € I. However, the following lemma and theorem give another

method of achieving the same end.

(n-l)Denote by e the class of subsets E of the form A X I with

A an element of G. Thus, e is a class of cylinder sets with bases

in the first coordinate space of I(n). Clearly, e is a sUb-cr-algebra

of a(n). If T is a complete and sufficient statistic for P, the

conditional probability measure pT on a(n) is, therefore, constant in

PEP. Denote by P~ the restriction of pT to the class of sets e,,..

and define an estimator P of P by

,..P(A;T) = P~(E), for all E E e,

(n-l)where E = A X X •

For each t E ~, P~ is a probability measure on e and, therefore,,..P, defined in (3.17), is a probability measure on a for each such t.

The following lemma is now easy to prove.

Lemma 3.1 If T is a complete sufficient statistic for a family P of

proability measures P, then; = P~ is the unique minimum variance

unbiased estimator of P.

Proof: Put B = ~ and identify pen) with Q in (3.3). Then, for every

E(=A X X(n-l)) E e it follows that

E [;(A;T)] = pen) (E) = peA),T

for all PEP. (3.18 )

Hence ; = pi is an unbiased estimator of P and, being a function of T

alone, is, moreover, the unique minimum variance unbiased estimator of

P; and this completes the proof of the lemma.

It should be noted that the P defined above is, in fact, a

probabili ty measure on a for each t E ~. This property has an important

implication in the next theorem which is concerned with both the

existence and computation o~ unique minimum variance unbiased esti-

mators o~ density ~ctions. Note also that e could have been de~ined

with the bases o~ its sets in any one o~ the coordinate spaces o~ l(n).

The notation o~ the previous lemma will be used without ~ther comment.

Theorem 3.2 Let T be a complete and su~ficient statistic for p. A,..

unique minimum variance unbiased estimator p of p E 1:> exists if and

only if ; = P~ is absolutely continuous with respect to the original

dominating measure~. Moreover, if such a p exists it is given by the

Radon-Nikodym derivative, ~e .

Proof: If PT .

= Pe 18 absolutely continuous with respect to~, then, by

the above lemma and theorem 3.1 an unbiased estimator of p is given by,.. ,..

:. But,: is a ~ction of T alone and hence is the unique minimum

variance unbiased estimator of p.

On the other hand, if p is the unique minimum variance unbiased

estimator of p, it must be a ~ction, p(x;T), of T alone. Hence, for

each A E U,

,..P(A) = SETCP(X;T)J~ (x) =

,..ETCSP(X;T)@ (x)J,

Afor all FE 'P..

,..But, by the lemma P = P~ is an unbiased estimator of P and, therefore,

for each A E U,

,. ,.ETCP(A;T)-S p(x;T)~(x)J = 0,

A~or all PEP. (3.20)

Such families of distributions on the real line will be the

But, T is complete, and hence

P(A;t) = J p(x;t)dg(x),A

,. T.Hence, P = Pe :LS absolutely continuous with respect to IJ.. Finally, it

should be noted that if P~ is absolutely continuous with respect to IJ.,

then ~ is, in fact, a proper density function. This completes the

proof.

THence, once Pe has been computed, it is only necessary to check

whether or not it is absolutely continuous with respect to IJ.; and if

it is, the unique minimum variance unbiased estimator of p is obtained

by differentiating P~ with respect to IJ.. For example, if P is a

family of probability measures on the real line with ;; the corres-

ponding family of distribution functions, and IJ. is ordinary Lebesgue

measure, then one needs only to check if F, the distribution function

corresponding to P, is an absolutely continuous function of x. If it

is, the unbiased estimator, at x, is given by the ordinary derivative,.dF(x) •dx

object of study in the next chapter.

)~. APPLICATIONS

4.1 Introduction

In this chapter applications of the general theory presented in

the previous chapter are discussed for families of distributions on

the real line; that is, the basic space is a Borel set of the real

line together with its Borel sUbsets, and the families of densities

and corresponding families of distribution functions are taken to be

Borel-measurable functions. Without exception, the dominating measure

will be either ordinary Lebesgue measure or counting measure.

In the search for an unbiased estimator of a density function the

general method will be to first find an unbiased estimator of the

corresponding distribution I1Lnction and then test it for absolute

continuity. If, then, the distribution function estimator is abso

lutely continuous an unbiased estimator of the density function is

obtained by differentiating this distribution function estimator.

Most of the families of distributions that are considered have the

feature that they admit a complete and sufficient statistic. In these

cases the search for estimators will be restricted to those which are

functions of the complete and sufficient statistic, and then the

unbiased estimators, if they exist, will also be unique and have uni

form minimum variance. Before proceeding to particular examples,

three different methods of finding unbiased estimators of distribution

fUnctions will be discussed. The choice of method is sometimes

dictated by the structure of the particular family of distributions

under consideration, whereas in some situations computational ease is

the only criterion.

Essentially, the remBinder o~ the chapter will be divided into

two main parts; one part dealing ~~th discrete distributions and the

other with families of absolutely continuous distributions. In the

discrete case a solution is given for a one~parameter, power series

type family which includes several of the well-known discrete dis

tributions with possible values on the set (0, 1, 2, ••• }. In the

absolutely continuous case many of the well-known f~~lies of dis

tributions are discussed including a general truncation-parameter

family. Rosenblatt's result, on the non-existence of an unbiased

estimator of a density function that corresponds to an unknown

absolutely continuous distribution function on the real line, is

also re-established as a simple corollary to Theorem 3.2 of the pre

vious chapter.

4.2 Methods

The literature, to date, contains essentially three different

methods for finding unbiased estimators of distribution functions.

Actually, all three methods can be used to find unbiased estimators

of more general parametric functions. These methods are given below.

However, before giving a formal statenlent of each method, a short

discussion will first be given comparing them on the basis of their

scope of applicability and computational ease.

The first method, due to Tate (1959) and Washio, et al (1956),

makes use of integral transform. theory. In general, the application

of this theory does not depend on the existence of sufficient

statistics, although Washio, ~ al do restrict their study to the

Koopman-Pitman family of densities (cf. Koopman. (1936) and Pitman

(1936». Recall that this family, by definition, admits a sufficient

statistic and has range independent of the parameter. On the other

hand, Tate restricts his attention to finding estimators which are

functions of homogeneous statistics and statistics with the trans-

lationproperty for the scale and location parameter families,

respectively.

In what follows interest will be confined to finding unbiased

estimators of distribution functions and in most of the examples

treated these integral transform methods would be somewhat heavy-

handed and the other two methods, described below, are usually eas~~r

to apply.

The second and third methods depend, for their application, on

the existence of sufficient or complete and sufficient statistics.

The second method is a straightforward application of the well-known

Rao-Blackwell and Lehmann-Scheffe theory, and the theorem due to B~u

(1955, 1958) is the connecting link between these two methods. Whe~e-

as the third method is generally the easiest to apply, its application

depends, somewhat artificially, on one being able to find a statistic

with distribution independent of the index parameter of the family of

distributions under consideration. On the other hand, the distribution

problems encountered in applying the Rao-Blackwell-Lehmann-Scheffe

theory are often difficult and laborious.

Throughout the remainder of this chapter the following conventions,

notation, and terminology will be adopted. I will denote a Borel set

of the real line, R; a, the Borel subsets of I; A, ordinary Lebesgue

measure; v, counting measure when I is a finite or infinite sequence

of points, (an: n ~ l}; Pe, Fe' and Pe' the unknown probability

measure, distribution fUnction, and density function, respectively;

wi th e an element of a parameter space ®. In each example it will be

assumed that the estimation procedure will be based on a statistic T

which is a function of n independent observations, x(n) = (Xl

, ••• ,xn

In all the examples considered, T will be a complete and sufficient

statistic. Each example will be characterized by a vector, (Pe,I,®;T}.~ ~ ~ A

Finally, F(x) = F(x;T) and p(x) = p(x;T) will denote, respectively, the,. ,.

estimators of Fe and Pe; and peA) = P(A;T), the estimator of Pe' will be

given by

,. ,.peA) = SdF(x),

Afor all A € a. (4.1)

The integral in (4.1) is, for each t €~, a Lebesgue-Stieltjes

integral but in all examples can be evaluated as a Rieman-Stieltjes

integral.

Method 1. The transform method considered here will not be discussed in

any generality but will be illustrated by the work of Tate in section

four of his paper where he applies the Mellin transform to find unbiased

estimators of arbitrary fUnctions of scale parameters. The Mellin trans-

form of a fUnction cl(X), when it exists, is defined by

00M[rr(x);s] = J XS-~(x)~~,

If &(s) is such a transform the inverse transform will be denoted by

(4.3 )

Let p(x) be a known, piecewise continuous density that vanishes

for negative x. Consider the following scale parameter family, defined

Pe (x) = ep( ex), o < e < 00. (4.4)

Tate considers the problem of estimating ep(ez) for a fixed value z.

He considers only those estimators that are functions of non-negative

homogeneous statistics; that is, H is such a homogeneous statistic of

degree Ol .;. 0 if,

(i) H: [O,oo)(n) ---i> [0,00),

(ii) H(8x(n) = eOlH(x(n)), 0 < e < 00.(4.5)

When x(n) is a simple random sample from a member of the family

defined in (4.4), H(X(n)) has a density of the form eOlg(eOlx). Thus an

""unbiased estimator p(z ;H}, if it exists, is a solution of the integral

equation

Sp(z;h)eag(eah)dh = ep(ez). (4.6)o

Assume that both g(x) and ep(ez) have Mellin transforms. Then, if

there exists an unbiased estimator of ep (ez) with a Mellin transform, it

will be given by

p(z;H) = g M-l [ M[p(x);a(s-l)+l] .1 ]H a(s-l)+L_[ () ]'H •z 1M g x ;s

Tate uses the Laplace and Bilateral Laplace transforms to find

unbiased estimators of functions of location parameter families.

Method 2. This method, as noted in the discus.sion above, is a straight

forward application of the Rao-Blackwell-Lehmann-Scheffe theory. To

find the unique minimum variance unbiased estimator of Fe at a fixed

point x a simple unbiased estimator is first found which when condi-

tioned on the complete and sufficient statistic yields the required

result. Usually, the simple estimator is

where I A denotes the indicator function of the set A = (-=,x] and Xl is

the first observation from a sample of n. Thus, if T is a complete and

sufficient statistic for e the minimum variance unbiased estimator of

Fe(x) is given uniquely by

F(x;T) ~ P[Xl ~ x\T].

Clearly, in the discrete case the minimum variance 'UIlbiased esti-

mator of the probability mass function can be obtained directly; abso..

lute continuity of the distribution function estimator with respect to

counting measure is guaranteed. Hence, the required density estimator

in the discrete case is given by

,.,p(x;T) ~ P[Xl ~ x\T],

at the point x.

(4alO)

Method 3. This method, due to Sathe and Yarde (1969), depends, for its

application, on a theorem of Basu (1955, 1958). The original result

will be specialized as follows. Denote by U the simple estimator given

in (4.8) and suppose T is a complete and sufficient statistic for e.

If there exists a function V ~ V(Xl,T) such that,

(i) it is stochastically independent of T,

(ii) it is a strictly increasing function of Xl for fixed T,

then the minimum variance unbiased estimator of Fe at x is given

uniquely by

V(x,T)SdH(U) if a < V(x,t) ~ b,

,..F(X;T) =

V(x,T) > b,

V(x,T) ~ a,

(4.11)

where H is the distribution fUnction of V and satisfies

H(x) =

o for x ~ a,

1 for x > b.

The proof is immediate by noting

,..F(X;T) = P[Xl ~ x\T]

= P[V ~ V(x,T)\T] by (ii)

= P[V ~ V(x,T)] by (i).

The important step in the application of this theorem is the

selection of a suitable statistic V. The following tneorem, due to

Basu (1955, 1958), gives sufficient conditions for V to satisfy (i) •

Theorem 4.1 If T is a boundedly complete and sufficient statistic for

e and V is a statistic (Which is not a fUnction of T alone) that has

distribution independent of e then V is stochastic~ly independent of T.

Therefore, once V is found and its distribution has been derived,

the required estimator is given by (4.11).

4.3 Discrete Distributions

In this section method 2 will be used to find unique minimum. variance

unbiased estimators of discrete-type density functions. Included is a

general expression for an estimator for a wide class of discrete distri-

butions. These distributions, first introduced by Noak (1950) and later

studied in detail by Roy and Mitra (1957), include many of the well-known

discrete distributions.

ExAAWle 4.1: The Binomial Distribution. This class of distributions,

in the notation introduced in section 4.2, is characterized by the vector

where k is a known positive integer.

The distribution of T is given by

(4.13 )

t E {O, 1, ... , nk}; (4.14)

and the conditional distribution of x(n) given T = t is

where S (t) = [x(n) e x(n):n

nE x. = t].

. 1 ~~=

Hence,

;(x;t) = E [.~ (X~)/(~)J, x ~ [0, 1, ••• , min(t, k)},J.=l J.

(4.16)

where the summation in (4.16) is over the set

[x(n): x(n-l)~ S let-x), x = x}. This expression simplifies to given- n

the required estimator

;(X;T) =(~)(~::)/(~), x ~ {O, 1, ••• , min(T,k)}.

It is interesting to note that the binomial distribution is esti-

mated by a hypergeometric distribution with a range that has random

end-point, min(T,k).

Example 4.2: The Noak Distributions. These distributions are character..

ized by the vector

n{a(x)ex/f(6), (0, 1, ... ), (0, c); E X.},

• 1 J.J.:::

where c is finite or infinite, a(x) > 0, a(O) ::: 1 and

f(e) = E a(x)ex•x=o

(4.18)

The Poisson, negative binomial and logarithmic distributions occur

as special cases of the above. They will be considered separately to

illustrate Theorem 4.2 below.n

Roy and Mitra proved that T = E X. is a com;plete and sui'ficienti=l ~

statistic for e and has distribution

t/ nC(t,n)e f (e), te[O,l, ••• }, (4.20)

nC(t,n) = E .IT a(x.).

. 1 ~~=

(4.21)

The summation in (4.21) is over the set S (t) = [x(n) € len): ~ x. :: t}.n i=l ~

The above results and notation will be used in the statement and proof of

the following theorem.

Theorem 4.2 The minimum variance unbiased estimator of a Noak density

function is given by

,. Cp(x;T) :: a(x)~~~~ , X e (0, 1, ••. , T). (4.22)

Proof: The conditional distribution of x(n) given T = t is given by

i~la(xi )/C(t,n), x(n) € S (t).n

(4.23 )

Hence,

p(x,t)n::: En a(x. )/C(t,n),

. 1 11=

X € (0, 1, ..• , t}, (4.24)

where the summation is over the set (x(n): x(n-l) € Sn_l(t-x), xn

::: x}

t~~ expr~ssion in (4.24) simplifies to give the required estimator.

The above result will now be illustrated for the Poisson, negative

binomial and logarithmic distributions.

The Poisson Distribution. For this distribution,

Therefore,

X € (0, 1, ... , T}.

Hence, the Poisson distribution is estimated by a binomial distri-

bution with parameters l/n and T.

The Negative Binomial Distribution. For this distribution,

a(x) ::: ( k+Xx-l); f(e) = ()-k () (nk+t-l)1-8 ; c ::: 1; C t,n::c t '

where k is a positive integer. Therefore,

(4.27)

38,.p(x;T) X € (0, 1, ••• , T}. (4.28)

Note that the minimum variance unbiased estimator for the geometric

distribution is obtained by putting k = 1 in (4.28). For fixed T = t,,.p(x;t) has the following interpretation. Suppose there are nk cells into

which t indistinguishable balls are placed at random. If it is assumed

that all of the (nk~t-l) distinguishable arrangements have equal proba-,.

bilities, then p(x;t) is the probability that a group of k prescribed

cells contain a total of exactly x balls.

The Logarithmic Distribution. For this distribution,

a(x) = l/(x+l); f(a) = -log(l-e)/e; c = 1; C(t,n)n

= Z IT l/(x.+l), (4.29)i=l 1.

where the summation is over the set S (t). No closed expression forn ,.

C(t,n) is known by the author and, hence, p(x,t) does not have an obvious

probabilistic interpretation.

4.4 Absolutely Continuous Distribution

In this section some of the well-known families of Lebesgue-continuous

distributions will be discussed. All of these families admit a complete

and sufficient statistic so that only unique minimum variance unbiased

estimators.of the distribution functions will be considered. The esti-

mators of these distributions functions have all previously been derived

elsewhere and, therefore, only a few of these will be re-established to

illustrate the methods of section two. The remainder of the results will

be stated and checked for absolute continuity to decide whether or not

unbiased estimators of the corresponding density functions exist. Also,

Rosenblatt's result is re-established as a simple corollary to Theorem

Example 4.3: The Class of all Absolutely Continuous Distributions. In

this example F(x) is an arbitrary absolutely continuous distribution

function with corresponding density p(x). It is well known that the

order statistic T = (X(l)' X(2)' ••• , X(n)) is complete and sufficient

for this family and that the minimum variance unbiased estimator of F

is given uniquely by the empirical distribution function

,.,F(x;T)

n= l: I

A(X. )/n,

i=l i 1.

X E R, (4.30)

where IA. is the indicator function of the set A. = [X. ~ x]. Clearly,1.,., 1. 1.

for each fixed T = t, F(x;t) is not absolutely continuous and hence, by

Theorem 3.2, there is no unbiased estimator of p(x) for any sample size n.

Example 4.4: The Normal Distribution. This distribution is character-

ized by the vector

n 2l: (X. -X) )}.

. 1 1.1.=

(4.31)

As well as (c), the complete family, two sub-families will be discussed;

namely, (a)62 = 1, and (b)el = O. For simplicity, write T/n = (U,W);

that is,

nU:::: L; X./n,

i::::l ~

n 2w:::: Z (X.-U) In.

i::::l ~

(4.32)

(a) 81 unknown, 82 :::: 1. Here, U is the complete and suff~cient statistic.

Moreover, V :::: Xl~U has a normal distribution with (81,e2) :::: (0, (n-l)/n),

and the function x-u is monotonic increasing in x for fi~e:d u. Hence,

method 3 ~plies and the required minimum variance unbi~ed estimator of

F is given uniquely by

F(x;U)x-U ~:::: J (2n(n-l)/n)~exp[-ns2/2(n-l)]ds.-eo

(4.34 )

It should be noted that, for n ~ 2, F is absolutely continuous and,

therefore, the minimum variance unbiased estimator of the density function

exists for n ~ 2 and is given by

;(x;U) :::: [2n(n-l)/n]""*exp[-n(x-u)2/2(n-l)], x € R.

When n :::: 1 F is not absolutely continuous and is, in fact, the

empirical distribution function. It will become apparent in several of

the examples that the existence or non-existence of unbiased estimators

depends on sample size. The smallest sample size m for wPich an unbiased

estimator exists will be called the degree of estimab}lity~ In example

4.3, m ::; CQ and in the present example, m = 2. The value m = 2 in this

example could be given the following interpretation: one 'degree of free~

dom' for est~mating the empiric~ distribution function and one 'degree

of freedom' f9r estimating 91

• A~tual1y, e1 does have one degree of

estimability. Finally, it spould be observed that the unbiasedness of

p(x;U) in (4.35) can easily be verified directly by integration.

(0) 91

~ 0, 92 unknown. Here, the complete and sufficient statistic is

n 2S = I: X.•

i=l 1.

Folks, !l!:b using method two, obtained the following estimator:

(4.36)

F (x;S) =

, Gn_1

( (n-l)'x/ (S_x2 )1],

x IS:~I,

...si < x ~ si,x > si,

where G l(u) is Student's t-dis~ribution function with (n-l) degrees ofn"l' ,.,freedom. Clearly, F is absolutely continuous for n :<!: 2 and, hence, the

required minimum variance unbiased estimator of the density function is

given uniquely by

p(X;S) =

0, elsewhere,

where B(a,~) is the complete Beta function.

(4.38)

It is interesting to compare the estimators in cases (a) and (b).

The estimator in (4.35) is strictly positive for all values of x and any

fixed T = t, whereas, the estimator in (4.38) is zero outside the inter

val [-t*,t*J. In both cases, of course, the original density functions

are positive everywhere for all values of the unknown parameters. In

case (b), m = 2 and in case (a), m = 2. The value, m = 2, in case (b)

can be decomposed into one 'degree of freedom' for 92 and one degree for

the empirical distribution function.

(c) 91

unknown, 92, unknown. In this case the complete and sufficient

statistic is T = (nU,nW). Using method 3 of section two and the fact

that V = (Xl-U)/~ has distribution

0, elsewhere.

Sathe and Yarde obtained, in a slightly different form, the estimator

F(x,T) =

x < U - [(n-l)w]i,

x > U + [(n-l)w]i.

Clearly, F is absolutely continuous for n ~ 3 and the unique minimum

variance unbiased estimator p of the density £'unction is obtained by

replacing x with (x-U)/wi in (4.39). It should be noted that, once

again, the estimator p is zero outside a finite interval for any fixed

value of T. The degree of estimability m = 3; two 'degrees of freedom'

for (el ,e2) and one for the empirical distribution £'unction. It

appears that the degree of estimability equals the degree of estima-

bility of the usual indexing parameter plus one degree for the empiri-

cal distribution £'unction.

Example 4.5: The Truncation Distributions. This family of distri-

butions includes two types; and the characterizing vector for each type

~s given by

(e<x<b),

(a < x < e),

where the interval (a,b) is either finite, semi-infinite or infinite and,

kl and hl and k2

and h2 have the obvious relations

bl/kl(e) = Shl(x)dX, for all e € (a,b);

(4.43 )

2(e) =Sh2(x)dx, for ~l e € (a, b).

a(4.44)

The statistics X(l) and X{p) are the smallest and largest order statis

tics, respectively. Tate used essentially method 3 to obtai~ the follow-

ing distribution function estimators for the Type I and Type II families

defined above:

a < x < X(l) < b,

" xType I F(X;X(l) ) = S hl(u)dU (4.45)

X1: + (1.1:) (1) a < X(l) < x < b.n n b

,S hl(u)duX(l)

S h2(u)du

(l-~ i' , , a < x < X(n) < b," S(n)h (u)du

Type II F(X,X(n) ) = ~ 2 (4.46)

1, a < X(n) < x < b.

Both of the above est~mators are mixed distributions with a jump

of lin at x =X(l) in the first case and at x =X(n) in tQe second case.

Hence by Tneorem 3.2 no minimum variance unbiased estimator of the

density function exists for either type. Two examples of these distri-

butions will now be given to illustrate the above estimators.

The ~areto Distribution. This distribution has a Type I density with

kl(e) ~ s, ~(x) ~ 1/x2, a = 0 and b =~. The estimator in (4.45)

takes the form,

0, 0 < x < X(l) < ro ,

F(X;X(l)) = (4.47)

l/n + (l-l/n)(l-X(l(x), 0 < X(l) < x < ro •

Note that if n = 1, F in (4.47) reduces to the empirical distribution

function.

The Uniform Distribution. This is a Type II distribution with

k2(S) = l/e, h2(X) = 1, a = 0 and b = +roo The estimator in (4.46) takes

the form,

(n-l)X/nX(n)' o < x < X(n) < ro ,~

F(X,X(n) ) = (4.48)

1, o < X(n) < x < ro ••

The following figure illustrates the estimator in (4.48).

~---------~----------------x1n)

Figure 4.1 The U.M.V.U.E. of (x/e)I(o,e).

This example serves to illustrate a point that emphasizes the

subtle difference between the work in this thesis and the usual point

estimation problem. It is simple to show that if n ~ 2, (n-l)/nX(n)

is the unique minimum variance unbiased estimator of lie and yet there

exists no such estimator for the density

p (x) =e

°< x < 9,

elsewhere.

The point is that in the first instance all that is required is an

unbiased estimator of the parametric function lie; and in the second

case an unbiased estimator is required of the function specified in

(4.49). Thus, the notion of a uniform (in x) unbiased estimator does

create a different estimation problem.

The Two-Parameter !!ponential Distribution.i ... .

Tp,e cha,raG-

terizing vector for this family of distributions is

Tate obtained the following distribution function estimator using

method 1:

F(X;T) =0, x < X(l)'

(1\ X-x Cl »)n-2

1- l-ii/(1 y - -. , X(1) < x < X(1)

1, x > X(l) + Y,

+ Y, (4_~1)

nwhere Y = E Xi-pX(l)

i=l ..Clearly, F is not absolutely continuous but is a mixed distributio~

with mass 1/n at x =X(l)- Hence, no unbiased estimator of the density

function exists for this family_ However, if 81

is known to be equal tQ

zero (say) then the estimator for this sub-family is obtained by putting

X(l) = 0; and changing the exponent to n-l and (l-l/n) to 1 in (4.51)

This estimator is ~bsolutely continuous and, therefore, the required-density estimator is given by

..p(x;T) = C4-52)

0, elsewhere,

where T = ~ X.. The degre~ at estimability in this case is two; one. 1 ~~=

degree for e2

and one degree for the empirical distribution function.

The wor~ in this thesis is concerned with the problem of unbiased

estimation of probability density functions. The general model con~

sidered is that of a ~-dominated family r of probability measures P on

..a o--fini te measure space

(n) ( )X ~ Xl' X2, ••• , Xn

(x,a,~). Given n observations

on a random variable X (with values in I.) the

object is to find esti~ators p o! p

corresponding to P - such th~t

the probability density function

,.Ep[p(x)] = p(x), for all PEr and all x € x..

The uniform in x property of such estimators p is the feature that dis.

tinguishes this problem from the usual problem of point estimation.

Here, an estimator of a function is sought.

In ~hapter three definitions of unbiasedness for both density

function and probability measure estimators are given. After giving a

suitable definition of ~-continuity for an estimator P of P two theorems

are established that give necessary and sufficient conditions that

guarantee the existence of unbiased estimators p of p. It is shown in

Theorem 3.1 that such a p exists if and only if there exists an unQiased

estimator P of P that is absolutely continuous with respect to the

original dominating measure~. Moreover, if such a P exists then it is"

shown that the Radon-Nikodym derivative ~ is an unbiased estimator of p.

Theorem }.2 gives a strqnger result if there exists a complete and

sufficient statistic T for r; a unique minimum variance unbiased esti~liLtQJ'

of p exists if and only if P~ ~s absolutely cQntinu~us with respect to,. dP

~. Moreover, if s~ch a p exi~t" it is give~ by dge which is also a

probability density function for almc;>st all (PT) t. (Here, P~ is the

conditional probability me~ure fT o~ a(n), ~iven T, restricted to the

sub-~-algebrae of a(n) whose set$ ~e ~ylinders with bases in a fixed

but arbitrary coordinl:l.te ~pace of X(n». H;ence, for ~ particular

family of distributions if i~ is possible to derive an estimator of P

which is absol~tely continuous w:f,tll respect to ~ then an unbiased

esti~ator of p ~ists apd is obtain~d by d~fferentiating the probability

measure estimator with res~ect to ~.

rn chapter four the ge~e~al theory is sp~cial~zed to well-known

families of distributtons on the real line that are either absolutely

continuous ¥ith respect to ord~na~y ~ebesgMe measure or absolutely

continuous with respect to counting m~asure on a se~uence of points.

In the Lebesgue-continuous case ~t is shown, tor example, that for the

normal distribution with \Ulknown mean and variance 'Ghat if the sample

size exceeds pr equals three then ~ minimum varianoe unbiased estimator

of the normal density function ~ists. This estimator has some inter-

esting propepties when co~pare~ with the oorresponding maximum likeli-

hood estimator obtained by substituting the maximum likelihood esti-

mators of the parameters for t~e par~eters in the functional form of

the normal density. For examp~e, for an~ fixed value of the sufficient

statistic the esti~tor ~s ~ero outside a fin~te interval (cf. the maxi

mum likelihood estimator). ~t is also interesting to note that the

normal density function is estimated by a St~dent's t-distribution,

whereas, the likelihood principle demands that the estimator should also

be a normal distribution. Examples of minimum variance unbiased esti

mators of distribution functions which are not absolutely continuous

are the family of all Lebesgue-continuous distributions on the real line

and the uniform distribution on (o,e). Estimators for the binomial dis

tribution and a general power series type distribution are obtained to

give examples of the procedures for discrete distributions.

Interesting sample size considerations are observed in many of the

examples illustrating the absolutely continuous case. For example, in

the normal distribution it has already been noted that at least three

observations are required to obtain an unbiased estimator. The maximum

likelihood procedure, on the other hand, requires a minimum of only two

observations. It is apparent in several of the examples that the degree

of estimability can be decompo~ed into one degree for the empirical dis

tribution function and the degree for the index parameter in the func

tional form of the density. The degree for the parameter is the smallest

number of observations that are required to have an unbiased estimator

of that parameter.

As in any area of research some problems still remain to be solved•

Some of these are:

1. Find conditions that guarantee the existence of Bayes, minimax

and invariant estimators of densityfunctionso

2. Find equivalent conditions on p that guarantee P~ to be abso

lutely continuous with respect to ~.

3. Compare, for example, maximum likelihood estimators with

unbiased estimators of density functions, when the latter exist.

4. The likelihood ratio is often used to partition the sample

space in the construction of decision rules. In discrimination proce

dures, for example, it is sometimes necessary to estimate the likeli

hood ratio and hence the resultant partition of the sample space. A

study of the properties of such procedures when unbiased estimators of

density functions are used to estimate these partitions would be inter

esting•

6. LIST OF REFERENCES

Anderson, G. D. 1969. A comparison of methods for estimating aprobability density function. Ph.D. thesis submitted tograduate ;faculty of the University of Washington.

Bartlett, M. S. 1963. Statistical estimation of density functions.Sarikya 25:245-254.

Barton, D. E. 1961. Unbiased estimation of a set of probabilities.Biometrika 48:227-229.

Basu, A. P. 1964. Estimates of realiabi1ity for some distributionsuseful in life testing. Technometrics 6:215.~19.

Basu, D. 1955, 1958.cient statistic.

On statistics independent of a complete suffiSankya 15:377-380; 20:223-226.

B+Ullk, H. D. 1965.applications.

Conditional expectation given a ~-lattice andAnn. Math. Stat. 36:1339-1350.

Cacoul10s, T. 1966. Estimation of a multivariate density. Ann.Inst. Stat. Math. 18:178-189.

Cencov, N. N. 1962. Evaluation of an unknown distribution densityfrom observations. Soviet Mathematics 3 :1559-1562.

Elkins, A. T. 1968. Cubical and spherical estimation of multivariateprobability density. J. Am. Stat. Assoc. 63:1495-1513.

Fix, E., and Hodges, J. L. Jr. 1951. Nonparametric discrimination:consistency properties. Report Number 4, Prqject Number 21-49-004,USAF School of Aviation Me<U.cine, Randolf Field, Texas, February,1951.

Folks, J. L., Pierce, D. A., ~d Stewart, C. 1965. Estimating thefraction of acceptable product. Technometrics 7:43-50•

Kolmogorov, A. N. 1950. Unbiased estimates. Izvestia Akad. Nauk SSSR,Seriya Matematiceskaya 14:303-326. Amer. Math. Soc. TranslationNo. 98.

Koopman, B. o. 1936. On distributions admitting sufficient statistics,Trans. Amer. Math. Soc. 39:399-409.

Kronmal, R., and Tarter, M. 1968. The estimation of probability densitiesand cumulatives by Fourier series methods. J. Am. Stat. Assoc.63: 925-952.

Laurent, A. G. 1963. Conditional distribution of order statistics anddistribution of the reduced i-th order statistic of the exponentialmodel. Ann. Math. Stat. 34:652-657.

Lehmann, E. L. 1959. Testing Statistical Hypotheses. Wiley, New York.

Loftsgaarden, D.O., and Quesenberry, C. P. 1965.estimate of a multivariate density function.36:1049-1051•

A nonparametricAnn. Math. Stat.

Map.ija, G. M. 1960. The mean square error of an estimator of a normalprobability density. Trudy Vycial Centra .Ak.ad. Nauk Gruzin. S.S.R.1:75-96·

Murthy, V. K. 1965. Estimation of probability density. Ann. Math.Stat. 36 :1027-1031.

Murthy, V. K. 1966. Nonparametric estimation of multivariate densitieswith applications. Proceeding of an International Symposium onMultivariate Analylsis. Academic Press, New York.

Nadaraya, E. A. 1965. On non-parametric estimates of density .functionsand regression curves. Theory of Probability and Its Applications.10:186...190.

Noak, A. 1950. A class of random variables with discrete distributions.Ann. Math. Stat. 21:127-132.

Parzen, E.mode.

1962. On estimation of a probability density function andAnn. Math. Stat. 33:1065-1076.

Pitman, E. J. G. 1936. Sufficient statistics and intrinsic accuracy.Proc. Camb. Philos. Soc. 32:567-579.

Pugh, E. L.case.

1963. The best estimate of reliability in the exponentialOpns. Res. 11:57-61.

.. Robertson, T. 1967. On estimating a density which is measurable withrespect to a ~-lattice. Ann. Math. Stat. 38-482-493.

Rosenblatt, M. 1956.density function.

Remarks on some nonparametric estimates of aAnn. Math. Stat. 271.832-837.

Roy, J., and Mitra, S. 1957. Unbiased minimum variance estimation ina class of discrete distributions. Sankya 18:371-378.

Sathe, Y•.. S., and Yarde, S. D.estimation of reliability.

1969. On minimum variance unbiasedAnn. Math. Stat. 40:710-714.

Schwartz, S. c. 1967.orthogonal series.

Estimation of probability density by anAnn. Math. Stat. 38:1261-1265.

Tate, R. F. 1959. Unbiased estimation: functions of location andscale parameters. Ann. Math. Stat. 30:341-366.

Washi0, Y., Morimoto, H., and Ikeda, N. 1956. Unbiased estimationbased on sufficient statistics. Bull. Math. Stat. 6:69-94•

Watson, G. S., and Leadbetter, M. R. 1963. On the estimation ofthe probability density. I. Ann. Math. Stat. 34:480-491.

Wegman, E. J. 1968. A note on estimating a unimodal density.Institute of Statistics Mimeo Series, No. 599.

Whittle, P. 1958. On the smoothing of probability density functions.J. R. Stat. Soc. (B) 20:334-343.

Woodroofe, M. 1967.Ann. Math. Stat•

On the maximum deviation of the sample density.38-475-481.

on unbiased estimation of density functionsboos/library/mimeo.archive/isms_1970… · density...

Documents

unbiased estimation

interpreting metabolomic profiles using unbiased pathway

unbiased lidar data measurement (draft) - asprs

towards unbiased artificial intelligence: literary …

unbiased recursive partitioning: a conditional inference...

the bias of the unbiased estimator

culturally unbiased assessment

nearly weighted risk minimal unbiased estimation"

contributions to unbiased diagrammatic methods for

mcconnellhonda.org auto financing 101 unbiased

disector (programme) module for unbiased estimation of...

unbiased generative semi-supervised...

unbiased estimation - simon fraser...

unbiased recursive partitioning: a conditional inference...

falcon: fast and unbiased reconstruction of high-density...

extension information based on unbiased

upgraded thermistor for unbiased mesurments … thermistor...

unbiased writing

unbiased lambdamart: an unbiased pairwise learning-to-rank...

an unbiased opinion