a note on bias robustness of the median

6
ELSEVIER Statistics & Probability Letters 38 (1998) 363-368 STATISTICS& PROBABILITY LETTERS A note on bias robustness of the median 1 Zhiqiang Chen * Department of Mathematics, William Paterson University, Wayne, NJ 07470, USA Received March 1997; received in revised form November 1997 Abstract In this article, it is proved that the median is a minimax bias functional (with respect to many distances including the Kolmogorov distance) among all location equivariant functionals if the distribution of interest is symmetric and unimodal. This is a parallel result of Huber's well-known result (1964). We also proved that the median is no longer a minimax bias functional with respect to several definitions of bias including the contamination bias if the symmetry assumption is violated. (~) 1998 Elsevier Science B.V. All rights reserved A M S classification: primary: 62F 10; 62G35 Keywords: Bias robustness; Median; Minimaxity I. Introduction For a parametric family of distributions {F0(x) = F(x - 0)}, and for a location estimator T, an e-contamination- bias functional is defined by br(e,,Fo) = sup IT((I - E)F0 + eG) - 01, G where G is a c.d.f., e c (0, 1). Under the condition that F is absolutely continuous and symmetric at the origin with a density decreasing away from the center, Huber (1964, 1981) showed that the median is a minimax contamination-bias functional among all location equivariant functionals. That is, it is the best-location estimator among all location equivariant estimators in the sense that for all T being location equivariant, we have median = arg minT bT(e,,Fo) for each E E (0, 1). Recall that the contamination discrepancy between two probability measures P and Q is defined by dc(Q,P) = inf{e,~>0 : Q = (1 -e)P + e,R}, where R is a probability measure. Let by(e) be the total variation gauge as in Donoho and Liu (1988), i.e., bv(e) = SUPd(&,F,,)<,~ It/-- 0 l, where the distance d is the total variation distance. For any location equivariant * E-mail: [email protected]. I Partially supported by a grant from the Center for Reseamh, College of Science and Health, William Paterson University. 0167-7152/98/$19.00 (~) 1998 Elsevier Science B.V. All rights reserved PH S0167-7 152(98 )00049-2

Upload: zhiqiang-chen

Post on 02-Jul-2016

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: A note on bias robustness of the median

E L S E V I E R Statistics & Probability Letters 38 (1998) 363-368

STATISTICS& PROBABILITY

LETTERS

A note on bias robustness o f the median 1

Zhiqiang Chen *

Department of Mathematics, William Paterson University, Wayne, NJ 07470, USA

Received March 1997; received in revised form November 1997

Abstract

In this article, it is proved that the median is a minimax bias functional (with respect to many distances including the Kolmogorov distance) among all location equivariant functionals if the distribution of interest is symmetric and unimodal. This is a parallel result of Huber's well-known result (1964). We also proved that the median is no longer a minimax bias functional with respect to several definitions of bias including the contamination bias if the symmetry assumption is violated. (~) 1998 Elsevier Science B.V. All rights reserved

A MS classification: primary: 62F 10; 62G35

Keywords: Bias robustness; Median; Minimaxity

I . I n t r o d u c t i o n

For a parametric family o f distributions {F0(x) = F(x - 0)}, and for a location estimator T, an e-contamination- bias functional is defined by

br(e,,Fo) = sup IT((I - E)F0 + eG) - 01, G

where G is a c.d.f., e c (0, 1). Under the condition that F is absolutely continuous and symmetric at the origin with a density decreasing away from the center, Huber (1964, 1981) showed that the median is a minimax contamination-bias functional among all location equivariant functionals. That is, it is the best-location estimator among all location equivariant estimators in the sense that for all T being location equivariant, we have median = arg minT bT(e,,Fo) for each E E (0, 1).

Recall that the contamination discrepancy between two probabili ty measures P and Q is defined by

dc(Q,P) = inf{e,~>0 : Q = (1 - e ) P + e,R},

where R is a probabili ty measure. Let by(e) be the total variation gauge as in Donoho and Liu (1988), i.e., bv(e) = SUPd(&,F,,)<,~ It/-- 0 l, where the distance d is the total variation distance. For any location equivariant

* E-mail: [email protected]. I Partially supported by a grant from the Center for Reseamh, College of Science and Health, William Paterson University.

0167-7152/98/$19.00 (~) 1998 Elsevier Science B.V. All rights reserved PH S0167-7 152(98 )00049-2

Page 2: A note on bias robustness of the median

364 Z. Chen / Statistics & Probability Letters 38 (1998) 363 368

estimator T, He and Simpson (1993) established a global lower bound for the contamination-bias functional as follows:

br(e, Fo)~ 2 v ~ .

As a consequence, they re-established Huber's result by showing that the above lower bound is attained by the median when the distribution is symmetric and unimodal. But, a natural question remains: is the median a minimax bias estimator without the symmetry assumption? At the end of this note, we will provide a negative answer to this question.

Besides the contamination bias, one can also consider a more general bias functional as in Donoho and Liu (1988). Let H be the space of all probability measures (or distributions) on ~, and let cbC ~ be an unbounded open set which forms the space of all possible location parameters. Note that in this article, we will not distinguish a distribution function with the corresponding probability measure. For any distance or discrepancy d on H, the e-neighborhood-bias functional of a estimator T evaluated at F E H is defined by

b(T,d,e ,F) = sup I T ( G ) - T(F)[. {G:d(G,F)<~:}

There are many available distances or discrepancies on the space H, the contamination bias is only a special case in this definition. Notice that the contamination discrepancy is not a distance. A theoretically more appealing definition for bias is to use some true distance in the above definition. For instance, one can use the total variation distance d, that is, for two probability measures P and Q, d(P,Q) = sup I P ( A ) - Q(A)[, over all measurable sets A. In view of the fact that the sample distribution is not "close" to the underlying distribution under the total variation distance, a good choice for distances is to take the supremum over a V-C class of sets (Vapnik and Cervonenkis, 1971 ) instead of all measurable sets, then the empirical process theory guarantees the "closeness" between the sample distribution and the underlying distribution. In one dimension, the family of all half-lines and the family of sets {(a,b]} are examples of V-C classes which give rise to the Kolmogorov distance and the Kuiper distance, respectively.

When a distance on the space of all probability measures, and hence the resulting bias functional, is given, a natural question arises: is the median a minimax bias estimator among all location equivariant estimators? Taking the same approach as in He and Simpson (1993), we will use a global lower bound of Donoho- Liu (1988) to establish a robustness result parallel to that of Huber's. That is, under the same conditions as those in the Huber's result, the median attains the global lower bound for the biases, with respect to distances including the Kolmogorov, the Kuiper and the total variation distances. Therefore, the median is the best-bias robust estimator of all location equivariant estimators. We point out that the median is no longer a minimax-bias functional when the distribution of interest is asymmetric.

2. Results

Donoho and Liu (1988) (Propositions 4.2 and 4.3) established a global lower bound for functional biases with respect to a variety of neighborhoods. A particular version of the global lower bound, which can be proved by an easy triangular inequality argument, is stated below as a lemma. For F E H, define Fo = F ( . - O) and b(2e, Fn) = suP{o:d(F,,,F,,)<~2~} IT(F,) -- T(Fo)[, where T is a location estimator, and distance d is of the form d(F, G) = supH IF(H) - G(H)[, where the supremum is taken over a class of sets {H}. Notice that such distances include the Kolmogorov, the Kuiper and the total variation distances.

Lemma 1. sup{o:d(F,i, Ft,)<~2e } b(T,d,e, Fo)>>- ½b(2e, F,1).

Page 3: A note on bias robustness of the median

Z Chen / Statistics & Probability Letters 38 (1998) 363 368 365

Also, if T is location equivariant, wlog, let T(F) = O, then T(Fo) = 0 and we have

b(T, ~, F ) >~ ½b(2e),

where b(2e) = supa(F,,,F)~<2 ~ Ir/[.

Next, we establish bias bounds for the median (which is denoted Med).

Proposition 2. For an absolutely continuous c.d.f F, we have: 1. {Med(G) • d(G,F)<<.~} C_[F-1(½ - ~ ) , F - l ( 1 + e)], where d can have different choices including the

Kolmogorov distance, the Kuiper distance and the total variation distance. 2. {Med(G):dc(F,G)<~e} C_[F-'(½ - ~/2(1 - e ) ) , F - l ( 1 + e,/2(1 - ~))].

In both cases, the end points are attainable.

Proof. In part one, for a c.d.f. G such that d(G,F)<~e, let y = Med(G) , by the definition of the median, both 1 1 1 and G(y_)<~ ½. If F(y) > ~ + e., we have F(y_) > i + e, hence, F ( y _ ) - G(y_) > ~, therefore G(y) >~

1 d(G,F) > e. On the other hand, if F(y) < ~ -- E, then G(y) - F(y) > E, hence d(G,F) > ~:, which implies

that {Med(G) 'd (G,F)<~} C [ F - I ( I - e . ) ,F - l (1 + ~)].

Moreover, the end points are attainable. For instance, let 0 -- F - 1 ( I + e), define

, 1_ F(x) i fx~<0, G(x) = I + 2e

F(x) - e + E¢~20 if x > 0.

Then one can verify that d(G,F)<~e, and Med(G) = 0. Therefore, the proof for part one is completed. In the second part, for a c.d.f. G such that de(F,G)<~e, since F(Med(G))~>(1 - e ) - l ( G ( M e d ( G ) ) - ~)>~

' - g / 2 ( 1 - e , ) , we get M e d ( G ) ~ > F - ' ( I - e / 2 ( 1 - e ) ) . Similarly we can show that ( l - ( I - ;0 --- F(Med(G)<~F-' (½ + e/Z(1 - ~,)). The bounds are attainable, for instance, let H = (1 - e)g + ~6_, with

z > F - ' (½ + c/2(1 - e)), then M e d ( H ) = F - ' (½ + e2(l - ~)). Hence, the proof is completed. []

As direct consequences of the previous proposition, one get,

Corol lary 3. For an absolutely continuous c.d.f F, we have

b(Med, d,e,F) = max { F - ' (1 + e) - F - l ( I ) , F -1 (½) - F - ' ( I - g )} ,

and

bMed(~,,F) = max F - l + 2(1 e~

The next result is parallel to the Huber ' s result stated at the beginning of the introductory section.

Proposition 4. Suppose that F is absolutely continuous with a symmetric density which is decreasing away from the center, then the median is a minimax bias functional among all location equivariant estimators, with respect to the Kolmogorov distance, the Kuiper distance and the total variation distance correspondingly.

Proof . Wlog, assume that F -1 ( } ) = 0. I f F is symmetric, then, by the previous corollary,

b(Med, d,e, F o ) = F - l (½ +e) - F - ' (1) = '~[F-' ( I + e ) - F - ' ( ~ l _ e ) ] .

Page 4: A note on bias robustness of the median

3 6 6 Z Chen / Statistics & Probability Letters 38 (1998) 363 368

I Since the global lower bound is ½b(2e) = ~ SUpd(F,j.F)<~2,: Iql, we only need to show that SUpd(F,.F)~<2~: it/i = F - 1 (1 + e) - F - l (½ - e), which is geometrically obvious (or it can be seen by a direct compu- tation). []

We will now use the minimax bias functional T,: introduced in Donoho and Liu (1988) (p. 4.2) to show that the median is not optimal for asymmetric distributions. Recall that for a parametric family Po and a given (unknown) P with (known) e > 0, let S~(P) be the set o f all parameter values 0 such that d(P, Po)<~, then, T~ is defined as the center o f the set S~(P), i.e., T~ = arg mint maxoes,:(p)It - 01.

For each e > 0, 0 E S~(P) implies that d(P, Po)<~e, therefore d(P,,Po+,)<~e, for location equivariant distances, including the Kolmogorov distance, the Kuiper and the total variation distances. This means that 0 + r/E S~(P,), hence the set S~,(P) is location equivariant and so is T~. Note that T~,(P) may be different for different e.

Proposition 5. Suppose that F is an absolutely continuous and asymmetric distribution, then the median is not a bias optimal functional among all location equivariant estimators for biases with respect to the Kolmooorov, the Kuiper and the total variation distances.

Proof. Assume that M e d ( F ) = 0, since the distribution of F is not symmetric, there is a e > 0, such that F - l ( ½ + e) :~ - F - l (½ - e), say F -1 (½ + e ) > - F -1 ( k - e), then b(Med, d,e, Fo)= F -1 (½ + e ) . Now,

1 we will show that b(Med, d,e, Fo) > ½b(2e). I f not, since ½b(2e) = ~SUpdtF,,F)~2,: I~1 -- sup{/~ > 0 1 d(FlJ, F_#)<~2e,}, there exists a fl with d(F~,F_l~)<~2e such that fl>~F -l (½ + e). Therefore, F(fl)>>, ~ + ~,

l (because fl > - F - l ( ½ - e)), which implies that d(FI~,F_#) > 2e, and we have a and F(- f l ) < ~ - e contradiction. Hence, b(Med, d,e, Fo) is greater than the lower bound ½b(2e). Since the minimax functional T,: attains the lower bound (Donoho-Liu, 1988) (Proposition 4.3), we conclude that the median is not a bias optimal functional when symmetry assumption is violated. []

Since Huber (1964, 1981 ), it is well known that the median is a minimax contamination-bias functional for certain symmetric distributions. When the distribution is not symmetric, we have just shown non-minimaxity of the median for the bias with respect to several distances including the Kolmogorov distance. Next, we will prove that the median is not a minimax bias functional for the contamination bias. To this end, let us consider a version of the minimax estimator T,: in the setting of the contamination neighborhood. For F E ~ , let {F, = F(. - q)} be a parametric family of distributions. For a given e > 0, and for any G E ~ such that d¢(G,F~)<<.e, let S~:(G) be the set of all parameter values 0 such that dc(G, Fo)~e. Since we are considering one-dimensional distributions, the convex hull of S~.(G),CoS~(G), is an interval. Let TIn(G) be the mid-point o f the interval CoS,:(G). Since 0 E S~(G) means that dc(G, Fo)<~e, which implies that there exist a sequence of 6, and a sequence of distribution Qn, such that G = (1 - 6,)Fo + 6nQn and 6~ ~ dc(G, Fo)<~e, therefore G; = G(. - ( ) = (1 - fin)F(" - (0 + ( ) ) + 6nQn(- - (). That is dc(G~,Fo+~)<~e, and hence 0 + ~ E S~.(G~). This implies that both the set S,:(G) and the estimator Tm are location equivariant. Note that the definition of TIn(G) may depend on e.

Proposit ion 6. Tm is optimal in the contamination bias, that is, bTo,(e,F) = ½bv(e/1 - ~).

Proof. Consider a G with ddG, F)<.e, we claim that

/3 01,02 E S~.( G) ~ dv(Fo2,Fo, ) = sup IFo,(A ) - Ft,~(A)l ~< 1 -----~"

A

By the definition of Oi E Se(G), we have dc(G, Fo,)<~e. Since d¢(G,F) = inf{e~>0 : G = (1 - E)F + gR}, by this very definition, there exist a sequence of numbers rn, and a sequence of c.d.f. Rn, such that G =

Page 5: A note on bias robustness of the median

Z. Chen / Statistics & Probability Letters 38 (1998) 363-368 367

( 1 - r, )F + r ,R , and r, decreasing to d~(G, F) . Therefore, dc(G, F0, ) ~< e,, i = 1,2 implies that there exist two sequences of numbers rn and s~ and two sequences of c.d.f. Qn and R,, such that as n goes to ~c, both rn and s, decrease to dc(G, Fo, ) and dc(G, Fo, ) respectively. Moreover, for all n,

G = (1 - r . ) F o , +r~Qn

and

G = ( 1 - sn )Fo: + snR..

Wlog, let d~(G.Fo, )~dc(G, Fo2), then we can select subsequences of r. and s. if necessary, again call them r~ and Sn. such that r. ~<s.. Since (1 - s . ) F o . + s~R. = (1 - r . ) F o , + r.Qn, we get

( 1 - r~ )(Fo, - Fo~ ) = sn(Rn - Fo2 ) + r.(Fo: - Q. ).

Now, for any measurable set A,

(1 - r,,)lFo, tA ) - Fo_.(A) I = Is .(R. - F0:)(A) + r~(Fo: - Q.)(A)I,

let us discuss the right-hand side of the above equation in two cases: 1. I f [ R . - F o : ] ( A ) and [Fo~-Q.] (A) have different sign, then ] s . [ R . - F o : ] ( A ) + r ~ [ F o 2 - Q . ] ( A ) I ~ m a x { I s . [ R . -

F0:](A)[, Ir.[Fo2 - Q.I(A)I} <~sn. 2. If both [R . -Fo : ] (A) and [F0 : -Q . ] (A) have the same sign, assume that both are positive, (otherwise consider

the complement event AC,) then R.(A)>~Fo:(A)>~Q.(A) , hence O<~s.[R. - F o 2 ] ( A ) + r.[Fo. - Q . ] ( A ) = s .Rn(A) - r~Q.(A) - (s. - r~)Fo. <~s..

Therefore, as a consequence o f the above discussion, (1 - r . )SUpA I Fo, ( A ) - Fo~(A)[ ~<Sn. Since sn >~ r. and s. ~ d~(G, Fo:), we get

Sn S~ dc(G, FO. ) sup IFo,(A) - Fo.(A)I ~< - - ~< - - --~n~o~

A - 1 - r n 1 - s~ 1 - d c ( G , F 0 : )"

Since dc(G, Fo2)<~e. implies d c ( G . F o . ) / [ l - d ~ ( G , Fo2)]<~e](1-e . ) and supA I F o , ( A ) - Fo:(A)] is a con- stant, we get supA IF0, ( A ) - F 0 : ( A ) ] ~< dc(G, Fo.)/[1 - dc(G, Fo2)] ~< e/(1 - e). Therefore, the claim has been proved. []

The above claim implies that {(01,02) : 01,02 E S,:(G)} C{(01,02) : dv(Fo:,Fo. ) ~< e/(1 - ~)}. Therefore l I 101 02]. Now. we are ready to prove the ½by (c/(1 - r)) = g maxd,(Fo~,Fo, )~<,:/(1-~:)101 -- 021 > max0,,0,ES, XG) -

minimaxity. Wlog (because Tm is location equivariant) assume that Tm(F) = 0, notice that 0 E S,:(G) when dc(G, F)<~ e, therefore I Tm(G)I ~< maxoes,:(a)]O-Tm(G)]. Since Tm(G) is the mid-point o f the interval CoS,:(G), ½ maxo,,o:~s,~a)10F - 021 = maX0Es,(G)10 -- Tm(G)I. Combining these facts, we have

: sup ITm(G)l {G:d~(G,F)<~:}

~< sup max IO - Tm(G)l {G:d~(G,F)<~:) 0ES~:(G)

1 = sup max 101-021

{G:dc(G,F)<~} ~ (h,O2E~:(G)

~<2 v TZ~_e •

Page 6: A note on bias robustness of the median

368 Z. Chen I Statistics & Probability Letters 38 (1998) 363-368

Since br(e,F)>~ ½bv (e/(1 - ~)) whenever T is location equivariant (see He-Simpson, 1993), we get bT.,(E,F) =

½bv(~/(1- ~)). []

Corollary 7. Suppose that F is an asymmetr ic and continuous distribution, then the median is not a min imax contamination bias funct ional among all location equivariant estimators.

Proof. Using the same reasoning in the proof o f proposition 5, the contamination bias for the median is greater than the lower bound ½bv (~/(1 - E)). On the other hand, the global lower bound is attained by the modified minimax bias estimator Tm, therefore, the median is not optimal in the contamination bias for asymmetric distributions. []

Acknowledgements

The author is very thankful to the referee for providing many valuable suggestions for this revision.

References

Donoho, D., Liu, R., 1988. The "automatic" robustness of minimum distance functionals. Ann. Statist. 16, 552-586. He, X., Simpson, D.G., 1993. Lower bounds for contamination bias: globally minimax versus locally linear estimation. Ann. Statist. 21,

314-337. Huber, P.J., 1964. Robust estimation of a location parameter. Ann. Math. Statist. 35, 73-101. Huber, P.J., 1981. Robust Statistics. Wiley, New York. Vapnik, V.N., (~ervonenkis, A.J., 1971. Necessary and sufficient conditions for the convergence of means to their expectations. Theory

Probab. Appl. 26, 532-553.