breakdown point of schuster–narvarte's location estimator

4

Click here to load reader

Upload: zhiqiang-chen

Post on 02-Jul-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Breakdown point of Schuster–Narvarte's location estimator

Statistics & Probability Letters 48 (2000) 283–286

Breakdown point of Schuster–Narvarte’s location estimator

Zhiqiang Chen ∗;1

Department of Mathematics, William Paterson University, Wayne, NJ 07470, USA

Received January 1998; received in revised form November 1999

Abstract

The Schuster–Narvarte’s (Ann. Statist. 1 (1973) 1096–1104) location parameter of a distribution F , the center of asymmetric distribution closest to F; is an appealing de�nition for the center of distributions. It is noticed that this parameteris a special case of Donoho and Liu’s (Ann. Statist. 16 (1988) 552–586) semiparametric minimum distance functional.For this estimator, an algorithm is available in Schuster–Narvarte (1973); asymptotics results were established in Raoet al. (Ann. Statist. 3 (1975) 862–873) and in Arcones and Gin�e (Ann. Statist. 19 (1991) 1496–1511); validity ofbootstraps is proved in Arcones and Gin�e (1991) while simulation results can be found in Schuster and Barker (Comm.Statist. Simulation Comput. 16 (1987) 69–84). In this article, we compliment the study by establishing the robustnessproperties for the Schuster–Narvarte’s estimator. A general result for the (enlargement) �nite sample breakdown point isestablished. In particular, when the underlying distribution is symmetric, the limiting �nite sample breakdown point isalmost surely 1

3 . c© 2000 Elsevier Science B.V. All rights reserved

MSC: primary: 62F35

Keywords: Bias; Finite sample breakdown point; Kolmogorov distance; Schuster–Narvarte’s parameter; Semiparametricminimum distance functional

1. Introduction

Donoho and Liu (1988) de�ned and studied the minimum distance (MD) functional which can be describedas follows: Let {F�} be a parametric family of probability measures and let � be a distance (or a discrepancy)on the space of all probability measures. For any probability distribution function (or probability measure) F;the MD functional �̃ is de�ned as �̃=arg inf � �(F; F�); i.e., �̃ is a point which minimizes the quantity �(F; F�):They showed that, for many choices of distance �; �̃ has excellent robustness properties in terms of breakdownpoint and the sensitivity. In a brief subsection, they applied the MD functional to semiparametric setting asfollows: Let � be the family of all symmetric probability measures, � can be viewed as a semiparametric

∗ Corresponding author. Tel.: +1-973-720-3382; Fax: +1-973-720-2263.E-mail address: [email protected] (Z. Chen)1 Partially supported by the Center for Research, College of Science and Health, and by ART at William Paterson University.

0167-7152/00/$ - see front matter c© 2000 Elsevier Science B.V. All rights reservedPII: S0167 -7152(00)00008 -0

Page 2: Breakdown point of Schuster–Narvarte's location estimator

284 Z. Chen / Statistics & Probability Letters 48 (2000) 283–286

family of distributions because � = {F�: F� = S(· − t)} where t ∈ R; S is a distribution symmetric at theorigin, and � = (t; S): Therefore, by using various distances, the �rst (location) component of the resultingMD functional de�nes a class of location parameters. When there are more than one solutions for minimizing�(F; F�), the location parameter can be de�ned as the geometric center of the convex hull of the set ofall minima. This class of location parameters is appealing, because the center of a distribution F for anydistribution is de�ned as the center of symmetry of a symmetric distribution closest to F with respect tocertain distance. This is a intuitive extension of the “center” of a distribution for an asymmetric distribution.For robustness concern, they de�ne a discrepancy D(�1; �2)= |t1− t2| on the parameter space, which only paysattention to the location parameter t: The breakdown point of these location parameters is discussed and theyconjectured that it is as good as the induced parametric MD functional with respect to Kolmogorov, Kuiper,variation and Hellinger distances.Coincidently, to estimate the center of symmetry, Schuster and Narvarte (1973) introduced an interesting

nonparametric empirical estimator �n, as well as a computational algorithm as follows. The estimator �n isde�ned as a point a which minimize the quantity maxx |Fn(x) + Fn((2a − x)−) − 1|. Also see a closelyrelated article of Rao et al. (1975) where asymptotics results are established. Using this idea, a locationparameter � (called Schuster–Narvarte’s parameter) for any distribution F (or sample) can be de�ned as apoint which minimize the quantity maxx|F(x)+F((2a−x)−)−1|. It is worthwhile to note that, when � is theKolmogorov distance, the resulting location parameter of the semiparametric MD functional coincides withthe Schuster–Narvarte parameter (e.g., implication of Donoho and Liu, 1988; Lemma 5:2). There are otherknown parameters de�ned in term of location component of the semiparametric MD functional under otherdistances (or discrepancies). For examples, the Hodges’ location parameter (Hodges, 1967) for a distributionF is the semiparametric MD functional under Wassersteins distance, that is, for c.d.f. F and Q; w(F;Q) =∫ |F(x) − Q(x)| dx; and the Hodges–Lehmann’s location parameter (Hodges and Lehmann, 1963) is that ofsemiparametric MD functional under a discrepancy �2(F;Q)=

∫(F(x)−Q(x))2 dx: For these above-mentioned

parameters and corresponding quantities infQ∈� d(Q; F); which are measures for symmetry, extensive studieshave been conducted concerning the asymptotic properties (e.g. see Arcones and Gin�e, 1991; Rao et al., 1975)and validity of bootstraps (Schuster and Barker (1987) for simulation results, Arcones and Gin�e (1991) forthe proofs). The robustness aspect is not mentioned in the literature.The initial objective of this research is to establish robustness results for the Schuster–Narvarte estimator,

after noticing the paper of Donoho and Liu (1988), we also include some results for other cases. In thenext section, we �rst prove that 13 is a sharp general upper bound for the �nite sample breakdown point ofthe location parameter T de�ned with respect to the Kolmogorov, the Kuiper and the variation distances,respectively. A sharp lower bound of the �nite sample breakdown point for the Schuster–Narvarte estimatoris established, which implies that, in the case that the underlying distribution is symmetric, the almost surelimit of the �nite sample breakdown point is 1

3 . The �nite sample breakdown points for the Hodges’ andHodges–Lehmann’s estimators are also given with proofs omitted.

2. De�nitions and results

A popular measure of robustness is the �nite sample breakdown point which itself has two versions. Wewill follow the de�nition of Donoho and Gasko’s (1992) enlargement sample version, which is the smallestproportion of bad observations added to the sample to totally upset the estimator. More precisely, for aestimator T; let X (n) = {X1; X2; : : : ; Xn} be an ordered sample of size n; Y (m) be a set of m arbitrarily picked(bad) points on the line, the �nite sample breakdown point is de�ned as �∗=infm{m=(m+n); supY (m) |T (X (n)∪Y (m))− T (X (n))|=∞}:For any probability distribution function F , let Ft denote the c.d.f. symmetric at t ∈ R such that Ft(x) =

12 [F(x)−F((2t−x)−)+1]: Let d be the Kolmogorov, the Kuiper and the total variation distances, respectively,

Page 3: Breakdown point of Schuster–Narvarte's location estimator

Z. Chen / Statistics & Probability Letters 48 (2000) 283–286 285

Lemma 5:2 of Donoho and Liu (1988) proved that inf t∈R d(F; Ft) = infQ∈� d(F;Q); where � is the set ofall symmetric distributions. Therefore, the location parameters resulting from the MD functionals, T , can beequivalently de�ned as T = arg inf t∈R d(F; Ft):

Proposition 1. For the location parameters resulting from the semiparametric MD functional with respectto the Kolmogorov; the Kuiper and the total variation distances; 13 is an upper bound for the respective�nite sample breakdown point �∗:

Proof. Let m be the number of contaminating points such that m¿n=2; pick a large w so that w¿Xn; letY (m)={w} be m bad points coincide at the same site w: Let u be the median when n is odd, and u=Xn=2 when nis even, denote y=(w+X1)=2¿maxXi∈X (n) Xi; and z=(w+u)=2. Let d be any of the three distances mentionedin the proposition. For any x¡y; it is a routine job to show that d(FX (n)∪Y (m) ; FxX (n)∪Y (m) )¿m=2(m + n): Onthe other hand, by direct computation,

d(FX (n)∪Y (m) ; FzX (n)∪Y (m) ) = max

{(n− m)2(m+ n)

;(m− i)2(m+ n)

;n=2

2(m+ n)

}

where i¿1 is the number of observations coincide at u. Since

max{(n− m)2(m+ n)

;(m− i)2(m+ n)

;n=2

2(m+ n)

}6(m− 1)2(m+ n)

;

hence d(FX (n)∪Y (m) ; FzX (n)∪Y (m) )6d(FX (n)∪Y (m) ; FxX (n)∪Y (m) ), therefore T¿y: Since, w can be made arbitrarily large,

y can be arbitrarily large, hence the estimator can be broken down. Therefore �∗6 13 :

For the Schuster–Narvarte’s parameter, we have the following further results on the �nite sample breakdownpoint.

Proposition 2. [1−2minx d(Fn; Fxn)]=[3−2minx d(Fn; Fxn)]6�∗6 13 . In particular; if the underlying distribution

is symmetric; when n goes to in�nity; the limit of the �nite sample breakdown point is almost surely 13 .

Proof. The upper bound was established in the previous proposition. For the lower bound, let X (n) be a sampleof size n; Y (m) be a set of m bad points and FX (n)∪Y (m) be the empirical distribution based on X (n) ∪ Y (m).If m¡n; for any point y 6∈ CoX (n); the convex hull of X (n); by de�nitions and simple computation, wehave d(FX (n)∪Y (m) ; F

yX (n)∪Y (m) )¿

12 (n − m)=(n + m): On the other hand, let T = arg inf t∈R d(Fn; F

tn), one gets

d(FX (n)∪Y (m) ; FTX (n)∪Y (m) )612 [m + 2nd(Fn; F

Tn )]=(n + m): If (n − m)=(n + m)¿ [m + 2nd(Fn; FTn )]=(n + m); or

equivalently, m¡n(1 − 2d(Fn; FTn )=2; we have d(FX (n)∪Y (m) ; FyX (n)∪Y (m) )¿d(FX (n)∪Y (m) ; FTX (n)∪Y (m) ) for any y 6∈CoX (n); which implies that T (FX (n)∪Y (m) ) ∈ CoX (n), hence the estimator will not be broken down. Therefore,�∗¿[1− 2d(Fn; FTn )]=[3− 2d(Fn; FTn )].Moreover, if F is a symmetric distribution, we have mint d(F; Ft)=0. Since, d(Fn; FTn )→n→∞ mint d(F; Ft)

a.s. (e.g. Arcones and Gin�e, 1991), taking the limit in the above bounds, we have limn→∞ �∗ = 13 a.s.

Remark 1. A �nite sample breakdown point of a 13 means that the number of bad observations must be at

least 13 of the sample size to totally upset the estimator. For practical purpose, breakdown point being 10% orhigher is good, therefore the Schuster–Narvarte’s estimator has an excellent �nite sample breakdown point.

Page 4: Breakdown point of Schuster–Narvarte's location estimator

286 Z. Chen / Statistics & Probability Letters 48 (2000) 283–286

Remark 2. Using arguments similar to the above, we can prove that the limit of the �nite sample breakdownpoints for Hodges and Hodges–Lehmann estimators are 1

4 and 1−√2=2, respectively. The proofs are omitted.

Remark 3. More theoretical results of functional breakdown points using contamination neighborhood andcertain true neighborhood are available from the author, for example, when the distribution is symmetric, 13is also the contamination breakdown point for the Schuster–Narvarte’s estimator.

Acknowledgements

The author would like to thank Professor Evarist Gin�e for his constant guidance and encouragement andfor sending references.

References

Arcones, M., Gin�e, E., 1991. Some bootstrap tests of symmetry for univariate continuous distributions. Ann. Statist. 19, 1496–1511.Donoho, D.L., Gasko, M., 1992. Breakdown properties of location estimators based on half space depth and projected outlyingness. Ann.Statist. 20, 1803–1827.

Donoho, D.L., Liu, R.C., 1988. The “automatic” robustness of minimum distance functionals. Ann. Statist. 16, 552–586.Hodges Jr., J.L., 1967. E�ciency in normal samples and tolerance of extreme values for some estimates of location. In Proceedings of theFifth Berkeley Symposium on Mathematics Statistics and Probability, Vol. 1. University of California Press, Berkeley, pp. 163–186.

Hodges Jr., J.L., Lehmann, E., 1963. Estimates of location based on rank tests. Ann. Math. Statist. 34, 598–611.Rao, P.V., Schuster, E.F., Littell, R.C., 1975. Estimation of shift and center of symmetry based on Kolmogorov-Smirnov statistics. Ann.Statist. 3, 862–873.

Schuster, E.F., Narvarte, J.A., 1973. A New nonparametric estimator of the center of a symmetric distribution. Ann. Statist. 1, 1096–1104.Schuster, E.F., Barker, R.C., 1987. Using the bootstrap in testing symmetry versus asymmetry. Comm. Statist. Simulation Comput. 16,69–84.