for generalized hill dept of statistics l a stefanski … · leonard a. stefanaki department of...

24
RD-RI69 348 OPTIMALLY BOUNDED SCORE FUNCTIONS FOR GENERALIZED LSi- LINEAR MODELS WITH APPL..(U) NORTH CAROLINA UNIV AT CHAPEL HILL DEPT OF STATISTICS L A STEFANSKI ET AL. UNCLASSIFIED APR 85 RFOSR-TR-85-866 F49620-82-C-8869 F/G 12/1 N mE0 0hhhon oEEo so ___ol

Upload: others

Post on 12-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

RD-RI69 348 OPTIMALLY BOUNDED SCORE FUNCTIONS FOR GENERALIZED LSi-LINEAR MODELS WITH APPL..(U) NORTH CAROLINA UNIV ATCHAPEL HILL DEPT OF STATISTICS L A STEFANSKI ET AL.

UNCLASSIFIED APR 85 RFOSR-TR-85-866 F49620-82-C-8869 F/G 12/1 N

mE0 0hhhon oEEo

so ___ol

Page 2: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

-..~Ag IIIg.=gII

* 125 1 1.4 16

kW

MICROCOPY RESOLUTION TEST CHART

NiATIONAL. "EAU OF STAMADS IS3-A

t-o

%-%

--

,.

%-%. l* ••! • e• - i... ' . ' " " . . - ,; .,N.":""" "'. "\:" .'''' --., ".."..'."." ,,,-. '..-_',.--,_

Page 3: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

NONCTASSIFIEDSECURITY CL.ASIXICA VIOO

AD-A 160 348 TION PAGE

Nonclassified C ITIUINAALSLT FRPR

* 21L SECURITY CLASSIFICAION4 AUTWORITY .DSRBUIWV LIITOFEPT

ft DGCLASSIFICATONJDOGRADING &C40L Approved for public releasS;idistriblitlon ulited.

C PERFORMING ORGANIZATION REPORT NUMBEREIIS) L MONITORING ORGANIZATION REPORT NUME11R11%

____ ____ ____ ___ ___ imOSR-TR 850866*NAME OF PERFORMING0ORGANIZATION ELOFFIE SYMBOL. 7&. NAME OF MONITORING ORGANIZATION

Univ. of North Carolina JAir Force Office of Scientific ResearchS&. ADORESS (City. SkMM end ZjP Codes- 7b. ADDRESS (City. Sisif and ZIP Cafts

Department of StatisticsPhillips Hall, 039ACarolina Canvus_______

SOL NAME OF PUNOING1SPONSORINO Eb OFFICE SYMBOL 9. PRO41CUREMENT I "MENTy IENTI ICATIO NUMBERORGANIZATION a i ppilicblel

* AFOSR Z 462 3 009* as A00RE38 (City. Stolenad ZIP Code) IC. SOURCE OF FUNDING NOB. ___________

Boiling Air Force Base PROGRAM PROJECT TASK " OREK UNIT

Washington, DC 20332 ELEMENT NO. t o. NO. NO.61102F I 2304 I A5

"Uptiinlly Bounid Score Panctions' for Gm. Aized Linear Ais witk[ Applications to* 12. PERSONAL AUTHORM ES)TT W~5 1M7M35resIXV

Stefanski, Leonard A., Carroll, Raymnd J. mnd Ruppert, David134L TYPO Of REPORT 13b. TIM* COVERED 14. DATE OF REPORT (Y.. A#& Day) IS. PAGE COUNT

Itechnical IFROM 1IL...4 . TO 9/95L April 1985 1810. SUPPLEMENTARY NOTATION

* I. COSATI COOlS IL SUBJECT TERMS (Continue on reverse it nceseen and identify by block mnmber)

F144LO GROUP SU.O. Bounded influence, Generalized linear models, Influential::T--points, Logistic regression, Ou~tliers, Robustness.

* 10. ASTRACT (Continue on weuwg if nwee- and identify by block numborl

* ~ ~ ~ ~ -l -e td ~t~ bounde score functions for estimating regression parameters in a* generalizeud-lpieamdl. ar orexedreutobandb rsr Wech(92

fonralthe linear iiel n proide aesxped prooflt otid Krasker an Welsch irstordecdinfor stnermoeng proiy.Th aplicatonf of thse resulhs loist orerso

is studied in some detail with an example gi en comparing the bounded influence estimatorwith maximmn likelihood. 7A 7'-

FILEGORYELECTEUTIC FL COYOCT 15 1

20. CIS7ILTION/AVAJL.AILI-Y :)F ABSTRACT 21. ABSTRACT SECURITY C.ASSiFICAT

22&. NAM4E OF RESPONSIBLI INOIVIOLJAL UES 22. TELIEPHONE6 NUMIBER 22C. OFFICE SYMBOL

* Bran W Wodruf, MJ, UAF nciudo Auwe Code,00 r FOR 143.3 Woorf 1010 0USAJA (202) 767-5026 AFOSR/NM

00FOM143,3AMGOTONOFI Is OSSOLETE. N4CIASSIFIE)SECURITY CLASSIF ICATION OF TMIlS PAGE

% *1 .* *

Page 4: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

-POSR.TR- 85-0866

OPTIMALL B3I1U. SCME FU3CIMS

FM QU0M1ZM LINE NL1-'

KIT A??LICAMMS TO LOGISTIC

IMUSIM

by

Leonard A. StefanakiDepartment of Economic and Social Statistics

Cornell UniversityIthaca, New York 15853

Raymond J. CarrollDepartment of Statistics-

University of North CarolinaChapel Hill, North Carolina 27514

David RuppertDepartment of Statistics

University of North Carolina

Chapel Hill, North Carolina 27514

Accesslon For

'" NTIS GRA&IDTIC TAB ..

Unannounced" (JJustification

:o By ..

Distribution/

Availability CodesAvail and/or

Dist Special

85 10 11 131

Page 5: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

4 SUDUSAY

We study optimally bounded score. functions. for estimating regression

parameters in a generalized linear model. Our work extends results

* obtained by Krasker & Wtelsch (1982) for the linear model and provides. a

simple proof of Krasker and Welach's first order condition for strong

optimality. The application of these results to logistic regression is.

studied in some detail vith an example given comparing the bounded in-

fluence estimator with maximum likelihood.

Some key wards: Bounded influence; Generalized linear models- Influential

points; Logistic regression; Outliers; Robustness.

!,I

%

io

T ' " v ...

: :z:;.. ; ;: '.:;_ - , .. 2::.-,..:,., ,: .. :..:....,:..,..:..-... . . ............ , ,..... ... .. . . . .

Page 6: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

In a generalized linear model (KcCullagh & Nelder, 1983, Ch. 2) a

response variable T and covariate vector X are related via a conditional

density of the form

f(ylx) - exp{(y-h(x T))q(x T)%(x)/q + C(yu)

The functions h(.) and q(-) are subject to certain restrictions, * is a

-vector of regression parameters, a is a scale parameter and u(x) is a known

weight function. In this paper we study the problem of robustly estimating

a when m(.) a 1 and a ia known. Models of this type include logistic and

prohit regresaon, Poisson regression, and certain models. used in modeling

lifetime data. In the case where w(-) a I but a is unknown the methods

presented in Section 2 are still applicahle with some modification to allow

for joint estimation of a, c.f. Kraaker & Welsch (1982).

Our motivation for seeking robust estimator& is the same as that

encountered in the context of linear model -- maximum likelihood estimation

is very sensitive to outlying data. For the case of logistic regression,

Pregibon (1981, 1982) has documented the nonrobustness of the maximum

likelihood estimator and expounded the benefits. of diagnostics. as well as

robust or resistant fitting procedures.

Much of the work on robust estimation concerns finding estimators

which sacrifice little efficiency at the assumed model while providing

protection against outliers and model violations. We follow this course

finding hounded influence estimators minimizing certain functionals of the

asymptotic covariance matrix. Related work includes. that of Hampel (1918),

Krasker (1980), and Krasker & Welsch (1982).

Two important issues when fitting models to data are (i) identi-

fication of outliers and influential cases and (ii) accommodation of these

%.5. o .% . -o ,

%. ' ' . . .

Page 7: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

observations. Frequently when influential cases are present, the fitted

model is not representative of the bulk of the data. To rectify this, one

can simply delete influential cases and refit via standard methods, but

this approach lacks a theory for inference and testing; the effects. of case

"" deletion upon the distributions of estimators is not well understood, even

asymptotically.

The robus.t techniques studied here provide a method of accommodating

* anomalous data. They allow continuous downweighting of influential cases

and are amenable to asymptotic inference. Also, together with more direct

diagnostics, residuals and weights from a bounded influence fit can be used

* to detect exceptional observations..

In Section 2 we present some general theory; this is specialized to

the case of logistic regression in Section 3; proofs of theorems are given

in an appendix.

". HE EEAL TY

"... 2.1 The regzression aodel

We study regression models in which the dependent variable Y and

*'*. explanatory p-vector X have a density of the form

g(y.x;eo ) * f(y;x T )s(x). (2.1)

The conditionpl density of Y given X-x is f(ytx T ) and depends on the

Tunknown parameter 0 only through x ; s(x) is the marginal density of X.

a

Expectation with respect to g(yx;e) is denoted by E0 while E9,x indicates

conditional expectation corresponding to f(y;x Te). Model (2.1) includes

many generalized linear models (McCullagh & Nelder, 1983, Ch. 2).

Suppose (YtX,)x (i-1,....,n) are independent copies of (Y,X). Under

regularity conditions the maximum likelihood estimator of 0 satisfies

n

V 1 (YXi,0) " 0,i-i

b," J -" . ' .. , ' .. " 4 " ". , ,r,." ,- ,, , - - . . . . . . .

,:, - ,. --%, ,: ' ~ ~~. * .2;;:,- :_ ,'.-:' , ,9 \,;,,.'" ..... *.-.., . . .. •..'..-......-.......

Page 8: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

-3-

where A(y.X.G) - 8IaO@l*o(f(y;z T)) , and n(; - 0 ) converges in distri-

4 bution to a p-dimensional normal random variate with mean zero and co-

variance matrix V(S) - E 1 T)

* 2.2 N-esrimatars and thetr influence curves

We will consider estimators 0 satisfying

nI vyivxis 0i-I Ci

P P Pfor suitably chosen functions * from R x R x R to R . We require that

be unbiased, i.e.,

Eej1,(Y,X,6)} - 0 (2.2)

Under regularity conditions (Huber, 1967), 0 is consistent and asymp-

totically normal with influence curve

IC (y,x,O) - D7l(o)*(y,x,B) (2.3)

where D (6) - -8/80[E fI(YXM (2.4)

Write *(G) and X(0) for *(Y,X,Q) and A(Y,X.0) reapectively. Assuming

that integration and differentation can be interchanged in (2.2) and (2.4)

it is easy to show that

TD (0) - E {*(e)T(o)}. (2.5)

* Nov let

W (0) - E (e)* T(01) (2.6)

It then follows (Huber, 1967) that the asymptotic variance of n (0 - 0* 0

is

V ) D (0 )W (9

1*6 For robustness we want IC to be bounded; for efficiency we want V to

be small. In the next section we define a norm for IC and outline a

I

-" =,V V Va- .- l,* . ,.-* .. . . ,4 . :.,.._ .-?- . .-:./ ?..,.,...

Page 9: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

theory vhich suggests efficient bounded scare functions.-D

S23 A scalaer Aasure of Influenc and an optxal acore fuactrfa

As a scalar measure of maximum influence we employ a definition of

. sensitivity introduced by Stahel in his. Swiss Federal Institute of Tech-

nology Ph.D. thesis, and by Krasker & Welach (1982). The oelf stsadardlzed

sensitivity of the estimator e is defined as

T

a(*) sup sup I - sup- (lC Tv; 1C.)T( I -

sup (*wI)) (2.7)(y,x)

For a generalized linear model s(*) has- a natural interpretation in terms

of the link function, e.g., in logistic regression &(W) measures the

maximum normalized influence of (y,x) on an estimated logit in that X IC

T- Tis the influence curve for X, 0 and A, V X is the asymptotic variance

of I, e Although this paper studies only the self standardized

sensitivity we believe that useful estimators can also be obtained by

bounding other measures of influence, such as. fitted values.

For maximum likelihood V = 3 and, in general, a(L) - + * * To obtain

robustness we limit attention to only those estimators 0 for which

a(S) < b < 0 (2.8)

Such an estimator is said to have bounded influence with hound b.

Consider the score function

El (yX,)- ( -c)minf{1, b2I((-C) TB-I(A-C))} , (2.9)

where A-=(yx,e) and C - C(O) and B - B(O) are functions of 0 definedpxl pxp

implicitly by the equations

"_-.' ," .. . '. .;,, ,' . " ".' , '- -' ' ". . . ..'- . " . . . • - ." -** ~ . . . .. . * . * -

Page 10: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

~-5-

Ee~*5 i(y,x6e} - 0, () - E, 51 , 1 1. (2.10)

With C(S) and B(S) as defined, Bis unbiased and V (0) - B(e), so

that by (2.7), *B1 has bounded sensitivity.

The vector C(O) and matrix B(S) are analogous to robust multivariate

location and scatter functional* for A(XX,S), (Maronna, 1976). For

sufficiently large b solutions C(S) and-B(s) satisfying (2.10) exist, and

Tas b tends to infinity these tend to zero and E{A ) respectively. Equation

* (2.9) shows that is similar to a weighted maximum likelihood score vith

r~ -1weights depending on the distance (A-C) B (A-C); as b tends to ii-finity

the weighting factor tends. to one and to L.

For the normal theory linear model 0 I Is the score function found by

Krasker & Welsch (1982), who show that if there exists a score *pt

satisfying (2.2) and (2.8) which minimizes V in the strong sense of

positive definiteness, i.e., V - V * 0 for all *, then it must be of

the form (2.9). That oBI possesses slilar optimality properties is seen

"* in Corollary 1.1 below.

THEO 1. f for a given choice of b > 0 equations (2.10) possess the

solution (C(O), B(S)). then Ainislizes tzrV V;) among all

.. sacisfyitg (2.2) and

T -1sup (IC V IC) < bl . (2.11)

(y'x) *Eh'ith the exception of multiplication by a constant matrix, 10 is unique

alaost surely.

Any score function op for which V -V a 0 for all w will bept opt

called strongly efficient; we now state the following corollary.

... . . . . . . ..

Page 11: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

COMAAr 1.1. MI there exists an unbiased, stronSly e.ficlnt sco pt

satisfying (2.8), then ispt 10 equivalent to 'BI whenever the latter is

defined.

* Rearks 1. In Theorem 1 the conditions for optimality of *BI depend on BI:i tsel thrugh-1

itself through VI. This is somewhat-disconcerting. Nevertheless *

does satisfy an optimality property and this result allows us to prove

Corollary 1.1.

* 2. Working within the class. of score functions of the form L(y,x,O)"(yx,$)

where m is a scalar weight function, Krasker & Welsch (1982) find the

optimal form of 4. Theorem 1 and its corollary show that *B1 is optimal

over a such larger class of functions and hence yield a technically

stronger result than Krasker and Welech's. Also our proof is. somewhat

simpler than Krasker and Welsch's.

3. Ruppert (1985) has shown that a strongly efficient score need not

exist, in which case Corollary 1.1 is vacuous. In fact, we know of no case

with p a 2 where a strongly efficient score has been shown to exist.

However, the result given in Corollary 1.1 is still of interest; Ruppert

(1985) uses it in his counter example.

4. The proofs of Theorem I and its corollary are presented in the appendix.

2.4 A one-step estizator

Write * BI " VuI(y 'xteBC) to indicate dependence on B and C. Theorem

1 suggests the estimator aBi obtained by solving

n a a

i-B

where 1 - Bi(YiVXi1OB(6),C(O))

and C(O) and B(O) are defined implicitly by the equations

%

Page 12: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

-7-

n ,,Z K y (i,4(S)} - 0 (2.12)

B(G) I E 6, it (4 ()* 1(8)) 'T (2.13)

i-1 i-i " £

In the linear model (Krasker & Welsch, 1982) symmetry implies C(s) a 0

so that finding 60 is greatly simplified. For non-linear models solving

for i is much more difficult, so we suggest the following one-step

procedure. Let 9 be an initial root-n consistent estimator of 00

Compute B(O) and C(O) iteratively from (2.12) and (2.13). Define1n -1(

^(1) n- 11 Di-

I n T

where D(e) - I E it (0)'T (Y,Xi, e)fi-I el,xi

This construction is similar to Bickel's (1975) Type II one-step

procedure. Under regularity conditions 0BI is consistent and asymp-

totically normal with covariance matrix V (a ) D (Q )B(6 )(D_ -1 ))T,B1 o B o o DI1 o

which is consistently estimated by V a D (0) (0)(D-()) T. To pre'erve

finite' sample robustness we suggest that S also be resistant to outliers.

3. APPLICATION TO LOGISTIC RMESSIO

3.1 The logistic model

Logistic regression is a special case of model (2.1) in which Y is an

", indicator variable such that

TP(T-1JX-x) - F(x e ), F(t) - 1I/(l+exp(-t)).0

The general applicability of this form of binary regression is discussed by

Berkson (1951), Cox (1970), and Efrn (1975). The likelihood score is

T- (y-F(x))x and the maximum likelihood estimator is consistent

:.YXO (-( )) n h

"%

Page 13: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

- and an asymptotically normal with covartance matrix V(o )

~F 'I(XI 0 ))OJI where F(l (t) - (dtdt)F(t).

3.2 Constructing the one-step e"CYmAtor for the lof1stYc odel.

The irstste inompuing( 1)T si entails finding an easily computed.The irstste in o~utng BI

robust, root-n consistent estimator S. Me find an optimal score function

. from among the class,

TN- i:*(y,x,O) - (y-F(x O))W(x,G)}

where (.,) is. a p-vector valued function of x and 0 but not y. The

advantage, in terms of computational simplicity, of restricting attention

to score functions. in Nis that condition (2.2) is automatically satisfied

and it is not necessary to estimate a robust location functional.

The estimator we propose, and call a bounded leverage estimator,

corresponds to the score

'BL= (y - F(xT0))x minf41,b2/(mZ(x T)xTQ1(0)x)),

where Q. U Q() is an implicitly defined function of 0 satisfying

Q(G) - E iF( )(XT a n'n,b1/(m(XTB)XTQ X)HI, (2.14)0A

and m(-) is the function m(t) - max(F(t), l-F(t)). In L. A. Stefanski's

University of North Carolina Ph.D. thesis it is shown that in order for

(2.14) to possess a solution Q > 0, it is necessary that

b2 > p / E a F (1?XTa)ImZ(XTO)). (2.15)

Condition (2.15) is generally not sufficient however. Note that with Q

satisfying (2.14), W .9 and by (2.7), O8 has bounded influence.

We are able to restrict attention to only those * in X and still

Tobtain bounded influence simply because the absolute residual ly -F(x $)I

is bounded. However, takes a pessimistic view in downweighting

observations in accordance with their maximum potential influence deter-

mined by their position in the design space and by 9. The term leverage

is often used to denote potential influence (Cook & Weisburg. 1983) and

-p. . :-: , ,,, ,: .< : ., .. .. .. . .. .-:....... ... .. .. ..- .. . . - .. ._ ,-V,,, , m ,i ~ • .- . , . :.:....:.-..."..-. . ".,. ... ...-

Page 14: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

' -9-

hence the name bounded leverage. Potential influence is often far greater

than the actual influence when the observation is veil fit by the model.

Although downveighting such points results in a loss of efficiency for BL

this will not affect the efficiency of our one-step estimator. Also, as

the following results show, *BL is the most efficient score in X.

*TWIE 2. rf for a given choice of b > 0 equation (2.14) possesses the

solution Q > 0 then minimizes Cr(V V ) on all * in M satisfying

sup (IC TV- IC ) < b2

(yx) 10 BL 1-

With the exception of multiplication by a constant matrix, 4'EL Is unique

* almost surely.

CaX AR Y 2.1. If there exists a stro gly efficient score ept In N, then

*opt is equivalent to * L whenever the latter Is defined.

Remark 1. Proofs are similar to those of Theorem 1 and its corollary and

will not be given.

2. The extent to which Theorem 2 generalizes to other regression models

is limited, since it requires that A(yxO) be a bounded function o.f y.

Our initial estimator 0 is obtained by solving

nI (Y - (X ))X mini{12/(M2(XT- T--1 U-i(Y F(. Il. 6)Xp (8)2% )I

where Q(G) satisfies

-() T T IT T--Q(6) n F (X * )X X minil, b 1( 2 (O)X iQ W

.. ...... ....... .. . . . ....... ,_.;-'--.-, - . . . .-.-.-.... ,. . ..-..... ..:- -e.- .. _i_ _ _m -= .= = , , ,., _. , . , .,o,..,.... -, -.p

Page 15: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

For the one-step construction in Section 2.4 to work it is necessary that

be root-n consistent. In Stefanski's Ph.D. thesis it is shown that

nf(i-t ) is asymptotically normal with covariance matrix VBL(8 0

1D 1)Q(Bo)(D (0o)) provided:BL o o BLo*(i) b is sufficiently large,

(ii) EIIIXII 2) (0 -,

(1) T I -(iii) EIF (XTO)XXTjIjXjI -1 is positive definite,

(iv) (aIaQ)EJ(X9,.,Q)) is nonsingular where

J(Xe,Q) - Q - F(1)(XTO)XXTain{1,bz/(mt(XTo)XT Q X)j.

The key assumptions are (iii) and (iv) which are similar to Assumption 7 of

Krasker & Welach (1982).

As an estimate of VBL we use V D (0)Q)(D-(0)) where

n

(*) a n- I E X (Y1XiQ(6))A(¥X )- -i'X BL

An algorithm for computing a and BI for logistic regression models

appears in Stefanski's Ph.D. thesis. To fully specify the algorithm one

must deiermine the bound b. For 0 this was chosen as a constant multiple

of b(G) where

b (0) p t p [ -1 n TO(1) T /u(X~ q)j! F ( i

see (2.15). For the examples in the next section we took the bound to be

(1.5)b(i); this same bound was then used for the one-step estimator ,

The choice (1.5)b(O) was suggested by experience; it is sufficiently small

to provide protection from extreme observations yet large enough to avoid

computational problems.

h- , .. • *, - . -. ." *.".. " "... .. '...-.. . ",. --. -•- , .. .. ...

-- -- mi*i * pa l i n ' i " ' ,' . .. ,-, a ,- . . -'.' -.

Page 16: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

7 .- 11-

~33 .r'svo .e

-p We apply our results to data relating participation in the U.S. Food

Stamp Program to various socioeconomic indicators. The data. which are

available from the first author, were selected at random from a cohort of

over 2000 elderly citizens. The covariates are, (i) tenancy, indicating

home ownership; (i) supplemental income, indicating whether some form of

supplemental security income is. received; and (iii) monthly income. In our

sample of 150 there were 24 cases of participation.

The researcher who supplied these data had been using probit regres-

sion with monthly income entering linearly in the model. A fit of the

logistic model with covariates tenancy, supplemental income, and (monthly

income)110 produced Table 1(a).

Apart from the constant C, 0 is a weighted maximum likelihood eati-

mator with weights 40 a win lb2/((A i-C)TB - (A - C))) , where X i - A(Yi"Xi'e)'

see equation (2.9). Estimated weights, &i less than one indicate in-

.. fluential or ill-fitting obaervations. For the analysis in Table l(a) 4O -

0.69. Z6 - 0.4O, 9 0.98 and 1 " 0.62 were the only weighta leas

than one. Since these observations correspond to the four largest incomes

-. among those receiving foad stamps a transformation of income is indicated.4_

In Table t(b) we present the analysis with log(monthly income + 1)

replacing (monthly income)I10. This transformation substantially reduces

the leverage of large income values but increases the leverage of small

income values. For this model the bounded influence estimator downweighted

only two observations with Q5 a 0.21 and &66 - 0.76. Case 66 has the

largest income among those participating while case 5 has the smallest

income among those not participating. Apparently cases 05 and 066 are

influencing the maximum likelihood fit; this is indicated to a great extent

by the bounded influence analysis and even more so by the maximum likeli-

' '" , ..- .- ' .-°- . -.'.*** "*. ". --.. . .- .•. . .

~~~-p*~ ~ ~~PL %I*~'**~*******.

Page 17: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

* -12-

hood fit with the two outlying cases removed.

An advantage of robust methods. over maximum likelihood is that

residual plots are more reliable for uncovering outlier&. This is illus-

trated in Figure. 1. Residuals (Cox (1970), p. 96; Pregibon (1981)) are

*il plotted for both the maximum likelihood and bounded influence fits.

3.4 Cancjusiaons.

Our bounded influence procedure provides a method of fitting meaning-

ful models in the presence of anomalous data. Sine similar models can be

*. obtained by diagnosing and deleting outliers it is worth emphasizing that,

unlike the method of case deletion, robust methods are amenable to

asymptotic inference; this feature is important whenever hypothesis testing

or confidence region* are objectives.

Robust procedures also supply useful diagnostic tools for model

building. Variable selection, as well as estimation, can be influenced by

anomalous data; Pregibon (1982) cites such an example. Often robust methods

suggeativariables appropriate for modeling the bulk of the data vhich would

otherwise go undetected in a standard maximum likelihood analysis. Can-

versely, with non-resiatant fitting, a variable might be used in the model

simply to accommodate a single outlier. In addition to variable selection,

the weights and residuals from a robust fit provide useful supplement& to

more direct diagnostics. For example, with the food stamp data, an

analyst, seeing the impact of case five, might question the validity of

that observation or the appropriateness of the model aver the full range of

incomes.

The research of the first two authors was supported by the Air Force

Office of Scientific Research while that of the third author was supported

, . .- .. . - ... . . ...%. . .: .,: '' ', ? .< ,-, ... ... .... ,. pa . . . .

Page 18: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

~-13-

by the National Science Foundation. Ve also acknovledge the helpful

comenta of the referees.

Proofs ot Tbeorem I and Corolary I

Theorem 1 is a generalization of Appendix A in Hampel (1978) and the

proof given here uses techniques found in Krasker (1980).

Proof of rhorem 1. Let , be any c etitor to *zI Without loss of

generality assume that I - ICy, i.e. that * is in canonical form in the

sense of Hampel (1974). Thin is equivalent to assuming

E fi(Y.X,)AT (YI,xe)l - I *pp (A.1)and implies V (6) - E 1VY'X'el*TlYXe)I

Nov vrite A for X(Y,XO) and * for *(YX,O). If * satiefies. (A.1) and

(2.2) then

TE. {(D3 (L-C.) - *)( (.-c,) - =

D; 41(A'C)(J-c) )(D I TV .(e)M

* Therefore tr(V V ) is, neglecting an additive constant independent of

*, proportional to

(.( -) V 1 (.-C.) -1f. (A.2)

Define * * V ; in terms of , (A.2) becomes

* E,{I,-VDI.-B I (A.3)

Note that 1,I2 - T~* -1 and thus subject to (2.11), equation (A.3) is

minimized, as a function of #, by

T-;1-l -i1T(AC)VB= V I(A-C)min {l,bhz((A-C) DuVi(Dl) (3-c)).(A.4)

Condition (A.1) insures that # Is unique almost surely. Equations. (2.3),(1 -1- -IT -

(2.6), and (2.10) imply DI1Vai(D) - B thus in terms of *,

(A.4) becomes proving the theorem. /1

"k 5%%%5 ~ * ~~~,* * ;lV ; *i -' , .S ". ', .:&.. . -. . , . .-

Page 19: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

Prof of Corollazy 1.1% Again assume that all scores are in canonical form

and satisfy (2.2). Define

S. 4: sp * -lTV-1S &p *V 4 ' S bt up£j1.fI bsu

(y,x) (yx)

We must show that if there exists *4pt in S such that V *t V for all' -1 1 o°pt

in S. then Vapt is equivalent to DBIIBI. Clearly D1 I4* 1 is in S.

thus by assumption V S VI Prom this it fallov that

T

. Ygopt

t T S-1 T -1opt UIopt opt optapt

and hence *opt is. in Sm. Let I S ft $la. The set Iis nonempty; it

-1contains D3 1*S1 and Opt . For any * in I we know V*opt S V and hence

tr(V V-1 ) S tr(V -1 )*Opt B V -1for all * in I. But Theorem I proves that Dj*,,. when defined, is the

almost everywhere unique miniamter of tr(V )I a'!,,ng all * in J'. Tentf

equivalence of 4op and 4'~ follows. II

op I

Page 20: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

• . -15-

References

Berkmon, J. (1951). Why I prefer logists to probitm. iometrics 7, 327-39.

Bickel, P. A. (1975). One-step Huber estimates in the linear model. J.

An. Statist. Assoc. 70, 428-34.

Cook, D. R. & Weisberg, S. (1983). Comment on "Minimax Aspects of

Bonded-Influence Regression" by Huber. J. Am. Statist. Assoc. 78,

74-5.

Cox, D. R. (1970). Analysis of Boary Rata. London" Nethuen.

SEfron, B. (1975). The efficiency of logistic regression compared to normal

discriainant analysis. J. An. Statist. Assoc. 70, 892-8.

Hampel, F. Rt. (1974). The influence curve and its role in robust esti-

mation. J. &.. St atis. Assoc. 69, 383-94.

Hampel, F. R. (1978). Optimally bounding the Gross-Error-Sensitivity and

the influence of Position in factor space. 1P8 Proceedixgs of the

AA Statistrica Computing Section, 59-64.

Huber, P. J. (1967). The behavior of maximum likelihood estimates under

non-standard conditions. Proceeding. of the Fifth Berkeley .mposigu

on Nathemetical Statstica and Probabil.ty. 1, Ed. L. M. LeCau and J.

Neysan, pp. 221-33, University of California Press.

Krasker, W. S. (1980). Estimation in linear regression models with

disparate data points. Econometric. 48, 1333-46.

Krasker, W. S. & Welasch, R. E. (1982). Efficient bounded influence

regression estimation using alternative definitions of sensitivity.

J. As. S atist. Assoc. 77, 595-605.

Haronna, R. A. (1976). Robust N-estimators of multivariate location and

scatter. Ann. Statist. 4., 51-67.

"; ' .%.,.- _.* *... ..., ...-. ..- ; S % % %. * *

Page 21: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

-1-7-T. 7 -16-

McCullaSh, P. & Nelder, J. A. (1983). 0rallt.edIin Uear odoJ. London:

Chapman and Hall.

Pregihan. D. (1981). Logistic regression diagnostic&. Ann. Stltast. 9.

705-24.

Pregibon, D. (1982). Resistant fits for some commonly used logistic models

with medical applications. Afawt,"ics 3G. 485-98.

Ruppert, D. (1985). On the bounded-influence regression estimator of

Krasker and Welsch. 1. Am. Statist. Assoc. 80, 205-08.

9.

9'.

I

4*.....

,-.. .-

.9S. %::::

- *,%

, -•

Page 22: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

-17-

Table I (a). Estimates for the logistic regression model with covariatestenancy, supplemental income, and (monthly income)I10; p-values in

* parentheses.

supplementalintercept tenancy income (monthly income)/10

6 -0.34 -1.76 0.78 -0.01

ML (0.5287) (0.0009) (0.1259) (0.1122)

e -0.16 -1.75 0.77 -0.02

(0.7872) (0.0014) (0.1360) (0.0826)

0(1 -0.20 -1.76 0.78 -0.02(0.6006) (0.0012) (0.1300) (0.0922)

Table 1 (b). Estimates for the logistic regression model with covariatestenancy, supplemental income, and log(monthly income + 1); p-values inparentheses.

supplemental log(monthly- intercept tenancy income income + 1)

L 0.93 -1.85 0.90 -0.33* (0.5681) (0.0005) (0.0737) (0.2228)

0 4.14 -1.81 0.75 -0.86(0.1030) (0.0007) (0.1444) (0.0430)

(1) 4.02 -1.81 0.76 -0.84(0.1100) (0.0006) (0.1416) (0.0465)

. 6.88 -2.02 0.76 -1.33ML(0.0160) (0.0004) (0.1586) (0.0062)

* With cases #5 and #66 excluded.

*, *,*.* ; .* -. .. , . . , ,, . . . . . . .. .. , .. -,

Page 23: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

IiI

166

5 0o

residual 4

3 -20 0

0 0 S S

0 O0o 0 0

a 00 0

00

00

0-1 20 4o 60 11

inicte by cice o;rsdasfotebuddifuneftb

a - bs

Cox ( pr artI y

* Figure 1. R~esidual Plots tfor FS Data. Maximum ikelihood residuals are'" Indicated by cirlts 'o'" residuals from the bounded influence fit by'." asterisks '*'. F'ar both estimation procedures residuals are defined am in

" Cox (1970). p. 96. Negligible residuals have been omitted for clarity•

U- -. *

** * ** ) .

", ++,'+,+',,+., ,,++.+,,,,, .. ,. P "? •P~~ *.. "......' ''',"...""...+'

Page 24: FOR GENERALIZED HILL DEPT OF STATISTICS L A STEFANSKI … · Leonard A. Stefanaki Department of Economic and Social Statistics Cornell University Ithaca, New York 15853 Raymond J

FILMED

11-85

DTIC% %. ~~*.Y .%