resistant line & related regression methods

15
Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org The Resistant Line and Related Regression Methods Author(s): Iain M. Johnstone and Paul F. Velleman Source: Journal of the American Statistical Association, Vol. 80, No. 392 (Dec., 1985), pp. 1041- 1054 Published by: on behalf of the Taylor & Francis, Ltd. American Statistical Association Stable URL: http://www.jstor.org/stable/2288572 Accessed: 11-08-2015 18:18 UTC Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/ info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. This content downloaded from 132.248.228.200 on Tue, 11 Aug 2015 18:18:38 UTC All use subject to JSTOR Terms and Conditions

Upload: german-palafox-palafox

Post on 16-Aug-2015

222 views

Category:

Documents


7 download

DESCRIPTION

un excelente articulo sobre metodos para la exploracion de datos à la Tukey

TRANSCRIPT

Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association.http://www.jstor.orgThe Resistant Line and Related Regression Methods Author(s): Iain M. Johnstone and Paul F. Velleman Source:Journal of the American Statistical Association, Vol. 80, No. 392 (Dec., 1985), pp. 1041-1054Published by: on behalf of theTaylor & Francis, Ltd. American Statistical AssociationStable URL:http://www.jstor.org/stable/2288572Accessed: 11-08-2015 18:18 UTCYour use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jspJSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected] content downloaded from 132.248.228.200 on Tue, 11 Aug 2015 18:18:38 UTCAll use subject to JSTOR Terms and ConditionsT h e Re sistantL ine and Re l ate d Re gre ssionMe th od s IAINM.JOHNST ONEand PAUL F.VEL L EMAN* Inabivariate (x,y)scatte rpl otth e th re e -groupre sistantl ine is th at l ine for wh ich th e me d ian re sid ualine ach oute rth irdof th e d ata(ord e re d onx)isze ro.Itwaspropose dbyT uke yas an e xpl oratoryme th od re sistanttooutl ie rsinxand yandsuite dtoh and cal cul ation.A d ualpl otre pre se ntationofth e proce d ureyie l d safast,conve rge nt al gorith m,nonparame tricconfid e nceinte rval s for th e sl ope ,consiste ncy,infl ue nce function,andasymptoticnormal ity re sul ts.Monte Carl ore sul tssh ow th at smal l sampl e e fficie ncie se xce e d th e ir asymptoticval ue sin importantcase s.Both bre akd ownand e fficie ncyare ad e quatefore xpl oratorywork. Re pl acingme d iansbyoth e rM-e stimators ofl ocationincre ase s e fficie ncysubstantial l ywith out affe ctingbre akd own or computationalcompl e xity.T h e me th od th us combine sbound e d infl ue nce and acce ptabl e e fficie ncywithconce ptualand computational simpl icity. KEY WORDS:Bound e d infl ue nce ;Robustre gre ssion; Dualpl ot;Variance -re d uctionswind l e ;Brown-Mood re gre ssion; Be rry-Esse e n th e ore m;Me stimator;Confid e nceinte rval ;Re - pe ate dme d ian re gre ssion;Bre akd own. 1.INT RODUCT ION1.1Summary T uke y (1970,ch ap.10)propose d th e (th re egroup) re sistant l ine asane xpl oratoryd ata-anal ytictool for quickl yfitting a straigh tl ine tobivariate (x,y)d ata. T h e d atapoints ared ivid e dinto th re e groupsaccord ingto smal l e st,mid d l e ,or l arge st x val ue s,andth e l ine with ane qualnumbe rofpointsabove andbe l ow itine ach ofth e oute r groupsisfitte d .T h e re sul tingparame te re stimate sare re sistant to th e e ffe ctsofd atapoints e xtre meiny, orx,orboth .(Ford e tail e dd iscussions incl ud inge xampl e s,se e Ve l l e manand Hoagl in1981and Eme rson andHoagl in1983.)De signe d asa"pe ncilandpape r"me th od , th ere sistant l ine isal so use ful asacomputational l yquicknon- parame tricre gre ssion incompute r-assiste d d ataanal yse s.Al - th ough itcompare sfavorabl yto oth e r we l l -knownnonpara- me tricl ine arre gre ssion proce d ure sand iswid e l yavail abl ein statistics package s(e .g.,Minitaband P-Stat),th e te ch nique , l ike many e xpl oratoryme th od s,h asnotbe e nanal yze d th e o- re tical l y. T h isarticl epre se ntsafastcomputational al gorith m(Se ction 2), d istribution-fre econfid e nce inte rval s(Se ction3),andasymptotic re sul tsonconsiste ncy,infl ue nce function,andasymptoticnormal ity(Se ction4)for th e re sistantl ine .A cl ass of "Vg-re sistant" l ine sconstructe d from M-e stimatorsofl oca- *Iain M.Joh nstone isAssistantProfe ssor,De partme ntofStatistics,StanfordUnive rsity,Stanford ,CA 94305.Paul F.Ve l l e manisAssociate Profe ssor, EconomicandSocial Statistics, Ne wYorkStateSch ool ofInd ustrial andL aborRe l ations,Corne l lUnive rsity,Ith aca, NY 14853.Joh nstone 'sworkwassup- porte dinpart atCorne l lbyanAustral ianNationalUnive rsity Sch ol arsh ip, at Stanford byOffice ofNavalRe se arch ContractN00014-81-K-0340, andatth eMath e matical Scie nce sRe se arch Institute ,Be rke l e y,CA,byth e National Sci- e nce Found ation. T h e auth orsth ank David Hoagl in,Pe te rRousse e uw,Joh n T uke y,and th e re fe re e sfor d e tail e d and h e l pful comme ntsond rafts ofth is articl e ;MarkMatth e wsforFigure1; and Corne l l Unive rsityforcompute rfund s. tionisth e nintrod uce d (Se ction5).Wh il e re tainingth e com- putational and re sistanceprope rtie softh e re sistantl ine , th e see stimators canh ave substantial l yh igh e re fficie ncy.We th e n use th e pre ce d ingth e oryand th e re sul tsofaMonte Carl osimul ation e xpe rime ntofsmal l sampl e prope rtie s(Se ction6) to compare th e re sistant l ine swith re l ate d l ine -fittingproce - d ure s.T h e e xpe rime ntuse stwo ne w variance re d uction "swind l e s"and provid e ssmal l -sampl ere sul tsnotpre viousl y avail abl e formost ofth ee stimatorsstud ie d . Final l y,we d iscuss e xte nsionsto mul tipl ecarrie rsbrie fl yinSe ction7.Inconsid e ringth eme ritsofvariousal te rnative stol e ast square s re gre ssionme th od sine xpl oratorywork ourprimarycrite riaareth e fol l owing: 1.re sistance , incl ud ingprote ctionfrom d ata val ue se xtre meinxaswe l l asth ose e xtre meiny("bound e dinfl ue nce ") 2.costofcomputation 3.asymptoticandsmal l -sampl e e fficie ncyforbothGaussian and h e avy-tail e d e rrord istributions. T h e re sistantl ine d oe swe l l forcrite ria 1and2andinmany case sh asane fficie ncyofrough l y60%.Wh e re re sistanceis important,aval ueof60%may not be notice abl yd iffe re ntfrom 100%.Conse que ntl y, wh e th e rone isinte re ste d insuch case s ine fficie ncyor inre sistance ,th e re sistant l ine isaval uabl etool .T h e mod ificationspropose d inSe ction5canincre aseth epractical e fficie ncytorough l y 75% und e rawid e rangeofe rrord istributionsand incre aseth e asymptoticre l ativee fficie ncyto80%und e rcl assical re gre ssionassumptions.(Inth isarticl e ,our use ofth e te rms"re sistant," "nonparame tric,"and "d is- tribution-fre e "isinte nd e d to conformwithth at ofHube r1981, ch ap.1.) 1.2BackgroundT uke y's me th od bl e nd s fe ature softwotrad itional approach e s tol ine arre gre ssion.Wal d(1940)stud ie d th e e rrors-in-variabl e s probl e moffittingastraigh tl ine y= a+ bxwh e nbothyandxare subje ct to e rror. He partitione d th e obse rve dd ata points (xi,yi)into two groups L ={(xi,yi):xi me d xi}and use d ane stimatorbase d ongroup me ans: bw= (YR- YL )/(XR XL ).(1.1) Ind e pe nd e ntl yofWal d ,Nairand Sh rivastava(1942)appl ie da simil are stimatorto fixe d ,e qual l y space d x-val ue sd ivid e dintoth re e groups.T h e ysh owe d for Gaussian yth atanoptimal(minimumvariance )e stimatorofth e form(1.1)isobtaine d by usingonl ynl 3points ine ach ofth e e xtre me groups, incre asingitse fficie ncyre l ative to l e astsquare s from4to '. Moste l l e r(1946)note dth e ad vantage ofth re egroups insimpl epartition- base d e stimationofth e corre l ationcoe fficie ntofabivariateGaussian (x,y)popul ation, and h e found th atth e optimald i- ? 1985 Ame ricanStatistical Association Journal ofth e Ame ricanStatistical Association De ce mbe r1985, Vol . 80,No.392,T h e oryand Me th od s 1041 This content downloaded from 132.248.228.200 on Tue, 11 Aug 2015 18:18:38 UTCAll use subject to JSTOR Terms and Conditions1042Joumal ofth e Ame ricanStatistical Association,De ce mbe r 1985 visionpl ace sabout27%ofth e d ata ine achofth eoute rgroups. T h e same optimal itycal cul ationappl ie s to th e re gre ssionprob- l e m(Se ction4)andinad iffe re ntconte xt l e ad sto th e so cal l e d"27%rul e "inpsych ome trics(compare Ke l l e y1939 and McCabe1980for some re fe re nce s). Furth e rpe rtine ntworkissurve ye dbyEme rson and Hoagl in(1983). T h e se cond approach (Mood 1950;BrownandMood 1951) isnonparame tric:Divid e th e d ata intotwogroupswith th e spl it atth e me d ian x(asinWal d 'sme th od ),andfind th e sl ope bBM makingth eme d ianre sid ual inth el e ftgroupe qual toth eme d ian ofth e re sid ual sinth e righ tgroup.T h e inte rce ptaBMisth e n ch ose nto make th e me d ian ofal l th e re sid ual s ze ro.L e tXL =me d {fx:xi iL },gL=me d {yi:xi-EL },and so forth .Mood 's (1950)al gorith mforfind ingth e sl ope (re pe ate d inT uke y 1970) be ginswith b(')= (YR- YL )I(XR -XL ),compute sre sid ual s e (l ), and ite rate s b(k+ 1)-b(k)+ (e t)-e J))(/CRXL ).(1.2) He re e andit) are th e me d iansofth e kth ste pre sid ual s in th e righ t and l e ftgroups,re spe ctive l y.T h e ite rationcontinue s until th e two groupme d ians are e qual or th e corre ctionisas smal l asre quire d . T h isappe ars to be th e onl y publ ish e dal - gorith mforBrown-Moodre gre ssion. Howe ve r,itfail sto con- ve rge ofte ne noughto be inappropriate for practical d ata anal - ysis(e .g.,se e Bingh am and Eme rson1983).Se ction2offe rs ane xpl anation and anal gorith m wh ose rapidconve rge nce is guarante e d . Insh ort, th e re sistantl ine combine s th e re sistance to outl ie rs ofth e Brown-Mood me d ian re gre ssion with th e improve d e f- ficie ncyobtaine dfromth re epartitions. Inad d ition to th e re gre ssion e stimatorsjustd iscusse d ,weincl ud eth re eoth e rsofcurre ntinte re stinth e comparisons.T woare more e xpe nsive to computeth anth ose d iscusse d th usfar, butth e yre ward th e e xtra e ffortwith h igh e rbre akd ownval ue s orgre ate re fficie ncie s.T h e seareth eme d ian-of-pairwise -sl ope s proce d ure stud ie d by T h e il(1950) andSe n(1968);bT s =me d {(y - y,)I(xj -xi)I xj > 4,wh ich h asge ne ral l y h igh e fficie ncy, and th e re l ate d re pe ate d me d iane stimator bRM=me d i me d joi{(yj - yi)/(xj -xi)}introd uce d by Sie ge l (1982),wh ichh asanoptimal 50%bre akd own. T h e th ird ,l e astabsol utere - sid ual (L 1)re gre ssion(e .g.,L apl ace 1818and Basse ttand Koe nke r1978)h asbe e nsugge ste d asanimprove me ntonl e astsquare s forh e avy-tail e d e rrord istributions.Ith asaninfl ue ncefunction th at isunbound e d inxand th uscanbe se nsitive to e xtre mex val ue s.Withth iscave at, L 1re gre ssionpe rformscomparabl ytoth eVu-re sistant l ine swith re spe ct to crite ria 1-3. Se ve raloth e rinte re stinge stimatorswe re notincl ud e d inourstud y. We h ope comparisonswil l be mad e with re l ate dte ch - nique ssuch asbound e d -infl ue nce robustre gre ssionsofth eKraske r-We l sch (1982)type ,th el e ast me d ianofsquare s(L MS) e stimatorspropose d byRousse e uw(1984),andth eS-e stimators propose dby Rousse e uwand Yoh ai(1983). Some ofour re sul tswe re firstre porte dinJoh nstoneandVe l l e man (1982). 2.PROPERT IESOFT HERESIST ANT L INEL e t-d atapoints(x1,yi)'...,(xv,Yn)be give nwith xl s xzc * x_.Partitionth e x-val ue sinto th re e groups L ,M, and RcontainingnL ,nM,and nRd ata points,re spe ctive l y (nL+ nM + nR=n),wh e re XL = max{L }< xR-min{R} andMC(XL ,XR)migh t be e mpty.(Se e Ve l l e manandHoagl in1981 for ad e tail e dal gorith m.)We sh al l occasional l y abusenotation bywritingiE L for xiE L ,and so forth .T h e re sistantl inesl ope e stimatebRListh e nd e fine d asth e sol utiontome d (yi -bx,) =me d (yi -bxi). (2.1) iEL iER T h e inte rce pte stimate aRL may,fore xampl e , be ch ose n tomaketh e me d ianre sid ual sofbothgroupsze ro.Brown-Moodre gre s- sionisth e spe cial case inwh ich nM= 0,nL = nR= n/2. T h ere sistantl ine isaspe cial case ofKil d e a's(1981)we igh te dme d ian e stimatorsth at use sonl y0-1we igh ts. 2.1A Dual Pl ot T h e d ual pl otinte rch ange s th e rol e sofpointsand l ine s.Ignore th e inte rce pt, a,and pl ote ach re sid ual ,e i(b) =yi- xib,against b.T h isyie l d sal ine with sl ope -xiand inte rce pt y1 to corre spond to th e original d atapoint (xi,yi). T wo d uall ine se i(b)ande j(b) wil l inte rse ct atapointh avingb-val uebij =(y, - y1)I(xi - xi), name l y,th e sl ope ofth e l ine in(x,y)spacejoining(xi, yi)and(xj, yj) (inte rpre te d as ooifxi =xj). Foranil l ustrative e xampl e ,se e Eme rson and Hoagl in(1983).Dol by(1960)and Danie l s (1954)h ave use d th e d ualpl otinsimpl e l ine arre gre ssioncon- te xts.Fish e r(1983)d iscusse d th e se and oth e rgraph ical non- parame tricre gre ssion me th od s. T h e me d ianstrace ofth e l e ftgroupisth e pie ce wise -l ine arfunctionT L (b) =me d iEL (y1- xib).IfnL isod d , T L consists ofl inese gme ntsfromth ed ual l ine sin L .IfnL ise ve n,th e l inese gme ntsinT Lave rage ad jace ntpairsofd uall ine s.Since XL< XR, th erigh tme d iantrace T R(b) =me d iER(Yi - bxi)inte r- se cts T Le xactl yonce .T h ebval ue ofth is inte rse ctionisth esl opeofth ere sistantl ine (se e Figure 1).T h is graph ical con- structionsh owsth atbRL al wayse xists and isunique . T h e d ual pl otre ve al sth e d ifficul tie se ncounte re d by th e Mood - T uke yal gorith m.L e tXR=me d iERxi,jR- me d iFRyi,and soforth , andAx= XR-XL .T h e n th e first e stimate d sl opeis b(l ) =(YR - yL )/AX.On th e d ualpl ot th e d istance be twe e n th eme d iantrace sat anysl ope , b,isth e d iffe re nce be twe e nth eme d ianre sid ual sin th e l e ft and righ tgroups.T h us,from(1.2), b(k+l )= b(k)+ [T R(b(k))- T L (b(k))]/Ax.Now,if I[T R(b(k)) -T L (b(k))]IAxI > 2|bk)- bRL I, (2.2) th e nIb(k+1) bRL> Ib(k)- bRL I,-andth isste pofth e ite ration h asmad eth e approximationworse . Ine qual ity(2.2)canbewritte nas T R(b(k)) - T R(bRL )T L (b())T L (bRL )|F_;."| ~~~~> 2Ax, (2.3) b(k) - bRL b (k) -bRLwh ich bound sth ed iffe re nce inth esl ope softh eme d iantrace s ne arbRL . Ame d iantrace can be ste e ponl yifanxval ue inits group isl arge inmagnitud e .T h usth e ite rationcanbe stymie dbyh igh -l e ve rage points withe xtraord inaryx-val ue s.Se e Eme r- sonand Hoagl in(1983)for aspe cific e xampl e constructe d by And re wSie ge l . This content downloaded from 132.248.228.200 on Tue, 11 Aug 2015 18:18:38 UTCAll use subject to JSTOR Terms and ConditionsJoh nstone and Ve l l e man:T h e Re sistantL ine and Re l ate d Re gre ssionMe th od s1043 e (b) %~~ ~ sT L20. %% (~~~^~~S(t80} /~~~O ...r..........s -20.d (_s,0 (/S**I" (-4.7.-4). -2.-7)N~~~FI. h-a.0?-2. Figure1.T h ed ualpl otd ispl ayse ach d atapoint,(x,y)asth e l ine e =y - xb.L ine s are sh own onl yforpoints inth e l e ftorrigh tgroups of th e d atase t.T h eme d iantrace s,T L and T R,fol l owth e me d ian re sid ual , e ,fore ach sl ope ,b.T h eabscissaofth e irinte rse ctionisth e re sistant l ine sl ope , bRL - 2.2AnAl gorith mfor Re sistantL ine s T h e d ualpl otsugge sts anal gorith mforth e re sistantl ine th at wil l al waysconve rge .T h e ze roofth e pie ce wise -l ine armono- tone -d e cre asingfunctionE(b)= T L (b)-T R(b) yie l d sth ere sistant l ine sl ope . Among al gorith ms for find ing ze rosof functions,th eone known asZEROIN(Wil kinson 1967; De kke r1969; Bre nt1973)iswe l l suite dtoth is probl e m.T h e al gorith m re quire sch oice ofatol e rance , tol ,wh ichisth e amountofe rrorinbth at canbe tol e rate d ,andaninte rval (bmjn,bma)to se arch . Bre nt (1973)sh owe d th at th isal gorith mwil l al waysconve rgeinfe we r th an[l og2(bma,,- bmijn)tol 1)]2ste ps,wh e re tol l =.5tol + 2.0 xe x Ibma,iand e isth e mach inee psil on(i.e ., th e smal l e stnumbe rfor wh ich th e mach ine 'sfl oating point re pre se ntationsof1.0+ e and 1.0d iffe r).Bre ntcl aimsth at th e al gorith mcanusual l y be e xpe cte dto conve rgemuchmorequickl y;forourappl ication,oure xpe rie nce confirmsth is.Eachste pre quire s th e me d ianofth e re sid ual s ine ach ofth e oute rgroups. Asymptotical l y,th e me d ianofnnumbe rscan be foundwith 0(n)comparisons(e .g.,se e Ah o,Hopcroft, and Ul l man 1974,pp.97-99).Formod e rate sampl esize ,h owe ve r,n/3is too smal l to make th e l arge ove rh e adofspe cial ize d me th od s worth wh il e ,so inpractice 0(nl ogn)me d ian-find ingal gorith ms are usual l yuse d .T h e conve rge nce rate ofZEROINisind e - pe nd e nt ofn,so th isal gorith mre quire s0(n)ope rations. Be - cause E(b)ismonotone ,anyb. and bmi,th atsatisfysign(E(b,)) $ sign(E(bj)) aresuitabl e ;th e firsttwoe stimate sfrom Mood 's original ite rationofte nsuffice .Ve l l e manand Hoagl in(1981) provid e d portabl e programsforfind ingre sistantl ine susingth is al gorith m. 2.3 Unbiase d ne ss Consid e rth e bivariatel ine armod e l yi = a+ fXi+e i(i = 1,...,n),inwh ich th e{e i} areiid and(a)th eXi arefixe dor (b)th e Xiare iid and th e e iare ind e pe nd e ntofth e m.T h ere sul tsare give nfor mod e l aand fol l ow for mod e l bbycon- d itioningonX.IfnL and nRare both od d ,th e nbRL isme d ian unbiase d forfwith out anyassumptionofsymme tryonth egi.T h us P{bRL P>f}= me de i' me di4', iERiELandsimil arl yP{bRL C fi}1.Supposenowal soth at th e e rrors e iare symme tricabout 0and ind e pe nd e nt,butnotne ce ssaril y id e ntical l yd istribute d .In th is case bRL and aRL aresymme trical l y d istribute d aboutfand a,re spe ctive l y,and th usare me d ian unbiase dand unbiase d(ifEIe 1i < oo) for al l val ue sofnL andnR. T ose eth is, noteth at {bRL - f > c}= {me d R(e i- cx") >me d L (e i - cxj)} and th at {bRL -f< -c}ise quival e nttoth e same e ve ntwith e ve rycire pl ace dby- ci. 2.4Bre akd ownBound s T h e (grosse rror)bre akd ownpointofane stimatorme asure s itsabil itytore sistth ee ffe cts ofoutl ie rs(Hampe l 1971). L oose l y,th e bre akd ownpointisth e l arge st fractionofth e d ata th at can be ch ange d arbitraril ywith outch angingth e e stimatorbe yondal l bound s.T h e re sistantl ine h asabre akd ownval ue of(1/n) min{[(nL-1)/2],[(nR-1)/2]} be cause we ne e donl y al te rh al f ofth ed atapoints in one ofth eoute rgroupstotakecompl e teThis content downloaded from 132.248.228.200 on Tue, 11 Aug 2015 18:18:38 UTCAll use subject to JSTOR Terms and Conditions1044Journal ofth e Ame ricanStatistical Association,De ce mbe r 1985 controlofth e sl ope e stimate . IfnL = nM=nR,th e bre akd own val ue isapproximate l ye qual to A. A Probabil isticBre akd ownVal ue .Al th oughabre akd own val ueofAisce rtainl yre sistant(l e ast square sandl e ast-absol ute - val ue re gre ssionboth h ave bre akd ownsof0%),itispoore rth ancomparabl eval ue sfor Brown-Mood (25%),T h e il -Se n re gre ssion (29%),orre pe ate d me d ian re gre ssion(50%).How- e ve r,bre akd ownconce ntrate sonworst-casebe h avior. Inre al - ity,notal l subse tsofsize Ware e qual l yd ange rous for th ere sistant l ine ;for th e l ine to bre ak d own,al l bad pointsmust l ie inth e sameoute rth ird . Ifwe re strictatte ntiontogross e rrors in y(soth atth ex-val ue s are h e l dfixe d )andassume th ate xtre mey-val ue soccur ind e pe nd e ntl ywith probabil itya(butcontinueto assume th atth e yd e viate inth e same d ire ction),we can cal cul ate aprobabil istic bre akd ownval ue (compareDonoh oand Hube r1982,se ction5.1).T h isgive sapl ausibl ebad -caserath e rth anworst-casebound onth e prote ctionavail abl e .Forth e re sistantl ine (incl ud ingBrown-Mood ),bre akd ownoccurs ifth e re are more th an[((nL-1)/2](re spe ctive l y,[((nR -1)/ 2])e xtre meyval ue sinth e l e ft(righ t) groupsth atd e viate in th e same d ire ction. Forconve nie ncetake nL =nR = 2k+ 1; th e nth e bre akd ownprobabil itybe come s1-Pr{Bin(2k+ 1, a)?k}2= 1 -II-a(k+ 1,k+ 1)2,wh e reIx(p, q)is Pe arson's incompl e tebe tafunction. T abl e 1 give ssome val ue sfor th e bre akd ownprobabil ity und e rth e pre ce d ingrand om l ocal ize d grosse rrormod e l .Forth e re sistantl ine ,nL =nR = nl 3,andforBrown-Moodre gre s- sionnL =nR = [nl 2],wh ich yie l d sal arge rval ue ofkandh e nce l owe r bre akd ownprobabil itie s. We al so give re sul tsfor re pe ate d me d ianand T h e il -Se n re gre ssionfor comparison.Inth e se case s,th e possibil ityof bre akd ownisaffe cte donl ybyth e size ofth e subse t d rawn soth atth e bre akd ownprobabil ityisgive nbyasingl e binomialprobabil ity.T abl e 1sh ows,fore xampl e ,th at forsampl esize 27,in90% ofcase sth e re sistantl ine cantol e rate about25% contamination with outl ie rs(and th isstil l assume sth atal l outl ie rs re inforcee ach oth e rinl ocationand sign).Forpracticalappl ications, a contaminationl e ve l ne ar25%se e msappropriate asaguid e in d e te rminingth e appl icabil ityofth e re sistantl ine me th od . Noteth atth e T h e il -Se nme th od ismuchmorese nsitivetogross e rrorT abl e1.Il l ustrative Bre akd ownProbabil itie s a Me th od .20.25.33 n= 27 Re pe ate dme d ian, L MS.0002.003.04 Brown-Mood .014.05.19 Re sistant l ine .04.10.26 T h e il -Se n.07.25.56 L e ast square s.998.9996.99998 n= 63 Re pe ate dme d ian, L MS00.002 Brown-Mood .0002.003.05 Re sistant l ine .002.013.10 T h e il -Se n.02.14.37 L e astsquare s-1.0-1.0-1.0 NOT E:ais th e probabil itytofgross e rror;L MSisl e astme d ianofsquare s.probabil itie s cl ose to and e xce e d ing itsnominal bre akd own pointof29%. He re ,andinSe ction6,itisinstructive to note th e trad e -offs among our crite riaofspe e d ,re sistance ,and e fficie ncyth at occurincurre ntre gre ssionme th od s.High -bre akd ownl ine ssuchasl e astme d ian ofsquare s (Rousse e uw1984)canre quire e x- te nsive computationor th e sol ution ofaminimizationprobl e m th atmayposse ssmanyl ocal minima.High l ye fficie nte sti- mators e ith e rinvol ve more computation(M-e stimators,Kras- ke r-We l sch )or sacrificere sistance(l e astsquare s). T h e quick, re sistantpe ncil -and -pape rme th od sd iscusse dh e recan onl yaim for mod e rate e fficie ncyl e ve l s. 2.5 Re sistantL ine and Re pe ate dMe d ianRe gre ssion T h e re pe ate d me d ian re gre ssion e stimateofSie ge l (1982), bRm = me d me dbij, ijoi paysfor itsoptimalgrosse rrorbre akd ownofne arl y 50% withacomputational compl e xity of0(n2)ope rations.Inte re stingl y,mod ifyingth e re pe ate d me d ian bycomputingbij onl yforxiEL and xjE Rise quival e ntto using th e 0(n)re sistant l inee stimator. Proposition2.1.IfnL and nRare both od d ,th e nbRL=me d rERme d l EL bl r. ProofJ0.Inth e d ual pl ot,l e tth e inte rse ctionofe r(b) =Yr -Xrb(XrE R)with T L (b)occur atbr.Fromth e d e finition T L (br) =me d l ELe ,(br) and fromth e re l ation e l (br)c T L (br)h ol d s,re spe ctive l y,asbirsbr, itfol l ows(since nL isod d )th at br=me d l EL bl r.Proof20.Since bRL satisfie sme d re Re r(bRL )= T L (bRL )by d e finition,itise nough to note th at e r(bRL )C T L (bRL ),re spe ctive l y, as br bRL . Re mark.Inanord e re dse txl x2.c x2,, ofe ve n card inal ity,we use th e te rms l ome d xiandh ime dxi forx,, andx,1,re spe ctive l y.IfnL ise ve n,th e concl usion of10 iswe ak- e ne d to br 'h ime d l EL bl r, and ifboth nL andnRare e ve n(fore xampl e ),th e concl usionofth e propositionisjustth at l ome d l ome dbl r SbRL C h ime d h ime d bl r. rERl EL rERl EL3.CONFIDENCEINT ERVAL SFOR,f Distribution-fre e confid e nce inte rval s for th e sl ope ,f,,ina re gre ssionmod e l may be d e rive d from ananal ogofth e me th odth atgive sinte rvale stimate sfor th e popul ationme d ian.Con- sid e r Mod e l aofSe ction2.3,and assume th atth e e ih ave a continuous d istribution;th e re sul tse xte nd to th e rand om-car- rie rsmod e l asbe fore . Write e i(fl ) =yi-xif forth ere sid ual of(xi,yi) at f (ignoringa)inth e d ual pl ot.L e te (L r)(fl )d e note th e rth smal l e st re sid ualinth e l e ftgroup,{y-x1/,iE L },andd e fine e (R)(fl )simil arl y inth erigh tgroup.Assume th atnL = nR-=m(say) forcon- ve nie nce : T h e reisnod ifficul ty ine xte nd ingth e th e oryorcom- This content downloaded from 132.248.228.200 on Tue, 11 Aug 2015 18:18:38 UTCAll use subject to JSTOR Terms and ConditionsJoh nstone and Ve l l e man:T h e Re sistantL ine and Re l ate d Re gre ssionMe th od s1045 putationsto une qual nL andnR.Asaconve ntion, se ts= m+1-r.L e t fi. be th e uniquesol ution of e fr)(fi)= e (S)(fi).(3.1) Existe nce and unique ne ss of /irfol l ow fromth e d ual pl otas be fore .Ofcourse ,ifmisod d th e n f[(i+1)/2]=fiRL ,and ,withanappropriate ch oice ofinte rce pta,find ingfir amountstol ocatingth e l ine in(x,y)space such th atr d ata pointsinth el e ftgroupl ie onorbe l owth e l ine andrpoints inth erigh tgroup l ie onor above th e l ine . Cl e arl yfil ?2 fi-?? fim,and we now sh ow th atpairs (fifir)(r < s)yie l d d istribution-fre e confid e nceinte rval sforfi. T h e function g(fi) =e R)(fi) -e r (fi) isstrictl y d e cre asing, and h e nce ifth e mod e l yi = a+fixi +e iobtains, Pp{fi >fir} =Pp{4e) (fi) >e ()(fi)l } =P{L ) >Rj} wh e re e (l )c ce ()and e (l )cR )are th e ord e re d j inth e oute rgroups.T h e quantityUm= th e numbe rofval ue s inth e righ t group th at e xce e dC(r)isane xce e d ancestatisticandh asprobabil itymassfunction P(UM= x)= (m-x+-1)(m-r+x)/(2m), x= 0,1,...,m,(3.2) asmaybe se e nfromasimpl e combinatorial argume nt.NowP{fl { > fir}= P(U'm < r),and byasymme tric argume nt(sinces= m+ 1-r),Pp(fs > fi} = P(U'm< r)al so.T h usth einte rval[fissfir]cove rsfi with probabil ity 1-2P(Ur'< r). T h e l atte re xpre ssion h asbe e ntabul ate d byEpste in (1954)form= 2(1)15(5)20, r- m,and more e xte nsive l ybyBe ch te l(1982). Re marks 1.Finite ne ss ofth e se t{81i . ..,fim}l imitsth e numbe rof possibl econfid e ncel e ve l s.T h iscanl e adtol argegaps be twe e n succe ssive possibl e confid e nce l e ve l sifmissmal l . 2. T h e pre ce d ingconfid e nceinte rval canbe inve rte d inth eusual wayto give ate stofHo: fi= fioagainst ge ne ral al te r- native s.T h e te stisbase d onth e numbe rofd ata pointsinth el e ftgroupth atl ie onoraboveth el ine withsl opefioandinte rce pt ch ose nso th at mpointsinL U Rl ie onor above it.T h iste st isd iscusse dinth e case nM = 0,usingnormal approximations, byHogg(1975), al ongwithe xte nsionstope rce ntil e re gre ssion. Oth e rre l ate d d istribution-fre e te stsforbothsl ope andinte rce pt we re give nbyBrownand Mood (1951),Danie l s(1954),andAd ich ie (1967).3.If,inste ad ofMod e l saor b,we assume th at,give nx, a+ fixisth e (IOOr/m)th pe rce ntil e ofth e Yd istribution,th e n ane stimateofth e kindconsid e re d byHogg (1975)isobtaine d(with nMnotre stricte d to be ze ro)bysol ving for,B3inanal ogof(3.1):e (r)(fi)=e (r)(fl ). Asymptotic Approximationfor P( UmP(W0Q') >xI\/ XRor both .Assume th at 0 ),and we take me d G =(l ome d G+ h ime dG)/2.De fine th e re sistant l ine sl ope functional /3RL (H)asth ecrossingpointofze ro ofth e function h (fl ) =me d (Y - f,X IX 2XR)- me d (Y- f,X IXc XL ). Existe nce and unique ne ss are cl e arbe cause th e sl ope ofh (fl ) is al ways l e ssth an - (XR - XL )andh (fl )/fl + [me d (X| XC XL ) - me d (X IX ?XR)]asfl -+ oo. 4.1Consiste ncy Fish e rconsiste ncyoffl RL (H)h ol d sforl ine arre gre ssionmod e l s ofth e formY = at+ fiX + e ,wh e re me d (e | X)= me d eal mostsure l y.Ind e e d ,me d (Y -fAX| X 2XR)= me d (a}+This content downloaded from 132.248.228.200 on Tue, 11 Aug 2015 18:18:38 UTCAll use subject to JSTOR Terms and Conditions1046Journal ofth e Ame ricanStatistical Association,De ce mbe r 1985 IIX >XR)=me d (a+ e )= me d (a+IIX c XL )= me d (Y- fiX I X: XL )so th at fiRL (H)=fi. Inl ocationprobl e ms th e me d ianofiid sampl e sfromGis consiste nt ifl ome d G= h ime d G.T o d e scribeth e anal ogous strongconsiste ncyprope rtie sof/JRL (H),we introd uceh *(fi) = h ime d (Y - fiX IX 2 XR) -l ome d (Y- fiX | XXXL ), andh *(fi)= l ome d (Y -fiX IX 2XR) -h ime d (Y- fiX X XL ). Cl e arl yh *(fi)2h (fi)2h *(fi),andth e uniqueze rosfi*,fiRL , fi* ofth e re spe ctivefunctions satisfyfi*2fiRL 'fi*. Exampl e 4.1.Supposeth atHpl ace s e qualmass onth e fourpoints{(-1,-1),(-1,0),(1,0),(1,1)}.T h e nh *(f) = 2 - 2f, h (fi) = 1-2fi,h *(fi)= -2f,and ,f* =1,fiRL =2,fi*= 0.Forsampl e sH,, ofsize nfromHitiscl e arl ypossibl eto h avefiRL (HJ) = 0or 1. L e tHnbe ase que nce ofbivariate d istributionswith corre - spond ingpartitionpointsXL ,n < XR,n,probabil itie sPL ,n , PR,n, and re sistant l ine e stimators fil '2. T h e notations Hn -4Hand2(Xn) S-- (X)d e noteconve rge nce ind istribution-ofd istri- bution functions and rand omvariabl e s, re spe ctive l y. Proposition4.2.Suppose th at Hn -4 H,and th at PL ,n PL ,PR,nPR.T h e n fi*_ l iminf fi2sl imsup fi2 cfi*. Proof.We prove th at fi*_l iminf fil ) _l imsup fi*(n)Cfi*.Itise nough to ch e ck,for e xampl e ,th at l iminf41(b)' h *(b)> 0 for anyb 0} - I{x< 0}isth e infl ue ncefunction ofth e me d ian and CH=[2g(O)pL (,R - pL )]-1 Qual itative l y, (4.1)accord swith intuition fromd ual pl ots.T h e positionofth e point(fi,y - fix)with re spe ctto th eappropriate (l e ftifxC XL ,righ t ifx>XR)me d ian tracee val - uate d atfid e te rmine s th e d ire ctionofch ange insl ope . T h eanal ogy be twe e n th isinfl ue ncefunctionandth at ofth e me d ian inl ocation probl e ms ispursue d furth e rinSe ction5. For anarbitrarybivariated istributionH,th e formofth einfl ue nce functionre mainsth e same e xce ptth atCHtake son two val ue s(give ninAppe nd ixB)accord ingasx2XRorX < XL . 4.3 Asymptotic Normal ity A stand ard h e uristicargume nt(e .g.,Hube r1981) asse rtsth at if Hnd e note sth e e mpiricald istributionfunction ofasampl e(Xi,Y1)ofsize nfrom H,th e nth e statisticfl Ri(Hj) isasymp- totical l ynormal with me an f,RL (H)and asymptoticvariancegive nbyEH[IC(fiRL , H,(X,Y))]2. IfHE XIr, th e nwe use(4.1)to obtain AsVar\/[ fRi(H) - fi,RL (H)]= 1 2pL g2(0)(1pR - L ) 11 var XG4g2(0) wh e reXGd e note sth e groupe d rand omvariabl ed e fine d by XG= PRwith probabil ityPR -,UL with probabil ityPLPR-P + L Iwith probabil ity1-(PR+ PL ) PR+ PLThis content downloaded from 132.248.228.200 on Tue, 11 Aug 2015 18:18:38 UTCAll use subject to JSTOR Terms and ConditionsJoh nstone and Ve l l e man:T h e Re sistantL ine and Re l ate d Re gre ssionMe th od s1047 and h avingvariancee qual to (,UR - PL )YPL PR/(PL + PR)or(IUR-PiL )2PL I2inth e symme tric casewh e nPL = PR. Westate ave rsionofth easymptotic normal ityth e ore mforth e re sistantl inesl ope e stimator,prove d in Appe nd ixCforth emore ge ne ral se ttingofSe ction5. T h e ore m4.3.Supposeth at(X1,Y1),i=1, ... , n,isan iid sampl e from HE 5f(.Supposeal soth at E|XI< cc; e h asa continuouspositive d e nsityatitsme d ian,0;nL In4 PL ,nR'n -> PR;and (forconve nie nce onl y)PL = PR.T h e n ( fh RL- /1)-*Gau(O, {2PL CL R - AL )g(O)})- Re marks. I. If PL $P(X c)= P(Z > W)+ o(l ),and th is l e ad stoth ecl aime d concl usion.In practice ,h owe ve r,th e groupsLand Rare notch ose nwithre fe re nce tofixe d val ue sXL and XR, butrath e rby usingsampl equantil e s.T ocove rth e se case sit isconve nie nttouse th e Be ny- Esse e nme th od ofAppe nd ixC. 4.4AsymptoticEfficie ncy Give nth e marginal d istributionFofX, th e partitionpoint XL(and fromsymme tryXR al so)canbed e te rmine d tominimizeasymptotic variance by maximizing PL (/UR-1k).*If Fissym- me tricabout0and PL=PR, th e n JUR = -UL ,and itsuffice s tomaximizeh (XL )={flSd F(s)}lF(XL ).(4.2) IfEX2 < 00,th e nth eCauch y-Sch wartz ine qual itysh owsth at h (XL )O*0asXL>oo*Howe ve r,ifth e tail sofF vary as l xi- for a c2,th e n h (x) fail sto conve rgeto ze roasx - - oo[compare th ere marksfol l owing(4.3)1.Wh e nEX2< Xoand F h asa d e nsityf,anycritical pointxoE (-,0)of(4.2) satisfie s /tL (XO) = 2X0,and asufficie ntcond itionfor xo to maximize(4.2)isth atfbe unimod al and F(xo) 2A. Spe cificcase swe re consid e re d byNairand Sh rivastava(1942),Bartl e tt(1949),BartonandCasl e y(1958),andGibsonandJowe tt(1957a),wh oobtaine d 2ai[F(XL )(1uR-IiL )2]-asth evariance ofth eWal d e stimator(1.1)und e rnormal th e oryas- sumptions.We l istacoupl eofth e secase sandre markth at Se ction 5sh owsth at asymptotical l y,th e optimal ch oice of d ivisionpointsd e pe nd sonl y onth emarginal d istributionofIandnot onth eparticul ar Me stimatorappl ie d toth ere sid ual s (se eSe ction5).If Xisuniforml yd istribute d onsome inte rval , or pl ace s e qualmass ate qual l yspace d points, th e n (4.2)is maximize d byF(XL ) = A,wh ich l e nd s supportto th e ch oiceof"th ird s"asth epartitionpoints.IfF(x)= ?I(x),th e n(4.2) be come s02(xL )I'I(xL )[wh e re +(x)= ?'(x)],wh ich ismax- imize d byXL = .6121, corre spond ing to (P(XL )= .2702, andyie l d sth e "27%rul e "d e scribe d inth eIntrod uction. Consid e rnowth e asymptoticre l ative e fficie ncyofth e re - sistantl ine e stimator formod e l sin 5Cr.L e t {T j} beanyse que nceofre gre ssioninvariante stimate sof,Bbase d onsampl e s(X1, YI), ..,(Xn,YJ) ofsize n;th atis, T n(Xi, a+fiX,+ e ) =T ,(X1, gi) +f,. T h e l ocal asymptoticminimaxre sul tsof Hije k(1972,th e ore m4.2)impl yth atforapproximate l y smoothmod e l sinXf, l iminfnEa,f(T n- fl )2 2{varX E[g'(g)/g(6)]2}-I=U2, n--ooand th ate qual itycan occuronl yif\/;;(T n- fi) isasymptot- ical l y N(0, 42,J). T h e re fore itisnatural to consid e rth e ratio(compare Se ction6) a2,l [AsVarfl ]= {varXG/varX} {4g2(0)/Ig},(4.3) wh e re l g =E[g(6)Ig(Q6)]2 isth eFish e rinformationofth ed e n- sityg.Cl e arl y(4.3) canbemad e arbitraril ysmal l by appropriatech oiceofg.T h is situationcoul d , inth e ory,be re me d ie d by re pl acingth e me d ianwithan(ad aptive ,e fficie nt)e stimate of l ocation(compare Se ction5).T h e ratiod e pe nd ingonth eXd istributioncanal sobe mad earbitraril ysmal l ,h owe ve r,by takingd e nsitie s forX with tail soford e rx-(3+e )for e > 0.Fore xampl e ,iffx(x)= A(a-l )x-a Il l xl> 1}, th e n supXL T *(F)= sup{t:A(F, t)> O}.T h e state me ntand proofofProposition4.2re main val id (with th e obviousch ange s) und e rth e e xtrate ch nicalre - strictionth at yp be bound e d .T h is facil itate sth eproof th atT *(F) andT *(F)arese micontinuous(se e Appe nd ix A).Inparticul ar, ifHE NCwith T (Q) unique l yd e fine d ,th e nfJB, isstrongl y consiste nt for,B. If ygisbound e d , th eV-re sistant l ine re tains th e prope rtyof bound e dinfl ue nce inboth xandy,posse sse d byth e ord inary re sistantl ine .L e t H8= (1-5)H +c5I{yl . Und e rappropriatere gul aritycond itions, th e argume ntinAppe nd ix2B sh owsth at ifHE, and e h asd e nsityg with t(&Xe ))= 0,and EIXI2XR(X_ k}[wh ich isminimax ove r contaminationne igh - borh ood softh e Gaussiand istribution(Hube r1964)]with k =1.5raise sth e ARE with re spe ctto l e astsquare s (for uniform X and GaussianY)from57%to 85%,wh il e re taininggoodbre akd ownbe h avior. Re d e sce nd ing v T h e use ofnonmonotone ("re d e sce nd ing") V-functions h as be e nad vocate d to affordgre ate rprote ctionfromoutl ie rs. So- l utions arenol onge rguarante e d toe xistorbe unique , h owe ve r,andcare isre quire d ine stimatingth e scal e ,a[compare (5.2)]. With out e ith e runimod al ityassumptionsonth e e rrord istribu- This content downloaded from 132.248.228.200 on Tue, 11 Aug 2015 18:18:38 UTCAll use subject to JSTOR Terms and ConditionsJoh nstone and Ve l l e man:T h e Re sistantL ine and Re l ate d Re gre ssionMe th od s1049 tionor al owe r bound onth e tuning constant,th e re sul tinge stimatorsne e d note ve nbe consiste nt(e .g.,DiaconisandFre e d man 1982).Ne ve rth e l e ss,such e stimatorspe rform we l linpractice .We h ave incl ud e d a iy-re sistantl ine using aone - ste pbiwe igh t(e .g.,se e Moste l l e rand T uke y1977)inth e e x- pe rime ntd iscusse dinSe ction 6.Ah e uristicargume nt(outl ine dinAppe nd ixD)sh ows,und e rre gul arity cond itionse nsuringasymptoticunique ne ssofth e sol ution,th at th e asymptoticnor- mal ityre sul t (5.4)ispre se rve d forawid e cl assofnonmonotoneyi-functions incl ud ingth e biwe igh t.Furth e rmore ,th e Gaussian e fficie ncyofth e biwe igh tiscl ose to th atofth e optimal"h ype rbol ictange nt"re d e sce nd inge stimators(Hampe l , Rous- se e uw,and Ronch e tti1981). 6.SMAL L -SAMPL E EFFICIENCIES We stud ie dsmal l -sampl e prope rtie sof10simpl e re gre ssion me th od sund e rth e stand ard l ine armod e lassumptions, XI((Se c- tion4.2),using aMonte Carl o e xpe rime nt. T h e e xpe rime nt d e signwasacompl e tecrossingofth re esampl esize s(n= 10, 22, and 40)bytwo x-configurationspe cifications [fixe d e qui- space dandrand omGaussian(0,1)]byth re ee rrord istributions [Gaussian(O,1),amixture of90%Gaussian(O,1)and 10% Gaussian(0, 9),and "sl ash "].T h e Gaussianmixture isone forwh ich th e me anandme d ian h ave approximate l ye qual asymp- totic variance s.T h e "sl ash "d istribution,ge ne rate dasaunit Gaussianvariate d ivid e d byanind e pe nd e nt uniform(0,1) variate , h asinfinitevariance but isnotaspe ake dasth e Cauch y d e nsity. T h e me th od s stud ie dfal l into th re ecate gorie s: I. nonre sistant(bre akd ownval ue=0)me th od s with com- putinge ffort0(n):l e astsquare s (L S),l e astabsol ute re sid ual(L AR)[al gorith mfromInte rnational Math e matical and Statis- tical L ibrarie s(IMSL )1980],and th e partitione d l ine me th odofNair,Sh rivastava,and Bartl e tt(NSB). II.re sistant me th od s with computinge ffortO(n2):re pe ate dme d ian (RM)andSe n'sve rsion ofth e ove ral lme d ianpairwisesl ope (SEN). III.re sistant me th od swith computinge ffort0(n):Brown- Mood re gre ssion(BM),re sistantl ine (RL ),y,-re sistant l ineusing one biwe igh tste p(c =6)to improvee ach me d ian andth e ove ral lme d ian absol uted e viation from th e me d ian (MAD) asascal e e stimate (RL BW),th e y,-re sistantl ine using onebiwe igh tste pbutal ocal MAD compute d onl ywith ine achpartition (RL BW3),and th e,v-re sistantl ine using aful l yit- e rate dHube ry,function and l ocal scal e (RL HU3).Al l ofth epartition-base d l ine s(NSB,BM,RL ,RL BW,RL BW3,BL HU3) we re compute d usingamod ifie dve rsion ofth e cod e publ ish e dbyVe l l e manandHoagl in (1981)forth ere sistantl ine .T e ch nicald e tail softh e e xpe rime ntare re porte d inAppe nd ix E. T h e e xpe rime nt e mpl oye d a"Gaussian ove r ind e pe nd e nt" variancere d uction"swind l e "(for e xampl e ,se e Simon1976 or Good fe l l ow and Martin1976).We al so d e ve l ope d ane wswind l eforth isstud y toimprove pe rformance onnon-Gaussian e rrord istributions.T h e swind l e isd e fine d inAppe nd ix E andd iscusse d more e xte nsive l yinJoh nstone and Ve l l e man (1984, 1985).T h e swind l e sincre aseth e e ffe ctive numbe rofre pl ica- tions,typical l ybyfactors of3-8.(T h at is,th e stand ard e rrors ofour e fficie ncy e stimate sare assmal l asth ose ofanaivesimul atione xpe rime ntwith be twe e n3and 8time sth e numbe rofre pl ications.)Occasional l y (e spe cial l y forth e moree fficie nt e stimators)swind l e gainfactors re ach e d 160. T abl e2pre se ntsth e smal l -sampl e e fficie ncie sforfixe d e qui- space dx.T h e e fficie ncie sh ave be e ncompute d re l ativeto th eCrame r-Raol owe r bound for e ach situation,so 100%e ffi- cie ncywil l onl ybe attainabl e inth e Gaussiancase h e re .Inth is se nse ,th e e fficie ncie sre porte dfor th e mixe d Gaussianandsl ash e rrord istributionsare conse rvative . T h e actuale fficie n- cie swoul d be obtaine dfromth e varianceofth e Pitman e sti- matorfore ach situation, wh ich coul d be we l l e stimate d byth escorefunctionswind l e(compare Joh nstone andVe l l e man 1985,se ction4.) Fe w smal l -sampl ere sul tsh ave be e npubl ish e d for anyof th e se me th od s.Sie ge l (1982)re porte d e fficie ncie sfor th e RM atsampl esize 10and20(forfixe d x,.69and .73;forGaussian X,.53and .61,re spe ctive l y) th at agre e with ours. Exce ptfor sl ash -d istribute d e rrorsatsampl e size 10(wh e real l e stimators-pre d ictabl y-d id poorl y),th ere sistantl ine pe r- forms we l l ,usual l y e xce e d ing60% e fficie ncy.T h e incre asein e fficie ncyove r asymptoticval ue site xh ibitsfor Gaussian andmixe d Gaussiane rrorsatsmal l e r sampl e size sisapl e asant be ne fitinate ch niqueofte ncompute dbyh and for smal l d ata se ts.T h isincre aseappe arsto d e rive from th e simil arbe h aviorofth e me d ian,wh ich is100% e fficie nt(re l ativeto th e me an) inGaussian sampl e sofone or two and d e cl ine sine fficie ncy to anARE of2/yr.T h e re ve rse tre ndfor sl ash e rrors appe ars to be d ue to bre akd ownofth e me d ian for th e sampl e s ofnl 3 pointsinth e oute rpartitionsofsmal l sampl e s. T h e re sistantl ine isth usappropriate forh andcal cul ationon smal l sampl e swith th e cave atth atd atase tssh oul d be abit l arge r(atl e ast15,pe rh aps 20,pre fe rabl y40)ifve ryh e avy- tail e d e rrord istributionsare suspe cte d . Amongth e e stimatorsord inaril yfound onl ywith th e aid of acompute r,th e biwe igh t'-re sistantl ine (RL BW)pe rforms quite we l l .Ith asth e h igh e stmaximine fficie ncy("tri- e fficie ncy")atsampl e size s22and 40amongme th od sstud - ie d h e re .Itcl e arl yd ominate s atmod e rate sampl esize samonge stimatorswithbound e d infl ue nceandine xpe nsive(0(n))com- putationord e r.Be cause th e biwe igh tisasingl e -ste p improve - me ntofth e me d ians inth e re sistantl ine ,th ise stimatorcanbecompute dbyh and ifaprogrammabl e cal cul atorisavail abl e . For itsfavorabl ecombinationofprope rtie s,we pre fe rth ise s- timatoramongth ose stud ie dh e re . Wh e nth e scal e e stimate iscompute d onl yfromth e l ocalpartition(RL BW3,RL HU3),we canre l axth e assumptionof h omosce d astice rrors, butwe wil l payabout 5%ine fficie ncy ifth atassumptionisinfacttrue . T h e L ARl ine pe rformssimil arl y to th e RL BW.Itsprimary we akne ssisth atitisnotre sistantto th e e ffe ctsofl arge fl uc- tuations ate xtre mexval ue s. T h e me th odofT h e il andSe n,andSie ge l 'sre pe ate d me d ian,re ward th e 0(n2)computinge xpe nse with h igh e rGaussiane f- ficie ncie sand (for th e re pe ate d me d ian)h igh e r bre akd own. T h e ir e fficie ncyad vantage ,h owe ve r,ise rod e d atth e morese ve re e rrord istributions,and th e ircomputationsare usual l y too compl e xfor h andcal cul ation and canbe too e xpe nsive toappl yto l arge d ata se ts. T h e e fficie ncie sre porte d inT abl e 3forrand omGaussianXThis content downloaded from 132.248.228.200 on Tue, 11 Aug 2015 18:18:38 UTCAll use subject to JSTOR Terms and Conditions1050Journal ofth e Ame rcanStatistical Association,De ce mbe r 1985 T abl e 2.Efficie ncyofSl ope Estimate sRe l ative to th e Crame r-RaoL owe rBoundDistributionofRe sid ual s GaussianMixe dGaussianSl ash T rie fficie ncy Fixe dEqui- space d xn= 10n= 22n= 40n= X n= 10n= 22n=40n=X n= 10n= 22n= 40n= X n= 22n= 40 L S100 10010010069.869.869.869.8000000 L AR61.963.363.663.762.967.268.469.736.260.968.077.360.963.6 (.4)(.5)(.8)(.5)(.5)(7)(1.3)(1.7)(1.1) NSB 89.188.988.988.962.262.162.062.000000 0 ilRM69.773.072.367.1 75.075.337.059.465.959.465.9 (.4)(.4)(.7)(.6)(.4)(.7)(9)(7)(1.1) SEN 87.991.493.095.581.5 89.192.095.835.156.062.570.956.062.5 (.2)(.2)(.3)(.4)(.3)(.3)(1.1)(.7)(1.1) 111BM50.049.049.247.851.251.553.152.329.746.253.658.046.249.2 (A4)(.6)(.8)(.5)(.6)(.8)(.9)(.8)(1.1) RL 65.360.558.856.664.063.663.062.016.851.062.268.751.058.8 (A4)(.5)(.8)(.6)(.5)(.8)(2.5)(1.0)(1.1) RL HU3 73.568.666.985.765.6 69.470.886.08.737.854.458.537.854.4 (.3)(.5)(.7)(.6)(.5) (7)(1.5)(1.3)(1.3) RL BW365.470.473.9 81.261.873.277.986.816.152.968.279.752.968.2 (A4)(.5)(.8)(.6)(.5)(.6)(2.7)(1.2)(1.2) RL BW 76.277.878.281.271.379.982.186.817.661.572.679.761.572.6 (.4)(A4)(.6)(.6)(A4)(.5)(2.8)(1.2)(1.1) CRL Bvariance .10.0454.0251/n.1256.0571.03141.261n.4854.2206.12144.851n NOT E:I-bre akd own= 0%,cost= 0(n);1k-bre akd own= 29%(SEN),50%(RM),cost= 0(n2);IIl -bre akd own= 25%(BM),16%(oth e rs), cost= 0(n);ital icize d val ue sare d e rive d from th e ory;bol d face val ue sare trie fficie nt(minimumsatsampl e size sacrossth re e d istributions). Re sul tsfor n= 10are base d on10,000 re pl ications;re sul tsfor n= 22 are base d on5,000re pl ications;re sul tsforn= 40are base d on2,000re pl ications. L ARisl e astabsol ute re sid ual ;NSB isNair-Sh rivastava(1942),Bartl e tt(1949);RMisre pe ate d me d ian(Sie ge l 1982);SEN isSe n(1968),T h e il (1950);BMisBrown-Mood (1950);RL is re sistantl ine (T uke y1970);RL BW3isw-re sistantl ine ,one -ste pbiwe igh t,l ocal MAD, c = 6;RL HU3isw-re sistantl ine ,Hube r v/(e .g.,Hube r 1981),l ocal MAD, c = 1.5;RL BW isw-re sistantl ine , one -ste pbiwe igh t,gl obal MAD,c = 6;CRL B is (naxl (g))-1, with ax,scal e d to 1.0(compare Appe nd ixE);MADisme d ianabsol ute d e viationd e rive d fromth e me d ian;stand ard e rrorsare in pare nth e se s. fol l ow th e same ge ne ralpatte rn,butth e yare some wh at l owe rth anth e corre spond ingfixe d -xe fficie ncie s.For th e re sistant l ine famil ye stimators,some ofth e l ossisd uetoth e suboptimalpl ace me nt ofXL andXRatth e 33%rath e rth an th e 27% points inx. We h ave be e nunabl e to account forth e d iscre pancybe twe e n th e tre ndofth e smal l -sampl ee fficie ncie sand th e asymptoticval ue sfor th e Hube re stimate s for Gaussiane rrorsinboth sit- uations.Various inte rnal ch e ckssugge stth atth e oth e re ntrie s inth e tabl e sare re l iabl e . 7.MUL T IPL E REGRESSIONMul tipl e -re gre ssionmod e l scanbe e stimate d with partition- base dl ine sbyusingth e univariate me th od sasaprimitive from wh ich amul tipl ere gre ssion isbuil t(e .g.,Gibsonand Jowe tt 1957b),for e xampl e ,by"swe e ping"(And re ws1974;Be aton and T uke y1974)or proje ction pursuit (Frie d manand T uke y 1974).Al l th e d atapoints areuse d ine ach swe e porproje ction, so th e one -d ime nsionalre sistance and bre akd ownbound arere taine d . Al th ough h and computationisno l onge r fe asibl e ,a re l ative l ysimpl e computational al gorith mwith fair e fficie ncy ispossibl e .T h e te ntative proposal sh e re h ave notye tbe e n e val uate dind e tail comparabl e to th e one -d ime nsional case . One e xte nsionofEquation (2.1)re quire s re sid ual s froma mul tipl e -re gre ssionfitto h ave ze ropartition-base d l ine sl ope s against e ach carrie r:forpcarrie rs(incl ud ingth e constant car- rie r)and coe fficie ntve ctor,,B, d e fine e = y- x,l . We se e k to satisfyth e psimul tane ouscond itions T (e j) = T (e 1),j= 2,..p ie L jie Rj T (e j) = 0(7.1) for M-e stimatorT and partition se ts (L j,Rj) d e fine d byth erank ord e ringofcarrie rxj. And re ws'sswe e ping me th od isone wayto approximate a sol ution to (7.1).Supposeth at nobse rvations(X1I,..., Xpi, Yj) are avail abl eonpcarrie rsand one re sponsevariabl e .Ar- range th e d ataasrowsinannx(p+ 1)matrix, D =[XI,x2,..,X.p,Yn,wh e re Xk, Yaren-d ime nsional col umnve ctors. T h e kth swe e pope rator,Rk,wh e nappl ie dto D re pl ace s e achXj(j > k)and Yby Xj - bjkXk and Y -bkXk,re spe ctive l y, wh e rebjk and bkare re gre ssion coe fficie ntsd e rive d by'someme th od such asa yv-re sistant l ine .Appl yingRI,R2,th roughRp inturn[atotal ofp(p-1)/2+ punivariatefits]andre tainingth e coe fficie ntsbkl e ad sto th e fit p Yi =Y(C)+IbkX(kc),(7.2) k = 1 wh e reXVandY(c)areth e curre ntconte ntsofD.Ifd e sire d ,th eproce sscanbe ite rate d . T h e me th od cl e arl yd e pe nd sonth e ord e ringofth e x-vari- abl e s,anditisnotaffime l ye quivariant.Usual l yone must ch ooseThis content downloaded from 132.248.228.200 on Tue, 11 Aug 2015 18:18:38 UTCAll use subject to JSTOR Terms and ConditionsJoh nstone and Ve l l e man:T h e Re sistantL ine and Re l ate d Re gre ssionMe th od s1051 T abl e 3.Efficie ncyofSl ope Estimate sRe l ative to th e Crame r-RaoL owe rBoundDistributionofRe sid ual s GaussianMixe dGaussianSl ash T rie fficie ncy Rand om Gaussian X n= 10n= 22n= 40n= X n= 10n= 22n= 40n= X n= 10n=22n= 40n= X n= 22n= 40 L S114.0105.0100.010080.072.870.5 69.7000000 (-9)(-4)(3)(9)(9)(1.0) L AR63.062.662.463.7 60.367.567.4 69.727.853.263.777.353.262.4 (.5)(.7)(.7)(.8)(.8)(.9)(1.6)(1.2) NSB 79.780.178.779.356.954.854.555.4000000 (.6)(.6)(.6)(.8)(.8)(1.0) 11 RM56.161.764.252.064.765.123.046.056.546.056.5 (.9)(-8)(.8)(1.0)(.8)(1.0)(1-6)(1.0) SEN 78.384.888.091.271.782.485.691.527.646.356.567.746.356.5 (.7)(.7)(.5)(.9)(.8)(.8)(1.1)(1.6) 111 BM35.137.442.440.534.041.045.044.416.132.444.349.132.442.4 (.8)(.7)(.8)(.8)(.9)(.9)(.9)(.9) RL 54.052.850.850.551.454.855.955.37.4 40.351.961.340.350.8 (.8)(.7)(.8)(1.0)(.8)(1.0)(1.7)(1.2) RL HU3 63.360.658.573.455.660.662.776.81.4 30.344.562.930.358.5 (.7)(.8)(.8)(.9)(.8)(1.0)(.6)(1.2) RL BW3 53.662.364.671.050.064.269.277.57.043.458.271.143.458.2 (.7)(8)(.8)(1.0)(9)(1.0)(1.7)(1.3) RL BW62.869.269.871.062.870.773.0 77.57.050.763.071.150.763.0 CRL Bvariance .1429.0526.02701/n.1794.0661.03391.261n.6934.2554.13124.851n NOT E:I-bre akd own =0%,cost =O(n);1k-bre akd own =29%(SEN),50%(RM),cost =0(n2);Il l -bre akd own =25%(BM),16%(oth e rs),cost=0(n);ital icize d val ue sare d e rive d from th e ory;bol d face val ue sare trie fficie nt(minimumsatsampl e size sacrossth re e d istributions). Re sul tsfor n= 10 are base d on10,000re pl ications;re sul tsfor n= 22 are base d on5,000re pl ications;re sul tsfor n =40are base d on2,000re pl ications. L ARisl e astabsol ute re sid ual ;NSB isNair-Sh rivastava(1942),Bartl e tt(1949);RMisre pe ate d me d ian(Sie ge l 1982);SEN isSe n(1968),T h e il (1950);BMisBrown-Mood (1950);RL is re sistantl ine (T uke y1970);RL BW3isw-re sistantl ine ,one -ste pbiwe igh t,l ocal MAD, c=6;RL HU3isw-re sistantl ine ,Hube r v/(e .g.,Hube r 1981), l ocal MAD, c =1.5;RL BW isw-re sistantl ine , one -ste pbiwe igh t,gl obal MAD,c = 6;CRL B is((n- 3)aXI(g))-1' with ax= 1.0, obtaine d bycond itioning onX;MADisme d ianabsol ute d e viationd e rive d fromth e me d ian;stand ard e rrorsareinpare nth e se s. an ord e ringforth evariabl e s.Swe e pingh as th esamebre akd own asitsunivariate primitive (6ifre sistantl ine isuse d ).Incontrast withsome moree l aborate me th od sbase donrobustcovariance s forth e carrie rs,th e bre akd ownd oe snotd e cre asewith incre as- ing p(Hube r1981,ch ap.8).Now l e tth e obse rvations form aniid sampl e fromth e mod e l Y ='l gkXk + 8,wh e re th ecarrie r(XI,...,Xk)isind e pe nd e ntofe .Itcanbe sh own, for e xampl e ,und e rmil d nond e ge ne racyassumptionson (XI, *.., XJ) th at th e e stimate d coe fficie ntsin(7.2)obtaine d from are sistant l ine primitiveare nconsiste nt[i.e .,bk-13k=Op(1 /V)].Eme rson and Hoagl in(1985)d iscussave rsionof th e swe ptre sistant-l ine mul tipl e re gre ssion and pre se ntsomee xampl e s. Inth e proje ction-pursuitapproach ,one fixe saunitve ctoru E RP and find saunivariate re sistantl ine Yi = bRL (u)(u* Xi) + aRL (u).T h e d ire ction u*isch ose nto minimize are sistant "figure ofme rit"I(u),wh ich migh tbe atrimme d sumof square d d e viations (Yi -Yi(u))2.T h e re sul tingcoe fficie nte s- timate s arebRL (u*)u*andh ave th e ad vantage ofbe ingorth og- onal l ye quivariant.Ofcourse ,proje ctionpursuitge ne ral l y re - quire s more computationth an swe e ping. No such mul tipl e re gre ssionwil l compe teine fficie ncywith , fore xampl e ,th e bound e d infl ue ncere gre ssionsofKraske randWe l sch (1982).Howe ve r,th e yprovid e al e sse xpe nsive e x- pl oratory-qual itybound e d -infl ue ncemul tipl e -re gre ssione sti- mateth at wil l be sufficie ntformanyappl ications.Wh e n gre ate re fficie ncyisd e sire d ,th e y,-re sistantmul tipl e re gre ssion wil lprobabl yprovid eabe tte rstartingval ueforth e Kraske r-We l schite rationth anl e astsquare sorl e astabsol ute re sid ual re gre ssion. APPENDIXA:ADDENDAT O CONSIST ENCYRESUL T S We sh ow firstth atund e rth e cond ition ofProposition4.2, 2(Y - bX IX SxL ,FH.)-2(Y -bX IX