robust estimators for additive models using back ttingmatias/rbf_tr.pdf · matias salibi an{barrera...

Robust estimators for additive models usingbackfitting

Graciela BoenteFacultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires and CONICET, Argentina

Alejandra Mart́ınezFacultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires and CONICET, Argentina

andMatias Salibián–Barrera

University of British Columbia, Canada

Abstract

Additive models provide an attractive setup to estimate regression functions in anonparametric context. They provide a flexible and interpretable model, where eachregression function depends only on a single explanatory variable and can be estimatedat an optimal univariate rate. It is easy to see that most estimation procedures for thesemodels are highly sensitive to the presence of even a small proportion of outliers in thedata. In this paper, we show that a relatively simple robust version of the backfittingalgorithm (consisting of using robust local polynomial smoothers) corresponds to thesolution of a well-defined optimization problem. This formulation allows us to findmild conditions to show that these estimators are Fisher consistent and to study theconvergence of the algorithm. Our numerical experiments show that the resultingestimators have good robustness and efficiency properties. We illustrate the use of theseestimators on a real data set where the robust fit reveals the presence of influentialoutliers.

Keywords: Fisher–consistency; Kernel weights; Robust estimation; Smoothing

1

1 Introduction

Consider a general regression model, where a response variable Y ∈ R is related to a vectorX = (X1, . . . , Xd)

t ∈ Rd of explanatory variables through the following non-parametricregression model:

Y = g0(X) + σ0 ε . (1)

The error ε is assumed to be independent from X and centered at zero, while σ0 is theerror scale parameter. When ε has a finite first moment, we have the usual regressionrepresentation E(Y |X ) = g0(X). Standard estimators for g0 can thus be derived relyingon local estimates of the conditional mean, such as kernel polynomial regression estimators.It is easy to see that such procedures can be seriously affected either by a small proportionof outliers in the response variable, or when the distribution of Y |X has heavy tails. Note,however, that even when ε does not have a finite first moment, the function g0(X) canstill be interpreted as a location parameter for the distribution of Y |X. In this case, localrobust estimators can be used to estimate the regression function as, for example, the localM−estimators proposed in Boente and Fraiman (1989) and the local medians studied inWelsh (1996).

Unfortunately both robust and non-robust non-parametric regression estimators are af-fected by the curse of dimensionality, which is caused by the fact that the expected number ofobservations in local neighbourhoods decreases exponentially as a function of d, the numberof covariates. This results in regression estimators with a very slow convergence rate. Stone(1985) showed that additive models can avoid these problems and produce non-parametricmultiple regression estimators with a univariate rate of convergence. In an additive model,the regression function is assumed to satisfy

g0(X) = µ0 +d∑j=1

g0,j(Xj) , (2)

where µ0 ∈ R, g0,j : R→ R, 1 ≤ j ≤ d, are unknown smooth functions with E(g0,j(Xj)) = 0.Such a model retains the ease of interpretation of linear regression models, where each com-ponent g0,j can be thought as the effect of the j-th covariate on the centre of the conditionaldistribution of Y . Moreover, Linton (1997), Fan et al. (1998) and Mammen et al. (1999)obtained different oracle properties showing that each additive component can be estimatedas well as when all the other ones are known.

Several algorithms to fit additive models have been proposed in the literature. In thispaper, we focus on the backfitting algorithm as introduced in Friedman and Stuetzle (1981)and discussed further in Buja et al. (1989). The backfitting algorithm can be intuitivelymotivated by observing that, if (2) holds, then

g0,j(x) = E(Y − α−

∑`6=j

g0,`(X`)

∣∣∣∣Xj = x) . (3)2

Hence, given a sample, the backfitting algorithm iteratively estimates the components g0,j,1 ≤ j ≤ d, using a univariate smoother of the partial residuals in (3) as functions ofthe j-th covariate. This algorithm is widely used due to its flexiblity (different univariatesmoothers can be used), ease of implementation and intuitive motivation. Furthermore, it hasbeen shown to work very well in simulation studies (Sperlich et al. 1999) and applications,although its statistical properties are difficult to study due to its iterative nature. Someresults regarding its bias and conditional variance can be found in Opsomer and Ruppert(1997), Wand (1999) and Opsomer (2000).

When second moments exist, Breiman and Friedman (1985) showed that, under certainregularity conditions, the backfitting procedure finds functions m1(X1), . . . ,md(Xd) mini-mizing E(Y −µ0−

∑di=1mj(Xj))

2 over the space of functions with E[mj(Xj)] = 0 and finitesecond moments. In other words, even if the regression function g0 in (1) does not satisfythe additive model (2), the backfitting algorithm finds the orthogonal projection of the re-gression function onto the linear space of additive functions in L2. Equivalently, backfittingfinds the closest additive approximation (in the L2 sense) to E(Y |X1, . . . , Xd). Furthemore,the backfitting algorithm is a coordinate-wise descent algorithm minimizing the squared lossfunctional above. The sample version of the algorithm solves a system of nd × nd normalequations and corresponds to the Gauss–Seidel algorithm for linear systems of equations.

If the smoother chosen to estimate (3) is not resistant to outliers then the estimatedadditive components can be seriously affected by a relatively small proportion of atypicalobservations. Given the local nature of non–parametric regression estimators, we will beconcerned with the case where outliers are present in the response variable. Bianco andBoente (1998) considered robust estimators for additive models using kernel regression, whichare a robust version of those defined in Baek and Wehrly (1993). The main drawback of thisapproach is that it assumes that Y − g0,j(Xj) is independent from Xj, which is difficult tojustify or verify in practice. Outlier–resistant fits for generalized additive models have beenconsidered recently in the literature. When the variance is a known function of the meanand the dispersion parameter is known, we refer to Alimadad and Salibián-Barrera (2012)and Wong et al. (2014), who consider robust fits based on backfitting and penalized splinesM−estimators, respectively. In the case of model (1), the approach of Wong et al. (2014)reduces to that of Oh et al. (2007) which is an alternative based on penalized splines. Onthe other hand, Croux et al. (2011) provides a robust fit for generalized additive modelswith nuisance parameters using penalized splines, but no theoretical support is provided fortheir method.

In this paper, we consider an intuitively appealing way to obtain robust estimators formodel (1) which combines the backfitting algorithm with robust univariate smoothers. Forexample, one can consider those proposed in Boente and Fraiman (1989), Härdle and Tsy-bakov (1988), Härdle (1990) and Oh et al. (2007). One of the main contributions of thispaper is to show that this intuitive approach to obtain a robust backfitting algorithm is welljustified. Specifically, we show that applying the backfitting algorithm using the robust non-parametric regression estimators of Boente and Fraiman (1989) corresponds to minimizing

3

E[ρ((Y − µ0 −∑d

i=1mj(Xj))/σ0)] over functions m1(X1), . . . ,md(Xd) with E[mj(Xj)] = 0,where ρ is a loss function. Furthermore, this robust backfitting corresponds to a coordinate-wise descent algorithm and can be shown to converge. We also establish sufficient conditionsfor these robust backfitting estimators to be Fisher consistent for the true additive compo-nents when (2) holds. Our numerical experiments confirm that these estimators have verygood finite-sample properties, both in terms of robustness, and efficiency with respect to theclassical approach when the data do not contain outliers. These robust estimators cannotbe interpreted as orthogonal projections of the regression function onto the space of additivefunctions of the predictors. However, the first-order conditions for the minimum of this op-timization problem are closely related to the robust conditional location functional definedin Boente and Fraiman (1989).

The rest of the paper is organized as follows. In Section 2, we show that the robust back-fitting algorithm mentioned above corresponds to a coordinate-descent algorithm to minimizea robust functional using a convex loss function. We also prove that the resulting estimatoris Fisher consistent, which means that the solution to the population version of the problemis the object of interest (in our case, the true regression function). The convergence of thisalgorithm is studied in Section 2.1, while its finite-sample version using local M−regressionsmoothers is presented in Section 3. The results of our numerical experiments conductedto evaluate the performance of the proposed procedure are reported in Section 4. To studythe sensitivity of the robust backfitting with respect to single outliers, Section 5 provides anumerical study of the empirical influence function. Finally, in Section 6 we illustrate theadvantage of using robust backfitting on a real data set. All proofs are relegated to theAppendix.

2 The robust backfitting functional

In this section, we introduce a population-level version of the robust backfitting algorithm.By showing that the robust backfitting corresponds to a coordinate-descent algorithm tominimize a “robust functional”, we are able to find sufficient conditions for the robust back-fitting to be Fisher-consistent.

In what follows, we will assume that (Xt, Y )t is a random vector satisfying the additivemodel (2), where Y ∈ R and X = (X1, . . . , Xd)t, that is,

Y = µ0 +d∑j=1

g0,j(Xj) + σ0 ε . (4)

As it is customary, to ensure identifiability of the components of the model, we will furtherassume that Eg0,j(Xj) = 0, 1 ≤ j ≤ d. When second moments exist, it is easy to see that

4

the backfitting estimators solve the following minimization problem

min(ν,m)∈R×Had

E(Y − ν −

d∑j=1

mj(Xj))2, (5)

whereHad ={m(x) =

∑dj=1mj(xj), mj ∈ Hj

}⊂ H,H = {r(x) : E(r(X)) = 0 ,E(r2(X)) <

∞} and Hj is the Hilbert space of measurable functions mj of Xj, with zero mean andfinite second moment, i.e., Emj(Xj) = 0 and Em2j(Xj) < ∞. The solution to (5) ischaracterized by its residual Y − µ − g(X) being orthogonal to Had. Since this space isspanned by H`, 1 ≤ ` ≤ d, the solution of (5) satisfies E(Y − µ −

∑dj=1 gj(Xj) ) = 0 and

E(Y − µ −∑d

j=1 gj(Xj) |X` ) = 0, for 1 ≤ ` ≤ d, from where it follows that µ = E(Y )and g`(X`) = E(Y − µ −

∑j 6=` gj(Xj)|X`), 1 ≤ ` ≤ d. Given a sample, the backfitting

algorithm iterates the above system of equations replacing the conditional expectations withnon-parametric regression estimators (e.g. local polynomial smoothers).

To reduce the effect of outliers on the regression estimates, we replace the square lossfunction in (5) by a function with bounded derivative such as the Huber or Tukey’s–lossfunctions. For these losses, ρc(u) = c

2ρ1(u/c), where c > 0 is a tuning constant to achievea given efficiency. The Huber–type loss corresponds to ρ1 = ρh while the Tukey’s lossto ρt, where ρh (u) = u

2/2 if |u| ≤ 1 and ρh (u) = |u| − 1/2 otherwise, while ρt(u) =min (3u2 − 3u4 + u6, 1). Another possible choices are ρ1(u) =

√1 + u2−1 which is a smooth

approximation of the Huber function and ρ1(u) = u arctan(u) − 0.5 ln(1 + u2) which hasderivative ρ′1(u) = arctan(u). The bounded derivative of the loss function controls theeffect of outlying values in the response variable (sometimes called “vertical outliers” in theliterature).

Formally, our objective function is given by

Υ(ν,m) = E ρ

(Y − ν −

∑dj=1mj(Xj)

σ0

), (6)

where ρ : R → [0,∞) is even, ν ∈ R and the functions mj ∈ Hj, 1 ≤ j ≤ d. Let P bea distribution in Rd+1 and let (Xt, Y )t ∼ P . Define the functional (µ(P ), g(P )) as thesolution of the following optimization problem:

(µ(P ), g(P )) = argmin(ν,m)∈R×Had

Υ(ν,m) , (7)

where g(P )(X) =∑d

j=1 gj(P )(Xj) ∈ Had.To prove that the functional in (7) is Fisher-consistent and to derive first-order conditions

for the point where it attains its minimum value, we will need the following assumptions:

E1 The random variable ε has a density function f0(t) that is even, non-increasing in |t|,and strictly decreasing for |t| in a neighbourhood of 0.

5

R1 The function ρ : R → [0,∞) is continuous, non-decreasing, ρ(0) = 0, and ρ(u) =ρ(−u). Moreover, if 0 ≤ u < v with ρ(v) < supt ρ(t) then ρ(u) < ρ(v).

A1 Given functions mj ∈ Hj, if P(∑d

j=1mj(Xj) = 0) = 1 then, for all 1 ≤ j ≤ d, we haveP(mj(Xj) = 0) = 1

Remark 2.1. Assumption E1 is a standard condition needed to ensure Fisher-consistencyof an M−location functional (see, e.g. Maronna et al., 2006). Assumption R1 is satisfiedby the so-called family of “rho functions” in Maronna et al. (2006), which include manycommonly used robust loss functions, such as those mentioned above. Since the loss functionρ can be chosen by the user, this assumption is not restrictive. Finally, assumption A1 allowsus to write the functional g(P ) in (7) uniquely as g(P ) =

∑dj=1 gj(P ).

Assumption A1 appears to be the most restrictive and deserves some discussion. It isclosely related to the identifiability of the additive model (4) and holds if the explanatoryvariables are independent from each other. Indeed, let us denote (x,Xα) the vector withthe α−th coordinate equal to x and the other ones equal to Xj, j 6= α and by m(x) =∑d

j=1mj(xj), for mj ∈ Hj. For any fixed 1 ≤ α ≤ d, the condition P(m(X) = 0) =1 implies that for almost every xα, P(m(xα,Xα) = 0|Xα = xα) = 1. Using that thecomponents of X are independent, we obtain that P(m(xα,Xα) = 0) = 1 which implies that∫m(xα,uα)dFXα(u) = 0. Note that since Emj(Xj) = 0 for all j,

∫m(xα,uα)dFXα(u) =

mα(xα) +∫ ∑

j 6=αmj(uj)dFXα(u) = mα(xα). Hence, mα(xα) = 0, for almost every xα asdesired. However, if the components of X are not independent, then P(m(xα,Xα) = 0|Xα =xα) = 1 does not imply

∫m(xα,uα)dFXα(u) = 0. This has already been observed by Hastie

and Tibshirani (1990, page 107). The fact that Had is closed in H entails that under mildassumptions, the minimum of E(Y −m(X))2 over Had exists and is unique. However, theindividual functions mj(xj) may not be uniquely determined since the dependence amongthe covariates may lead to more than one representation for the same surface (see alsoBreiman and Friedman, 1985). In fact, condition A1 is analogous to assumption 5.1 ofBreiman and Friedman (1985). It is also worth noticing that Stone (1985) gives conditionsto ensure that A1 holds. Indeed, Lemma 1 in Stone (1985) implies Proposition 2.1 belowwhich gives weak conditions for the unique representation and hence, as shown in Theorem2.1 below, for the Fisher–consistency of the functional g(P ). Its proof is omitted since itfollows straightforwardly.

Proposition 2.1. Assume that X has compact support S and that its density fX is boundedin S and such that infx∈Sf fX(x) > 0. Let Vj = mj(Xj) be random variables such thatP(∑d

j=1 Vj = 0) = 1 and E(Vj) = 0, then P(Vj = 0) = 1.

The next Theorem establishes the Fisher–consistency of the functional (µ(P ), g(P )).In other words, it shows that the solution to the optimization problem (7) are the targetquantities to be estimated under model (4).

6

Theorem 2.1. Assume that the random vector (Xt, Y )t ∈ Rd+1 satisfies (4) and let Pstand for its distribution.

a) If E1 and R1 hold, then Υ(ν,m) in (6) achieves its unique minimum over R × Hadat (µ(P ), g(P )) = (µ(P ),

∑dj=1 gj(P )) when µ(P ) = µ0 and P(

∑dj=1 gj(P )(Xj) =∑d

j=1 g0,j(Xj) ) = 1 .

b) If in addition A1 holds, the unique minimum (µ(P ), g(P )) = (µ(P ),∑d

j=1 gj(P )) sat-isfies µ(P ) = µ0 and P ( gj(P )(Xj) = g0,j(Xj) ) = 1 for 1 ≤ j ≤ d.

It is worth noticing that a minimizer (µ(P ), g(P )) of (7) always exists if ρ is a strictlyconvex function, even if E1 does not hold. If in addition A1 holds, the minimizer will havea unique representation.

For ν ∈ R, x = (x1, . . . , xd)t ∈ Rd and m = (m1, . . . ,md)t ∈ H1 × H2 · · · × Hd letΓ(ν,m,x) = (Γ0(ν,m),Γ1(ν,m, x1), . . . ,Γd(ν,m, xd))

t , where

Γ0(ν,m) = E

[ψ

(Y − ν −

∑dj=1mj(Xj)

σ0

)]

Γ`(ν,m, x`) = E

[ψ

(Y − ν −

∑dj=1mj(Xj)

σ0

)∣∣∣∣∣X` = x`], 1 ≤ ` ≤ d . (8)

Our next theorem shows that it is possible to choose the solution g(P ) of (7) so that itsadditive components gj = gj(P ) satisfy first order conditions which are generalizations ofthose corresponding to the classical case where ρ(u) = u2.

Theorem 2.2. Let ρ be a differentiable function satisfying R1 and such that its derivativeρ′ = ψ is bounded and continuous. Let (Xt, Y )t ∼ P be a random vector such that(µ(P ), g(P )) is a minimizer of Υ(ν,m) over R×Had where µ(P ) ∈ R, g(P ) =

∑dj=1 gj(P ) ∈

Had, i.e., (µ(P ), g(P )) is the solution of (7). Then, (µ(P ),g(P )) satisfies the system ofequations Γ(ν,m,x) = 0 almost surely PX.

Remark 2.2. a) It is also worth mentioning that if (Xt, Y )t satisfies (4) with the errorssatisfying E1, then Γ(µ0,g0,x) = 0. Moreover, if the model is heteroscedastic

Y = g0(X) + σ0(X) ε = µ0 +d∑j=1

g0,j(Xj) + σ0(X) ε , (9)

where the errors ε are symmetrically distributed and the score function ψ is odd, then (µ0,g0)

7

satisfies

E

[ψ

(Y − µ0 −

∑dj=1 g0,j(Xj)

σ0(X)

)]= 0 ,

E[

1

σ0(X)ψ

(Y − µ0 −

∑j 6=` g0,j(Xj)− g0,`(X`)σ0(X)

)∣∣∣∣X`] = 0 , 1 ≤ ` ≤ d ,which provides a way to extend the robust backfitting algorithm to heteroscedastic models.

b) Assume now that missing responses can arise in the sample, that is, we have a sample(Xti , Yi, δi)

t, 1 ≤ i ≤ n, where δi = 1 if Yi is observed and δi = 0 if Yi is missing, and(Xti , Yi)

t satisfy an additive heteroscedastic model (9). Let (Xt, Y, δ)t be a random vectorwith the same distribution as (Xti , Yi, δi)

t. Moreover, assume that responses may be missingat random (mar), i.e., P(δ = 1|(X, Y )) = P(δ = 1|X) = p(X). Define (µ(P ), g(P )) =argmin(ν,m)∈R×Had Υδ(ν,m) where

Υδ(ν,m) = E δρ

(Y − ν −

∑dj=1mj(Xj)

σ0(X)

)= E p(X)ρ

(Y − ν −

∑dj=1mj(Xj)

σ0(X)

).

Analogous arguments to those considered in the proof of Theorem 2.1, allow to show that,if E1 and R1 hold, Υδ(ν,m) achieves its unique minimum at (ν,m) ∈ R×H where ν = µ0and P(m(X) =

∑dj=1 g0,j(Xj)) = 1. Besides, if in addition A1 holds, the unique minimum

satisfies that µ(P ) = µ0 and P(gj(P )(Xj) = g0,j(Xj)) = 1, that is, the functional is Fisher–consistent.

On the other hand, the proof of Theorem 2.2 can be also generalized to the case of anhomocedastic additive model (4) with missing responses. Effectively, when infx p(x) > 0,using the mar assumption, it is possible to show that there exists a unique measurablesolution g̃`(x) of λ`,δ(x, a) = 0 where

λ`,δ(x, a) = E{p(X) ψ

(Y − µ(P )−

∑j 6=` gj(P )(Xj)− aσ0

)∣∣∣∣X` = x} .More precisely, let Γδ(ν,m,x) = (Γ0,δ(ν,m),Γ1,δ(ν,m, x1), . . . ,Γd,δ(ν,m, xd))

t with m =(m1, . . . ,md)

t and

Γ0,δ(ν,m) = E

[p(X)ψ

(Y − ν −

∑dj=1mj(Xj)

σ0

)]

Γ`,δ(ν,m, x`) = E[p(X) ψ

(Y − µ(P )−

∑j 6=`mj(Xj)−m`(X`)σ

)∣∣∣∣X` = x`] 1 ≤ ` ≤ d .Similar arguments to those considered in the proof of Theorem 2.2, allow to show that ifthere exists a unique minimizer (µ(P ), g(P )) ∈ R × Had of Υδ(ν,m), then (µ(P ),g(P )) is

8

a solution of Γδ(ν,m,x) = 0. Note that instead of a simplified approach, a propensityscore approach can also be considered taking δ/p(X) instead of δ. In this case, Υδ(ν,m) =Υ(ν,m) defined in (6) and Γ`,δ = Γ` defined in (8). The propensity approach is usefulwhen preliminary estimates of the missing probability are available, otherwise, the simplifiedapproach is preferred.

2.1 The population version of the robust backfitting algorithm

In this section, we derive an algorithm to solve (7) and study its convergence. For simplicity,we will assume that the vector (Xt, Y )t is completely observed and that it satisfies (4). ByTheorem 2.2, the robust functional (µ(P ),g(P )) satisfies (8). To simplify the notation, inwhat follows we will put µ = µ(P ) and gj = gj(P ), 1 ≤ j ≤ d and

∑ms=` as will be understood

as 0 if m < `. The robust backfitting algorithm is given in Algorithm 1.

Algorithm 1 Population version of the robust backfitting

1: Let ` = 0 and g(0) = (g(0)1 , . . . , g

(0)d )

t be an initial set of additive components, for example:g(0) = 0 and µ0 an initial location parameter.

2: repeat3: `← `+ 14: for j = 1 to d do5: Let R

(`)j = Y − µ(`−1) −

∑j−1s=1 g̃

(`)s (Xs)−

∑ds=j+1 g

(`−1)s (Xs)

6: Let g̃(`)j solve

E

[ψ

(R

(`)j − g̃

(`)j (Xj)

σ0

)∣∣∣∣∣Xj = x]

= 0 a.s.

7: end for8: for j = 1 to d do9: g

(`)j = g̃

(`)j − E[g̃

(`)j (Xj)].

10: end for11: Let µ(`) solve

E

[ψ

(Y − µ(`) −

∑dj=1 g

(`)j (Xj)

σ0

)]= 0 .

12: until convergence

Our next Theorem shows that each Step ` of the algorithm above reduces the objectivefunction Υ(µ(`), g(`)).

Theorem 2.3. Let ρ be a differentiable function satisfying R1 and such that its derivativeρ′ = ψ is a strictly increasing, bounded and continuous function with limt→+∞ ψ(t) > 0

and limt→−∞ ψ(t) < 0. Let(µ(`),g(`)

)`≥1 = (µ

(`), g(`)1 , . . . , g

(`)d )`≥1 be the sequence obtained

9

with Algorithm 1. Then, {Υ(µ(`), g(`))}`≥1 is a decreasing sequence, so that the algorithmconverges.

3 The sample version of the robust backfitting algo-

rithm

In practice, given a random sample (Xti , Yi)t 1 ≤ i ≤ n from the additive model (4) we

apply Algorithm 1 replacing the unknown conditional expectations with univariate robustsmoothers. Different smoothers can be considered, including splines, kernel weights or evennearest neighbours with kernel weights. In what follows we describe the algorithm for kernelpolynomial M -estimators.

Let K : R → R be a kernel function and let Kh(t) = (1/h)K(t/h). The estimators ofthe solutions of (8) using kernel M−polynomial estimators of order q ≥ 0 are given by thesolution to the following system of equations:

1

n

n∑i=1

ψ

(Yi − µ̂−

∑dj=1 ĝj(Xi,j)

σ̂0

)= 0

1

n

n∑i=1

Khj(Xi,j − xj)ψ

(Yi − µ̂−

∑6̀=j ĝ`(Xi,`)−

∑qs=0 βs,jZi,j,s

σ̂0

)Zi,j(xj) = 0 , 1 ≤ j ≤ d ,

where Zi,j(xj) = (Zi,j,0, Zi,j,1, . . . , Zi,j,q)t with Zi,j,s = (Xi,j−xj)s, 0 ≤ s ≤ d. Then, we have

ĝj(xj) = β0,j, 1 ≤ j ≤ d. The corresponding algorithm is described in detail in Algorithm 2.The same procedure can be applied to the complete sample when responses are missing.

Remark 3.1. A possible choice of the preliminary scale estimator σ̂0 is obtained by cal-culating the mad of the residuals obtained with a simple and robust nonparametric regres-

sion estimator, as local medians. In that case we have σ̂0 = mad1≤i≤n

{Yi − Ŷi

}, where

Ŷi = median1≤j≤n {Yj : |Xj,k −Xi,k| ≤ hk, ∀ 1 ≤ k ≤ d}. The bandwidths hk are preliminaryvalues to be selected, or alternatively they can be chosen as the distance between Xi,k andits r-th nearest neighbour among {Xj,k}j 6=i.

4 Numerical studies

This section contains the results of a simulation study designed to compare the behaviourof our proposed estimator with that of the classical one. We generated N = 500 samplesof size n = 500 for models with d = 2 and d = 4 components. We considered sampleswithout outliers and also samples contaminated in different ways. We also included in our

10

Algorithm 2 The sample version of the robust backfitting

1: Let ` = 0 and ĝ(0) = (ĝ(0)1 , . . . , ĝ

(0)d )

t be an initial set of additive components, for example:ĝ(0) = 0, and let σ̂0 be a robust residual scale estimator. Moreover, let µ̂

(0) an initiallocation estimator such as an M−location estimator of the responses.

2: repeat3: `← `+ 14: for j = 1 to d do5: for i0 = 1 to n do6: Let xj = Xi0,j7: for i = 1 to n do8: Let Zi,j(xj) = (1, (Xi,j − xj), (Xi,j − xj)2, . . . , (Xi,j − xj)q)t and R̂(`)i,j = Yi −

µ̂(`) −∑j−1

s=1 g̃(`)s (Xi,s)−

∑ds=j+1 ĝ

(`−1)s (Xi,s).

9: end for10: Let β̂j(xj) = (β̂0j(xj), β̂1j(xj), . . . , β̂qj(xj))

t be the solution to

1

n

n∑i=1

Kh(Xi,j − xj)ψ

(R̂

(`)i,j − β̂j(xj)tZi,j(xj)

σ̂0

)Zi,j(xj) = 0 .

11: Let g̃(`)j (xj) = β̂0j(xj).

12: end for13: end for14: for j = 1 to d do15: ĝ

(`)j = g̃

(`)j −

∑ni=1 g̃

(`)j (Xi,j)/n.

16: end for17: Let µ̂(`) solve

1

n

n∑i=1

ψ

(Yi − µ̂(`) −

∑dj=1 ĝ

(`)j (Xi,j)

σ̂0

)= 0 .

18: until convergence

experiment cases where the response variable may be missing, as described in Remark 2.2.All computations were carried out using an R implementation of our algorithm, publiclyavailable on–line at http://www.stat.ubc.ca/~matias/soft.html.

To generate missing responses, we first generated observations (Xti , Yi)t satisfying the

additive model Y = g0(X) +u = µ0 +∑d

j=1 g0,j(Xj) +u , where u = σ0 ε. Then, we generate{δi}ni=1 independent Bernoulli random variables such that P (δi = 1|Yi,Xi) = P (δi = 1|Xi) =p (Xi) where we used a different function p for each value of d. When d = 2 we usedp(x) = p2(x) = 0.4+0.5(cos(x1 +0.2))

2 which yields around 31.5% of missing responses. Ford = 4, we set p(x) = p4(x) = 0.4 + 0.5(cos(x1 ∗ x3 + 0.2))2, which produces approximately33% of missing Yi’s. In addition, we also consider the case where all responses are observed,

11

i.e., p(x) ≡ 1.We compared the following estimators:

• The classical backfitting estimator adapted to missing responses, denoted ĝbc.

• A robust backfitting estimator ĝbr,h using Huber’s loss function with tuning constantc = 1.345. This loss function is such that ρ′c(u) = ψc(u) = min (c,max(−c, u)) .

• A robust backfitting estimator ĝbr,t using Tukey’s bisquare loss function with tuningconstant c = 4.685. This loss function satisfies ρ′c(u) = ψc(u) = u (1− (u/c)2)

2 I[−c,c](u) .

The univariate smoothers were computed using the Epanechnikov kernel K(u) = 0.75 (1 −u2)I[−1,1](u). We used both local constants and local linear polynomials which correspond toq = 0 and q = 1 in Algorithm 2 denoted in all Tables and Figures with a subscript of 0 and1, respectively.

The performance of each estimator ĝj of g0,j, 1 ≤ j ≤ d, was measured through thefollowing approximated integrated squared error (ise):

ise(ĝj) =1∑ni=1 δi

n∑i=1

(g0,j (Xij)− ĝj (Xij))2 δi .

where Xij is the jth component of Xi and δi = 0 if the i-th response was missing and δi = 1otherwise. An approximation of the mean integrated squared error (mise) was obtained byaveraging the ise above over all replications. A similar measure was used to compare theestimators of the regression function g0 = µ0 +

∑dj=1 g0,j.

4.1 Monte Carlo study with d = 2 additive components

In this case, the covariates were generated from a uniform distribution on the unit square,Xi = (Xi,1, Xi,2)

t ∼ U([0, 1]2), the error scale was σ0 = 0.5 and the overall location µ0 = 0.The additive components were chosen to be

g0,1(x1) = 24 (x1 − 0.5)2 − 2 , g0,2(x2) = 2π sin(πx2)− 4 . (10)

Optimal bandwidths with respect to the mise can be computed assuming that the othercomponents in the model are known (see, for instance, Härdle et al., 2004). Since the ex-planatory variables are uniformly distributed, the optimal bandwidths for the local constant(q = 0) and local linear (q = 1) estimators are the same and very close to our choice ofh = (0.10, 0.10).

For the errors, we considered the following settings:

• C0: ui ∼ N(0, σ20).

12

• C1: ui ∼ (1− 0.15)N(0, σ20) + 0.15N(15, 0.01).

• C2: ui ∼ N(15, 0.01) for all i’s such that Xi ∈ D0.3, where Dη = [0.2, 0.2 + η]2.

• C3: ui ∼ N(10, 0.01) for all i’s such that Xi ∈ D0.09, where Dη is as above.

• C4: ui ∼ (1− 0.30)N(0, σ20) + 0.30N(15, 0.01) for all i’s such that Xi ∈ D0.3.

Case C0 corresponds to samples without outliers and they will illustrate the loss of effi-ciency incurred by using a robust estimator when it may not be needed. The contaminationsetting C1 corresponds to a gross-error model where all observations have the same chanceof being contaminated. On the other hand, case C2 is highly pathological in the sense thatall observations with covariates in the square [0.2, 0.5]× [0.2, 0.5] are severely affected, whileC3 is similar but in the region [0.2, 0.29] × [0.2, 0.29]. The difference between C2 and C3is that the bandwidths we used (h = 0.10) are wider than the contaminated region in C3.Finally, case C4 is a gross-error model with a higher probability of observing an outlier, butthese are restricted to the square [0.2, 05]× [0.2, 0.5]. Figure 1 illustrate these contaminationscenarios on one randomly generated sample. The panels correspond to settings C2, C3 andC4, with solid triangles indicating contaminated cases.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(a) C2

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(b) C3

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(c) C4

Figure 1: Scatter plots of covariates (x1, x2)t with solid triangles indicating observations withcontaminated response variables, for contamination settings C2, C3 and C4. The square regions

indicate the sets Dη for each scenario.

Tables 1 to 3 report the obtained values of the mise when estimating the regressionfunction g0 = µ0+g0,1+g0,2 and each additive component g0,1 and g0,2, respectively. Estimatesobtained using local constant (q = 0) and local linear smoothers (q = 1) are indicated with asubscript “0” and “1”, respectively. To complement the reported results on the mise, givenin Tables 1 to 3, we report in Tables 4 to 6 the ratio between the mean integrated squarederror (mise) obtained for each considered contamination and for the clean data. To identifythe ratio, it is denoted as mise(Cj)/mise(C0).

Consider first the situation where there are no missing responses. As expected, whenthe data do not contain outliers the robust estimators are slightly less efficient than the

13

p(x

)≡

1p2(x

)=

0.4

+0.5

cos2

(x1

+0.2

)ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1

C0

0.07

040.

0703

0.07

180.

0077

0.0

079

0.0

079

0.0

741

0.0

756

0.0

790

0.0

110

0.0

113

0.0

113

C1

5.84

370.

1306

0.07

305.

9026

0.0

620

0.0

091

6.1

657

0.1

457

0.0

800

6.3

125

0.0

897

0.0

272

C2

8.58

230.

2470

0.07

538.

5218

0.1

471

0.0

086

10.0

841

0.3

550

0.0

837

10.0

100

0.3

040

0.0

396

C3

0.15

600.

0722

0.07

190.

0930

0.0

090

0.0

080

0.1

882

0.0

786

0.0

790

0.1

224

0.0

127

0.0

113

C4

0.91

590.

0764

0.07

180.

8523

0.0

122

0.0

080

1.1

017

0.0

840

0.0

782

1.0

307

0.0

169

0.0

115

Tab

le1:

mise

ofth

ees

tim

ator

sof

the

regr

essi

onfu

nct

iong 0

=µ0+g 0,1

+g 0,2

under

diff

eren

tco

nta

min

atio

ns

and

mis

sing

mec

han

ism

s.

p(x

)≡

1p2(x

)=

0.4

+0.5

cos2

(x1

+0.2

)ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1

C0

0.04

450.

0445

0.04

540.

0096

0.0

097

0.0

097

0.0

493

0.0

502

0.0

521

0.0

138

0.0

140

0.0

140

C1

0.36

380.

0503

0.04

590.

3903

0.0

152

0.0

103

0.5

176

0.0

592

0.0

523

0.5

904

0.0

334

0.0

271

C2

3.54

900.

1348

0.04

733.

3751

0.0

665

0.0

101

3.7

935

0.1

641

0.0

547

3.5

914

0.0

942

0.0

176

C3

0.09

350.

0468

0.04

540.

0490

0.0

103

0.0

097

0.1

084

0.0

532

0.0

521

0.0

599

0.0

147

0.0

140

C4

0.43

510.

0514

0.04

530.

3547

0.0

119

0.0

098

0.4

910

0.0

587

0.0

517

0.3

991

0.0

166

0.0

141

Tab

le2:

mise

ofth

ees

tim

ator

sof

the

addit

ive

com

pon

entg 0,1

under

diff

eren

tco

nta

min

atio

ns

and

mis

sing

mec

han

ism

s.

p(x

)≡

1p2(x

)=

0.4

+0.5

cos2

(x1

+0.2

)ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1

C0

0.04

050.

0406

0.04

150.

0106

0.0

107

0.0

107

0.0

465

0.0

476

0.0

497

0.0

157

0.0

159

0.0

159

C1

0.35

590.

0494

0.04

180.

3907

0.0

160

0.0

113

0.4

945

0.0

599

0.0

498

0.5

731

0.0

273

0.0

184

C2

3.18

910.

0932

0.04

343.

3314

0.0

633

0.0

110

4.0

168

0.1

572

0.0

520

4.1

969

0.1

642

0.0

385

C3

0.06

830.

0403

0.04

160.

0480

0.0

111

0.0

107

0.0

893

0.0

476

0.0

496

0.0

695

0.0

165

0.0

159

C4

0.32

370.

0391

0.04

150.

3415

0.0

119

0.0

108

0.4

195

0.0

465

0.0

492

0.4

433

0.0

177

0.0

160

Tab

le3:

mise

ofth

ees

tim

ator

sof

the

addit

ive

com

pon

entg 0,2

under

diff

eren

tco

nta

min

atio

ns

and

mis

sing

mec

han

ism

s.

14

p(x

)≡

1p2(x

)=

0.4

+0.5

cos2

(x1

+0.2

)ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1

mise

(C1)/mise

(C0)

82.9

580

1.85

771.

0161

766.6

964

7.8

275

1.1

499

83.1

557

1.9

285

1.0

124

575.4

987

7.9

530

2.4

127

mise

(C2)/mise

(C0)

121.

8359

3.51

411.

0492

1106.9

072

18.5

640

1.0

907

136.0

025

4.6

984

1.0

594

912.5

855

26.9

607

3.5

103

mise

(C3)/mise

(C0)

2.21

471.

0277

1.00

15

12.0

853

1.1

315

1.0

051

2.5

383

1.0

407

0.9

996

11.1

610

1.1

286

1.0

060

mise

(C4)/mise

(C0)

13.0

019

1.08

690.

9997

110.7

086

1.5

411

1.0

138

14.8

589

1.1

118

0.9

898

93.9

705

1.4

974

1.0

160

Tab

le4:

Rati

ob

etw

een

themise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0

=µ0

+∑ 2 j=

1g 0,j

un

der

the

con

sid

ered

conta

min

ati

on

san

du

nd

ercl

ean

data

for

the

two

mis

sin

gm

ech

anis

ms.

p(x

)≡

1p2(x

)=

0.4

+0.5

cos2

(x1

+0.2

)ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1

mise

(C1)/mise

(C0)

8.16

961.

1298

1.0113

40.5

953

1.5

601

1.0

552

10.5

007

1.1

803

1.0

044

42.6

524

2.3

876

1.9

396

mise

(C2)/mise

(C0)

79.6

889

3.02

831.

0427

351.0

462

6.8

407

1.0

395

76.9

559

3.2

722

1.0

495

259.4

529

6.7

405

1.2

555

mise

(C3)/mise

(C0)

2.09

941.

0500

1.0020

5.0

949

1.0

586

1.0

024

2.1

986

1.0

609

1.0

008

4.3

251

1.0

529

1.0

031

mise

(C4)/mise

(C0)

9.76

941.

1541

0.9993

36.8

950

1.2

287

1.0

063

9.9

606

1.1

701

0.9

934

28.8

339

1.1

895

1.0

073

Tab

le5:

Rati

ob

etw

een

themise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,1

un

der

the

con

sid

ered

conta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

anis

ms.

p(x

)≡

1p2(x

)=

0.4

+0.5

cos2

(x1

+0.2

)ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1

mise

(C1)/mise

(C0)

8.79

941.

2173

1.0062

36.7

398

1.4

951

1.0

528

10.6

429

1.2

567

1.0

022

36.4

372

1.7

173

1.1

599

mise

(C2)/mise

(C0)

78.8

384

2.29

531.

0454

313.3

083

5.8

990

1.0

285

86.4

525

3.2

998

1.0

461

266.8

154

10.3

392

2.4

245

mise

(C3)/mise

(C0)

1.68

730.

9925

1.0010

4.5

158

1.0

365

1.0

020

1.9

229

0.9

997

0.9

985

4.4

195

1.0

386

1.0

027

mise

(C4)/mise

(C0)

8.00

220.

9615

1.0000

32.1

152

1.1

092

1.0

054

9.0

289

0.9

769

0.9

896

28.1

834

1.1

154

1.0

063

Tab

le6:

Rati

ob

etw

een

themise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,1

un

der

the

con

sid

ered

conta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

anis

ms.

15

least squares one, although the differences in the estimated mise’s are well within the MonteCarlo margin of error. For the contamination cases C1 and C2, when using either localconstants or local linear smoothers, the mise of the classical estimator for g0 is notablylarger than those of the robust estimators (more than 40 times larger). This difference issmaller when estimating each component g0,1 and g0,2, but remains fairly large nonetheless.In general, when the data contain outliers, we note that the robust estimators give noticeablybetter regression estimators (both for g0 and its components) than the classical one. Theestimator based on Tukey’s bisquare function is very stable across all the contamination casesconsidered here, while for the Huber one, there’s a slight increase in mise for the estimatedadditive components under the contamination setting C1 and a larger one under C2. Theadvantage of the estimators based on Tukey’s loss function over those based on the Huberone becomes more apparent when one inspects the ratio between the mise’s obtained withand without outliers given in Tables 4 to 6.

It is worth noting that, for clean data, local linear smoothers achieve much smaller misevalues than local constants. This may be explained by the well-known bias-reducing propertyof local polynomials, particularly near the boundary. This behaviour can also be observedacross contamination settings for the robust estimators.

Interestingly, the results of our experiments with missing responses yield the same overallconclusions. The estimators based on the subset of complete data remain Fisher-consistent,but, as one would expect, they all yield larger mise’s due to the smaller sample sizes used tocompute them. Moreover, the estimators based on Tukey’s loss function are still preferable,although they are slightly less efficient than those based on the Huber loss when q = 0. Whencomparing the behaviour of the robust local constant and local linear estimators one noticesthat, as above, local linear estimators have smaller mise’s. However, the ratios of mise,reported in Tables 4 to 6, show that under C2 the mise’s of the local linear estimators of g0with missing responses are more than 4 times larger than those obtained with local constants.This effect is mainly due to the poor estimation of g0,2 and of the location parameter µ0 assuggested by the reported results.

We also looked at the median ise across the simulation replications to determine whetherthe differences in the observed averaged ise’s are representative of most samples or they wereinfluenced by poor fits obtained in a few bad samples. Tables 7 to 9 report the median overreplications of the ise, denoted Medise for the estimators of the regression function g0 andeach additive component when d = 2. Besides, Tables 10 to 12 report the ratio betweenthe median integrated squared error (Medise) obtained for each considered contaminationand for the clean data. To identify the ratio, it is denoted as Medise(Cj)/Medise(C0).The median ise results show that the robust estimators behave similarly when p(x) ≡ 1 andwhen responses may be missing. Furthermore, the performances of Tukey’s local linear andlocal constant smoothers are equally good.

Based on the above results we recommend using local linear smoothers computed withTukey’s loss function, although in some situations local constants may give better results,

16

p(x

)≡

1p2(x

)=

0.4

+0.5

cos2

(x1

+0.2

)ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1

C0

0.07

000.

0700

0.07

140.

0073

0.0

074

0.0

074

0.0

733

0.0

752

0.0

784

0.0

105

0.0

108

0.0

107

C1

5.76

810.

1256

0.07

135.

8235

0.0

587

0.0

086

6.0

524

0.1

379

0.0

793

6.1

475

0.0

676

0.0

126

C2

8.47

180.

2370

0.07

478.

4056

0.1

333

0.0

082

9.7

880

0.2

982

0.0

825

9.7

619

0.1

794

0.0

118

C3

0.13

750.

0717

0.07

100.

0753

0.0

083

0.0

075

0.1

535

0.0

783

0.0

781

0.0

892

0.0

122

0.0

108

C4

0.84

810.

0762

0.07

120.

7852

0.0

113

0.0

075

0.9

826

0.0

827

0.0

776

0.9

236

0.0

157

0.0

110

Tab

le7:

Medise

of

the

esti

mato

rsof

the

regr

essi

onfu

nct

iong 0

un

der

diff

eren

tco

nta

min

atio

ns

and

mis

sin

gm

ech

anis

ms.

p(x

)≡

1p2(x

)=

0.4

+0.5

cos2

(x1

+0.2

)ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1

C0

0.04

300.

0433

0.04

400.

0068

0.0

069

0.0

070

0.0

465

0.0

478

0.0

501

0.0

099

0.0

099

0.0

099

C1

0.32

960.

0489

0.04

430.

3597

0.0

130

0.0

073

0.4

574

0.0

557

0.0

498

0.5

377

0.0

194

0.0

109

C2

3.51

880.

1292

0.04

533.

3350

0.0

613

0.0

073

3.7

000

0.1

526

0.0

523

3.4

773

0.0

750

0.0

108

C3

0.08

650.

0452

0.04

410.

0406

0.0

078

0.0

070

0.0

981

0.0

510

0.0

502

0.0

498

0.0

109

0.0

099

C4

0.41

210.

0500

0.04

360.

3325

0.0

096

0.0

070

0.4

404

0.0

567

0.0

496

0.3

520

0.0

131

0.0

100

Tab

le8:

Medise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,1

un

der

diff

eren

tco

nta

min

atio

ns

and

mis

sin

gm

ech

anis

ms.

p(x

)≡

1p2(x

)=

0.4

+0.5

cos2

(x1

+0.2

)ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1

C0

0.03

820.

0383

0.03

910.

0071

0.0

072

0.0

072

0.0

430

0.0

443

0.0

462

0.0

102

0.0

103

0.0

103

C1

0.31

800.

0465

0.03

980.

3517

0.0

130

0.0

078

0.4

503

0.0

563

0.0

463

0.5

036

0.0

193

0.0

114

C2

3.17

440.

0874

0.04

093.

3084

0.0

575

0.0

077

3.9

785

0.1

258

0.0

489

4.1

319

0.0

884

0.0

113

C3

0.06

200.

0381

0.03

870.

0409

0.0

077

0.0

072

0.0

767

0.0

437

0.0

463

0.0

588

0.0

111

0.0

104

C4

0.29

930.

0369

0.03

910.

3171

0.0

086

0.0

073

0.3

722

0.0

434

0.0

461

0.3

955

0.0

122

0.0

105

Tab

le9:

Medise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,2

un

der

diff

eren

tco

nta

min

atio

ns

and

mis

sin

gm

ech

anis

ms.

17

p(x

)≡

1p2(x

)=

0.4

+0.5

cos2

(x1

+0.2

)ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1

Medise(C

1)

Medise(C

0)

82.3

814

1.79

570.

9991

802.7

378

7.9

218

1.1

601

82.6

104

1.8

354

1.0

113

584.4

446

6.2

633

1.1

736

Medise(C

2)

Medise(C

0)

120.

9971

3.38

761.

0457

1158.6

577

17.9

962

1.1

054

133.5

979

3.9

676

1.0

525

928.0

673

16.6

135

1.0

978

Medise(C

3)

Medise(C

0)

1.96

381.

0254

0.99

5110.3

761

1.1

264

1.0

095

2.0

946

1.0

416

0.9

961

8.4

822

1.1

289

1.0

078

Medise(C

4)

Medise(C

0)

12.1

124

1.08

990.

9971

108.2

387

1.5

199

1.0

123

13.4

123

1.1

003

0.9

906

87.8

085

1.4

519

1.0

276

Tab

le10

:R

atio

bet

wee

nth

eMedise

of

the

esti

mat

ors

ofth

ead

dit

ive

com

pon

entg 0

un

der

the

con

sid

ered

conta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

anis

ms.

p(x

)≡

1p2(x

)=

0.4

+0.5

cos2

(x1

+0.2

)ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1

Medise(C

1)

Medise(C

0)

7.66

791.

1314

1.00

6152.8

814

1.8

807

1.0

462

9.8

420

1.1

645

0.9

930

54.2

267

1.9

519

1.0

976

Medise(C

2)

Medise(C

0)

81.8

740

2.98

741.

0294

490.3

515

8.8

710

1.0

445

79.6

110

3.1

894

1.0

435

350.6

805

7.5

627

1.0

876

Medise(C

3)

Medise(C

0)

2.01

311.

0458

1.00

195.9

736

1.1

294

1.0

016

2.1

097

1.0

655

1.0

019

5.0

231

1.0

977

0.9

964

Medise(C

4)

Medise(C

0)

9.58

971.

1565

0.99

1148.8

814

1.3

842

1.0

081

9.4

753

1.1

843

0.9

894

35.5

014

1.3

224

1.0

083

Tab

le11

:R

ati

ob

etw

een

theMedise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,1

un

der

the

consi

der

edco

nta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

an

ism

s.

p(x

)≡

1p2(x

)=

0.4

+0.5

cos2

(x1

+0.2

)ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1

Medise(C

1)

Medise(C

0)

8.32

521.

2142

1.01

7749.2

713

1.8

064

1.0

717

10.4

835

1.2

699

1.0

020

49.1

453

1.8

770

1.1

082

Medise(C

2)

Medise(C

0)

83.1

079

2.28

141.

0453

463.4

817

8.0

080

1.0

595

92.6

170

2.8

382

1.0

571

403.2

129

8.6

146

1.0

931

Medise(C

3)

Medise(C

0)

1.62

260.

9934

0.99

115.7

296

1.0

781

0.9

914

1.7

847

0.9

856

1.0

022

5.7

381

1.0

852

1.0

044

Medise(C

4)

Medise(C

0)

7.83

490.

9617

0.99

9044.4

295

1.1

974

1.0

120

8.6

646

0.9

798

0.9

963

38.5

909

1.1

879

1.0

214

Tab

le12

:R

ati

ob

etw

een

theMedise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,2

un

der

the

consi

der

edco

nta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

an

ism

s.

18

in particular when responses are missing and some neighbourhoods contain too few obser-vations.

4.2 Monte Carlo study with d = 4 additive components

For this model we generated covariates Xi = (Xi1, Xi2, Xi3, Xi4) ∼ U([−3, 3]4), independenterrors εi ∼ N(0, 1) and σ0 = 0.15. We used the following additive components: g0,1(x1) =x31/12, g0,2(x2) = sin(−x2), g0,3(x3) = x23/2 − 1.5, g0,4(x4) = ex4/4 − (e3 − e−3)/24. As indimension d = 2, optimal bandwidths with respect to the mise were computed assuming thatthe other components in the model are known, resulting in hMISEopt = (0.36, 0.38, 0.34, 0.29).However, it was difficult to obtain a reliable estimate for the residual scale σ0 using thesebandwidths (see Remark 3.1), since many 4-dimensional neighbourhoods did not containsufficient observations. To solve this problem we used hσ = (0.93, 0.93, 0.93, 0.93) to estimateσ0 (we expect an average of 5 points in each 4-dimensional neighbourhood using hσ). Wethen applied the backfitting algorithm with the optimal bandwidths hMISEopt .

In addition to the settings C0 and C1 described above, we modified the contaminationsetting C2 so that ui ∼ N(15, 0.01) for all i such that Xi,j ∈ [−1.5, 1.5] for all 1 ≤ j ≤ 4.

Tables 13 to 17 report the mise for the different estimators, contamination settings andmissing mechanisms. Ratios of mise’s for clean and contaminated settings and median ise’sare reported in tables included Tables 18 to 32 .

As observed in Section 4.1, our experiments with and without missing responses yieldsimilar conclusions regarding the advantage of the robust procedure over the classical back-fitting and the benefits of the Tukey’s loss function over the Huber one. As in dimensiond = 2, when responses are missing, all the mise’s are slightly inflated, as it is to be expected.It is also not surprising that when the data do not contain outliers (C0), the robust estima-tors have a slightly larger mise than their classical counterparts. However, when outliers arepresent, both robust estimators provide a substantially better performance than the classicalone, given similar results to those for clean data. Also as noted for the model with d = 2,the mise of the estimators based on Tukey’s bisquare score function are more stable acrossthe different contamination settings than those using Huber’s score function. However, onedifference with the results in dimension d = 2 should be pointed out. Under the gross errormodel C1, the local linear smoother performs worse than the local constant when missing re-sponses arise. This difference is highlighted when one looks at the ratio of the correspondingmise’s. However, median ise’s reported in Tables 23 to 27 are more stable, which indicatesthat the difference in mise’s may be due to a few samples. In fact, due to the relatively largenumber of missing observations that may be present in this setting (around 33%) around 6%of the replications produce robust fits with Tukey’s loss function with atypically large valuesof mise, which are due to sparsely populated neighbourhoods. Based on these observations,we also recommend using Tukey’s local linear estimators.

19

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1

C0

0.01

230.

0125

0.01

250.

0023

0.0

023

0.0

023

0.0

135

0.0

142

0.0

150

0.0

033

0.0

033

0.0

033

C1

7.33

450.

0525

0.01

347.

6095

0.0

497

0.0

046

8.4

183

0.0

699

0.0

192

8.8

581

0.1

563

0.0

932

C2

4.84

410.

0360

0.01

284.

8224

0.0

221

0.0

025

5.9

889

0.0

392

0.0

148

6.0

121

0.0

258

0.0

038

Tab

le13

:mise

ofth

ees

tim

ator

sof

the

regr

essi

on

fun

ctio

ng 0

=µ0

+∑ 4 j=

1g 0

,ju

nd

erd

iffer

ent

conta

min

ati

on

san

dm

issi

ng

mec

han

ism

s.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2(x

1x3+

0.2)

ĝ1,bc,0

ĝ1,br,h,0

ĝ1,br,t,0

ĝ1,bc,1

ĝ1,br,h,1

ĝ1,br,t,1

ĝ1,bc,0

ĝ1,br,h,0

ĝ1,br,t,0

ĝ1,bc,1

ĝ1,br,h,1

ĝ1,br,t,1

C0

0.0046

0.0047

0.0047

0.0020

0.0020

0.0020

0.0059

0.0060

0.0061

0.0030

0.0030

0.0030

C1

0.5547

0.0085

0.0049

0.6356

0.0066

0.0021

0.8355

0.0107

0.0063

0.9703

0.0250

0.0178

C2

0.9691

0.0089

0.0047

0.9897

0.0060

0.0020

1.1446

0.0105

0.0062

1.1960

0.0073

0.0030

Tab

le14

:mise

ofth

ees

tim

ator

sof

the

ad

dit

ive

com

pon

entg 0

,1u

nd

erd

iffer

ent

conta

min

ati

on

san

dm

issi

ng

mec

han

ism

s.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2(x

1x3+

0.2)

ĝ2,bc,0

ĝ2,br,h,0

ĝ2,br,t,0

ĝ2,bc,1

ĝ2,br,h,1

ĝ2,br,t,1

ĝ2,bc,0

ĝ2,br,h,0

ĝ2,br,t,0

ĝ2,bc,1

ĝ2,br,h,1

ĝ2,br,t,1

C0

0.0025

0.0025

0.0025

0.0016

0.0016

0.0016

0.0035

0.0035

0.0036

0.0024

0.0024

0.0024

C1

0.5332

0.0062

0.0027

0.6189

0.0081

0.0026

0.7802

0.0101

0.0038

0.8982

0.0226

0.0136

C2

0.9298

0.0066

0.0026

0.9482

0.0055

0.0016

1.1950

0.0083

0.0037

1.2339

0.0067

0.0024

Tab

le15

:mise

ofth

ees

tim

ator

sof

the

ad

dit

ive

com

pon

entg 0

,2u

nd

erd

iffer

ent

conta

min

ati

on

san

dm

issi

ng

mec

han

ism

s.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2(x

1x3+

0.2)

ĝ3,bc,0

ĝ3,br,h,0

ĝ3,br,t,0

ĝ3,bc,1

ĝ3,br,h,1

ĝ3,br,t,1

ĝ3,bc,0

ĝ3,br,h,0

ĝ3,br,t,0

ĝ3,bc,1

ĝ3,br,h,1

ĝ3,br,t,1

C0

0.0091

0.0092

0.0092

0.0042

0.0042

0.0042

0.0136

0.0141

0.0146

0.0082

0.0082

0.0082

C1

0.5991

0.0137

0.0095

0.6741

0.0106

0.0052

0.9195

0.0273

0.0157

1.0679

0.0449

0.0337

C2

1.0137

0.0155

0.0094

1.0007

0.0085

0.0042

1.2200

0.0207

0.0144

1.2256

0.0126

0.0082

Tab

le16

:mise

ofth

ees

tim

ator

sof

the

ad

dit

ive

com

pon

entg 0

,3u

nd

erd

iffer

ent

conta

min

ati

on

san

dm

issi

ng

mec

han

ism

s.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2(x

1x3+

0.2)

ĝ4,bc,0

ĝ4,br,h,0

ĝ4,br,t,0

ĝ4,bc,1

ĝ4,br,h,1

ĝ4,br,t,1

ĝ4,bc,0

ĝ4,br,h,0

ĝ4,br,t,0

ĝ4,bc,1

ĝ4,br,h,1

ĝ4,br,t,1

C0

0.0070

0.0070

0.0071

0.0036

0.0036

0.0036

0.0095

0.0098

0.0103

0.0058

0.0058

0.0058

C1

0.6818

0.0118

0.0072

0.7558

0.0097

0.0037

1.0890

0.0235

0.0127

1.2310

0.0583

0.0429

C2

1.0582

0.0125

0.0071

1.0592

0.0078

0.0036

1.3678

0.0159

0.0101

1.3877

0.0117

0.0061

Tab

le17

:mise

ofth

ees

tim

ator

sof

the

ad

dit

ive

com

pon

entg 0

,4u

nd

erd

iffer

ent

conta

min

ati

on

san

dm

issi

ng

mec

han

ism

s.

20

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1

mise

(C1)/mise

(C0)

594.

6856

4.19

271.

0660

3263.3

298

21.3

232

1.9

603

622.6

124

4.9

188

1.2

865

2677.7

184

47.1

836

28.0

937

mise

(C2)/mise

(C0)

392.

7639

2.87

891.

0204

2068.0

873

9.4

793

1.0

625

442.9

360

2.7

631

0.9

876

1817.4

052

7.8

002

1.1

449

Tab

le18

:R

atio

bet

wee

nth

emise

of

the

esti

mat

ors

ofth

ead

dit

ive

com

pon

entg 0

=µ0

+∑ 4 j=

1g 0,j

un

der

the

con

sid

ered

conta

min

ati

on

san

du

nd

ercl

ean

data

for

the

two

mis

sin

gm

ech

anis

ms.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1

mise

(C1)/mise

(C0)

119.

9119

1.82

251.

0448

318.2

129

3.3

018

1.0

502

142.1

759

1.7

651

1.0

330

327.3

855

8.4

212

5.9

965

mise

(C2)/mise

(C0)

209.

4957

1.91

191.

0185

495.5

000

3.0

240

1.0

185

194.7

711

1.7

403

1.0

134

403.5

116

2.4

681

1.0

207

Tab

le19

:R

atio

bet

wee

nth

emise

of

the

esti

mat

ors

ofth

ead

dit

ive

com

pon

entg 0,1

un

der

the

con

sid

ered

conta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

anis

ms.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1

mise

(C1)/mise

(C0)

213.

8436

2.46

421.

0692

385.1

917

5.0

106

1.6

478

225.1

335

2.8

729

1.0

599

379.1

973

9.5

277

5.7

400

mise

(C2)/mise

(C0)

372.

9278

2.64

081.

0309

590.1

075

3.4

041

1.0

208

344.8

200

2.3

533

1.0

324

520.8

911

2.8

161

1.0

259

Tab

le20

:R

atio

bet

wee

nth

emise

of

the

esti

mat

ors

ofth

ead

dit

ive

com

pon

entg 0,2

un

der

the

con

sid

ered

conta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

anis

ms.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ 3

,bc,0

ĝ 3,br,h

,0ĝ 3

,br,t

,0ĝ 3

,bc,1

ĝ 3,br,h

,1ĝ 3

,br,t

,1ĝ 3

,bc,0

ĝ 3,br,h

,0ĝ 3

,br,t

,0ĝ 3

,bc,1

ĝ 3,br,h

,1ĝ 3

,br,t

,1

mise

(C1)/mise

(C0)

66.0

132

1.48

801.

0318

160.3

085

2.5

264

1.2

368

67.6

382

1.9

308

1.0

775

130.9

593

5.5

051

4.1

322

mise

(C2)/mise

(C0)

111.

7021

1.68

041.

0150

237.9

769

2.0

109

1.0

087

89.7

438

1.4

627

0.9

885

150.2

898

1.5

440

1.0

080

Tab

le21

:R

atio

bet

wee

nth

emise

of

the

esti

mat

ors

ofth

ead

dit

ive

com

pon

entg 0,3

un

der

the

con

sid

ered

conta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

anis

ms.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ 4

,bc,0

ĝ 4,br,h

,0ĝ 4

,br,t

,0ĝ 4

,bc,1

ĝ 4,br,h

,1ĝ 4

,br,t

,1ĝ 4

,bc,0

ĝ 4,br,h

,0ĝ 4

,br,t

,0ĝ 4

,bc,1

ĝ 4,br,h

,1ĝ 4

,br,t

,1

mise

(C1)/mise

(C0)

97.8

712

1.68

421.

0244

211.8

380

2.7

222

1.0

340

114.1

378

2.4

048

1.2

307

212.2

685

10.0

466

7.3

858

mise

(C2)/mise

(C0)

151.

9140

1.77

601.

0115

296.8

837

2.1

873

1.0

131

143.3

664

1.6

293

0.9

839

239.2

791

2.0

083

1.0

541

Tab

le22

:R

atio

bet

wee

nth

emise

of

the

esti

mat

ors

ofth

ead

dit

ive

com

pon

entg 0,4

un

der

the

con

sid

ered

conta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

anis

ms.

21

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1

C0

0.01

230.

0125

0.01

250.

0023

0.0

023

0.0

023

0.0

135

0.0

142

0.0

148

0.0

033

0.0

033

0.0

033

C1

7.23

180.

0497

0.01

337.

5283

0.0

421

0.0

027

8.3

171

0.0

481

0.0

155

8.7

854

0.0

435

0.0

040

C2

4.75

740.

0344

0.01

284.

7560

0.0

203

0.0

024

5.9

049

0.0

365

0.0

147

5.9

358

0.0

215

0.0

035

Tab

le23

:Medise

of

the

esti

mato

rsof

the

regr

essi

onfu

nct

iong 0

un

der

diff

eren

tco

nta

min

atio

ns

and

mis

sin

gm

ech

anis

ms.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1

C0

0.00

400.

0040

0.00

400.

0013

0.0

013

0.0

013

0.0

050

0.0

051

0.0

052

0.0

019

0.0

019

0.0

019

C1

0.51

390.

0077

0.00

420.

5909

0.0

059

0.0

014

0.7

747

0.0

099

0.0

055

0.9

060

0.0

075

0.0

021

C2

0.95

600.

0084

0.00

410.

9677

0.0

056

0.0

013

1.1

128

0.0

096

0.0

053

1.1

591

0.0

061

0.0

020

Tab

le24

:Medise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,1

un

der

diff

eren

tco

nta

min

atio

ns

and

mis

sing

mec

han

ism

s.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1

C0

0.00

200.

0020

0.00

200.

0011

0.0

011

0.0

011

0.0

028

0.0

028

0.0

029

0.0

016

0.0

016

0.0

016

C1

0.49

800.

0056

0.00

220.

5879

0.0

054

0.0

012

0.7

289

0.0

071

0.0

031

0.8

413

0.0

064

0.0

018

C2

0.91

080.

0062

0.00

210.

9260

0.0

050

0.0

011

1.1

789

0.0

077

0.0

031

1.2

041

0.0

059

0.0

016

Tab

le25

:Medise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,2

un

der

diff

eren

tco

nta

min

atio

ns

and

mis

sing

mec

han

ism

s.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ 3

,bc,0

ĝ 3,br,h

,0ĝ 3

,br,t

,0ĝ 3

,bc,1

ĝ 3,br,h

,1ĝ 3

,br,t

,1ĝ 3

,bc,0

ĝ 3,br,h

,0ĝ 3

,br,t

,0ĝ 3

,bc,1

ĝ 3,br,h

,1ĝ 3

,br,t

,1

C0

0.00

760.

0077

0.00

770.

0023

0.0

023

0.0

023

0.0

102

0.0

108

0.0

111

0.0

044

0.0

044

0.0

044

C1

0.56

860.

0124

0.00

800.

6297

0.0

075

0.0

024

0.8

675

0.0

167

0.0

110

1.0

010

0.0

115

0.0

046

C2

1.00

460.

0141

0.00

790.

9907

0.0

069

0.0

023

1.1

809

0.0

177

0.0

110

1.1

953

0.0

096

0.0

045

Tab

le26

:Medise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,3

un

der

diff

eren

tco

nta

min

atio

ns

and

mis

sing

mec

han

ism

s.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ 4

,bc,0

ĝ 4,br,h

,0ĝ 4

,br,t

,0ĝ 4

,bc,1

ĝ 4,br,h

,1ĝ 4

,br,t

,1ĝ 4

,bc,0

ĝ 4,br,h

,0ĝ 4

,br,t

,0ĝ 4

,bc,1

ĝ 4,br,h

,1ĝ 4

,br,t

,1

C0

0.00

560.

0057

0.00

580.

0022

0.0

022

0.0

022

0.0

073

0.0

076

0.0

080

0.0

034

0.0

034

0.0

034

C1

0.65

100.

0106

0.00

610.

7153

0.0

078

0.0

023

1.0

298

0.0

141

0.0

080

1.1

628

0.0

112

0.0

037

C2

1.04

840.

0115

0.00

591.

0447

0.0

066

0.0

022

1.3

321

0.0

140

0.0

079

1.3

391

0.0

084

0.0

035

Tab

le27

:Medise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,4

un

der

diff

eren

tco

nta

min

atio

ns

and

mis

sing

mec

han

ism

s.

22

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1ĝ b

c,0

ĝ br,h

,0ĝ b

r,t

,0ĝ b

c,1

ĝ br,h

,1ĝ b

r,t

,1

Medise(C

1)

Medise(C

0)

587.

9280

3.99

011.

0632

3284.7

120

18.3

599

1.1

840

616.5

264

3.3

837

1.0

457

2694.8

344

13.3

422

1.2

335

Medise(C

2)

Medise(C

0)

386.

7698

2.76

231.

0253

2075.0

990

8.8

790

1.0

682

437.7

162

2.5

680

0.9

951

1820.7

418

6.6

007

1.0

631

Tab

le28

:R

atio

bet

wee

nth

eMedise

of

the

esti

mat

ors

ofth

ead

dit

ive

com

pon

entg 0

un

der

the

con

sid

ered

conta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

anis

ms.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1ĝ 1

,bc,0

ĝ 1,br,h

,0ĝ 1

,br,t

,0ĝ 1

,bc,1

ĝ 1,br,h

,1ĝ 1

,br,t

,1

Medise(C

1)

Medise(C

0)

129.

7275

1.91

271.

0513

460.2

723

4.5

320

1.0

668

156.4

803

1.9

257

1.0

649

466.3

972

3.8

856

1.0

932

Medise(C

2)

Medise(C

0)

241.

3087

2.09

091.

0125

753.7

706

4.3

051

1.0

116

224.7

727

1.8

676

1.0

267

596.6

858

3.1

300

1.0

223

Tab

le29

:R

ati

ob

etw

een

theMedise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,1

un

der

the

consi

der

edco

nta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

an

ism

s.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1ĝ 2

,bc,0

ĝ 2,br,h

,0ĝ 2

,br,t

,0ĝ 2

,bc,1

ĝ 2,br,h

,1ĝ 2

,br,t

,1

Medise(C

1)

Medise(C

0)

253.

9908

2.82

921.

0822

557.0

175

5.1

005

1.0

940

258.5

182

2.4

780

1.0

665

524.3

816

4.0

351

1.1

120

Medise(C

2)

Medise(C

0)

464.

5554

3.14

241.

0488

877.3

765

4.7

631

1.0

244

418.1

190

2.7

026

1.0

618

750.5

317

3.6

802

1.0

270

Tab

le30

:R

ati

ob

etw

een

theMedise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,2

un

der

the

consi

der

edco

nta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

an

ism

s.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ 3

,bc,0

ĝ 3,br,h

,0ĝ 3

,br,t

,0ĝ 3

,bc,1

ĝ 3,br,h

,1ĝ 3

,br,t

,1ĝ 3

,bc,0

ĝ 3,br,h

,0ĝ 3

,br,t

,0ĝ 3

,bc,1

ĝ 3,br,h

,1ĝ 3

,br,t

,1

Medise(C

1)

Medise(C

0)

75.1

423

1.60

251.

0372

273.7

164

3.2

475

1.0

414

85.4

013

1.5

531

0.9

944

226.6

848

2.6

012

1.0

510

Medise(C

2)

Medise(C

0))

132.

7657

1.81

891.

0282

430.6

451

3.0

177

1.0

127

116.2

487

1.6

494

0.9

913

270.6

875

2.1

641

1.0

084

Tab

le31

:R

ati

ob

etw

een

theMedise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,3

un

der

the

consi

der

edco

nta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

an

ism

s.

p(x

)≡

1p4(x

)=

0.4

+0.5

cos2

(x1x3

+0.2

)ĝ 4

,bc,0

ĝ 4,br,h

,0ĝ 4

,br,t

,0ĝ 4

,bc,1

ĝ 4,br,h

,1ĝ 4

,br,t

,1ĝ 4

,bc,0

ĝ 4,br,h

,0ĝ 4

,br,t

,0ĝ 4

,bc,1

ĝ 4,br,h

,1ĝ 4

,br,t

,1

Medise(C

1)

Medise(C

0)

115.

3211

1.85

211.

0528

331.9

275

3.6

263

1.0

442

140.1

222

1.8

468

0.9

912

344.1

223

3.3

286

1.0

869

Medise(C

2)

Medise(C

0)

185.

7104

2.01

151.

0223

484.7

585

3.0

544

1.0

200

181.2

518

1.8

305

0.9

832

396.2

957

2.4

830

1.0

267

Tab

le32

:R

ati

ob

etw

een

theMedise

ofth

ees

tim

ator

sof

the

add

itiv

eco

mp

onen

tg 0,2

un

der

the

consi

der

edco

nta

min

atio

ns

and

un

der

clea

nd

ata

for

the

two

mis

sin

gm

ech

an

ism

s.

23

5 Empirical Influence

A well-known measure of robustness of an estimat

robust estimators for additive models using back ttingmatias/rbf_tr.pdf · matias salibi an{barrera...

Documents