testing the equality of two regression curves using linear smoothers

Statistics & Probability Letters 12 (1991) 239-247

North-Holland September 1991

Testing the equality of two regression curves using linear smoothers

Eileen King The Procter and Gamble Company, Cincinnati, OH, USA

Jeffrey D. Hart and Thomas E. Wehrly Department of Statistics, Texas A&M Uniuersity, College Station, TX 77843, USA

Received July 1990

Revised December 1990

Abstract: Suppose that data (y, I) are observed from two regression models, y = f(x)+ e and z = g(x) + 7. Of interest is testing the

hypothesis H: f = g without assuming that f or g is in a parametric family. A test based on the difference between linear, but

nonparametric, estimates of f and g is proposed. The exact distribution of the test statistic is obtained on the assumption that the

errors in the two regression models are normally distributed. Asymptotic distribution theory is outlined under more general

conditions on the errors. It is shown by simulation that the test based on the assumption of normal errors is reasonably robust to

departures from normality. A data analysis illustrates that, in addition to being attractive descriptive devices, nonparametric

smoothers can be valuable inference tools.

1. Introduction

A common problem in experimental work is the comparison of two regression curves. Plant physi- ology and analysis of animal growth curves are examples of areas in which this problem fre- quently arises. Typically, the two curves correspond to treatment and control groups, and the predictor variable in the regression is time or a covariate. For any given value of the predictor variable, suppose that one measurement is re- corded for each regression model and that the measurements across settings of the predictor variable are independent of one other. We shall propose a nonparametric method of testing the hypothesis that the two curves are identical. This is a useful methodology in cases where (a) the experimenter has little (if any) indication of an ap-

Research supported in part by ONR Contract NOOO14-85-K-

0723.

propriate parametric regression model, or (b) the experimenter desires a quick and easy significance test without any serious modelling of the two regressions. Our test is a function of the difference between two nonparametric curve estimates, where each estimate is a linear smoother. We feel that comparing plots of, e.g., kernel smoothers is a particularly good way to describe differences between two regression curves. Our test will provide a formal significance test that can be used to supplement such a graphical comparison.

The setting we shall consider can be described more precisely as follows. The observed data are (x,, y,, z,), i= l,..., n, where

yj=f(xi)+&; and zj=g(x,)+ni, i=l,...,n.

(1.1)

The design points, x,, are fixed with 0 < x1 < ~2 ( . . . < x, G 1, and the regression functions I and g are smooth (at least uniformly continuous on [O,l]). Each &i is assumed to be normally dis-

0167-7152/91/$03.50 0 1991 - Elsevier Science Publishers B.V. (North-Holland) 239

Volume 12. Number 3 STATISTICS & PROBABILITY LETTERS September 1991

tributed with mean 0 and variance u*, while each ni is normal with mean 0 and variance r2. Fur- thermore, the random variables er, . . . , E,,

1713.. .2 9n are assumed to be mutually independent.

The functions f and g may be estimated using linear smoothers. For example, an estimate of f(x) (x E [0, 11) is h(x) = X~Z1wi(x; h)y,, where w,(x; A),..., w,,(x; h) are weights that depend on a smoothing parameter h. The weights could be those of a kernel estimator or a smoothing spline - see Eubank (1988) for a discussion of each of these linear estimators.

We are interested in testing the hypothesis

H,: f(x) = g(x) for all x E [0, l] (I.21

against the alternative

Hi: f(x) #g(x) for some x E [0, 11.

We propose that these hypotheses be tested using the statistic

T, = np'C;=l(_&(xj) -ih(x,))2 A2 5 s

(l-3)

where s^* is some estimate of scale. The null hypothesis H, is rejected in favor of H, (at level a) if the observed value of T, is greater than the 1 - cy quantile of T,,‘s distribution under Ha. A motivation for the statistic (1.3) is that its numera- tor is a consistent estimator of

D = J’(f(x) -g(x))*+) dx, 0

where r is the density of the design points xi. Assuming r(x) > 0 for all x, and since f and g are continuous, D is zero if and only if Ho is true. Therefore, ‘large’ values of T, favor H, and ‘small values favor H,. It is important to realize that our assumption of normal errors is needed only for formal validity of the proposed test. It turns out (see Section 4) that the normal-theory test is reasonably robust to departures from normality.

The area of hypothesis testing in nonparametric regression is relatively new in the literature. Cox, Koh, Wahba and Yandell (1988) examine the one-sample test of the hypothesis that f(x), x E

[0, 11, is a polynomial of degree m - 1 or less versus the alternative that f is ‘smooth’. Estima- tion of f is by the technique of smoothing splines. Eubank and Spiegelman (1990) propose alternative nonparametric smoothing techniques to test the goodness of fit of a linear model. Raz (1990) develops a randomization test to determine if a real-valued response variable and a vector-valued explanatory variable are related. Hardle and Mar- ron (1990) compare two regression curves that have been estimated using nonparametric techniques. They assume the curves are the same up to a parametric transformation of the predictor and response variables. Hence, in their model the hypothesis (1.2) is equivalent to a hypothesis that depends on only a few parameters. Knafl, Sacks, and Ylvisaker (1985) propose methods of con- structing confidence bands for regression functions, including the nonparametric case. Hall and Hart (1990) propose a bootstrap test for detecting the difference between regression functions. Other recent work on testing a parametric null hypothesis using nonparametric smoothers includes Dabrowska (1987), Horvath, Yandell and Sen (1990 ) and Kozek (1990, 1991).

The rest of the paper will be arranged as follows. In Section 2 we define our test statistic using matrix notation and discuss a number of its distributional properties. Implementation of the test in practice and an example involving a dietary experiment with cows are the subject of Section 3. Finally, in Section 4 results of a simulation study are presented. The study suggests that the level of our test is relatively insensitive to departures from normally distributed errors, and that its power compares favorably with that of traditional com- petitors like the paired t-test.

2. Distributional properties of the test statistic

We first introduce some notation that makes it easier to describe the distribution of our test statistic. Let y and z denote, respectively, the vectors of observations (yi, . . . , y,,)’ and (z,, . . . , z,)‘, and define d = y - z. Let W, be the n X n matrix with ijth element wj(xi; h). Defining d, =y, - zi, i =

240

Volume 12, Number 3 STATISTICS & PROBABILITY LETTERS September 1991

1 , . . . , n, our estimator of scale is of the type suggested by Rice (1984), namely

6; = & 2 (d; - d,_l)2. ^ I=2

This estimator may be written in the form 2: = d’G’Gd/n, where G is the (n - 1) X n matrix defined by

I-1 1 0 0 ... 0 0 o\ 0 0 0

p y1 ; ; 1:’ : : :

I (j (j (j(j ..:o -1 1 I

Finally, taking s^* = I$, the test statistic (1.3) may be written as a ratio of quadratic forms:

T, = d’ W; W,d/d’G’Gd.

2. I. Null distribution of test statistic

It is clear that, under the null hypothesis (1.2), T,

has the same distribution as a' WL Wha/a’G’Ga,

where the n components of a are i.i.d. random variables having the standard normal distribution. Therefore, for any 9,

In principle, any scale estimator of the form d’Bd

could be used in the statistic (1.3). For example, one could use sj = Cl,l(d, - 2)*/n. Both si

and 6: are consistent (as n -+ bo) for cri =

Var(d,) under the null hypothesis (1.2). However, we prefer 3: since it is also consistent for uj

under the alternative hypothesis, so long as lim n_,mmaxj=, ,...,” (xi--~~_,)=O.Bycontrast, si is consistent for

P(T,>+)=P[a’(W,‘W,--$G’G)a>O]. (2.1) ef-g)

The probability (2.1) may be calculated for any n

and any (fixed) value of the smoothing parameter h by using Theorem 2.1 in Box (1954). Box’s theorem implies that

= 0; + jkf(xJ -g(x)) - (i~?)]~r(x) dx, 0

P[a’(W,‘W,-$G’G) a>O]=P( ~lhjX~>O),

G-2)

where f- 2 = /,‘( f(x) - g(x))r(x) dx. It follows that si is consistent for 0: if and only if f(x) - g(x) = c for all x. Since a*( f - g) > uj for non- constant alternatives, a test using 3: will tend to have larger power than a test using s$ hence our preference for 6:.

where r=rank(W,‘W,-QG’G), Ar,...,h, are the real nonzero eigenvalues of Wi W,, - $G’G,

and xf,. . . , x: are independent &i-squared random variables each with 1 degree of freedom.

2.3. Nonnormal errors

To perform the desired hypothesis test there are at least two possible approaches. One could approximate percentiles of T,‘s null distribution by using (2.1), (2.2) and one of the suggested proce- dures for approximating the distribution of linear combinations of independent x2 random variables (see, e.g., Box, 1954; Imhof, 1961; and Farebrother, 1990). A second approach approach is to simply

When the errors in model (1.1) are not normally distributed, (2.1) and (2.2) at best only approxi- mately describe the null distribution of the test statistic. However, our experience indicates that this approximation often works quite well even when the sample size is as small as n = 20. Evi- dence of this will be seen in the simulation 2tudy of Section 4. King (1988) shows that, when fh and & are Gasser-Mtiller type kernel estimators of f

use Monte Carlo methods to approximate the null distribution of T,. The latter approach is the one we have taken in data analyses and simulation studies. For a given set of data, one may perform

the hypothesis test by rejecting H, only when P < a, where a is the desired significance level,

P = P [a’ Wl W,a/a’G’Ga > T:“],

Ohs and T, is the observed valued of T,. The quan- tity P may be estimated to any desired degree of precision by generating m independent sets of n i.i.d. standard normal variates and taking m suffi-

ciently large.

2.2, Scale estimation

241


and g (see Gasser and Miiller, 1979) a properly standardized version of

._ j=l

is asymptotically normal as n + co, h + 0 and nh + 00. One can use this fact to perform an asymptotically valid test of H, vs. H, even when the normality assumption is inappropriate. The simulation study of King (1988), though, suggests that, when n is small, using the distribution in (2.2) as an approximation tends to yield smaller level error than does the test based on the asymptotic normality result.

2.4. Power of the test

Define t, to be the 1 - (Y quantile of tion under H,. When the alternative true, the power of our test is

T,‘s distribu- hypothesis is

P(T,>t,)=P[d’(W;W,-t,G’G)d>O]

Lj=l _I

where the h,‘s are as defined in Section 2.1, Sj is a linear combination of (f (x1) - g(x,))/u,, . . . ,

(f(xn) - g(x,))/%? and the x:(1, 8,‘) are independent, noncentral &i-squared random variables each with 1 degree of freedom and noncentrality parameter S,!.

King (1988) studied asymptotic power properties of our proposed test when the linear smoother is a Gasser-Miiller kernel estimator with smooth, symmetric kernel K having support ( - 1, 1). Con- sider a sequence of size (Y tests for which h -+ 0

and nh + 00. For a fixed alternative f - g, King (1988) showed that the power of such tests tends tolasn+oo.

King also (1988) studied asymptotic power for a sequence of local alternatives. For a fixed function u (not identical to 0), define the local alternatives u,(x) = f(x) - g(x) = u(x). (n~&-“~, and again let h + 0 and nh + 00. Then King showed that, as n + cc, the power of our test tends to

p z> z _ Jo+J2b)r(4 dx i

a 1 u,‘JB ’

where Z has the standard normal distribution with 1 - (Y quantile z, and

K(z)K(z+y) dz

2.5. Choice of smoothing parameter

To each choice of smoothing parameter h there corresponds a different test of the null hypothesis (1.2). Ideally one would choose h to maximize power. In general, though, the most powerful smoothing parameter will depend on the unknown function (f - g)/u,.

One way of proceeding is to fix h by using a data-based method such as cross-validation (see, e.g., Hardle, Hall and Marron, 1988). There are two potential deficiencies of this approach. First of all, this procedure adds randomness to the test, and hence the distribution theory of Section 2.1 is no longer valid. On the other hand, if one uses simulation to approximate the null distribution, the extra randomness could be accounted for in the simulation. Another objection to using cross- validation is that it is designed for a different purpose than maximizing the power of a test. Cross-validation on the d, ‘s tends to produce a good point estimate of f - g, i.e. one which comes close to minimizing C[A(x,) - &(xi) - (f(x,) -

g(x,))]*. Our experience is that, when using a kernel estimate in (1.3), the bandwidth h that maximizes power tends to be quite a bit larger than ones which produce visually good estimates of f-g.

In practice the experimenter often has a rea- sonable idea, at least qualitatively, of the type of alternative to expect. Therefore, another approach would be to choose a bandwidth that is optimal over a class of alternatives of the expected type. Monte Carlo can be used to approximate the optimal bandwidth for any particular alternative

(f-g)/%? If the optimal bandwidth varies significantly over the alternatives of interest, then one could use the test statistic T,* = C,riTRjr where T,,, is the test statistic of form (1.3) which maximizes power for a particular alternative, and VT; is the prior probability attached to that particular alternative. The statistic T,* is still a ratio of

242


quadratic forms, and hence its distribution is no harder to determine than that of (1.3).

3. An illustration

Matis, Wehrly and Ellis (1989) developed nonlin- ear models for the digesta flow through the gastro- intestinal tracts of ruminants. These models were fit to the observed concentration of marker in the feces of four cows given each of two types of diet. The data were originally presented by Ferreiro, Boodoo, Sutton, and Bishop (1980). The experiment was designed to compare the effects of two diets, one a chopped straw and the other a ground and pelleted straw, denoted as treatments C and I’, respectively. Four cows were used in the experiment and each received the two diets. The straw was stained with magenta, and each cow was given a meal with the stained straw. The feces were collected at regular time intervals, and the observed concentration of stained particles was re- corded.

It was noted that, for each cow, the best fitting models were qualitatively different for the two diets. However, for a given diet the same model fit all the cows quite well. Since the emphasis in Matis, Wehrly and Ellis (1989) was on developing and fitting models, no tests of significance were performed to determine whether the response curves differed either for the two diets given to a single cow, or for two cows given the same diet. We will use our methodology to test whether Cow 7 had different responses to diets C and I’, and whether Cows 6 and 8 had different responses to diet C.

Defining

K(u)=0.75(1-z+(]U] <l),

estimates of concentration-time curves for Cow 7 were computed as follows. We transformed the time scale to produce equally spaced points on the unit interval, computed the Gasser-Mtiller (1979) smoother

J(x) = ; ;&Y,/” (i-1)/n

K(Y) ds,

A

0 50 100 150 200 250

time (in hours)

Fig. 1. Kernel regression estimates of the concentration-time

curves for Cow 7 under Diets C and V. The observed responses are given by x for Diet C and by A for Diet V. The solid and

dashed lines denote, respectively, Diet C and V estimated response curves.

where x E [0, 11, and then transformed back to the original time scale using the technique of Carroll and Hlirdle (1989). Effectively, this procedure pro- duces a kernel estimate whose bandwidth varies with design density. Figure 1 shows the regression estimates and the observed data. The test statistic T, was computed for each of the bandwidths h = 0.1, 0.2, 0.3, 0.4 and 0.5, where h is measured on the transformed time scale. The P-value was estimated according to the method described in Section 2.1 by using m = 8000 simulations, so that the error of estimation would be less than 0.01 with 95% confidence. The values of the test statistics and P-values are presented in Table 1. A similar analysis was performed to compare the responses of Cows 6 and 8 to Diet C. Figure 2 shows the regression estimates, and Table 2 pre-

Table 1

Values of the test statistic T, for comparing Diets C and V for Cow 7, and the corresponding estimated P-values based on 8000 simulations

h

0.1 0.2 0.3 0.4 0.5

T, 5.10 4.20 3.22 2.51 2.28 P 0.0000 0.0000 0.000125 0.000125 0.000375

243

Volume 12. Number 3 STATISTICS & PROBABILITY LETTERS September 1991

0 50 100 150 200 250 300

lime (in hours)

Fig. 2. Kernel regression estimates of the concentration-time curves for Cows 6 and 8 under Diet C. The observed responses

are given by x for Cow 6 and by A for Cow 8. The solid and dashed lines denote, respectively, Cow 6 and 8 estimated

response curves.

sents the values of the test statistics and the P-values.

The P-values in Table 1 indicate strong evidence that the curves for the two diets differ for Cow 7. Even smaller P-values were observed for the other three cows in the study. The P-values in Table 2 indicate that the Diet C response curves of Cows 6 and 8 did not differ significantly. Similar pairwise comparisons for the other cows within diets could also be performed for a more complete analysis.

Though one would naturally expect some difference in before and after readings on a given animal, our test provides a means of formally establishing that the observed difference is larger than would be expected from experimental error alone. If the tests applied to each of a few animals are all significant, and the difference in before and

Table 2 Values of the test statistic T, for comparing Cows 6 and 8 under Diet C, and the corresponding estimated P-values based on 8000 simulations

T, P

h

0.1

0.69 0.212

0.2 0.3 0.4 0.5

0.44 0.40 0.39 0.43 0.243 0.196 0.177 0.124

after readings is consistently in one direction, then one feels confident that a treatment effect has been detected. When a fairly large number of experimental units is available, it would be more appropriate to perform a repeated measures analysis. In a repeated measures design, covariance between measurements on the same subject would typically be modelled through the error terms. In our analysis this covariance was modelled through the mean functions f and g.

4. A simulation study

A simulation study was conducted to investigate the level error and power of our proposed test. Throughout the study the design points were evenly spaced, and the test statistic T, was based on a kernel smoother of the form

where

K(u) = 0.75(1- u2)1( (24 1 6 1).

To avoid large boundary bias, boundary kernels were used at design points within a bandwidth of either 0 or 1 (see Gasser and Miiller, 1979). Tests of nominal level 0.05 and 0.10 were considered, and the following three factors were of interest: (i) the distribution of the errors, (ii) the sample size n and (iii) the bandwidth h. Three models for the errors were used. In two of these pi,. . . , E,, ql,. . . , qn were taken to be independent and iden- tically distributed, with fiei having standard normal distribution in one case and 2&i distributed as t with 4 degrees of freedom in the other. To investigate how skewness of ei - vi affects the level of our test, we studied a third model in which E, and vi were distributed, respectively, as 3(x24 - 4)/m and (x’, - 4)/m, where xi has the &i-squared distribution with 4 degrees of freedom. In each of the three models ei - vi has variance 1, and hence power comparisons across error models are not confounded by scale differences. Finally, two choices for the sample size were considered, n = 20 and 50, and four bandwidths were used at each n (see Tables 3 and 4).

244


Table 3 Rejection rates of various tests under the null hypothesis (n = 20)

a

0.05

0.10

Error distribution

Normal

X2

t

Normal

X2

t

Test

0,

0.040

0.060

0.028

0.098

0.114

0.084

; = 0.10) $ = 0.15) ; = 0.20) ; = 0.25)

0.044 0.040 0.040 0.038

0.056 0.070 0.078 0.082

0.042 0.048 0.048 0.052

0.078 0.090 0.094 0.090

0.120 0.118 0.122 0.124

0.096 0.098 0.098 0.104

Paired t

0.050

0.074

0.048

0.106

0.108

0.086

Entries in the same row are based on the same set of 500 replications. The error models are explained in the text. X2 refers to the

linear combination of independent Xi random variables, while t refers to the difference of independent tq variates.

All tests based on T, were performed as described in Section 2 with m = 500.

For the sake of comparison, two tests in addition to the one based on T, were considered. These were the usual paired t-test (where a pair of observations is (y,, zl)) and a test based on the statistic

D = Z=zdrdr-i n sjJn-1 ’

where si is the sample variance of the d,‘s. Using a central limit theorem for l-dependent random variables, it is easy to show that D,, has an asymptotically standard normal distribution under H,. Therefore, in the simulation study we used the test : “reject H, (at level a) if D,, exceeds the 1 - (Y quantile of the standard normal distribution.” Es- sentially, the paired t-test and the test based on D,, are extremes of the T,-based tests, with the t-test corresponding to a very large bandwidth and the test based on D,, corresponding to no smoothing of the data. The statistic D,, is very similar to the Durbin-Watson statistic, which is most often used to detect serial correlation among residuals. See Munson and Jemigan (1989) for another statistic similar to D,,.

The results of the level study are given in Tables 3 and 4. At each n and error model 500 replications were performed, and all six of the tests were performed at both OL levels in a given replication. Replications thereby play the role of blocks for purposes of comparing tests. Tables 3 and 4 show that the tests based on T, held their level reasonably well even when the distribution of

E, - nr was skewed or long-tailed. With skewed di’s, there is evidence, in particular at n = 20, that the T,-based tests are liberal, but the excess of the rejection rate over the nominal level in most cases is not large. When the errors are normally distributed, of course, the t-test is exact and our tests based on T, are exact up to approximating the critical values by Monte Carlo methods. It is thus not surprising that the T,-based test performed well in the normal errors case.

To investigate power, five choices for u =f - g (other than u = 0) were made. Each choice was of the form u(x) = a + bx + c sin(dx), x E [0, 11, where a, b, c and d are constants. The constants used in the simulation study are given in Table 5.

o.o{ , , ( / ,

D. Tn Tn Tll T” Paired t

(h = .04) (h = JO) (h = 20) (h = .25)

ERROR (cu Normal scw Chi(4) -0 l(4)

Fig. 3. Estimated power for the shift alternative u(x) = 0.47

(based on 500 replications; n = 50, (I = 0.05).

245


Table 4

Rejection rates of various tests under the null hypothesis (n = 50)

a Error distribution Test

0.05

0.10

Normal

X2

t

Normal

X2

t

D”

0.062

0.056

0.062

0.102

0.106

0.118

T, (h = 0.04)

0.052

0.062

0.072

0.104

0.116

0.128

; = 0.10)

0.050

0.066

0.072

0.108

0.122

0.136

T, T, (h = 0.20) (h = 0.25)

0.050 0.052

0.058 0.054

0.064 0.064

0.098 0.092

0.106 0.098

0.112 0.110

Paired r

0.060

0.038

0.048

0.110

0.088

0.088

See notes of Table 3.

Two of the alternatives correspond to shifts, one is low power compared to the other tests. The results

linear and integrates to 0, and the other two for n = 50, cx = 0.10 were very similar to those at oscillate about 0. In conducting the power study, n = 50, (Y = 0.05. At n = 20, there was no clear

the same sets of simulated errors were used as in pattern to indicate that smoothing usually leads to

the level study. Hence, at a given n and error a more powerful test. For n = 20, except in the

model, 5 x 500 data sets of the form d,, = y, (i/n) case of shift alternatives, the test based on D,, (i.e., + E, - q,, j = 1,. . . ,5, i = 1,. . . , n, were obtained, no smoothing) had power comparable to or higher

where u, is a given alternative to H, and e1 - than the ‘smooth’ tests. However, at n = 50, the

nl,...,En- n,, was a set of differences from the superiority of tests based on smoothing becomes level study. evident.

Here we only display power results for the case

n = 50 and (Y = 0.05 (see Figures 3-5). As would be expected, the t-test was more powerful than all others against shift alternatives. However, when the alternative was such that u alternated between positive and negative values, the t-test had very

In summary, all tests considered did a reasona- ble job of holding their levels, and so none should be rejected on the grounds of being patently in- valid. In terms of power, there was some sensitiv- ity of the results to choice of bandwidth. (One may as well regard the t-test and D,-based tests as special cases of T,-based tests with, respectively, very large and small bandwidths.) The discussion

10.

R 08- e

Dn Tn Tn Tn T” Paired t

(j, = .04) (h = .lO) (h = 20) (h = .25)

ERRrEi - HormaI E-BE! Chi(l) -0 T(4)

Fig. 4. Estimated power for the straight line alternative u(x) = 0.814-1.628x (based on 500 replications; n = 50, a = 0.05).

I 0

I

(h “.O,)

Tll TII Tn Paired t

(h = .lO) (h = .20) (h = .25)

1 ERROR m Normal - ChiO) e-04 IO) I

Fig. 5. Estimated power for the sine curve alternative u(x) = 0.671 sin(9.42x) (based on 500 replications; n = 50, (Y = 0.05).


Table 5

Models used in the simulation study u(x) = a + bx + c sin( dx)

a b c d j;“*(u)& $o(u)du

0 0 0 1 0 0 0.1565 0 0 1 0.0245 0.1565

0.4700 0 0 1 0.2209 0.4700

0.8140 - 1.628 0 1 0.2209 0

0 0 0.671 9.42 0.2213 0.0712

0 0 1.233 9.42 0.7471 0.1309

in Section 2.5 provides some ideas on how to choose the bandwidth in practice.

References

Box, G.E.P. (1954) Some theorems on quadratic forms applied

in the study of analysis of variance problems, I. effect of

inequality of variance in the one-way classification, Ann.

Math. Statist. 25, 290-302.

Carroll, R.J. and W. Hlrdle (1989), Symmetrized nearest

neighbor regression estimates, Stntist. Probab. Lett. 7, 315-

318.

Cox, D., E. Koh, G. Wahba and B.S. Yandell (1988) Testing

the (parametric) null model hypothesis in (semiparametric)

partial and generalized spline models, Ann. Statist. 16,

1133119.

Dabrowska, D.M. (1987), Non-parametric regression with

censored survival time data, Stand. J. Statist. 14, 181-197.

Eubank, R.L. (1988) Spline Smoothing and Nonparametric

Regression (Dekker, New York).

Eubank, R.L. and C.H. Spiegelman (1990), Testing the good-

ness of fit of a linear model via nonparametric regression

techniques, J. Amer. Statist. Assoc. 85, 387-392.

Farebrother, R.W. (1990) The distribution of a quadratic form

in normal variables, Appl. Statist. 39, 294-309.

Ferreiro, G.J., A.A. Boodoo, J.D. Sutton and C. Bishop (1980)

National Institute for Research in Dairying, 1980 Report

(Shinfield, UK).

Gasser, T. and H.G. Mtiller (1979) Kernel estimation of

regression functions, in: Gasser and Rosenblatt, eds., Smoothing Techniques for Curve Estimation, Lecture Notes

in Mathematics No. 757 (Springer, Heidelberg) pp. 23-68.

Hall, P. and J.D. Hart (1990) Bootstrap test for difference

between means in nonparametric regression, J. Amer.

Statist. Assoc. 85, 1039-1049.

Hardle, W., P. Hall and J.S. Marron (1988), How far are

automatically chosen regression smoothing parameters from

the optimum? (with discussion), J. Amer. Statist. Assoc. 83,

86-101.

Hlrdle, W. and J.S. Marron (1990) Semiparametric compari-

son of regression curves, Ann. Statist. 18, 63-89.

Horvath, L., B.S. Yandell and A. Sen (1990), Convergence of

kernel regression estimators, Tech. Rept. No. 869, Dept. of

Statist., Univ. of Wisconsin (Madison, WI).

Imhof, J.P. (1961), Computing the distribution of quadratic

forms in normal variables, Biometrlka 48, 419-426.

King, E.C. (1988) A test of the equality of two regression

curves based on kernel smoothers, Ph.D. dissertation, Dept.

of Statist., Texas A&M Univ. (College Station, TX).

Knafl, G., J. Sacks and D. Ylvisaker (1985) Confidence bands

for regression functions, J. Amer. Statist. Assoc. 80, 683-

691.

Kozek, A.S. (1990) A nonparametric test of fit of a linear

model, Comm. Statwt. - Theory Methods 19(l), 169-179.

Kozek, AS. (1991), A nonparametric test of fit of a parametric

model, J. Multiuariate Anal. 37, 66-75.

Matis, J.H., T.E. Wehrly and W.C. Ellis (1989). Some gener-

alized stochastic compartment models for digesta flow, Biometrics 45, 703-720.

Munson, P.J. and R.W. Jernigan (1989) A cubic spline exten-

sion of the Durbin-Watson test, Biometrika 76, 39-47.

Raz, J. (1990), Testing for no effect when estimating a smooth

function by nonparametric regression: a randomization ap-

proach, J. Amer. Statist. Assoc. 85, 132-138.

Rice, J. (1984), Bandwidth choice for nonparametric regres-

sion, Ann. Statist. 12, 1215-1230.

247

testing the equality of two regression curves using linear smoothers

Documents