variable selection and function estimation in additive nonparametric regression using a data-based...

2
Variable Selection and Function Estimation in Additive Nonparametric Regression Using a Data-Based Prior: Comment Author(s): Peter Muller Source: Journal of the American Statistical Association, Vol. 94, No. 447 (Sep., 1999), p. 803 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2669994 . Accessed: 15/06/2014 15:16 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 195.34.78.61 on Sun, 15 Jun 2014 15:16:41 PM All use subject to JSTOR Terms and Conditions

Upload: peter-muller

Post on 20-Jan-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Variable Selection and Function Estimation in Additive Nonparametric Regression Using aData-Based Prior: CommentAuthor(s): Peter MullerSource: Journal of the American Statistical Association, Vol. 94, No. 447 (Sep., 1999), p. 803Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2669994 .

Accessed: 15/06/2014 15:16

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 195.34.78.61 on Sun, 15 Jun 2014 15:16:41 PMAll use subject to JSTOR Terms and Conditions

Comment Peter MULLER

I congratulate the authors on a delightful article. They bring together ideas and state-of-the art computational methods for solving several related smaller problems to address the larger encompassing problem of additive non- parametric regression. In doing so, they had to make a se- quence of modeling and implementation choices. Some of these decisions are necessarily arbitrary. In the discussion I highlight some of these choices, point out alternatives, and question the choices where I feel appropriate. I do so without questioning the overarching goal of introducing and implementing nonparametric additive models.

The first concern is with the chosen probability model for the unknown component regression functions fj (.). The au- thors propose using an integrated Wiener process prior plus a linear term. The justification relies mainly on the fact that the posterior mean of fj (.) will be a cubic smoothing spline. If a cubic smoothing spline is attractive, then why not pa- rameterize fj (.) as such? Under the integrated Wiener pro- cess assumption, only the posterior mean is a cubic spline, but of course individual realizations of the random func- tion fj are not. I realize that a full-fledged knot selection would involve excessive computation effort. But reasonable constraints could keep complexity within limits. Denison, Mallick, and Smith (1998) successfully illustrated the use of cubic smoothing splines as a basis for random functions.

A related critical choice is the data-based prior. Although I realize the need for some generic and computationally convenient assumption, I wonder whether "double-dipping" the data could be avoided. For example, Denison et al. (1998) considered additive models with cubic splines as random component regression functions. Instead of data- dependent priors, they used a Poisson distribution on the number k of knots and, conditional on k, a uniform prior on the coefficients. Is this maybe yet another reason to re- consider the choice of prior model and parameterization for fj (.)? Alternatively, staying within the realm of data- dependent priors, I would find expected posterior priors (Perez 1998) very attractive. The idea is to use as prior the posterior based on latent data x*, possibly generated

Peter Muller is Associate Professor, Institute of Statistics and Decision Sciences, Duke University, Durham, NC 27708 (E-mail: [email protected]).

from the empirical distribution, and to marginalize over x* in the final analysis. Computational convenience should be exactly as in the proposed scheme.

I could not find an explicit statement of the marginal prior on J, but conclude from the discussion that a uni- form p(J) a const is implied. Of course, uniform in the indicator vector J is not uniform in the number of included terms. Especially when the number of possible candidates is large, this seems an unreasonable choice. Including an explicit subjective prior on J would be very easy. On a related issue, the use of uniform priors on variance param- eters is somewhat unusual. The authors comment at the end of Section 2.2. that this seems to work well in practice, but maybe this is true only for problems with relatively sharply determined variance parameters.

In Figure 7 the authors show estimated component re- gression functions. Although I understand the technical reason for the somewhat strange appearance of the error bounds, I wonder whether alternative ways to summarize and display the inference would be more appropriate. For example, posterior means and standard deviation bounds for fj ( ) + /o would look very similar, except for removing the artefact of the zero uncertainty at si = 0.

Finally, the authors contend that few papers in the lit- erature consider model averaging and variable selection in nonparametric regression. Maybe a reference to re- cent literature on Bayesian inference for CART models (Chipman, George, and McCulloch 1998; Denison et al. 1999b) and MARS models (Denison et al. 1999a) might be appropriate.

ADDITIONAL REFERENCES Chipman, H., George, E., and McCulloch, R. (1998), "Bayesian CART

Model Search," Journal of the American Statistical Association, 93, 935- 960.

Denison, D. G. T., Mallick, B. K., and Smith, A. F. M. (1999a), "Bayesian MARS," unpublished manuscript submitted to Statistics and Computing.

(1999b), "A Bayesian CART Algorithm," Biometrika, 85, 363- 377.

Perez, J. M. (1998), "Development of Expected Posterior Prior Distribu- tions for Model Comparison," Ph.D. thesis, Purdue University, Dept. of Statistics.

? 1999 American Statistical Association Journal of the American Statistical Association

September 1999, Vol. 94, No. 447, Theory and Methods

803

This content downloaded from 195.34.78.61 on Sun, 15 Jun 2014 15:16:41 PMAll use subject to JSTOR Terms and Conditions