design issues for generalized linear models: a review · design issues for generalized linear...

25
arXiv:math/0701088v1 [math.ST] 3 Jan 2007 Statistical Science 2006, Vol. 21, No. 3, 376–399 DOI: 10.1214/088342306000000105 c Institute of Mathematical Statistics, 2006 Design Issues for Generalized Linear Models: A Review Andr´ e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh Abstract. Generalized linear models (GLMs) have been used quite ef- fectively in the modeling of a mean response under nonstandard con- ditions, where discrete as well as continuous data distributions can be accommodated. The choice of design for a GLM is a very important task in the development and building of an adequate model. However, one major problem that handicaps the construction of a GLM design is its dependence on the unknown parameters of the fitted model. Sev- eral approaches have been proposed in the past 25 years to solve this problem. These approaches, however, have provided only partial solu- tions that apply in only some special cases, and the problem, in general, remains largely unresolved. The purpose of this article is to focus atten- tion on the aforementioned dependence problem. We provide a survey of various existing techniques dealing with the dependence problem. This survey includes discussions concerning locally optimal designs, se- quential designs, Bayesian designs and the quantile dispersion graph approach for comparing designs for GLMs. Key words and phrases: Bayesian design, dependence on unknown parameters, locally optimal design, logistic regression, response surface methodology, quantal dispersion graphs, sequential design. 1. INTRODUCTION In many experimental situations, the modeling of a response of interest is carried out using regression Andr´ e I. Khuri is Professor, Department of Statistics, University of Florida, Gainesville, Florida 32611, USA e-mail: [email protected]fl.edu. Bhramar Mukherjee is Assistant Professor, Department of Statistics, University of Florida, Gainesville, Florida 32611, USA e-mail: [email protected]fl.edu. Bikas K. Sinha is Professor, Division of Theoretical Statistics and Mathematics, Indian Statistical Institute, Kolkata 700 108, India e-mail: [email protected]. Malay Ghosh is Distinguished Professor, Department of Statistics, University of Florida, Gainesville, Florida 32611, USA e-mail: [email protected]fl.edu. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in Statistical Science, 2006, Vol. 21, No. 3, 376–399. This reprint differs from the original in pagination and typographic detail. techniques. The precision of estimating the unknown parameters of a given model depends to a large ex- tent on the design used in the experiment. By design is meant the specification of the levels of the factors (control variables) that influence the response. The tools needed for the adequate selection of a design and the subsequent fitting and evaluation of the model, using the data generated by the design, have been developed in an area of experimental de- sign known as response surface methodology (RSM). This area was initially developed for the purpose of determining optimum operating conditions in the chemical industry. It is now used in a variety of fields and applications, not only in the physical and en- gineering sciences, but also in the biological, clini- cal, and social sciences. The article by Myers, Khuri and Carter (1989) provides a broad review of RSM (see also Myers, 1999). In addition, the three books by Box and Draper (1987), Myers and Montgomery (1995) and Khuri and Cornell (1996) give a compre- hensive coverage of the various techniques used in RSM. 1

Upload: others

Post on 18-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

arX

iv:m

ath/

0701

088v

1 [

mat

h.ST

] 3

Jan

200

7

Statistical Science

2006, Vol. 21, No. 3, 376–399DOI: 10.1214/088342306000000105c© Institute of Mathematical Statistics, 2006

Design Issues for Generalized LinearModels: A ReviewAndre I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

Abstract. Generalized linear models (GLMs) have been used quite ef-fectively in the modeling of a mean response under nonstandard con-ditions, where discrete as well as continuous data distributions can beaccommodated. The choice of design for a GLM is a very importanttask in the development and building of an adequate model. However,one major problem that handicaps the construction of a GLM designis its dependence on the unknown parameters of the fitted model. Sev-eral approaches have been proposed in the past 25 years to solve thisproblem. These approaches, however, have provided only partial solu-tions that apply in only some special cases, and the problem, in general,remains largely unresolved. The purpose of this article is to focus atten-tion on the aforementioned dependence problem. We provide a surveyof various existing techniques dealing with the dependence problem.This survey includes discussions concerning locally optimal designs, se-quential designs, Bayesian designs and the quantile dispersion graphapproach for comparing designs for GLMs.

Key words and phrases: Bayesian design, dependence on unknownparameters, locally optimal design, logistic regression, response surfacemethodology, quantal dispersion graphs, sequential design.

1. INTRODUCTION

In many experimental situations, the modeling ofa response of interest is carried out using regression

Andre I. Khuri is Professor, Department of Statistics,

University of Florida, Gainesville, Florida 32611, USA

e-mail: [email protected]. Bhramar Mukherjee is

Assistant Professor, Department of Statistics,

University of Florida, Gainesville, Florida 32611, USA

e-mail: [email protected]. Bikas K. Sinha is

Professor, Division of Theoretical Statistics and

Mathematics, Indian Statistical Institute, Kolkata 700

108, India e-mail: [email protected]. Malay Ghosh

is Distinguished Professor, Department of Statistics,

University of Florida, Gainesville, Florida 32611, USA

e-mail: [email protected].

This is an electronic reprint of the original articlepublished by the Institute of Mathematical Statistics inStatistical Science, 2006, Vol. 21, No. 3, 376–399. Thisreprint differs from the original in pagination andtypographic detail.

techniques. The precision of estimating the unknownparameters of a given model depends to a large ex-tent on the design used in the experiment. By designis meant the specification of the levels of the factors(control variables) that influence the response.

The tools needed for the adequate selection of adesign and the subsequent fitting and evaluation ofthe model, using the data generated by the design,have been developed in an area of experimental de-sign known as response surface methodology (RSM).This area was initially developed for the purposeof determining optimum operating conditions in thechemical industry. It is now used in a variety of fieldsand applications, not only in the physical and en-gineering sciences, but also in the biological, clini-cal, and social sciences. The article by Myers, Khuriand Carter (1989) provides a broad review of RSM(see also Myers, 1999). In addition, the three booksby Box and Draper (1987), Myers and Montgomery(1995) and Khuri and Cornell (1996) give a compre-hensive coverage of the various techniques used inRSM.

1

Page 2: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

2 A. I. KHURI, B. MUKHERJEE, B. K. SINHA AND M. GHOSH

Most design methods for RSM models in thepresent statistical literature were developed aroundagricultural, industrial and laboratory experiments.These designs are based on the standard general lin-ear model where the responses are assumed to becontinuous (quite often, normally distributed) withuncorrelated errors and homogeneous variances. How-ever, clinical or epidemiological data, for example,quite frequently do not satisfy these assumptions.For example, data which consist primarily of hu-man responses tend to be more variable than is ex-pected under the assumption of homogeneous vari-ances. There is much less control over data collectedin a clinical setting than over data observed in a lab-oratory or an industrial setting. Furthermore, mostbiological data are correlated due to particular ge-netic relationships. There are also many situationsin which clinical experiments tend to yield discretedata. Dose-response experiments are one good ex-ample where the responses are binary in most cases.Furthermore, several responses may be observed forthe same patient. For example, in addition to thestandard binary response of success or failure, somemeasure of side effects of the treatment may be ofimportance. Since responses are measured from thesame subject, they will be correlated, and hence con-sidering each response to be independent of othersmay lead to erroneous inferences.

Due to the nature of the data as described above,doing statistical analysis of the data using standardlinear models will be inadequate. For such data,generalized linear models (GLMs) would be moreappropriate. The latter models have proved to bevery effective in several areas of application. For ex-ample, in biological assays, reliability and survivalanalysis and a variety of applied biomedical fields,GLMs have been used for drawing statistical conclu-sions from acquired data sets. In multicenter clini-cal trials, estimates of individual hospital treatmenteffects are obtained by using GLMs (see Lee andNelder, 2002). In entomology, GLMs are utilized torelate changes in insect behavior to changes in thechemical composition of a plant extract (Hern andDorn, 2001). Diaz et al. (2002) adopted some GLMsin order to study the spatial pattern of an importanttree species. In climatology, GLMs are used to studythe basic climatological pattern and trends in dailymaximum wind speed in certain regions (see Yanet al., 2002). Also, Jewell and Shiboski (1990) usedGLMs to examine the relationship between the risk

of HIV (human immunodeficiency virus) infectionand the number of contacts with other partners.

In all of the above examples and others, the cor-nerstone of modeling is the proper choice of designsneeded to fit GLMs. The purpose of a design is thedetermination of the settings of the control variablesthat result in adequate predictions of the response ofinterest throughout the experimental region. Hence,GLMs cannot be used effectively unless they arebased on efficient designs with desirable properties.Unfortunately, little work has been done in devel-oping such designs. This is mainly due to a seriousproblem caused by the dependence of a design onthe unknown parameters of the fitted GLM.

In this article, we address the aforementioned de-sign dependence problem by providing a survey ofvarious approaches for tackling this problem. Thearticle is organized as follows: Section 2 presentsan introduction to GLMs. Section 3 describes cri-teria for the choice of a GLM design, and intro-duces the design dependence problem. The variousapproaches for solving this problem are discussedin Sections 4 (locally optimal designs), 5 (sequen-tial designs), 6 (Bayesian designs) and 7 (quantiledispersion graphs). The article ends with some con-cluding remarks in Section 8.

2. GENERALIZED LINEAR MODELS

As a paradigm for a large class of problemsin applied statistics, generalized linear models haveproved very effective since their introduction byNelder and Wedderburn (1972). GLMs are a unifiedclass of regression models for discrete and continu-ous response variables, and have been used routinelyin dealing with observational studies. Many statisti-cal developments in terms of modeling and method-ology in the past twenty years may be viewed as spe-cial cases of GLMs. Examples include logistic regres-sion for binary responses, linear regression for con-tinuous responses and log-linear models for counts.A classic book on the topic is McCullagh and Nelder(1989). In addition, the more recent books by Lind-sey (1997), McCulloch and Searle (2001), Dobson(2002) and Myers, Montgomery and Vining (2002)provide added insights into the application and use-fulness of GLMs.

There are three components that define GLMs.These components are:

Page 3: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

DESIGN ISSUES FOR GENERALIZED LINEAR MODELS 3

(i) The elements of a response vector y are dis-tributed independently according to a certain prob-ability distribution considered to belong to the ex-ponential family, whose probability mass function(or probability density function) is given by

f(y, θ,φ) = exp

[θy− b(θ)

a(φ)+ c(y,φ)

],(2.1)

where a(·), b(·) and c(·) are known functions; θ is acanonical parameter and φ is a dispersion parameter(see McCullagh and Nelder, 1989, page 28).

(ii) A linear regression function, or linear predic-tor, in k control variables x1, x2, . . . , xk of the form

η(x) = fT (x)β,(2.2)

where f(x) is a known p-component vector-valuedfunction of x = (x1, x2, . . . , xk)

T , β is an unknownparameter vector of order p × 1 and fT (x) is thetranspose of f(x).

(iii) A link function g(µ) which relates η in (2.2)to the mean response µ(x) so that η(x) = g[µ(x)],where g(·) is a monotone differentiable function.When g is the identity function and the responsehas the normal distribution, we obtain the specialclass of linear models.

GLMs have several areas of application rangingfrom medicine to economics, quality control and sam-ple surveys. Applications of the logistic regressionmodel, expanded with the popularity of case-controldesigns in epidemiology, now provide a basic tool forepidemiologic investigation of chronic diseases. Sim-ilar methods have been extensively used in econo-metrics. Probit and logistic models play a key role inall forms of assay experiments. The log-linear modelis the cornerstone of modern approaches to the anal-ysis of contingency table data, and has been foundparticularly useful for medical and social sciences.Poisson regression models are widely employed tostudy rates of events such as disease outcomes. Thecomplementary log–log model arises in the study ofinfectious diseases (e.g., in HIV disease transmis-sion and AIDS as illustrated in Jewell and Shiboski,1990), and more generally, in the analysis of sur-vival data associated with clinical and longitudinalfollow-up studies.

Traditionally, the exponential family modeladopted for the study of GLMs deals with a lin-ear function of the response variable involving theunknown parameters of interest. This covers mostof the experimental situations arising in practice.

However, some special members, such as the curvedexponential family of distributions, are not covered.Thus, GLMs should be further generalized to in-clude such members.

As was pointed out earlier, all known responsesurface techniques were developed within the frame-work of linear models under the strong assumptionsof normality and equal variances concerning the er-ror distribution. One important area that needs fur-ther investigation under the less rigid structure ofgeneralized linear models is the choice of design.

3. CHOICE OF DESIGN

By a choice of design we mean the determinationof the settings of the control variables that yield anestimated (predicted) response with desirable prop-erties. The mean response, µ(x), at a point x in aregion of interest, R, is given by

µ(x) = h[fT (x)β]= h[η(x)],

(3.1)

where η(x) is the linear predictor in (2.2), and his the inverse function of the link function g. Anestimate of µ(x) is obtained by replacing β in (3.1)

with β, the maximum likelihood estimate of β, thatis,

µ(x) = h[fT (x)β].(3.2)

The variance of µ(x) is approximately given by (seeKhuri, 1993, page 198)

Var[µ(x)]

=1

φ

[dµ(x)

dη(x)

]2

fT (x)(XT WX)−1f(x),(3.3)

where φ is the dispersion parameter (determined bythe exponential family considered), X is a matrixwhose rows consist of fT (x) at the various settingsof x used in a particular design and W is a diagonalmatrix of the form

W = diag(w1,w2, . . . ,wn),(3.4)

where n is the number of experimental runs, and

wu =1

φσ2u

(dµu

dηu

)2

, u= 1,2, . . . , n,(3.5)

where σ2u is the variance of yu, the response value

at the uth experimental run, and dµu

dηudenotes the

derivative of µ(x) with respect to η(x) evaluated atthe setting of x at the uth experimental run (u =1,2, . . . , n).

Page 4: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

4 A. I. KHURI, B. MUKHERJEE, B. K. SINHA AND M. GHOSH

The estimation bias incurred in µ(x) is approxi-mately given by (see Robinson and Khuri, 2003)

Bias[µ(x)]

= fT (x)(XT WX)−1XTWξdµ(x)

dη(x)

+1

2φfT (x)(XT WX)−1f(x)

d2µ(x)

dη2(x),

(3.6)

where

ξ = −1

2φW−1ZdF1n

and

Zd = diag(z11, z22, . . . , znn),

where zuu is the uth diagonal element of Z =X(XTWX)−1XT , F = diag(f11, f22, . . . , fnn) with

fuu =1

φσ2u

[d2µu

dη2u

][dµu

dηu

], u= 1,2, . . . , n,

and 1n is an n× 1 vector of 1’s. Here d2µu

dη2u

denotes

the second-order derivative of µ(x) with respect toη(x) evaluated at the uth experimental run (u =1,2, . . . , n).

A good design is one that minimizes the mean-squared error of µ(x), namely,

MSE[µ(x)] =E[µ(x)− µ(x)]2

= Var[µ(x)] + {Bias[µ(x)]}2.(3.7)

This is known as the mean-squared error of predic-tion (MSEP). One major problem in doing this min-imization is that the MSEP depends on β, the pa-rameter vector in the linear predictor in (2.2), whichis unknown. This leads us to the so-called designdependence problem. Other design optimality cri-teria such as A-, D-, E- and G-optimality, whichare variance-based criteria, suffer also from the sameproblem.

3.1 The Design Dependence Problem

In the foregoing section it has been emphasizedthat in the context of a GLM, the minimization ofthe mean-squared error of prediction, or of the vari-ances of the parameter estimators, leading to theso-called optimal designs, depends on the values ofunknown parameters. Common approaches to solv-ing this problem include:

(a) The specification of initial values, or best“guesses,” of the parameters involved, and the sub-sequent determination of the so-called locally opti-mal designs.

(b) The sequential approach which allows the userto obtain updated estimates of the parameters insuccessive stages, starting with the initial values usedin the first stage.

(c) The Bayesian approach, where a prior distri-bution is assumed on the parameters, which is thenincorporated into an appropriate design criterion byintegrating it over the prior distribution.

(d) The use of the so-called quantile dispersiongraphs approach, which allows the user to comparedifferent designs based on quantile dispersion pro-files.

We now provide a review of the basic results thathave been developed under the aforementioned ap-proaches.

4. LOCALLY OPTIMAL DESIGNS

Binary data under a logistic regression model andPoisson count data are the best known examples toillustrate the implementation of the first approachleading up to a locally optimal design.

4.1 Logistic Regression Model

Let us first discuss the study of optimal designs forbinary data under a logistic regression model. Thekey reference is Mathew and Sinha (2001). Otherrelated references include Abdelbasit and Plackett(1983), Minkin (1987), Khan and Yazdi (1988), Wu(1988) and Sitter and Wu (1993). It is postulatedthat a binary response variable y assumes the val-ues 0 and 1 and the chance mechanism depends ona nonstochastic quantitative covariate X taking val-ues in a specified domain. Specifically, for X = x, ytakes the value 1 with probability given by

π(x) =1

1 + exp(−α− xβ),(4.1)

where α and β are unknown parameters with β > 0.It may be noted that there are other versions of thismodel studied in the literature. One version refersto the dose-response model which will be discussedin the next sections. The parameters themselves andalso some parametric functions such as α

β and per-

centiles of π(x) are of interest to the experimenter.The purpose is to suggest continuous (or approxi-mate) optimal designs, that is, optimum dose lev-els and their relative weights, following the termi-nology of continuous design theory. See Pukelsheim

Page 5: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

DESIGN ISSUES FOR GENERALIZED LINEAR MODELS 5

(1993). It turns out that the solutions to the opti-mal design problems mentioned above provide op-timum values of α+ xβ. Hence, in order to imple-ment such designs in practice, good initial estimates,or guess values, of α and β are needed. Whereasthe earlier researchers established optimality resultscase by case, Mathew and Sinha (2001) developed aunified approach to tackle the optimality problemsby exploiting the property of Loewner-order dom-ination in the comparison of information matrices.However, in some of these studies the technical de-tails depend crucially on the symmetry of the trans-formed factor space. For asymmetric domains, Liskiet al. (2002) have initiated some studies in the con-ventional regression setup. Much yet remains to bedone there and also in the context of logistic regres-sion.

With reference to the model (4.1), consider nows distinct dose levels x1, x2, . . . , xs, and suppose wewish to obtain fi observations on y at dose level xi

(i = 1,2, . . . , s). Let∑s

i=1 fi = n. For most efficientestimation of α and β, or some functions thereof,the exact optimal design problem in this contextconsists of optimally selecting the number s of dis-tinct dose levels, the xi’s (in a given experimentalregion) and the fi’s, with respect to a given optimal-ity criterion, for a fixed n. This is equivalent to theapproximate determination of optimum dose levelswith respective optimum (relative) weights, denotedby pi (i= 1,2, . . . , s), which sum to 1.

For various application areas, it turns out thatthe estimation problems that are usually of interestrefer to (a) the estimation of β, or α/β, or somepercentiles of π(x) given in (4.1), or (b) the joint es-timation of a pair of parameters such as (i) α and β,(ii) β and α/β, (iii) β and a percentile of π(x) and(iv) two percentiles of π(x). The approach is to startwith the asymptotic variance–covariance matrix ofthe maximum likelihood estimators of α and β, andthen choose the xi’s and the pi’s optimally by mini-mizing a suitable function, depending on the natureof the problem at hand and the specific optimal-ity criterion applied. For this reason, we considerthe information matrix of the two parameters asa weighted combination of component informationmatrices based on the xi’s, using the pi’s as weights.Then we argue that the asymptotic variance–covari-ance matrix of the maximum likelihood estimatorsof the parameters is just the inverse of the weightedinformation matrix computed above. The optimalityfunctions to be minimized are different scalar-valued

functions of the information matrix. The D- andA-optimality criteria are well-known examples. His-torically, theD-optimality criterion has received con-siderable attention in this context and A-optimalityhas also been considered by some authors; see Sitterand Wu (1993). As was mentioned earlier, the op-timum dose levels depend on the unknown parame-ters α and β, as is typical in nonlinear settings. Infact, solutions to the optimal design problems men-tioned above provide optimum values of α + βxi,i = 1, . . . , s. Therefore, while implementing the op-timal design in practice, good initial estimates of αand β are called for. In spite of this unpleasant fea-ture, it is important to construct the optimal designsin this context; see the arguments in Ford, Torsneyand Wu (1992, page 569).

Following the approximate design theory, a designis denoted by D = {(xi, pi), i = 1,2, . . . , s}. There-fore, the information matrix for the joint estimationof α and β underlying the design D is given by

I(α,β) =

s∑

i=1

piexp(−ai)

(1 + exp(−ai))2

s∑

i=1

pixiexp(−ai)

(1 + exp(−ai))2s∑

i=1

pixiexp(−ai)

(1 + exp(−ai))2

s∑

i=1

pix2i

exp(−ai)

(1 + exp(−ai))2

,

(4.2)

where ai = α+ βxi, i= 1, . . . , s.

Note that except for the factors exp(−ai)(1+exp(−ai))2

, the

information matrix is identical to the one under theusual linear regression of y on a nonstochastic re-gressor x under the assumption of homoscedasticerror structure. This reminds one of the celebratedde la Garza Phenomenon (de la Garza, 1954) whichcan be explained as follows. Suppose we consider apth-degree polynomial regression of y on x underhomogeneous error structure and we start with ann-point design where n > p+ 1. Then, according tothis phenomenon, it is possible to come up with analternative design with exactly p+ 1 support pointssuch that the two designs have identical informationmatrices for the entire set of p+ 1 parameters. Thisshows that in the homoscedastic scenario, essentiallyone can confine attention to the collection of designssupported on exactly p+ 1 points, if the underlyingpolynomial regression is of degree p. On top of this,it is also possible that a particular (p+ 1)-point de-sign dominates another (p + 1)-point design in the

Page 6: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

6 A. I. KHURI, B. MUKHERJEE, B. K. SINHA AND M. GHOSH

sense that the information matrix (for the entire setof parameters) based on the former design Loewner-dominates that based on the latter. Here, Loewnerdomination means that the difference of the two in-formation matrices is nonnegative definite. It mustbe noted that Loewner domination is the best onecan hope for, but it is rarely achieved. Pukelsheim(1993) has made some systematic studies of Loewnerdomination of information matrices. See Liski et al.(2002) for some applications.

In the present setup, however, the form of the in-formation matrix indicates that we are in a linearregression setup, but with a heteroscedastic errorstructure. Does the de la Garza Phenomenon stillhold in this case? More importantly, do we haveLoewner domination here? The Mathew and Sinha(2001) article may be regarded as one seeking an-swers to the above questions. From their study, itturns out that though Loewner domination is notpossible, for D-optimality the class of two-point de-signs is complete in the entire class of competingdesigns while for A-optimality, it is so in a sub-class of competing designs which are symmetric insome sense. Here, completeness is in the sense thatany competing design outside the class is dominated(with respect to the specific optimality-criterion) byanother design within the class.

We shall briefly explain below the salient featuresof the arguments in Mathew and Sinha (2001). Notefirst that for the joint estimation of α and β, or forthat matter, for any two nonsingular transforms ofthem, the D-optimality criterion seeks to maximizethe determinant of the joint information matrix ofα and β. Routine computations yield an interestingrepresentation for this determinant,

β2|I(α,β)| =

[m∑

i=1

piexp(−ai)

(1 + exp(−ai))2

]

·

[m∑

i=1

pia2i

exp(−ai)

(1 + exp(−ai))2

]

[m∑

i=1

piaiexp(−ai)

(1 + exp(−ai))2

]2

.

(4.3)

Note that for any real number a, exp(−a)(1+exp(−a))2

=exp(a)

(1+exp(a))2 . Moreover, it also turns out that for a

given D = {(xi, pi), i= 1,2, . . . , s}, there exists a realnumber c > 0 such that

s∑

i=1

piexp(ai)

(1 + exp(ai))2=

exp(c)

(1 + exp(c))2,(4.4)

s∑

i=1

pia2i

exp(ai)

(1 + exp(ai))2≤ c2

exp(c)

(1 + exp(c))2.(4.5)

Therefore, for this choice of c, the design D(c) =[(c,0.5); (−c,0.5)] provides a larger value of the de-terminant of the underlying information matrix thanthat based on D. Hence, the class of two-point sym-

metric designs provides a complete class ofD-optimaldesigns. It is now a routine task to determine spe-cific D-optimal designs for various parametric func-tions. Of course, the initial solution is in terms ofc-optimum (which is independent of α and β) andthen we have to transfer it to optimum dose levels,say x0 and x00, by using the relations c = α+ βx0

and −c= α+ βx00. Thus an initial good guess of αand β is called for to evaluate x0 and x00.

Again, for A-optimality with respect to the pa-rameters α and β, some algebraic simplification yieldsthe following function (to be minimized) when werestrict to the subclass of symmetric designs (i.e.,designs involving ai and −ai with equal weights forevery i):

Var(α) + Var(β)

=

[1

β2

m∑

i=1

piexp(−ai)

(1 + exp(−ai))2[α2 + β2 + a2

i ]

]

·

[1

β2

m∑

i=1

piexp(−ai)

(1 + exp(−ai))2

·m∑

i=1

pia2i

exp(−ai)

(1 + exp(−ai))2

]−1

=α2 + β2

∑mi=1 pia2

i exp(−ai)/(1 + exp(−ai))2

+1

∑mi=1 pi exp(−ai)/(1 + exp(−ai))2

.

(4.6)

In view of the existence of the real number c withthe properties laid down above, it turns out thatthe design D(c) = [(c,0.5); (−c,0.5)] once again doesbetter than D with respect to A-optimality, at leastin the subclass of symmetric designs so defined. Itis a routine task to spell out the nature of a specificA-optimal design, which in this case depends on theunknown parameters α and β in a twofold manner.First, we have to determine the optimum value for cfrom given values of the parameters by minimizingthe lower bound to (4.6) as a function of α,β andc. Then we have to evaluate the values of the tworecommended dose levels using the relations involv-ing c and −c, as above. Note that in the contextof D-optimal designs, this twofold phenomenon didnot arise.

Page 7: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

DESIGN ISSUES FOR GENERALIZED LINEAR MODELS 7

Remark 4.1.1. It so happens that without thesymmetry restriction on the class of competing de-signs, construction of A-optimal designs, in general,is difficult. Numerical computations have revealedthat A-optimal designs are still point symmetric butnot weight symmetric. An analytical proof of thisobservation is still lacking. In Mathew and Sinha(2001), E-optimal designs have also been studied.

4.2 Poisson Count Model

Let us now consider Poisson count data and ex-plain the optimality results following Minkin (1993)and Liski et al. (2002). Here we assume that y fol-lows a Poisson distribution with mean µ(x), µ(x) =c(x) exp [θ(x)], θ(x) = α+ βx;β < 0. The emphasisin Minkin (1993) was on most efficient estimationof 1/β by choosing a design in the nonstochasticfactor space of x over [0,∞). Naturally, only a lo-cally optimal design could be characterized, whichturned out to be a two-point design with 21.8% massat 0 and the rest at 2.557β. Thus a good guess forβ is called for. Liski et al. (2002) developed a uni-fied theory in Minkin’s setup for the derivation ofa stronger result on completeness of two-point de-signs, including the point 0. Consequently, it wasmuch easier to re-establish Minkin’s result as wellas to spell out explicitly the nature of A-, D-, E-and MV -optimal designs for simultaneous estima-tion of the two parameters in the model. It may benoted that Fedorov (1972) gave a complete char-acterization of D-optimal designs in a polynomialregression setup involving several families of het-erogeneous variance functions. In an unpublishedtechnical report, Das, Mandal and Sinha (2003) ex-tended the Loewner domination results of Liski etal. (2002) in a linear regression setup involving twospecific variance functions. We shall briefly presentthese results below.

We recall that the setup of Minkin (1993), in aform suitable for our discussion, and as suggested inLiski et al. (2002), is of the form

E(y) = α+ βx;V (y) = v(x)σ2;v(x) = exp(x); 0≤ x<∞.

(4.7)

As usual, we assume σ2 = 1 and confine attention toapproximate design theory. Thus, as before, to startwith we have an s-point design Ds = [(xi, pi); 0 ≤x1 < x2 < · · · < xs;

∑i pi = 1] for s ≥ 2. It is a rou-

tine task to write down the form of the information

matrix for the parameters α,β underlying the de-sign Ds. We call it IDs

. Liski et al. (2002) establishedthat given Ds, one can construct a two-point designD∗

2 whose information matrix, say ID∗2, dominates

IDs. Das, Mandal and Sinha (2003) generalized this

result when the above form of v(x) is changed to

(i) v(x) = kx, k ≥ 1;

(ii) v(x) = (1 + x)(1+γ)/2, γ ≥−1.(4.8)

Following Das, Mandal and Sinha (2003), we startwith a general variance function v(x) subject tov(0) = 1 and v(x) increasing in x over 0 to ∞. Next,we start with a two-point design of the form [(a, p);(b, q)] where 0 < a < b <∞ and 0 < p, q = 1 − p <1. Then, we ask for the sort of variance functionsfor which this two-point design can be Loewner-dominated by another two-point design, includingthe point 0. Define in this context two other relatedfunctions, ψ(x) = 1/v(x) and φ(x) = (v(x) − 1)/xfor every x > 0, φ(0) being the limit of φ(x) as xtends to 0. It follows that whenever these two func-tions satisfy the following conditions, it is possibleto achieve this fit:

(i) φ(0) = 1;(ii) φ(x) is increasing in x;(iii) for some s and c, 0 < s < 1 and c > 0, for

which

1− s= [p(1− ψ(a)) + q(1−ψ(b))]/[1−ψ(c)],

ψ(c) satisfies the inequality ψ(c)< pψ(a) + qψ(b).

Das, Mandal and Sinha (2003) demonstrate thatfor both forms of v(x) as in (4.8), the above con-ditions are satisfied. They then continue to arguethat this result on Loewner domination (by a two-point design, including the point 0) holds even whenone starts with an s-point design for s > 2. Thisgreatly simplifies the search for specific optimal de-signs under different variance structures covered bythe above two forms. Minkin’s result follows as aspecial case of (i) in the above model. The detailsare reported in Das, Mandal and Sinha (2003).

5. SEQUENTIAL DESIGN

In the previous section, initial values of the pa-rameters are used as best “guesses” to determine alocally optimal design. Response values can then beobtained on the basis of the generated design. In thesequential approach, experimentation does not stopat this initial stage. Instead, using the information

Page 8: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

8 A. I. KHURI, B. MUKHERJEE, B. K. SINHA AND M. GHOSH

thus obtained, updated estimates of the parametersare developed and then used to determine additionaldesign points in subsequent stages. This process con-tinues until convergence is achieved with respectto a certain optimality criterion, for example, D-optimality. The implementation of such a strategy isfeasible provided that the response values in a givenstage can be obtained in a short time, as in sen-sitivity testing. Sequential designs for GLMs wereproposed by Wu (1985), Sitter and Forbes (1997)and Sitter and Wu (1999), among others.

The theory of optimal designs, as noted earlier,involves selection of design points with the goal ofminimizing some objective function which can oftenbe interpreted as an expected loss. In a sequentialframework, with the arrival of each data point, oneneeds to make a decision of whether or not to payfor the cost of additional data or else to stop sam-pling and make a decision. Also, if additional obser-vations become necessary, when there is an optionof selecting samples from more than one population,the choice of a suitable sampling rule is equally im-portant. The latter, on many occasions, amounts tothe selection of suitable design points. Thus the issueof optimal designs goes hand in hand with sequen-tial analysis, and together they constitute an areaof what has become known as sequential design ofexperiments.

Research on sequential designs can be classifiedinto multiple categories depending on the objectiveof the researcher. We review here primarily a broadarea commonly referred to as “Stochastic Approxi-mation,” which was initiated by Robbins and Monro(1951), and was subsequently extended by numerousauthors. Here, the problem is one of sequential selec-tion of design points according to some optimalitycriterion. We also discuss briefly some work on mul-tistage designs.

In Section 5.1, we begin with the stochastic ap-proximation procedure as described in Robbins andMonro (1951). We then consider several extensionsand modifications of this pioneering work, and dis-cuss asymptotic properties of the proposed meth-ods. The Robbins–Monro article and much of thesubsequent work do not prescribe any stopping rulein this sequential experimentation. There are a fewexceptions, and we point out one such result.

Section 5.2 discusses application of the Robbins–Monro method and its extensions for estimating thepercentiles of the quantal response curve, in partic-ular, estimation of the median effective dose, pop-

ularly known as ED50. We highlight in this con-text the work of Wu (1985). We also discuss brieflythe Bayesian stopping rule as proposed by Freeman(1970). Finally, in Section 5.3, we provide a veryshort account of multistage designs.

In general, the core of sequential analysis involvessample size determination. This itself is an “opti-mal” design problem as stopping rules, in general,are motivated by some optimality criteria. A verysuccinct account of sequential design of experiments,motivated by several important statistical criteria,appeared in Chernoff’s (1972) classic monograph.

5.1 Stochastic Approximation

We begin with a description of the stochastic ap-proximation procedure of Robbins and Monro (RM)(1951). Let y be a random variable such that con-ditional on x, y has a distribution function (df )H(y|x) with mean E(y|x) = M(x) and varianceV (y|x) = σ2(x). The function M(x) is unknown tothe experimenter, but is assumed to be strictly in-creasing so that the equation M(x) = α has a uniqueroot, say ζ . It is desired to estimate ζ by makingsuccessive observations on y, say, y1, y2, . . . at levelsx1, x2, . . . . The basic problem is the selection of thedesign points x1, x2, . . . .

RM address this problem as follows: consider anonstationary Markov chain with an arbitrary initialvalue x1, and then define recursively

xn+1 = xn − an(yn −α), n= 1,2, . . . ,(5.1)

where yn has the conditional distribution functionH(y|xn), and {an} is a sequence of positive con-stants satisfying

0<∞∑

n=1

a2n <∞.

A finite sample justification of the RM procedureis given by Wu (1985) when y and x are related bya simple linear regression model. In particular, letyi = β0 + β1xi + ei, where ei are independent andidentically distributed with mean 0. Then the pa-rameter of interest is ζ = −β0/β1, the solution of

β0 + β1x= 0. If β1 is known, β0n = yn − β1xn is theleast-squares estimator of β0 based on {(xi, yi); i=1, . . . , n}. Thus a natural choice of xn+1 is given by

xn+1 = −β0n/β1 = xn − β−11 yn. It is shown in Lai

and Robbins (1979) that xn+1 = xn − β−11 yn for all

n is equivalent to xn+1 = xn − (β−11 /n)yn for all

n. This is an RM procedure with an = β−11 /n and

α= 0.

Page 9: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

DESIGN ISSUES FOR GENERALIZED LINEAR MODELS 9

RM investigated asymptotic properties of theirprocedure. Let bn = E(xn − ζ)2. RM showed thatfor bounded y and strictly increasing M(x), if an =O(n−1), then bn → 0 as n→∞.

A more general result is proved later in Robbinsand Siegmund (1971). Rather than the boundednessof y and the monotonicity of M(x), one assumesboundedness of σ(x)+ |M(x)| by a linear function of|x|. These conditions neither imply nor are impliedby the original RM conditions. In addition, Rob-bins and Siegmund (1971) have another assumptionwhich essentially implies that M(x) and x− ζ areof the same sign. Finally, they assume that

0<∞∑

n=1

an = ∞, 0<∞∑

n=1

a2n <∞,(5.2)

which is strictly weaker than the condition an =O(n−1) as required in RM.

Fabian (1968) proved the asymptotic normality ofxn under additional assumptions. His conditions aresimilar to those of Robbins and Siegmund, but (5.2)is replaced by the stronger condition

nγan → a(> 0)(5.3)

as n→ ∞ for some γ ∈ (1/2,1]. Also, he needed aLindeberg-type condition for the second moment ofthe conditional distribution of y given x and alsorequired M to have a positive derivative M ′(ζ) atζ and M ′(ζ) > (2a)−1 when γ = 1. The varianceof the asymptotic distribution depended on σ(ζ),a and M ′(ζ). The asymptotic variance was given byaσ2(ζ)/(2M ′(ζ)) if γ ∈ (1/2,1) and a2σ2(ζ)/[2aM ′(ζ)− 1] if γ = 1.

Remark 5.1.1. The best rate of convergence isachieved when γ = 1, and then the minimum asymp-totic variance is given by σ2(ζ)/[M ′(ζ)]2. This is at-tained when a= [M ′(ζ)]−1.

In view of Remark 5.1.1, an optimal asymptoticchoice of {an} is given by an = (n+ n0)

−1dn, wheredn is any consistent estimator of M ′(ζ) and n0 ispositive. One can interpret n0 as a “tuning param-eter” which does not affect the asymptotics at all,but can play a major role when the sample size issmall. Also, Fabian (1983) has shown that when theconditional probability density function of y given xis N(M(x), σ2(x)), then the RM procedure is locallyasymptotically minimax. Moreover, Abdelhamid(1973) and Anbar (1973) pointed out independently

that even for nonnormal conditional probability den-sity functions, the RM process can be made asymp-totically optimal by a suitable transformation of ob-servations.

The next thing is to outline procedures which guar-antee consistent estimation of M ′(ζ). Venter (1967)addressed this by taking observations in pairs atxi ± ci for a suitable positive sequence of constants{ci}, i≥ 1. We denote by yi1 and yi2 the correspond-ing responses so that

E(yi1|xi) =M(xi − ci);E(yi2|xi) =M(xi + ci).

(5.4)

Let zi = (yi2 − yi1)/(2ci) and yi = (yi2 + yi1)/2. If

xiP→ ζ , then estimate M ′(ζ) by dn = zn = 1

n

∑ni=1 zi.

Define recursively the design points

xn+1 = xn − (ndn)−1(yn −α), n= 1,2, . . . ,(5.5)

where dn is a truncated version of dn ensuring that

xnP→ ζ . Fabian (1968) and later Lai and Robbins

(1979) contain relevant asymptotic results.In spite of all the asymptotic niceties, the RM pro-

cedure can be seriously deficient for finite samples.This was demonstrated in the simulation studies ofWu (1985) and Frees and Ruppert (1990). First, theinitial choice of x1 heavily influences the recursivealgorithm. If x1 is far away from ζ , then the conver-gence of xn to ζ may demand prohibitively large n.Again, if all the xi’s are closely clustered around ζ ,no estimator of M ′(ζ) can be very accurate, and im-precise estimation of M ′(ζ) leads to imprecise esti-mation of ζ due to the proposed recursive algorithm,at least for small samples.

The original RM paper, and much of the sub-sequent literature, address sequential design prob-lems without specifying any stopping rule. However,a simple stopping rule can be proposed based onasymptotic considerations. In particular, in the setupof Venter (1967), dn consistently estimates M ′(ζ).Defining σ2

n = (2n)−1 ∑ni=1(y

2i1+y2

i2), it can be shownthat σ2

n is a consistent estimator of σ2(ζ). Now in-voking the asymptotic normality of xn, a large sam-ple 100(1− γ)% confidence interval for ζ is given by

xn ± Φ−1(1 − γ/2)σn/{dn(2n)1/2}, where Φ is thecumulative distribution function of the N(0,1) vari-able. If one decides to construct a confidence inter-val of length not exceeding l, then one can stop atthe first n for which the width of the above inter-val is less than or equal to l. Sielken (1973) pro-posed this stopping rule and proved that as l→ 0 (so

Page 10: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

10 A. I. KHURI, B. MUKHERJEE, B. K. SINHA AND M. GHOSH

that n→∞), the coverage probability converges to1−γ. Alternative stopping rules have been proposedby Stroup and Braun (1982) and Wei (1985) whenthe responses are normal. Ruppert et al. (1984) pro-posed a stopping rule for Monte Carlo optimizationwhich is so designed that M(xn) is within a specifiedpercentage of M(ζ).

Kiefer and Wolfowitz (1952) considered a relatedproblem of locating the point where the regressionfunction is maximized or minimized instead of find-ing the root of a regression equation as in RM. Thisamounts to solving M ′(ζ) = 0. The resulting proce-dure is similar to that of RM, but yn − α in (5.1)is replaced by an estimate of M ′(ζ). We omit thedetails.

One of the major applications of the RM stochas-tic approximation procedure is the estimation of thepercentiles of the quantal response curve. We shallspecifically discuss this problem in the next section.

5.2 Quantal Response Curves

Consider now the special case when y is a binaryvariable with

E(y|x) = π(x) = [1 + exp{−θ(x− µ)}]−1.(5.6)

The independent variable x is the dose level, whilethe binary response y is a quantal variable. Such amodel is called a “dose-response” model. The pa-rameter µ is the 50% response dose, or ED50. It iseasy to recognize this as a logistic model with lo-cation parameter µ and scale parameter θ and asa simple reparameterization of the model given in(4.1). Also, here π(x) is the same as M(x). Equatingπ(x) to α leads to the solution ζ = µ+ θ−1log( α

1−α ).Hence, efficient estimation of ζ depends on efficientestimation of µ and θ.

The direct RM procedure will continue to gener-ate the x-sequence with an arbitrary initial value x1,and then define xn+1 = xn − an(yn − α), where onemay take an as proportional to n−1. However, as rec-ognized by many, including RM, for specific distri-butions there may be a more efficient way of generat-ing the x values. Indeed, simulations by Wu (1985),and subsequently by Frees and Ruppert (1990),demonstrate the poor performance of the RM pro-cedure for small n in this example.

Wu’s procedure begins with generating some ini-tial values x1, . . . , xm. Then one obtains the MLE’sµm and θm of µ and θ, respectively, based on {(xi, yi);i = 1, . . . ,m} by solving the likelihood equations(i)

∑mi=1 yi =

∑mi=1 π(xi) and (ii)

∑mi=1 xiyi =

∑mi=1 xiπ(xi). Let xm+1 = µm + θ−1

m log( α1−α ). Up-

date the MLE’s of µ and θ by µm+1 and θm+1, re-spectively, based on {(xi, yi); i= 1, . . . ,m+ 1}. Now

let xm+2 = µm+1 + θ−1m+1 log( α

1−α ), and continue inthis manner. The {xn} generated in this way meetthe consistency and asymptotic optimality proper-ties mentioned in the previous section. Wu (1985)also suggested modification of the likelihood equa-tions (i) and (ii) by multiplying both sides of the ithcomponent by a factor w(|xi − xm|), where w is acertain weight function. This can partially overcomevulnerability of the logits at extreme tails.

As mentioned earlier, the case α= 1/2 is of specialinterest since then ζ = µ = ED50. In this case, analternative procedure known as the “up-and-down”procedure for generating the design points was pro-posed by Dixon and Mood (1948). Specifically, let

xn+1 =

{xn + ∆, if yn = 0,xn −∆, if yn = 1,

where ∆ is somewhat arbitrary. Once again, perfor-mance of this procedure depends very much on thechoice of a good guess for x1 and ∆. Unless ∆ ismade adaptive, the large sample performance of xn

cannot be studied. Wetherill (1963) discusses somemodifications of this method.

One important issue that has not been discussedso far is the choice of the stopping rule. This requiresexplicit consideration of payoff between the cost offurther observation and that of less accurate esti-mation. The study was initiated by Marks (1962)who considered the problem as one of Bayesian se-quential design with known θ and a two-point priorfor µ. Later, Freeman (1970) considered the prob-lem, once again for known θ, but with a conjugateprior for µ. In addition, he considered special casesof one, two or three dose levels. For one dose level,he introduced the prior

p(µ|r0, n0)∝exp(r0θ(ζ − µ))

[1 + exp(θ(ζ − µ))]n0.

This amounts to a Beta(r0, n0 − r0) prior for theresponse probability α at dose level ζ , with priorparameters 0< r0 <n0.

With squared error loss plus cost and a uniformprior, that is, r0 = 1 and n0 = 2, Freeman (1970)considered the usual backward induction argumentto set up the necessary equations for finding theBayes stopping rule. The stopping rule cannot befound analytically, but Freeman provided the nu-merical algorithm for finding the solution. He also

Page 11: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

DESIGN ISSUES FOR GENERALIZED LINEAR MODELS 11

considered situations with two or three possible doses,but did not provide a general algorithm. Thus, whatis needed here is an approximation of the “optimal”stopping rule. A Bayesian approach using dynamicprogramming is hard to implement in its full gen-erality. It appears that a suitable approximation ofthe Bayes stopping rule which retains at least theasymptotic optimality of the regular Bayes stoppingrule is the right approach toward solving this prob-lem.

5.3 Multistage Designs

In most practical situations, it is more realisticto adopt a multistage design rather than a sequen-tial design, since continuous updating of the infer-ential procedure with each new observation may notbe very feasible. We discuss one such application asconsidered in Storer (1989) in the context of Phase Iclinical trials. These trials are intended to estimatethe maximum tolerable dose (MTD) of a new drug.Although a strict quantitative definition of MTDdoes not exist in clinical trials, very often the 33rdpercentile of the tolerance distribution is taken todefine MTD.

From a clinician’s perspective, an optimal designis one in which the MTD is defined by the dose atwhich the trial stops. Storer (1989, 1990) argues thatthe design problem should be viewed instead as anefficient way of generating samples, wherein the de-sign and analysis are robust to the vagaries of pa-tient treatment in a clinical setting. He first intro-duces four single-stage designs, and then proposestwo two-stage designs by combining some of thesesingle-stage designs. The details are available in thetwo cited papers. Storer implemented his proposalfor MTD estimation in a dose-response setting withthree logistic curves.

6. BAYESIAN DESIGNS

Application of Bayesian design theory to general-ized linear models is a promising route to avoid thedesign dependence problem. One approach, as dis-cussed in Section 3, is to design an experiment for afixed best guess of the parameters leading to a “lo-cally optimal” (Chernoff, 1953) design. Locally opti-mal designs for nonlinear models were first suggestedin the seminal paper by Box and Lucas (1959). Assuggested in Box and Lucas (1959), another natu-ral approach to solve this problem is to express theuncertainty in the parameters through a prior dis-tribution on the parameters. The Bayesian design

problem for normal linear models has been discussedin Owen (1970), Brooks (1972, 1974, 1976, 1977),Chaloner (1984), Pilz (1991) and DasGupta (1996).Chaloner and Verdinelli (1995) present an excellentoverview of Bayesian design ideas and their appli-cations. Atkinson and Haines (1996) discuss localand Bayesian designs specifically for nonlinear andgeneralized linear models.

6.1 Bayesian Design Criteria

The Bayesian design criteria are often integratedversions of classical design optimality criteria wherethe integration is carried out with respect to theprior distribution on β [β is as in (2.2)]. Most ofthe Bayesian criterion functions are based on nor-mal approximations to the posterior distribution ofthe vector of parameters β, as computations involv-ing the exact posterior distribution are often in-tractable. Several such approximations to the pos-terior are available (Berger, 1985) and involve ei-ther the observed or the expected Fisher informa-tion matrix. The most common form of such nor-mal approximation states that under standard reg-ularity conditions, the posterior distribution of β isN(β, [nI(β, ξ)]−1), where β is the maximum likeli-hood estimate (MLE) of β, and for any given designmeasure ξ, the expected Fisher information matrixis denoted by I(β, ξ). We shall assume, as usual, thatthe design measure ξ puts relative weights (p1, p2, . . . ,pk) at k distinct points (x1, . . . , xk), respectively,with

∑ki=1 pi = 1. The prior distribution for β is

used as the predictive distribution of β and con-sequently the Bayesian optimality criteria can beviewed as approximations to the expected posteriorutility functions under the prior distribution p(β).The first criterion which is an analogue to the D-optimality criterion in classical optimality theory isgiven by

φ1(ξ) =Eβ[log detI(β, ξ)].(6.1)

Maximizing this function is equivalent to approxi-mately maximizing the expected increase in theShannon information or maximizing the expectedKullback–Leibler distance between the posterior andprior distributions (Lindley, 1956; DeGroot, 1986;Bernardo, 1979).

The next criterion is of interest when the onlyquantity to be estimated is a function of β, say h(β).In such situations, the approximate asymptotic vari-ance of the MLE of h(β) is

c(β)T [I(β, ξ)]−1c(β),

Page 12: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

12 A. I. KHURI, B. MUKHERJEE, B. K. SINHA AND M. GHOSH

where the ith element of c(β) is ci(β) = ∂h(β)/∂βi.The Bayesian c-optimality criterion approximatesthe posterior expected utility (assuming a squarederror loss) under the prior p(β) as

φ2(ξ) =−Eβ{c(β)T [I(β, ξ)]−1c(β)}.(6.2)

As in the generalization from c- to A-optimalityin classical optimality theory, if one is interested inestimating several functions of β with possibly dif-ferent weights attached to them and if B(β) is theweighted average of the individual matrices of theform c(β)c(β)T , then the criterion to be maximizedis

φ3(ξ) =−Eβ{trB(β)[I(β, ξ)]−1}.(6.3)

The functions c(β), B(β) may not depend on β ifone is considering only linear functions of β.

Since the interpretation of the Bayesian alpha-betic optimality criteria, as approximations to ex-pected utility, is based on normal approximations tothe posterior, Clyde and Chaloner (1996, 2002) sug-gest several approaches to verify normality throughimposing constraints and discuss how to attain suchmultiple design objectives in this context. Other de-sign criteria which can be related to a Bayesianperspective appear in Tsutakawa (1972, 1980), Za-cks (1977) and Pronzato and Walter (1987). Howwell the Bayesian criteria actually approximate theexpected utility in small samples is not very wellknown. Some illustrations are presented in Atkinsonet al. (1993), Clyde (1993a) and Sun, Tsutakawa andLu (1996). Dawid and Sebastiani (1999) attempt toconnect this type of criterion-based Bayesian designto a purely decision-theoretic utility-based approachto Bayesian experimental design.

Muller and Parmigiani (1995) and Muller (1999)suggest estimating the exact posterior utility throughMarkov chain Monte Carlo (MCMC) methods in-stead of using analytical approximations to the pos-terior distribution. They embed the integration andmaximization of the posterior utility function bycurve fitting of Monte Carlo samples. This is done bysimulating draws from the joint parameter/samplespace and evaluating the observed utilities and thenfitting a smooth surface through the simulated points.The fitted surface acts as an estimate to the ex-pected utility surface and the optimal design canthen be found deterministically by studying the ex-trema of this surface. Parmigiani and Muller (1995),Clyde, Muller and Parmigiani (1995) and Palmerand Muller (1998) contain applications of thesesimulation-based stochastic optimization techniques.

6.2 Bayesian Optimality and

Equivalence Theorems

All of the Bayesian criteria mentioned above areconcave over the space of all probability measureson the design space X . The equivalence theorem forestablishing optimality of a design for linear models(Whittle, 1973) has been extended to the nonlin-ear case by White (1973, 1975), Silvey (1980), Ford,Torsney and Wu (1992), Chaloner and Larntz (1989)and Chaloner (1993).

Let ξ be any design measure on X . Let the mea-sure ξ put unit mass at a point x ∈ X , and let themeasure ξ′ be defined as

ξ′ = (1− ε)ξ + εξ for ε > 0.

Let I∗(ξ) and I∗(ξ) denote the information matri-ces corresponding to the design measures ξ and ξ,respectively. Then the information matrix for ξ′ is

I∗(ξ′) = (1− ε)I∗(ξ) + εI∗(ξ).

The key quantity in the equivalence theorem is thedirectional derivative of a criterion function φ at thedesign ξ in the direction of ξ, usually denoted byd(ξ, x), and is defined as

d(ξ, x) = limε↓0

1

ε[φ{I∗(ξ′)} − φ{I∗(ξ)}].

The general equivalence theorem states that in or-der for a design ξ∗ to be optimal, the directionalderivative function in the direction of all single-pointdesign measures has to be nonpositive, thus,

supx∈X

d(ξ∗, x) = 0.

It also states that if φ is differentiable, then at thesupport points of the optimal design, the directionalderivative function d(ξ, x) should vanish. The usualapproach to evaluate Bayesian optimal designs forGLMs is through finding a candidate design by nu-merical optimization of the criterion function. Veri-fication of global optimality is then done by study-ing the directional derivative function d(ξ, x) for thecandidate design under consideration.

Example. Chaloner and Larntz (1989) and Zhuand Wong (2001) consider the problem of estimatingquantiles in a dose-response experiment, relating thedose level of a drug x to the probability of a responseat level x, namely, π(x). A popular model is thesimple logistic model as described in (5.6), namely,

log(π(x)/(1− π(x))) = θ(x− µ).(6.4)

Page 13: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

DESIGN ISSUES FOR GENERALIZED LINEAR MODELS 13

Here, β = (θ,µ)T and the Fisher information matrixis

I(β, ξ) =

(θ2t −θt(x− µ)

−θt(x− µ) s+ t(x− µ)2

),

where wi = π(xi)(1 − π(xi)), t =∑k

i=1 piwi, x =t−1 ∑k

i=1 piwixi and s =∑k

i=1 piwi(xi − x)2. Then,the Bayesian D-optimality criterion as given in (6.1)simplifies to the form

φ1(ξ) =Eβ[log θ2ts].(6.5)

Recall that the parameter µ is the median effectivedose (ED50) or median lethal dose (LD50). If thegoal is estimating µ, c(β) = (1,0)T does not dependon the parameters. More generally, one may wantto estimate the dose level x0 at which the probabil-ity of a response is a fixed number, say, α. Clearly,x0 = µ+ γ/θ where γ = log[α/(1 − α)], and x0 is anonlinear function of the unknown parameters. Inthis case, c(β) = (1,−γ/θ2)T does depend on theparameters. The Bayesian c-optimality criterion [asin (6.2)] for estimating any percentile of the logisticresponse curve reduces to

φγ2(ξ) = −Eβ{θ

−2[t−1 + (γ − θ(x− µ))2

× θ−2s−1]}.(6.6)

This can also be written as a special case ofA-optimality as in (6.3), with B(β) = c(β)c(β)T ,namely,

B(β) = Bγ(β) =

(1 −γ/θ2

−γ/θ2 γ2/θ4

),

where B(β) is denoted as Bγ(β) in this case to re-flect its dependence on the value of the constant γ.If one wants to estimate ED50 and ED95 simultane-ously with weight 0.5 each, then B(β) = 0.5B0(β)+0.5B2.944(β) (note that log[0.95/(1− 0.95)] = 2.944,implying for ED95, γ = 2.944).

For the Bayesian D- and A-optimality criteria, asmentioned in (6.1) and (6.3), the directional deriva-tive function, respectively, turns out to be

d(ξ, x) =Eβ[tr{I(β, ξ)[I(β, ξ)]−1}]− p,

d(ξ, x) =Eβ[tr{B(β)[I(β, ξ)]−1I(β, ξ)[I(β, ξ)]−1}]

+ φ3(ξ),

where ξ as defined earlier is the measure with unitmass on x ∈ X . Bayesian c-optimality can be viewedas a special case of A-optimality with B(β) =c(β)c(β)T .

For our logistic model (6.4), the directional deriva-tive function with the φ1 criterion reduces to

d(ξ, x) =Eβ{w(x,β)[t−1 + (x− x)2s−1]} − 2,

where w(x,β) = π(x)(1− π(x)). For estimating anypercentile of the dose-response curve, the directionalderivative function corresponding to the φ2 criterionfor c-optimality [as in (6.2)] reduces to

d(ξ, x) = Eβ{w(x,β)(θst)−2

+ [t(x− x)(x− µ) + s]2}+ φγ2(ξ).

Chaloner (1987) and Chaloner and Larntz (1988,1989) developed the use of such Bayesian design cri-teria. The general equivalence theorem for these cri-teria can be derived under suitable regularityconditions (Chaloner and Larntz, 1989; Chaloner,1993; see also Lauter, 1974, 1976; Dubov, 1977).The directional derivative d(ξ, x) is evaluated overthe range of possible values of x to check the globaloptimality of a candidate design.

6.3 Binary Response Models

The Bayesian design literature outside the normallinear model is vastly restricted to binary responsemodels. Tsutakawa (1972, 1980), Owen (1975) andZacks (1977) all have considered optimal design prob-lems for binary response models from a Bayesianperspective. Many of these designs are restricted toequally spaced points with equal weights at eachpoint. Chaloner and Larntz (1989) investigate theBayesian D-optimality and A-optimality criteria withseveral choices of B(β) in the context of a binaryresponse logistic regression model with a single de-sign variable x. As illustrated in the above example,Chaloner and Larntz (1989) consider the problemof finding ED50 and ED95, and of estimating thevalue of x at which the success probability equalsγ with γ having a uniform distribution on [0,1].The last problem is defined as average percentileresponse point estimation. The optimal designs areobtained through implementing the simplex algo-rithm of Nelder and Mead (1965). They assume in-dependent uniform priors on the parameters andevaluate the expectation over the prior distributionthrough numerical integration routines. They noticethat the number of support points of the optimaldesign grows as the support of the prior becomeswider and the designs in general are not equispacedand not supported with equal weights. Smith andRidout (1998) extend the computational algorithm

Page 14: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

14 A. I. KHURI, B. MUKHERJEE, B. K. SINHA AND M. GHOSH

Table 1

Bayesian optimal designs for the dose-response model in (6.4) with optimality criterion being the average log determinant [asgiven in (6.5)]

Criterion Number

Prior on µ Prior on θ value of doses (n) Design points Weights

U[−2,2] U[1,5] 4.35 8 −1.952,−1.289,−0.762,−0.257, 0.119,0.124,0.126,0.132,

0.257,0.762,1.289,1.952 0.132,0.126,0.124, 0.119U[−2,2] U[2,4] 4.35 8 −1.906,−1.093,−0.488,−0.000a , 0.132,0.156,0.149,0.064,

0.000a,0.488,1.093,1.906 0.064,0.149,0.156, 0.132U[−2,2] U[2.9,3.1] 4.36 6 −1.882,−0.965,−0.287, 0.140,0.190,0.170,

0.287,0.965,1.882 0.170,0.190,0.140U[−0.5,0.5] U[1,5] 3.43 3 −0.661,0.0,0.661 0.389,0.222,0.389U[−0.5,0.5] U[2,4] 3.27 3 −0.581,0.0,0.581 0.457,0.087,0.457U[−0.5,0.5] U[2.9,3.1] 3.21 2 −0.546,0.546 0.5,0.5U[−0.1,0.1] U[1,5] 3.28 2 −0.499,0.499 0.5,0.5U[−0.1,0.1] U[2,4] 3.07 2 −0.512,0.512 0.5,0.5U[−0.1,0.1] U[2.9,3.1] 3.00 2 −0.516,0.516 0.5,0.5µ ≡ 0 θ ≡ 3 2.99 2 −0.515,0.515 0.5,0.5

aThe rounded-off value 0.000 is exactly evaluated as 0.0003.

of Chaloner and Larntz (1988) to find locally andBayesian optimal designs for binary response mod-els with a wide range of link functions and uniform,beta or bivariate normal prior distributions on theparameters.

There is wide interest in the binary response modelin the context of dose-response bioassays. Markuset al. (1995) consider a Bayesian approach to find adesign which minimizes the expected mean-squarederror of an estimate of ED50 with respect to thejoint prior distribution on the parameters of the re-sponse distribution. Sun, Tsutakawa and Lu (1996)

reiterate that approximation of the expected util-ity function, using the usual Bayesian design cri-teria, can be poor, and they introduce a penalizedrisk criterion for Bayes optimal design. They illus-trate that the chance of having an extreme poste-rior variance could be avoided by sacrificing a smallamount of posterior risk by adding the penalty term.Kuo, Soyer and Wang (1999) use a nonparamet-ric Bayesian approach assuming a Dirichlet processprior on the quantal response curve. They adoptthe simulation-based curve fitting ideas of Mullerand Parmigiani (1995) to reduce computational time

Table 2

Bayesian optimal designs for the dose-response model in (6.4) when the variance of the estimate of ED95 is considered as

the optimality criterion [as given in (6.6) with γ = 2.944 corresponding to ED95]

Criterion Number

Prior on µ Prior on θ value of doses (n) Design points Weights

U[−2,2] U[1,5] 9.39 8 −2.401,−1.484,−0.959,−0.416, 0.015,0.040,0.086,0.118,

0.129,0.693,1.305,2.230 0.123,0.125,0.127, 0.366U[−2,2] U[2,4] 6.58 6 −1.514,−0.905,−0.300, 0.034,0.113,0.187,

0.359,1.085,2.096 0.199,0.190,0.277U[−2,2] U[2.9,3.1] 6.09 6 −1.458,−0.852,−0.327, 0.042,0.116,0.173,

0.274,1.031,2.079 0.197,0.210,0.262U[−0.5,0.5] U[1,5] 6.72 4 −1.550,−0.001,0.641,1.282 0.189,0.083,0.178,0.551U[−0.5,0.5] U[2,4] 3.52 3 −0.898,−0.028,0.938 0.083,0.134,0.783U[−0.5,0.5] U[2.9,3.1] 2.96 2 −0.127,0.912 0.138,0.862U[−0.1,0.1] U[1,5] 5.98 3 −1.631,0.487,1.159 0.226,0.190,0.585U[−0.1,0.1] U[2,4] 2.80 2 −0.938,0.780 0.164,0.836U[−0.1,0.1] U[2.9,3.1] 2.25 2 −0.759,0.794 0.102,0.898µ ≡ 0 θ ≡ 3 2.19 2 −0.800,0.800 0.093,0.907

Page 15: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

DESIGN ISSUES FOR GENERALIZED LINEAR MODELS 15

appreciably. Clyde, Muller and Parmigiani (1995)and Flournoy (1993) also use the logistic regres-sion model and present Bayesian design and analysisstrategies for two very interesting applications.

Zhu, Ahn and Wong (1998) and Zhu and Wong(2001) consider the optimal design problem for esti-mating several percentiles for the logistic model withdifferent weights on each of them (the weights de-pending on the degree of interest the experimenterhas in estimating each percentile). They use a mul-tiple objective criterion which is a convex combina-tion of individual objective criteria. They modify thelogit-design software of Chaloner and Larntz (1989)to optimize this compound criterion. Zhu and Wong(2001) compare the Bayes optimal designs with se-quential designs proposed by Rosenberger and Grill(1997) for estimating the quartiles of a dose-responsecurve. Zhu and Wong (2001) note that the sequentialdesign, based on a generalized Polya urn model, iscomparable to the Bayes optimal design. When com-pared to the locally compound optimal designs inZhu and Wong (2001), it was noted, as anticipated,that the Bayesian design performs better than thelocally optimal design, if the specified parametersare far from the true parameters. Berry and Frist-edt (1985), Berry and Pearson (1985), Parmigiani(1993) and Parmigiani and Berry (1994) examineseveral clinical design problems using Bayesian ideasnot limited to just dose-response studies.

Example 1. Using the program by Smith andRidout (1998), we evaluated the Bayesian optimaldesigns for several choices of prior and criterionfunctions in the context of the dose-response modelmentioned in (6.4). We assumed that there is someprior knowledge on the range of µ and the sign of θ,which is often realistic. We arbitrarily chose the bestguess of µ to be 0 and of θ to be 3 and assumed uni-form priors of different spreads around these centers.Table 1 contains optimal designs for several choicesof priors when one uses the Bayesian D-optimalitycriterion as given in (6.5). Table 2 contains the opti-mal designs when one uses the Bayesian c-optimalitycriterion for minimizing the variance of the estimateof the 95th percentile of the logistic response curve.The explicit formula for this variance is given in(6.6) with γ = log(0.95/(1 − 0.95)) = 2.944 for es-timating ED95.

The program optimizes the design criterion underconsideration for a fixed value of n, the number ofdoses. The user has to vary n manually. Then one

chooses the design which optimizes the criterion un-der consideration for the smallest number of doses.The global optimality of the design is then checkedby evaluating the directional derivative d(ξ, x) overall possible values of x. The results are fairly clear, asnoted in Chaloner and Larntz (1989); as we increasethe spread of the prior distribution, the number ofsupport points of the optimal design increases. Withdecreasing variability in the prior distribution, thedesigns become closer to the locally optimal designfor µ= 0 and θ = 3. While for the D-optimality cri-terion the optimal designs in Table 1 are symmetricabout 0, for the c-optimality criterion for estimatingED95, the designs in Table 2 are asymmetric, tend-ing to put more weight toward larger doses. Onealso notes that the designs for a given prior on µ arerobust with respect to the choice of prior on θ, butthe converse is not true. The optimal designs changequite appreciably as the uncertainty in the prior in-formation on µ changes. We also experimented withindependent normal priors and essentially noted thesame basic patterns. The results are not includedhere.

6.4 Exact Results

Chaloner (1993) characterizes the φ1-optimal de-signs for priors with two support points for logis-tic regression models with a known slope. She alsoprovides sufficient conditions for a one-point designto be optimal under both local optimality and aBayesian criterion with a nondegenerate prior dis-tribution for a general nonlinear model. As antic-ipated, the conditions basically reduce to the sup-port of the prior being sufficiently small. Dette andNeugebauer (1996) provide a sufficient condition forthe existence of a Bayesian optimal one-point designfor one-parameter nonlinear models in terms of theshape of the prior density. Haines (1995) presentsan elegant geometric explanation of these results forpriors with two support points. Dette and Sperlich(1994) and Mukhopadhyay and Haines (1995) con-sider exponential growth models with one parameterand derive analytical expressions for the weights anddesign points for the optimal Bayesian design. Formore than one parameter and a dispersed prior dis-tribution, analytical results are extremely hard toobtain and numerical optimizations are so far theonly route.

Sebastiani and Settimi (1997) use the equivalencetheorem to establish the local D-optimality of a two-point design suggested by Ford, Torsney and Wu

Page 16: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

16 A. I. KHURI, B. MUKHERJEE, B. K. SINHA AND M. GHOSH

(1992) for a simple logistic model with design re-gion bounded at one end. They also suggest an effi-cient approximation to the D-optimal designs whichrequires less precise knowledge of the model param-eters.

Matthews (1999) considered Bayesian designs forthe logistic model with one qualitative factor at ±1and derived closed-form expressions for the weightsand support points which minimize the asymptoticvariance of the MLE of the log-odds ratio. He alsostudied the effect of prior specification on the designand noticed that as uncertainty on the log-odds ratioincreases, the design becomes more unbalanced.

6.5 More Than One Explanatory Variable

Atkinson et al. (1995) consider a dose-responseexperiment when the male and female insects understudy react differently and considered the model

log(π(x, z)/(1− π(x, z))) = α+ θx+ γz,

where π(x, z) is the probability of death of the insectat dose level x and z is 0 for males and 1 for females.Assuming the proportion of males and females to beequal, they illustrate that the larger the separationbetween the two groups, the larger is the cardinal-ity of the support for the locally D-optimal design.They also consider a Bayesian version of this designproblem by imposing a trivariate normal distribu-tion on the three parameters and notice that thedesign is robust with respect to uncertainty in theparameters for this problem. The paper also consid-ers designs for estimating ED95 for the two groupsseparately as well as the two groups combined to-gether.

Sitter and Torsney (1995) consider locally D-op-timal designs when the model contains two quan-titative variables. Burridge and Sebastiani (1992)consider a generalized linear model with two de-sign variables and a linear predictor of the formη = αx1 +θx2 and obtain locally D-optimal designs.Burridge and Sebastiani (1994) obtain D-optimaldesigns for a generalized linear model when obser-vations have variance proportional to the square ofthe mean. They do allow for any number of possi-ble predictors. However, their results are restrictedto the case of power link functions. They establishthat under certain conditions on the parameters of amodel, the traditional change “one factor at a time”designs are D-optimal. They also conduct a numer-ical study to compare the efficiency of classical fac-torial designs to the optimal ones and suggest some

efficient compromise designs. Sebastiani and Settimi(1998) obtain D-optimal designs for a variety of non-linear models with an arbitrary number of covariatesunder certain conditions on the Fisher informationmatrix.

In a more recent article by Smith and Ridout(2003), optimal Bayesian designs are obtained forbioassays involving two parallel dose-response rela-tionships where the main interest is in estimatingthe relative potency of a test drug or test substance.They consider a model for the probability of a re-sponse as

π(x, z) = F (α+ θ(x− ρz)),(6.7)

where z is 0 or 1 representing the two substances(called the standard and test substance, resp.) andF−1 is a link function. The parameter ρ is the rel-ative log potency of the test substance comparedto the standard substance. Smith and Ridout con-sider local and Bayesian D-optimal designs as wellas the Ds-optimal design which is appropriate wheninterest is mainly in a subset of parameters (hereθ and ρ), the others (here α) being considered as nui-sance parameters. The designs are obtained numer-ically and optimality is verified by using the corre-sponding directional derivative function. This modelcontains one quantitative and one qualitative predic-tor with no interaction, and as discussed in Chap-ter 13 of Atkinson and Donev (1992), the local D-optimal designs for the two substances (z = 0 andz = 1) are identical. The number of support pointsfor each substance is also the same as for the cor-responding local D-optimal design with a single-substance experiment.

To illustrate the proposed designs, Smith andRidout (2003) use a data set from Ashton (1972,page 59) which provides the number of Chrysan-themum aphids killed out of the number tested atdifferent doses of two substances. One can modelthe response probability [as given in (6.7)] as a func-tion of x= log(dose) and z = factor representing thetwo substances. Smith and Ridout (2003) investi-gate the problem of finding optimal designs undervarious link functions, optimality criteria and priorchoices. Since there are still very few illustrations ofBayesian optimal design in the multiparameter sit-uation, this numerical example is interesting in itsown right.

Smith and Ridout (2005) apply their work on mul-tiparameter dose-response problems to obtainBayesian optimal design in a three-parameter binary

Page 17: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

DESIGN ISSUES FOR GENERALIZED LINEAR MODELS 17

dose-response model with control mortality as a pa-rameter. In many bioassays where the response isoften death, death may sometimes occur due to nat-ural causes of mortality unrelated to the stimulus. Insuch instances, the occurrence of response is due tonatural mortality or control mortality. The model,including control mortality in a dose-response setup,as considered in Smith and Ridout (2005), is

π(x) = λ+ (1− λ)F [θ(x− µ)].

The control mortality parameter λ is treated as anuisance parameter and a wide range of prior distri-butions and criteria is investigated.

A general treatment of Bayesian optimal designswith any number of explanatory variables seems tobe in order. The computations get much more in-volved with an increase in the number of parametersas integration and optimization need to be carriedout in a higher-dimensional space. The simulation-based methods proposed in Muller (1999) may beof particular importance in these high-dimensionalproblems.

6.6 Miscellaneous Issues In Bayesian Design

Multiresponse models. Draper and Hunter (1967)consider multiresponse experiments in nonlinear prob-lems and adopt locally optimal or sequential designas their design strategy. Very little work has beendone for multiresponse experiments for GLMs us-ing Bayesian design ideas. Hatzis and Larntz (1992)consider a nonlinear multiresponse model with theprobability distribution for the responses given by aPoisson random process. They consider locallyD-optimal designs which minimize the generalizedvariance (volume of the confidence ellipsoid) of theestimated parameters for given specific values of theparameters. They also discuss the case when only asubset of the parameters is of interest, leading tothe local Ds-optimality criterion. They use a gener-alized simulated annealing algorithm along with theNelder and Mead (1965) simplex algorithm. Obtain-ing Bayesian optimal designs for nonlinear multire-sponse models will certainly pose some computa-tional challenges as numerical integration needs tobe carried out in a higher-dimensional space. Heiseand Myers (1996) provides methods for producingD-optimal designs for bivariate logistic regressionmodels. Zocchi and Atkinson (1999) uses D-optimaldesign theory to obtain designs for multinomial lo-gistic regression models.

Sequential Bayesian designs. Ridout (1995) con-siders a limiting dilution model for a binary responsein a seed testing experiment. Suppose that n sam-ples of seed are tested; the ith sample (i= 1, . . . , n)contains xi seeds and yields a binary response vari-able yi, where yi = 0 if the sample is free of infec-tion and yi = 1 otherwise. Let πi = P (ith sample isfree of infection) and θ = the proportion of infectedseeds in the population. Then the limiting dilutionmodel is yi ∼ Bernoulli(πi) with πi = 1 − (1 − θ)xi ,i= 1, . . . , n. The author reparametrizes the problemin terms of λ= log{− log(1− θ)} and the design cri-terion is based on expectations of functions of theFisher information for λ with respect to some uni-form prior. Single-stage and three-stage designs aredeveloped and compared when the sample sizes arerestricted to small numbers. The three-stage designsare found to be much more efficient than single-stagedesigns. This is an interesting example of multistagesequential Bayesian design for one-parameter non-linear problems where the sample size is constrainedto be small. Mehrabi and Matthews (1998) also con-sider the problem of implementing Bayesian opti-mal designs for limiting dilution assay models. Za-cks (1977) proposes two-stage Bayesian designs for asimilar problem. Freeman (1970), Owen (1975), Kuo(1983) and Berry and Fristedt (1985) have also donework in the sequential domain.

Number of support points. In classical design the-ory, an upper bound on the number of support pointsfor an optimal design is usually obtained by invokingCaratheodory’s theorem, as the information matrixdepends on a finite number of moments of the de-sign measure ξ. For D-optimality, the optimal designtypically has as many support points as the num-ber of unknown parameters in the model with equalweights at each point (Silvey, 1980; Pukelsheim, 1993).This property helps to obtain the optimal designanalytically but has the drawback that there is nota sufficient number of support points to allow forany goodness of fit checking. This type of upperbound result applies to local optimality criteria andBayesian optimality criteria for linear models. Fornonlinear models with a continuous prior distribu-tion there is no such bound available. The spaceof possible Fisher information matrices is now in-finite dimensional and Caratheodory’s theorem can-not be invoked. For most concave optimality crite-ria, if the prior distribution has k support points,then there exists a Bayesian optimal design which

Page 18: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

18 A. I. KHURI, B. MUKHERJEE, B. K. SINHA AND M. GHOSH

is supported on at most kp(p+1)/2 different points(Dette and Neugebauer, 1996), where p is the num-ber of parameters, but no such bounds are avail-able for a continuous prior distribution. Chalonerand Larntz (1989) first illustrated how the numberof support points of the optimal design changes asthe prior becomes more dispersed. This is an ad-vantage of the Bayesian design as it allows for pos-sible model checking with the observed data. Howto incorporate model uncertainty in the paradigmremains an important unresolved issue (Steinberg,1985; DuMouchel and Jones, 1994).

Sensitivity to prior specification. Robustness of thedesign to the prior distribution is a desirable prop-erty. DasGupta and Studden (1991), DasGupta,Mukhopadhyay and Studden (1992) and Toman andGastwirth (1993, 1994) developed a framework forrobust experimental design in a linear model setup.Efforts to propose robust Bayesian experimental de-signs for nonlinear models and generalized linearmodels are needed.

Prior for inference. Tsutakawa (1972) argues thatwhen Bayesian inference is considered appropriate,it may be desirable to use two separate priors, onefor constructing designs and the other for subse-quent inference. Many practitioners believe in in-corporating prior information for constructing de-signs, but carry out the analysis through maximumlikelihood or other frequentist procedures. Using adesign prior with small variability and an inferenceprior with an inflated variance, as recommended inTsutkawaka (1972), raises philosophical issues fordiscussion. Etzioni and Kadane (1993) and Lindleyand Singpurwalla (1991) address this dichotomy ofusing informative priors for design and noninforma-tive priors for the subsequent statistical analysis.

Statistical software. Finding optimal Bayesian de-signs for multiparameter nonlinear problems with adiffuse prior distribution is analytically very difficultand can only be obtained numerically. Chaloner andLarntz (1988) made the first effort in this direction.They introduced FORTRAN77 programs for obtain-ing Bayesian optimal designs for logistic regressionwith a single explanatory variable. Smith and Rid-out (1998) introduced an enhanced version of theirprogram called DESIGNV1, which provides a widerrange of link functions (not only logistic) as con-sidered in Ford, Torsney and Wu (1992) along witha greater ensemble of prior distributions and opti-mality criteria. The program SINGLE by Spears,

Brown and Atkinson (1997), available in StatLib,also offers the logistic and log–log link functions withvarious choices of priors and an automated proce-dure to determine the number of support points.Smith and Ridout (2003) extended their software toDESIGNV2 to accommodate two explanatory vari-ables, one quantitative, the other dichotomous. Allthese programs use the Nelder–Mead (1965) opti-mization algorithm. The expectation over a priordistribution is computed by some numerical quadra-ture formulae (usually Gauss–Legendre or Gauss–Hermite quadrature).

A flexible design software developed by Clyde(1993b) is built within XLISP-STAT (Tierney, 1990).This allows evaluation of exact and approximateBayesian optimal designs for linear and nonlinearmodels. Locally optimal designs and non-Bayesianoptimal designs for linear models can also be ob-tained as special cases of Bayesian designs. This re-quires the NPSOL FORTRAN library of Gill et al.(1986) to be installed in the system. The simulation-based ideas for obtaining optimal designs (Muller,1999) can also be implemented through XLISP-STAT.

The use of Bayesian design depends greatly on up-dating and maintaining the existing programs andmaking them known to practitioners, in addition toincluding similar software in other standard statis-tical software packages.

7. DESIGN COMPARISONS USING

QUANTILE DISPERSION GRAPHS

A fourth approach to the design dependence prob-lem was recently introduced by Robinson and Khuri(2003). Their approach is based on studying the dis-tribution of the mean-squared error of prediction(MSEP) in (3.7) throughout the experimental re-gion R. For a given design D, let QD(p,β, λ) de-note the pth quantile of the distribution of MSEPon Rλ, where Rλ represents the surface of a regionobtained by shrinking the experimental region R us-ing a shrinkage factor λ, and β is the parametervector in the linear predictor in (2.2). By varying λwe can cover the entire region R. In order to assessthe problem of unknown β, a parameter space C towhich β is assumed to belong is specified. Subse-quently, the minimum and maximum of QD(p,β, λ)over C are obtained. We therefore get the extrema

QmaxD (p,λ) = max

β∈C{QD(p,β, λ)},

QminD (p,λ) = min

β∈C{QD(p,β, λ)}.

(7.1)

Page 19: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

DESIGN ISSUES FOR GENERALIZED LINEAR MODELS 19

Plotting QmaxD (p,λ) and Qmin

D (p,λ) against p resultsin the so-called quantile dispersion graphs (QDGs)of the MSEP. A desirable feature of a design D is tohave close and small values of Qmax

D and QminD over

the range of p (0 ≤ p≤ 1). Small values of QmaxD in-

dicate small MSEP values on Rλ, and the closenessof Qmax

D and QminD indicates robustness to changes

in the values of β that is induced by the design D.The QDGs provide a comprehensive assessment ofthe prediction capability of D and can therefore beconveniently utilized to compare two candidate de-signs by comparing their graphical profiles.

The QDG approach has several advantages:

(1) The design’s performance can be evaluatedthroughout the experimental region. Standarddesign optimality criteria base their evaluationof a design on a single measure, such as D-efficiency, but do not consider the quality of pre-diction inside R.

(2) Estimation bias is taken into consideration inthe comparison of designs.

(3) The QDGs provide a clear depiction of the de-pendence of a design on the unknown parametervector β. Designs can therefore be easily com-pared with regard to robustness.

(4) The use of the quantile plots of the MSEP per-mits the comparison of designs for GLMs withseveral control variables.

7.1 Logistic Regression Models

Robinson and Khuri (2003) applied the QDG ap-proach to comparing designs for logistic regressionmodels of the form

π(x) =1

1 + e−fT (x)β,(7.2)

where fT (x)β defines the linear predictor in (2.2)and π(x) is the probability of success at x = (x1, x2,. . . , xk)

′, thus µ(x) = π(x). Robinson and Khurishowed that, in this case, (3.7) takes the form

MSE[π(x)]

= π2(x)[1 − π(x)]2fT (x)(XT WX)−1f(x)

+ {π(x)[1− π(x)]fT (x)(XT WX)−1XTWζ

+ 12π(x)[1− π(x)][1− 2π(x)]

· fT (x)(XT WX)−1f(x)}2,

where W is the same as in (3.4) with wu =muπu(1−πu), u= 1,2, . . . , n, and ζ is an n× 1 vector whose

uth element is zuu(πu−0.5), where zuu is the uth di-agonal element of Z = X(XTWX)−1XT . Here, mu

denotes the number of experimental units tested atthe uth experimental run, and πu is the value ofπ(x) at xu, the vector of design settings at the uthexperimental run (u= 1,2, . . . , n).

Robinson and Khuri considered quantiles of a scaledversion of MSE[π(x)], namely,

SMSE[π(x)] =N

π(x)[1 − π(x)]MSE[π(x)],(7.3)

where N =∑n

u=1mu. Formula (7.1) can then be ap-plied with QD(p,β, λ) now denoting the pth quantileof SMSE[π(x)] on the surface of Rλ. Two numeri-cal examples were presented in Robinson and Khuri(2003) to illustrate the application of the QDG ap-proach to comparing designs for logistic regressionmodels. The following example provides another il-lustration of this approach in the case of logisticregression.

Example 2. Sitter (1992) proposed a minimaxprocedure to obtain designs for the logistic regres-sion model (7.2), where

fT (x)β = θ(x− µ).

This model is the same as the one given in (5.6). Thedesigns proposed by Sitter are intended to be robustto poor initial estimates of µ and θ. The numeri-cal example used by Sitter concerns sport fishing inBritish Columbia, Canada, where x is the amountof increased fishing cost. The binary response des-ignates fishing or not fishing for a given x. Thusπ(x) is the probability of wanting to fish for a givenincrease in fishing cost.

Using the D-optimality criterion, Sitter compareda locally D-optimal design against his minimaxD-optimal design. The first design was based on theinitial estimates µ0 = 40, θ0 = 0.90 for µ and θ, andconsisted of two points, namely, x = 38.28,41.72,with equal allocation to the points. For the seconddesign, Sitter assumed the parameter space C : 33≤µ ≤ 47, 0.50 ≤ θ ≤ 1.25. Accordingly, the minimaxD-optimal design produced by Sitter entailed equalallocation to the points, x= 31.72,34.48,37.24,40.00,42.76,45.52,48.28. This design was to be robustover C.

The QDG approach was used to compare Sitter’stwo designs. The same parameter space, C, was con-sidered. The experimental region investigated was

Page 20: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

20 A. I. KHURI, B. MUKHERJEE, B. K. SINHA AND M. GHOSH

R : 30 ≤ x≤ 50, and the two designs consisted of 70runs each with equal weights at the design points.

The scaled mean-squared error of prediction(SMSEP) is given in (7.3). For selected values of(µ, θ) in C, values of SMSEP are calculated through-out the region R for each design. The maximum andminimum quantiles (over C) of the distribution ofSMSEP on R are then obtained as in (7.1). Sincein this example there is only one control variable,no shrinkage of the region R is necessary. The com-bined QDGs for the two designs are shown in Figure1. We note that the dispersion in the quantile val-ues for the minimax D-optimal design is less thanthat for the locally D-optimal design. This indicatesmore robustness of the former design to changes inthe parameter values. This is consistent with theconclusion arrived at by Sitter that “for most of theregion, the minimax D-optimal design performs bet-ter than the locally D-optimal design.”

8. CONCLUSION

The research on designs for generalized linear mod-els is still very much in its developmental stage.Not much work has been accomplished either interms of theory or in terms of computational meth-ods to evaluate the optimal design when the di-mension of the design space is high. The situationswhere one has several covariates (control variables)or multiple responses corresponding to each subjectdemand extensive work to evaluate “optimal” or at

least efficient designs. The curve fitting approachof Muller and Parmigiani (1995) may be one direc-tion to pursue in higher-dimensional design prob-lems. Finding robust and efficient designs in high-dimensional problems will involve formidable com-putational challenges and efficient search algorithmsneed to be developed.

The stochastic approximation literature, as dis-cussed in Section 5, dwells primarily on the develop-ment of algorithms for the selection of design points.Similar ideas can be brought into case-control stud-ies where the prime objective is to study the associ-ation between a disease (say, lung cancer) and someexposure variables (such as smoking, residence neara hazardous waste site, etc.). Classical case-controlstudies are carried out by sampling separately fromthe case (persons affected with the disease) and con-trol (persons without the disease) populations, withthe two sample sizes being fixed and often arbitrary.Chen (2000) proposed a sequential sampling proce-dure which removes this arbitrariness. Specifically,he proposed a sampling rule based on all the ac-cumulated data, which mandates whether the nextobservation (if any) should be drawn from a case ora control population. He showed also certain opti-mality of his proposed sampling rule.

However, like much of the stochastic approxima-tion literature, Chen touched very briefly on thechoice of a stopping rule, but without any optimal-ity properties associated with it. It appears that aBayes stopping rule or some approximation thereof

Fig. 1. Combined QDGs for the minimax D-optimal and D-optimal designs.

Page 21: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

DESIGN ISSUES FOR GENERALIZED LINEAR MODELS 21

can be introduced along with Chen’s sampling ruleso that the issues of optimal stopping and choice ofdesigns can be addressed simultaneously.

The use of the quantile dispersion graphs (of themean-squared error of prediction) provides a con-venient technique for evaluating and comparing de-signs for generalized linear models. The main advan-tages of these graphs are their applicability in ex-perimental situations involving several control vari-ables, their usefulness in assessing the quality of pre-diction associated with a given design throughoutthe experimental region, and their depiction of thedesign’s dependence on the parameters of the fit-ted model. There are still several other issues thatneed to be resolved. For example, the effects of mis-specification of the link function and/or the parentdistribution of the data on the shape of the quan-tile plots of the quantile dispersion graph approachneed to be investigated. In addition, it would be ofinterest to explore the design dependence problem inmultiresponse situations involving several responsevariables that may be correlated. The multiresponsedesign problem in a traditional linear model setupwas discussed by Wijesinha and Khuri (1987a, b).

ACKNOWLEDGMENTS

The authors would like to thank Professor DavidSmith of the Medical College of Georgia for mak-ing his software available for the evaluation of theBayesian optimal designs. They also thank the Edi-tor and the two referees for their helpful commentsand suggestions.

REFERENCES

Abdelbasit, K. M. and Plackett, R. L. (1983). Experi-mental design for binary data. J. Amer. Statist. Assoc. 78

90–98. MR0696852Abdelhamid, S. N. (1973). Transformation of observations

in stochastic approximation. Ann. Statist. 1 1158–1174.MR0348944

Anbar, D. (1973). On optimal estimation methods us-ing stochastic approximation procedures. Ann. Statist. 1

1175–1184. MR0351001Ashton, W. D. (1972). The Logit Transformation with Spe-

cial Reference to Its Uses in Bioassay. Hafner, New York.MR0381220

Atkinson, A. C., Chaloner, K., Herzberg, A. M. and Ju-

ritz, J. (1993). Optimum experimental designs for prop-erties of a compartmental model. Biometrics 49 325–337.

Atkinson, A. C., Demetrio, C. G. B. and Zocchi, S. S.

(1995). Optimum dose levels when males and females differin response. Appl. Statist. 44 213–226.

Atkinson, A. C. and Donev, A. N. (1992). Optimum Ex-

perimental Designs. Clarendon, Oxford.Atkinson, A. C. and Haines, L. M. (1996). Designs for non-

linear and generalized linear models. In Design and Analy-

sis of Experiments (S. Ghosh and C. R. Rao, eds.) 437–475.North-Holland, Amsterdam. MR1492576

Berger, J. O. (1985). Statistical Decision Theory and

Bayesian Analysis, 2nd ed. Springer, New York.MR0804611

Bernardo, J.-M. (1979). Expected information as expectedutility. Ann. Statist. 7 686–690. MR0527503

Berry, D. A. and Fristedt, B. (1985). Bandit Problems:Sequential Allocation of Experiments. Chapman and Hall,London. MR0813698

Berry, D. A. and Pearson, L. M. (1985). Optimal designsfor clinical trials with dichotomous responses. Statistics in

Medicine 4 497–508.Box, G. E. P. and Draper, N. R. (1987). Empirical

Model-Building and Response Surfaces. Wiley, New York.MR0861118

Box, G. E. P. and Lucas, H. L. (1959). Design of ex-periments in non-linear situations. Biometrika 46 77–90.MR0102155

Brooks, R. J. (1972). A decision theory approach to optimalregression designs. Biometrika 59 563–571. MR0334426

Brooks, R. J. (1974). On the choice of an experiment forprediction in linear regression. Biometrika 61 303–311.MR0368333

Brooks, R. J. (1976). Optimal regression designs for predic-tion when prior knowledge is available. Metrika 23 221–230.MR0440810

Brooks, R. J. (1977). Optimal regression design for controlin linear regression. Biometrika 64 319–325. MR0488525

Burridge, J. and Sebastiani, P. (1992). Optimal designsfor generalised linear models. J. Italian Statist. Soc. 1 183–202.

Burridge, J. and Sebastiani, P. (1994). D-optimal de-signs for generalised linear models with variance propor-tional to the square of the mean. Biometrika 81 295–304.MR1294893

Chaloner, K. (1984). Optimal Bayesian experimentaldesigns for linear models. Ann. Statist. 12 283–300.MR0733514

Chaloner, K. (1987). An approach to design for general-ized linear models. In Model-Oriented Data Analysis (V.V. Fedorov and H. Lauter, eds.) 3–12. Springer, Berlin.

Chaloner, K. (1993). A note on optimal Bayesian designfor nonlinear problems. J. Statist. Plann. Inference 37

229–235. MR1243800Chaloner, K. and Larntz, K. (1988). Software for logistic

regression experiment design. In Optimal Design and Anal-

ysis of Experiments (Y. Dodge, V. V. Fedorov and H. P.Wynn, eds.) 207–211. North-Holland, Amsterdam.

Chaloner, K. and Larntz, K. (1989). Optimal Bayesiandesign applied to logistic regression experiments. J. Statist.

Plann. Inference 21 191–208. MR0985457Chaloner, K. and Verdinelli, I. (1995). Bayesian ex-

perimental design: A review. Statist. Sci. 10 273–304.MR1390519

Page 22: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

22 A. I. KHURI, B. MUKHERJEE, B. K. SINHA AND M. GHOSH

Chen, K. (2000). Optimal sequential designs of case-control

studies. Ann. Statist. 28 1452–1471. MR1805792

Chernoff, H. (1953). Locally optimal designs for estimating

parameters. Ann. Math. Statist. 24 586–602. MR0058932

Chernoff, H. (1972). Sequential Analysis and Optimal De-

sign. SIAM, Philadelphia. MR0309249

Clyde, M. A. (1993a). Bayesian optimal designs for approx-

imate normality. Ph.D. dissertation, Univ. Minnesota.

Clyde, M. A. (1993b). An object-oriented system for

Bayesian nonlinear design using xlisp-stat. Technical Re-

port 587, School of Statistics, Univ. Minnesota.

Clyde, M. A. and Chaloner, K. (1996). The equivalence

of constrained and weighted designs in multiple objective

design problems. J. Amer. Statist. Assoc. 91 1236–1244.

MR1424621

Clyde, M. A. and Chaloner, K. (2002). Constrained design

strategies for improving normal approximations in nonlin-

ear regression problems. J. Statist. Plann. Inference 104

175–196. MR1900525

Clyde, M. A., Muller, P. and Parmigiani, G. (1995).

Optimal designs for heart defibrillators. Case Studies in

Bayesian Statistics II. Lecture Notes in Statist. 105 278–

292. Springer, New York.

Das, P., Mandal, N. K. and Sinha, B. K. (2003). de la

Garza phenomenon in linear regression with heteroscedas-

tic errors. Technical report, Dept. Statistics, Calcutta

Univ.

DasGupta, A. (1996). Review of optimal Bayes designs.

In Design and Analysis of Experiments (S. Ghosh and

C. R. Rao, eds.) 1099–1147. North-Holland, Amsterdam.

MR1492591

DasGupta, A., Mukhopadhyay, S. and Studden, W. J.

(1992). Compromise designs in heteroscedastic linear mod-

els. J. Statist. Plann. Inference 32 363–384. MR1190204

DasGupta, A. and Studden, W. J. (1991). Robust

Bayesian experimental designs in normal linear models.

Ann. Statist. 19 1244–1256. MR1126323

Dawid, A. and Sebastiani, P. (1999). Coherent dispersion

criteria for optimal experimental design. Ann. Statist. 27

65–81. MR1701101

DeGroot, M. H. (1986). Concepts of information based on

utility. In Recent Developments in the Foundations of Util-

ity and Risk Theory (L. Daboni, A. Montesano and M.

Lines, eds.) 265–275. Reidel, Dordrecht.

de la Garza, A. (1954). Spacing of information in

polynomial regression. Ann. Math. Statist. 25 123–130.

MR0060777

Dette, H. and Neugebauer, H.-M. (1996). Bayesian opti-

mal one-point designs for one-parameter nonlinear models.

J. Statist. Plann. Inference 52 17–30. MR1391681

Dette, H. and Sperlich, S. (1994). A note on Bayesian

D-optimal designs for a generalization of the exponen-

tial growth model. South African Statist. J. 28 103–117.

MR1333863

Diaz, M. P., Barchuk, A. H., Luque, S. and Oviedo, C.

(2002). Generalized linear models to study spatial distri-

bution of tree species in Argentinean arid Chaco. J. Appl.

Statist. 29 685–694. MR1888680

Dixon, W. J. and Mood, A. M. (1948). A method for ob-taining and analyzing sensitivity data. J. Amer. Statist.

Assoc. 43 109–126.Dobson, A. J. (2002). An Introduction to Generalized Linear

Models, 2nd ed. Chapman and Hall/CRC, Boca Raton, FL.MR2068205

Draper, N. R. and Hunter, W. G. (1967). The use ofprior distributions in the design of experiments for parame-ter estimation in non-linear situations: Multiresponse case.Biometrika 54 662–665. MR0223028

Dubov, E. L. (1977). D-optimal designs for nonlinear modelsunder the Bayesian approach. In Regression Experiments

103–111. Moscow Univ. Press. (In Russian.)DuMouchel, W. and Jones, B. (1994). A simple Bayesian

modification of D-optimal designs to reduce dependenceon an assumed model. Technometrics 36 37–47.

Etzioni, R. and Kadane, J. B. (1993). Optimal experimen-tal design for another’s analysis. J. Amer. Statist. Assoc.

88 1404–1411. MR1245377Fabian, V. (1968). On asymptotic normality in stochas-

tic approximation. Ann. Math. Statist. 39 1327–1332.MR0231429

Fabian, V. (1983). A local asymptotic minimax optimality ofan adaptive Robbins–Monro stochastic approximation pro-cedure. Mathematical Learning Models—Theory and Algo-

rithms. Lecture Notes in Statist. 20 43–49. Springer, NewYork. MR0731719

Fedorov, V. V. (1972). Theory of Optimal Experiments.Academic Press, New York. MR0403103

Flournoy, N. (1993). A clinical experiment in bone marrowtransplantation: Estimating a percentage point of a quantalresponse curve. Case Studies in Bayesian Statistics. Lecture

Notes in Statist. 83 324–336. Springer, New York.Ford, I., Torsney, B. and Wu, C. F. J. (1992). The use

of a cannonical form in the construction of locally optimaldesigns for nonlinear problems. J. Roy. Statist. Soc. Ser.

B 54 569–583. MR1160483Freeman, P. R. (1970). Optimal Bayesian sequential esti-

mation of the median effective dose. Biometrika 57 79–89.MR0293780

Frees, E. W. and Ruppert, D. (1990). Estimation follow-ing a sequentially designed experiment. J. Amer. Statist.

Assoc. 85 1123–1129. MR1134509Gill, P. E., Murray, W., Saunders, M. A. and

Wright, M. H. (1986). User’s guide for NPSOL (version4.0): A Fortran package for nonlinear programming. Tech-nical Report SOL 86-2, Dept. Operations Research, Stan-ford Univ.

Haines, L. (1995). A geometric approach to optimal designfor one-parameter non-linear models. J. Roy. Statist. Soc.

Ser. B 57 575–598. MR1341325Hatzis, C. and Larntz, K. (1992). Optimal design in non-

linear multiresponse estimation: Poisson model for filterfeeding. Biometrics 48 1235–1248.

Heise, M. A. and Myers, R. H. (1996). Optimal designs forbivariate logistic regression. Biometrics 52 613–624.

Hern, A. and Dorn, S. (2001). Statistical modelling of insectbehavioral responses in relation to the chemical composi-tion of test extracts. Physiological Entomology 26 381–390.

Page 23: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

DESIGN ISSUES FOR GENERALIZED LINEAR MODELS 23

Jewell, N. P. and Shiboski, S. (1990). Statistical analysisof HIV infectivity based on partner studies. Biometrics 46

1133–1150.Khan, M. K. and Yazdi, A. A. (1988). On D-optimal de-

signs for binary data. J. Statist. Plann. Inference 18 83–91.MR0926416

Khuri, A. I. (1993). Response surface methodology withinthe framework of GLM. J. Combin. Inform. System Sci.

18 193–202. MR1317235Khuri, A. I. and Cornell, J. A. (1996). Response Surfaces,

2nd ed. Dekker, New York. MR1447628Kiefer, J. and Wolfowitz, J. (1952). Stochastic estima-

tion of the maximum of a regression function. Ann. Math.

Statist. 23 462–466. MR0050243Kuo, L. (1983). Bayesian bioassay design. Ann. Statist. 11

886–895. MR0707938Kuo, L., Soyer, R. and Wang, F. (1999). Optimal design

for quantal bioassay via Monte Carlo methods. In Bayesian

Statistics VI (J.-M. Bernardo, J. O. Berger, A. P. Dawidand A. F. M. Smith, eds.) 795–802. Oxford Univ. Press,New York.

Lai, T. L. and Robbins, H. (1979). Adaptive designand stochastic approximation. Ann. Statist. 7 1196–1221.MR0550144

Lauter, E. (1974). Experimental design in a class of models.Math. Oper. Statist. 5 379–396. MR0440812

Lauter, E. (1976). Optimal multipurpose designs for regres-sion models. Math. Oper. Statist. 7 51–68. MR0413385

Lee, Y. J. and Nelder, J. A. (2002). Analysis of ulcer datausing hierarchical generalized linear models. Statistics in

Medicine 21 191–202.Lindley, D. V. (1956). On a measure of the information

provided by an experiment. Ann. Math. Statist. 27 986–1005. MR0083936

Lindley, D. V. and Singpurwalla, N. D. (1991). On the ev-idence needed to reach agreed action between adversaries,with application to acceptance sampling. J. Amer. Statist.

Assoc. 86 933–937. MR1146341Lindsey, J. K. (1997). Applying Generalized Linear Models.

Springer, New York.Liski, E. P., Mandal, N. K., Shah, K. R. and Sinha,

B. K. (2002). Topics in Optimal Design. Lecture Notes in

Statist. 163. Springer, New York. MR1933941Marks, B. L. (1962). Some optimal sequential schemes for

estimating the mean of a cumulative normal quantal re-sponse curve. J. Roy. Statist. Soc. Ser. B 24 393–400.

Markus, R. A., Frank, J., Groshen, S. and Azen, S. P.

(1995). An alternative approach to the optimal design ofan LD50 bioassay. Statistics in Medicine 14 841–852.

Mathew, T. and Sinha, B. K. (2001). Optimal designs forbinary data under logistic regression. J. Statist. Plann. In-

ference 93 295–307. MR1822403Matthews, J. N. S. (1999). Effect of prior specification on

Bayesian design for two-sample comparison of a binary out-come. Amer. Statist. 53 254–256.

McCullagh, P. and Nelder, J. A. (1989). Generalized Lin-

ear Models, 2nd ed. Chapman and Hall, London.McCulloch, C. E. and Searle, S. R. (2001). Generalized,

Linear and Mixed Models. Wiley, New York. MR1884506

Mehrabi, Y. and Matthews, J. N. S. (1998). Imple-mentable Bayesian designs for limiting dilution assays. Bio-

metrics 54 1398–1406.Minkin, S. (1987). Optimal designs for binary data. J. Amer.

Statist. Assoc. 82 1098–1103. MR0922174Minkin, S. (1993). Experimental design for clonogenic as-

says in chemotherapy. J. Amer. Statist. Assoc. 88 410–420.MR1224368

Mukhopadhyay, S. and Haines, L. (1995). BayesianD-optimal designs for the exponential growth model.J. Statist. Plann. Inference 44 385–397. MR1332682

Muller, P. (1999). Simulation-based optimal design. InBayesian Statistics VI (J.-M. Bernardo, J. O. Berger, A.P. Dawid and A. F. M. Smith, eds.) 459–474. Oxford Univ.Press, New York. MR1723509

Muller, P. and Parmigiani, G. (1995). Optimal design viacurve fitting of Monte Carlo experiments. J. Amer. Statist.

Assoc. 90 1322–1330. MR1379474Myers, R. H. (1999). Response surface methodology—

current status and future directions. J. Quality Technology

31 30–44.Myers, R. H., Khuri, A. I. and Carter, W. H. (1989).

Response surface methodology: 1966–1988. Technometrics

31 137–157. MR1007291Myers, R. H. and Montgomery, D. C. (1995). Response

Surface Methodology. Wiley, New York.Myers, R. H., Montgomery, D. C. and Vining, G.

G. (2002). Generalized Linear Models. Wiley, New York.MR1863403

Nelder, J. A. and Mead, R. (1965). A simplex method forfunction minimization. Computer J. 7 308–313.

Nelder, J. A. and Wedderburn, R. W. M. (1972). Gener-alized linear models. J. Roy. Statist. Soc. Ser. A 135 370–384.

Owen, R. J. (1970). The optimum design of a two-factorexperiment using prior information. Ann. Math. Statist. 41

1917–1934. MR0279943Owen, R. J. (1975). A Bayesian sequential procedure for

quantal response in the context of adaptive mental test-ing. J. Amer. Statist. Assoc. 70 351–356. MR0381185

Palmer, J. L. and Muller, P. (1998). Bayesian optimaldesign in population models of hematologic data. Statistics

in Medicine 17 1613–1622.Parmigiani, G. (1993). On optimal screening ages. J. Amer.

Statist. Assoc. 88 622–628. MR1224388Parmigiani, G. and Berry, D. A. (1994). Applications

of Lindley information measure to the design of clinicalexperiments. In Aspects of Uncertainty (P. R. Freemanand A. F. M. Smith, eds.) 329–348. Wiley, Chichester.MR1309700

Parmigiani, G. and Muller, P. (1995). Simulation ap-proach to one-stage and sequential optimal design prob-lems. In MODA-4—Advances in Model Oriented Data

Analysis (C. P. Kitsos and W. G. Muller, eds.) 37–47. Phys-ica, Heidelberg. MR1366881

Pilz, J. (1991). Bayesian Estimation and Experimental De-

sign in Linear Regression Models, 2nd ed. Wiley, New York.MR1118380

Pronzato, L. and Walter, E. (1987). Robust experimentdesigns for nonlinear regression models. In Model-Oriented

Page 24: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

24 A. I. KHURI, B. MUKHERJEE, B. K. SINHA AND M. GHOSH

Data Analysis (V. V. Fedorov and H. Lauter, eds.) 77–86.Springer, Berlin.

Pukelsheim, F. (1993). Optimal Design of Experiments. Wi-ley, New York. MR1211416

Ridout, M. S. (1995). Three-stage designs for seed testingexperiments. Appl. Statist. 44 153–162.

Robbins, H. and Monro, S. (1951). A stochastic approxima-tion method. Ann. Math. Statist. 22 400–407. MR0042668

Robbins, H. and Siegmund, D. (1971). A convergence the-orem for non-negative almost positive supermartingalesand some applications. In Optimizing Methods in Statistics

(J. S. Rustagi, ed.) 233–257. Academic Press, New York.Robinson, K. S. and Khuri, A. I. (2003). Quantile disper-

sion graphs for evaluating and comparing designs for lo-gistic regression models. Comput. Statist. Data Anal. 43

47–62. MR2005386Rosenberger, W. F. and Grill, S. E. (1997). A sequential

design for psychophysical experiments: An application toestimating timing of sensory events. Statistics in Medicine

16 2245–2260.Ruppert, D., Reish, R. L., Deriso, R. B. and Carroll,

R. J. (1984). Optimization using stochastic approximationand Monte Carlo simulation (with application to harvest-ing of Atlantic menhaden). Biometrics 40 535–545.

Sebastiani, P. and Settimi, R. (1997). A note on D-optimaldesigns for a logistic regression model. J. Statist. Plann.

Inference 59 359–368. MR1450507Sebastiani, P. and Settimi, R. (1998). First-order optimal

designs for non-linear models. J. Statist. Plann. Inference

74 177–192. MR1665126Sielken, R. L. (1973). Stopping times for stochastic approx-

imation procedures. Z. Wahrsch. Verw. Gebiete 26 67–75.MR0341781

Silvey, S. D. (1980). Optimal Design. Chapman and Hall,London. MR0606742

Sitter, R. R. (1992). Robust designs for binary data. Bio-

metrics 48 1145–1155. MR1212858Sitter, R. R. and Forbes, B. E. (1997). Optimal two-stage

designs for binary response experiments. Statist. Sinica 7

941–955. MR1488652Sitter, R. R. and Torsney, B. (1995). Optimal designs

for binary response experiments with two design variables.Statist. Sinica 5 405–419. MR1347597

Sitter, R. R. and Wu, C.-F. J. (1993). Optimal designs forbinary response experiments: Fieller, D, and A criteria.Scand. J. Statist. 20 329–341. MR1276697

Sitter, R. R. and Wu, C.-F. J. (1999). Two-stage designof quantal response studies. Biometrics 55 396–402.

Smith, D. M. and Ridout, M. S. (1998). Locally andBayesian optimal designs for binary dose-response modelswith various link functions. In COMPSTAT 98 (R. Payneand P. Green, eds.) 455–460. Physica, Heidelberg.

Smith, D. M. and Ridout, M. S. (2003). Optimal de-signs for criteria involving log(potency) in comparative bi-nary bioassays. J. Statist. Plann. Inference 113 617–632.MR1965132

Smith, D. M. and Ridout, M. S. (2005). Algorithms for find-ing locally and Bayesian optimal designs for binary dose-response models with control mortality. J. Statist. Plann.

Inference 133 463–478. MR2205343

Spears, F. M., Brown, B. W. and Atkinson, E. N. (1997).The effect of incomplete knowledge of parameter values onsingle- and multiple-stage designs for logistic regression.Biometrics 53 1–10.

Steinberg, D. M. (1985). Model robust response surface de-signs: Scaling two-level factorials. Biometrika 72 513–526.MR0817565

Storer, B. E. (1989). Design and analysis of phase I clinicaltrials. Biometrics 45 925–937. MR1029610

Storer, B. E. (1990). A sequential phase II/II trial for bi-nary outcomes. Statistics in Medicine 9 229–235.

Stroup, D. F. and Braun, H. I. (1982). On a new stop-ping rule for stochastic approximation. Z. Wahrsch. Verw.

Gebiete 60 535–554. MR0665744Sun, D., Tsutakawa, R. K. and Lu, W.-S. (1996).

Bayesian design of experiment for quantal responses:What’s promised versus what’s delivered. J. Statist. Plann.

Inference 52 289–306. MR1401025Tierney, L. (1990). LISP-STAT : An Object-Oriented Envi-

ronment for Statistical Computing and Dynamic Graphics.

Wiley, New York.Toman, B. and Gastwirth, J. L. (1993). Robust Bayesian

experimental design and estimation for analysis of variancemodels using a class of normal mixtures. J. Statist. Plann.

Inference 35 383–398. MR1229251Toman, B. and Gastwirth, J. L. (1994). Efficiency robust

experimental design and estimation using a data-basedprior. Statist. Sinica 4 603–615. MR1309431

Tsutakawa, R. K. (1972). Design of experiment for bioassay.J. Amer. Statist. Assoc. 67 584–590.

Tsutakawa, R. K. (1980). Selection of dose levels for esti-mating a percentage point of a logistic quantal responsecurve. Appl. Statist. 29 25–33.

Venter, J. H. (1967). An extension of the Robbins–Monroprocedure. Ann. Math. Statist. 38 181–190. MR0205396

Wei, C. Z. (1985). Asymptotic properties of least-squaresestimates in stochastic regression models. Ann. Statist. 13

1498–1508. MR0811506Wetherill, G. B. (1963). Sequential estimation of quantal

response curves (with discussion). J. Roy. Statist. Soc. Ser.

B 25 1–48.White, L. V. (1973). An extension of the general equiva-

lence theorem to nonlinear models. Biometrika 60 345–348.MR0326948

White, L. V. (1975). The optimal design of experiments forestimation in nonlinear models. Ph.D. dissertation, Univ.London.

Whittle, P. (1973). Some general points in the theory ofoptimal experimental design. J. Roy. Statist. Soc. Ser. B

35 123–130. MR0356388Wijesinha, M. C. and Khuri, A. I. (1987a). The sequential

generation of multiresponse D-optimal designs when thevariance–covariance matrix is not known. Comm. Statist.

Simulation Comput. 16 239–259. MR0883130Wijesinha, M. C. and Khuri, A. I. (1987b). Construction of

optimal designs to increase the power of the multiresponselack of fit test. J. Statist. Plann. Inference 16 179–192.MR0895758

Wu, C.-F. J. (1985). Efficient sequential designs with binarydata. J. Amer. Statist. Assoc. 80 974–984. MR0819603

Page 25: Design Issues for Generalized Linear Models: A Review · Design Issues for Generalized Linear Models: A Review Andr´e I. Khuri, Bhramar Mukherjee, Bikas K. Sinha and Malay Ghosh

DESIGN ISSUES FOR GENERALIZED LINEAR MODELS 25

Wu, C.-F. J. (1988). Optimal design for percentile estima-tion of a quantal response curve. In Optimal Design and

Analysis of Experiments (Y. Dodge, V. V. Federov and H.P. Wynn, eds.) 213–223. North-Holland, Amsterdam.

Yan, Z. W., Bate, S., Chandler, R. E., Isham, V. andWheater, H. (2002). An analysis of daily maximum windspeed in northwestern Europe using generalized linearmodels. J. Climate 15 2073–2088.

Zacks, S. (1977). Problems and approaches in design of ex-periments for estimation and testing in non-linear models.In Multivariate Analysis IV (P. R. Krishnaiah, ed.) 209–

223. North-Holland, Amsterdam. MR0455232Zhu, W., Ahn, H. and Wong, W. K. (1998). Multiple-

objective optimal designs for the logit model. Comm.

Statist. A—Theory Methods 27 1581–1592.Zhu, W. and Wong, W. K. (2001). Bayesian optimal designs

for estimating a set of symmetric quantiles. Statistics in

Medicine 20 123–137.Zocchi, S. S. and Atkinson, A. C. (1999). Optimum exper-

imental designs for multinomial logistic models. Biometrics

55 437–444. MR1705110