modelling inpatient length of stay by a hierarchical mixture regression via the em algorithm

MATHEMATICAL

COMPUTER MODELLING

PERGAMON Mathematical and Computer Modelling 37 (2003) 365-375 www.elsevier.com/locate/mcm

Modelling Inpatient Length of Stay by a Hierarchical Mixture

Regression via the EM Algorithm

S. K. NG* Department of Mathematics, University of Queensland

Brisbane, QLD 4072, Australia [email protected]

K. K. W. YAU Department of Management Sciences, City University of Hong Kong

Kowloon, Hong Kong

A. H. LEE Department of Epidemiology and Biostatistics

Curtin University of Technology Perth, WA 6845, Australia

(Received December 2001; revised and accepted October 2002)

Abstract-The modelling of inpatient length of stay (LOS) has important implications in health care studies. Finite mixture distributions are usually used to model the heterogeneous LOS distribution, due to a certain proportion of patients sustaining a longer stay. However, the morbidity data are collected from hospitals, observations clustered within the same hospital are often correlated. The generalized linear mixed model approach is adopted to accommodate the inherent correlation via unobservable random effects. An EM algorithm is developed to obtain residual maximum quasi- likelihood estimation. The proposed hierarchical mixture regression approach enables the identifica, tion and assessment of factors influencing the long-stay proportion and the LOS for the long-stay patient subgroup. A neonatal LOS data set is used for illustration. @ 2003 Elsevier Science Ltd. All rights reserved.

Keywords-clustered data, EM algorithm, Generalized linear mixed models, Mixture distribution, Random effects.

1. INTRODUCTION

The modelling of inpatient length of stay (LOS) has important implications in various aspects of health care management, such as hospital management and planning [l], the management of

*Author to whom all correspondence should be addressed. The authors wish to thank the referees for helpful comments. The authors are also grateful to the Health Infor- mation Center, Health Department of Western Australia, for providing the neonatal length of stay data. This research was supported in part by grants from Curtin University of Technology and the Research Grants Council of Hong Kong.

0895-7177/03/$ - see front matter @ 2003 Elsevier Science Ltd. All rights reserved. Typeset by dM-QX PII: SO895-7177(03)00012-S

366 S. K. NG et al

health care resources [a]: and the performance in hospital care 131. For many health care management systems, it is important to develop a comprehensive analysis of LOS and to identify hospital- and patient-related characteristics influencing LOS variations [4,5]. By targeting relevant factors, appropriate policies can be developed to manage the hospitai care [6] and the health care resources effectively [7].

Many hospital casemix schemes, such as diagnosis related groups. are based on the assumption that outliers of LOS (long-stay patients) have different resource consumption patterns from those of inliers. It is, therefore, important to account for the heterogeneity of LOS IS]. Trimmmg met,hods currently used are determined arbitrarily and without theoretical support [9]. mixture distribution approach appears to be a suitable alternative to describe the heterogeneous LOS variable [S,lO-121. For example, Quantin et al. [8] considered a logistic-Weibull mixture model to estimate the proportion of long-stay patients and identify risk factors that specifically explain an increase for the long-stay subgroup proportion. The outcomes of the analysis enable hospitals to justify a disproportionately high burden of long-stay patients. Lee et al. [lo], on the other hand, considered a Gamma mixture regression model to identify t,he risk factors that affecting the LOS for each subgroup and compare their effects between the long-stay subgroup and the reference subgroup (inliers). The information obtained helps to determine appropriate pohc~es m hospital resource utilization management and contributes towards an early prediction of patterns who will require longer hospital care.

The problem motivating current study on modelling of LOS with random effects is that observations collected from the same hospital are often correlated. The dependence of clustered data (patients nested within hospitals) can lead to spurious associations and misleading infer- ences [13-151. There has been limited research on hospital discrepancies as a relevant attribute of LOS variations. The aim of this paper is to develop a statistical methodology for analysmg bet-- erogeneous LOS data with the adjustment of random hospital variations. The paper addresses the above two issues by proposing a general hierarchical mixture regression approach to assess detex- minants of long-stay proportion (Section 3) and risk factors affecting the LOS for each subgroup (Section 4). Taking into account the determinants of an increase in the proportion of long-stay patients would allow the budgetary allocation to be more equitable. At the same time. the outcomes of the analysis assist clinicians and managers in the preparation of prescriptive policies for a better utilization of resources by comparing and contrasting the significant factors between the outliers and inliers subgroups. The model also provides information on inter-hospital variation and estimates on (random) hospital effects. As a result, the relative efficiency of hospitals ma! be evaluated based on the predicted random effects

This paper is organized as follows. In Section 2. a statisticai modelling framework for LOS data is presented. In Section 3, we describe the methodology to identify determinants of the incidence of long-stay. In Section 4, we mention the approach to assess risk factors for the outliers and inliers subgroups. In Section 5, the proposed hierarchical mixture regression approach is illustrated using a neonatal LOS data set of 206 neonates from 23 hospitals in the state of Western Australia. Implications and remarks on the methodology are given in Section 6.

2. STATISTICAL MODELLING FRAMEWORK FOR LOS DATA

Modelling of the distribution of LOS is based on a mixture of normal distributions. The proposed methodology framework, however, is also appropriate if other distributions are adopted; see for example [8,10,16]. The choice of a normal distribution is motivated by its simple form and the availability of software. It is well established that the empirical distribution of LOS is positively skewed; log-transformation of LOS is, therefore. applied to approximate normal

distribution. Such transformation of LOS has previously been applied to model variations in maternity LOS [14]. The appropriateness of the normal assumption for the logarithm of LOS can

Mixture Models with Random Effects 367

be justified using the profile likelihood approach on the Box-Cox transformation [17], as presented in [18].

Let Yij denote the log-transformed LOS of the jth individual within the ith hospital. Denoting by M the number of hospitals and nT1, the number of observations in the ith hospital, the total number of observations is, therefore, N = c,“=, n.,. The probability density function of Y takes the form of a two-component normal mixture distribution

f(Ytj) = n$l(Yij) + (1 - r)42(ytj), (1)

where r denotes the probability of the patient belonging to the first component (short-stay subgroup), and 4,(y) is the gth component normal distribution (g = 1,2),

q5,(yzj) = exp - i i i log (2rra,2) + (“.J ii”)2 I) , for i = 1,. . , M; j = 1,. . . , n,, pg and c$ represent the mean and variance of the gth normal distribution, respectively. In practice, the number of subgroups can be determined a posteriori based on various criteria [19-211, using existing software: EMMIX [22]. It is noted that the estimation of the unknown parameters on the basis of the observations Y2j is only meaningful if the parameters are identifiable. Titterington et al. [23, Section 3.11 have given a lucicl account of the concept of identifiability for mixtures. They pointed out that most finite mixtures of continuous density such as mixture of multivariate normal are identifiable; an exception is a mixture of uniform densities; see also the discussion in [22, Section 1.14).

3. IDENTIFICATION OF RISK FACTORS OF LONG-STAY SUBGROUP PROPORTION

As pointed out in Section 1, disparities in hospital- and patient-related characteristics may lead to the formation of long-stay subgroup, whose proportion may differ from one hospital to another. The advantage of using a mixture model is that it allows a hospital to estimate its proportion of long-stay subgroup and to compare this to the national figures without defining arbitrarily the limit for long-stay subgroups [8]. Th is issue is very important because the comparison of proportions for long-stay subgroup can determine financial decisions and provide hospitals with justification for long-stay patients. The comparison can also be taken into consideration for the assessment of potential interventions or the allocation of resources.

This section describes the identification of determinants for the proportion of long-stay subgroup, adjusting for random hospital variations. Let x,~ be a vector of covariates associated with Yij. With respect to model (l), the proportion of each subgroup is now expressed in terms of these covariate variables by assuming 7r to be a logistic function of x. Random hospital effects V, are then adjusted through the linear predictor, viz.

T(Xtj) = exp Et2 1 +exp&’ <ij=W,jY+K (i = 1, _. , M; j = 1.. , n,), (2)

where the superscript T denotes vector transpose, wij = [ 1 z: I’, y is a vector of logistic regression coefficients. Here, Vi represents the unobservable random effect due to the ith hospital, and is taken to be i.i.d. N(0, X). With (2), risk factors that significantly affect the proportion of long-stay patients can be identified.

Adopting the generalized linear mixed model (GLMM) method, the estimation procedure is adapted from McGilchrist [24] and McGilchrist and Yau [25]. The method commences with the best linear unbiased predictor (BLUP) and extends to obtain residual maximum quasi-likelihood (REMQL) estimators. We denote $ as the vector of unknown parameters y, V, Pi, and c$ (g =

368 S. K. NG et d.

1,2), where yT = [&,Vz,...,V~]. For g iven initial values of A. the BLUP estimators of ~1 maximize 1 = 11 t 12, where

The BLUP estimate of $J is obtained as a solution of the equation $$ = 0, which can be solved

via the EM algorithm [26]. The EM algorithm has a number of desirable properties, including its simplicity of implementation and reliable global convergence [27. Section I .7]. That is. it increases monotonically the likelihood after each iteration and converges to a local maximum oi the likelihood function. The global maximum can then be located by applying the EM algorithm from different starting values [22, Section 2,121.

On the (k + l)th iteration, the E-step of the EM algorithm involves the calculation of the Q-function,

where

is the current estimated posterior probability that yzj belongs to the first subgroup (g = 1). Prom the practical health care (hospital) management point of view. it will be useful to identifv patients in the outliers subgroup (g = 2), so that further studies can be focused on this subgroup. Classification strategy is by assigning each patient to the component of the mixture to which it has the highest posterior probability of belonging [22, Section 1.151. That is, a patient, is classified into the first group if the final estimated posterior probability ?%I > 0.5, otherwise, he/she is classified into the second group. The M-step of the EM algorithm provides the updated estimate &‘(“+l) that maximizes the Q-function with respect t,o &. It fotlows that t,he M-step involves solving the nonlinear equations

and the following closed-form equations for ps and 0: (g = 1,2) :


where d is the number of covariates and ulij,~ is the Ith element of the vector wij. The MINPACK routine HYBRDl [28] is adopted to find a solution to the system of nonlinear equations for y and 2’. The REMQL estimate of the variance component X is then obtained as follows. Denote

B= [;:I [-Y&l [W Zl+ [; ,pi,] ( Lf = Wr + zv.

where W and Z are the design matrices of y and u, respectively, and IM is an identity matrix of dimension M. Matrix B and its inverse are partitioned conformally to y 1 v as

Using results in McGilchrist and Yau [25], we have

i = M-l (trA22 + cT.i,),

var (?I = [All] ,

var i = 0

2 x-2 (A4 - 2x-1 tr A22) + Xp4 tr (A;,)

The significance of each parameter yl (1 = 1.. , d + 1) associated with a covariate on the proportion of long-stay patients can be determined based on the standard error of the estimate. Then odds ratio can be calculated for each significant factor to estimate the associated relative risk of belonging to a long-stay subgroup. The estimated random hospital effects Vi and the standard error provide useful information as to whether hospitals differ significantly in the proportions of long-stay subgroup, after adjustment for patient case mix. A negative significant random effect indicates a higher proportion of long-stay patients in a hospital. The relative loading of hospitals may be compared based on these predicted random hospital effects. The result improves the forecast of future bed requirements to aid the hospital care planning. It also enables hospitals to justify a disproportionately high burden of long-stay patients and provides additional insights towards a global budget allocation system.

4. IDENTIFICATION OF RISK FACTORS ON LOS FOR OUTLIERS AND INLIERS SUBGROUPS

The identification of risk factors that affect the LOS for each subgroup is particularly useful for clinicians and managers in the preparation of prescriptive policies for a better utilization of resources. By comparing and contrasting the significant factors such as socio-economic status between the outliers and inliers subgroups, appropriate strategic planning and financial nego- tiation can be prescribed to improve the economic efficiency. Moreover, early identification of patients at high risk of prolonged LOS will also allow physicians to treat those patients more aggressively, and permit their families to estimate the costs of long-term hospital care and predict the scheduled day of discharge. In this section, we describe how model (1) can be used to assess the risk factors of each subgroup, adjusting for random hospital variations.

As motivated by Leung et al. [14], the probability density function for each subgroup is modelled by a normal distribution with mean expressed in terms of the covariate L,, as

P&&j) = w,jPg + ug, (i = 1 > 1 iv: j=l,..., n,; g=1,2), (3)

where bg is the vector of regression coefficients and U,, represents the unobservable random effect of the ith hospital on the gth component mean, and is taken to be i.i.d. N(O,$,). With (3), risk factors that significantly influence the LOS of the long-stay patients subgroup can be identified. Under this formulation, the vector of unknown parameters becomes $J = (x, p,‘, ui? 0,“) for

370 S. K. NG et al

g = 1,2, where ul = [Urr, Urz, . . , U~M] and ul = [Uzr, Uz2, , U~M]. For given initial values of 19, (g = 1,2), the BLUP estimators of II, maximize 1 = 11 + 12, where

The BLUP estimate of II, via the EM algorithm is obtained in a similar manner as in Section 3. On the (k + 1) th iteration, the E-step of the EM algorithm calculates the Q-function

where

(k) = Tij

nCk)41 (Yij; &jr qck’)

rck)41 (Yij;Gj, +,‘k’) + (I - rck’) 42 (Yzj; xij, QLck’)

The M-step involves solving the nonlinear equations

5 -&jJ’(Yij - P&,&%j,l = 0, forp,l(g=1,2; I=1 ,..., d+l), a=1 j=l

ni Ei $(Yij - /-&zj)) 6 1 ucl, -0

--- I es? for U,, (g = 1,2; i = 1,. , M), j=l

and the following closed-form equation for rr:

5 2 rj;’ i=r j=r

7r= N

The REMQL estimates of the variance components 0r and 02 are obtained as follows. Denote

where 1-1s = W& + 2u9, (g = 1,2). M t a rix B and its inverse are partitioned conformally to

Pl I P2 I Ul I u2 as

eT2 (M-28;’ tr Ass) +0r4 tr (A&) e;2e;2 tr (A34A43) 1 -1

eT2eT2 tr (A34A43) eT2 (M-2e;1 trA44) +0i4 tr (Aa4)


The significance of each parameter ,&,l (g = 1,2; 1 = 1, , d + 1) associated with a covariate on the LOS for each subgroup can be determined using the standard error of the estimate. The significant factors attributed mainly for the variation of LOS for the outliers and inliers subgroups can be then compared. This information on different sets of significant determinants for each subgroup provides useful guidance to hospitals with respect to economic decisions such as budgeting expenditures in targeted groups. Moreover, it provides indications to clinicians on how their medical practice influences LOS [29] and to managers how a hospital care strategy impacts on the LOS. The estimated random hospital effects U,, and the standard error provide useful information whether there is significant hospital variation in LOS for each subgroup, after adjustment for patient case mix. A positive significant random effect indicates a prolonged LOS in a hospital. As a result, the relative efficiency of hospitals may be evaluated based on these predicted random hospital effects. This provides additional insights towards performance-based policy planning.

5. APPLICATION TO NEONATAL LOS DATA

This study is based on the 1998/1999 data for 206 neonates from 23 hospitals in the state of Western Australia. The outcome variable (neonatal LOS) is defined to be the number of days from admission to discharge. For this sample, the LOS ranged from one to 46 days. In addition, concomitant information on the number of diagnoses, number of procedures, indigenous status (non-Aboriginal/Aboriginal), and admission type (elective/emergency) are also available. These variables are considered as potential determinants of neonatal LOS.

Based on the various statistical criteria outlined in Section 2, the application of mixture models analysis identified two subgroups for neonatal LOS using the EMMIX program /22]. Table 1 displays the results for these selection criteria and the maximum likelihood estimates of the parameters. The two-component normal mixture distribution is found to provide a better fit than the single-density function, yet its fit cannot be enhanced by adding an extra component to the model. The estimated proportion of the long-stay subgroup is approximate 36%.

Table 1. Normal mixture distribution fits.

Estimates Log-Likelihood AIC BIC

7r1 = 1.000 -260.340 524.7 531.3

~1 = 2.398, ~1” = 0.733

irr = 0.640, R~ = 0.360

~1 = 1.993, ~1” = 0.636 -244.306 498.6* 515.3’

~2 = 3.118, ~2” = 0.097

7~1 = 0.066, ‘IT* = 0.552,

7r3 = 0.382

/JI = 0.563, CT: = 0.227 -241.380 498.8 525.4

/Lz = 2.097, u; = 0.353

p3 = 3.149, u3’ = 0.093

1 based on 99 bootstrap replications * selected number of components

5.1. Risk Factors for the Proportion of Long-Stay Subgroup

p-value’ for 9 (Versus 9 + 1)

--I 0 01

------I

The results from fitting the hierarchical logistic mixture model of Section 3 are summarized in Table 2. The number of diagnoses, the number of procedures, and Aboriginality are significant determinants of the long-stay probability. The number of diagnoses and number of procedures

372 S. K. Nc et al.

are related to the complications and co-morbidities that the neonate had during hospitalization, with a larger number indicating a more severe condition and consequently an increased long-stay subgroup proportion [12]. Adjusted odds ratio can also be calculated to estimate the associated relative risk of belonging to a long-stay subgroup. For example. relative to non-Aborigines patients, the odds of belonging to the long-stay subgroup increased about five-fold (exp 1.6381 for Aboriginal patients. This is partly because hospital doctors tend to keep the child (and possibl, mother together) longer to ensure drug compliance or treatment of other illness. and partly due to logistical problems in arrangin g transport back to their rural/remote home settlements [7i.

The estimate of the variance component X is not significant, indicating that the variation in LOS is not attributed by different hospitals. The identification and quantification of risk factors. after adjusted for the random hospital effects, if any, are important, in hospital care management (suctr as discharge planning) as well as hospital resource allocation

Table 2. Summary of hierarclucal loglstr rruxture model fit

* p - value < 0.05

5.2. Risk Factors on LOS for the .Outliers and Inliers Subgroups

The results from fitting the hierarchical mixture regression model of Section 4 are summarized in Table 3. As shown in the current application, the sets of significant risk factors can differ between inliers and outliers subgroups. For the mliers subgroup (first component), it is founo that a longer stay is typically sustained by those patients with a larger number of diagnoses and the number of procedures. On the other hand. for the long-stay subgroup isecond component ,. emergency admitted Aboriginal patients with larger number of diagnoses appear to ha\,? prolonged hospitalization. Estimates of the variance components 01 and 02 are both significantly nonzero, implying that the variation in LOS is partially due to the differences among providers The identification of risk factors, after adjusted for the random hospital effects provides us+ ful information for clinicians on how their medical practice influences LOS. Moreover. it helps the hospital care managers in the preparation of prescriptive policies for a better utilization of resources by comparing the significant factors between the outliers and inliers subgroups

The quantification of inter-hospital variation by the estimated random effects provides atl- ditional insights for state health authorities to assess hospital efficiencv and strategic funding policy. Those hospital characteristics affecting the LOS for both outliers and inliers subgroup,> suggest that state funding allocation should provide the incentives to measure and control for such differences, in order to maintain equity of resource distribution between hospitals.

6. DISCUSSION

We have developed a general hierarchical mixture regression approach for a comprehensive analysis of LOS data, with the adjustment of random hospital variations. Unlike the linear mixed model of Leung et al. [14], arbitrary trimming of the data is unnecessary, nor the number of LOS subgroups is defined a priori. The study highlights significant heterogeneity in the


Table 3. Summary of hierarchical mixture regression fit.

Variable

(Intercept)

Reference Category First Component Second Component Estimate (S.E.) Estimate (S.E.)

1.525* (0.157) 2.311* (0.100)

Number of Diagnosis 0.117* (0.034) 0.02&3* (0.013)

Number of Procedures

Indigenous Status: Aboriginal

Non-Aboriginal

o.ok33* (0.041) 0.029 (0.018)

0.231 (0.151) 0.328* (0.045)

Admission Type: Emergency

Variance Component

* p - value < 0.05

Elective -0.211 (0.139) 0.206* (0.051)

0.223* (0.110) 0.165’ (0.057)

LOS and significant hospital effects for clustered sample (patients are nested within hospitals). It demonstrates how the random hospital effects can be adjusted within a mixture-modelling framework, which is a viable approach for modelling mixture distributions with random effects. Indeed, the proposed approach extends to account for inter-hospital variations the mixture model of Quantin et al. [8] for identifying determinants of long-stay proportion (refer to Section 3). By targeting relevant risk factors on the proportion of long-stay patients, hospital can manage the health care resources and the budgetary allocation effectively based on the hospital- and patient- related characteristics. At the same time, it shows how the mixture models approach of Xiao et al. [12] and Lee et al. [lo] can be extended to adjust for the random hospital effects to identify risk factors on the LOS of the long-stay patients subgroup (refer to Section 4). The findings will enable hospitals to develop appropriate strategic plarming and after-care policies.

In the paper, a mixture model of normal distributions is used. The layout of the methodology should be sufficiently clear for the development of a general mixture model with the other distributions (such as Weibull) adopted for each subgroup. In Sections 3 and 4, we highlight the estimation procedure of the GLMM using the BLUP approach to derive the residual maximum quasi-likelihood (REMQL) estimators. The EM algorithm is applied to obtain the pa.rameter estimates. Alternatively, other procedures such as the Bayesian approach using Gibbs sampling may be adopted. The Gibbs sampler is a Markov Chain Monte Carlo (MCMC) method for estimating the posterior distribution of interest [22. Chapter 4; 301. With the Gibbs sampling approach. a random variate is generated directly from the conditional distribution of a subvector of tin given all the other parameters in @ and the observed data yzJ. The process is then cycled through all the parameters iteratively, each time drawing from the corresponding conditional distribution. Under mild regularity conditions, the joint distribution is converged to the desired posterior distribution [31]. A detailed description of the Gibbs sampler can be found in [32; 33, Chapter 61. while a Bayesian formulation of the GLMM is also available in the literature [34].

Although the focus of this paper is on hospital as the unit of clustering. the methodology can be extended to analyze variations in other settings. For example, patients may be nested under different health regions or local districts within the state. Such a hierarchical analysis is of interest to state health authorities to assess variations in service consumption between different population groups. Similarly, with appropriate modifications, the method is also applicable to analyze longitudinal data, where repeated episodes of some events such as discharge and readmissions [35] for each patient are monitored.

374 S. K. NG et al

REFERENCES 1. G. Riley, J. Lubitz, M. Gornick et al., Medicare beneficiaries: Adverse outcomes after hospitalization for

eight procedures, Medical Cure 31 (lo), 921-949, (1993). 2. C. Beaver, Y. Zhao, S. McDermind and D. Hindle, Casemix-based fundmg of Northern Territory public

hospitals: Adjusting for severity and socio-economic variations, Health Economrcs 7 (1). 53-61. (1998) 3. A.H. Leyland and F.A. Boddy, Measuring performance in hospital care: Length of stay 111 gynaecology.

European Journal of Public Health 7 (2), 136-143, (1997). 4. R. Rental, M.J. Kiess, S. DesHarnais and K. Reutter, Applications for risk-adjusted outcome measures.

Quality Assurance in Health Care 3, 283-292, (1991). 5. J. Xiao, D. Douglas, A.H. Lee and S.R. Vemuri, A Delphi evaluation of the factors influencing length of stay

in Australian hospitals, International Journal of Health Plannzng and Management 12 (3), 207-218. ( 1997). 6. D. Conrad, T. Wickizer, C. Maynard et al., Managing care, incentives, and information: An explanatory iook

inside the “Black Box” of hospital efficiency, Health Services Research 31 (3), 235-259, (1996). 7. A.H. Lee and J. Codde, Determinants of length of stay. Implications on differential funding for rural anti

metropolitan hospitals, Australian Health Rewzew 23 (4), 126-133. (2000) 8. C. Quantin, E. Sauleau, P. Bolard et al., Modeling of high-cost patient distribution within renal fallurr

diagnosis related group, Journal of Clinical Epidemiology 52 (3), 251.-258, (1999). 9. A.H. Lee, J. Xiao, S.R. Vemuri and Y. Zhao, A discordancy test approach to Identify outliers of length of

hospital stay, Statistics in Medicine 17 (19), 2199-2206, (1998). 10. A.H. Lee, A.S.K. Ng and K.K.W. Yau, Determinants of maternity length of stay: A gamma mixture risk-

adjusted model, Health Care Management Science 4 (4), 249-255. (2001). 11. G. Taylor, S. McClean and P. Millard, Geriatric-patient flow-rate modelling, IMA Jownal of Mathematacs

Applied in Medicine and Biology 13 (4), 297-307. (1996). 12. J. Xiao, A.H. Lee and S.R. Vemuri, Mixture distribution analysis for length of hospital stay for efficient

funding, So&o-Economic Planning Sciences 33 (l), 39-59, (1999). 13. R.C. Blair, J.J. Higgins, M.E. Topping and A.L. Mortimer, An investigation of the robustness of the t-test

to unit of analysis violations, Educational and Psychological Measurement 43. 69-80, (1983). 14. K.M. Leung, R.M. Elashoff, K.S. Rees et al., Hospital- and patient-related characteristics determining ma-

ternity length of stay: A hierarchical linear model approach, American Journal of Public Health 88 (3). 377-381, (1998).

15. R.D. Gibbons, Mixed-effects models for mental health services research. Health Services and Outcomes Re- search Methodology 1 (2), 91-129, (2000).

16. S. McClean and P. Millard, Patterns of length of stay after admlssion in geriatric-medicine---An event histoq approach, Statistician 42 (3), 263-274, (1993).

17. G.E.P. Box and D.R. Cox, The analysis of transformations (with discussion). Journal of the Royal Statzstzcal Society B 26 (2), 211-252, (1964).

18. T.J. Thompson, P.J. Smith and J.P. Boyle, Finite mixture models with concomitant information: Assessing diagnostic criteria for diabetes, Applied Statistics 47 (3), 393-404, (1998)

19. H. Akaike, Information theory and an extension of the maximum likelihood principle, In Second Internatzonul Symposium on Injonnation Theory, (Edited by B.N. Petrov and F. Csaki). pp. 267-281, Akademiai Kiadd, Budapest, (1973).

20. G. Schwarz, Estimating the dimension of a model, Annuls of Statistzcs 6 (2), 461-464. (1978). 21. G.J. McLachlan, On bootstrapping the likelihood ratio test statistic for the number of components m a

normal mixture, Applied Statistics 36 (3), 318-324, (1987). 22. G.J. McLachlan and D. Peel, Finite Mixture Models, Wiley, New York, (2000~ 23. D.M. Titterington, A.F.M. Smith and U.E. Makov. Statzstzcal Analyszs of Fanzte Mzzture Distrzbulzons,

Wiley, New York, (1985). 24. C.A. McGilchrist, Estimation in generalised mixed models. Journal of the Royal Statistzcal Soczety B 56 (1).

61-69, (1994). 25. C.A. McGilchrist and K.K.W. Yau, The derivation of BLUP, ML, REML estimation methods for generalized

linear mixed models, Communications in Statistics--Theory and Method 24 (12), 2963-2980, (1995). 26. A.P. Dempster, N.M. Laird and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm

(with discussion), Journal of the Royal Statistical Society B 39 (l), l-38, (1977) 27. G.J. McLachlan and T. Krishnan, The EM Algorithm and Eztenszons, Wiley, New York, (1997). 28. J.J. MorB, B.S. Garbow and K.E. Hillstrom, User Guide for MINPACK-1, ANL-80-74, Argonne National

Laboratory, Chicago, IL, (1980). 29. M.B. Bobek, L. Hoffman-Hogg, N. Bair et al., Utilization patterns, relative costs, and length of stay followmg

adoption of MICU sedation guidelines, Form&q 36 (9), 664-673, (2001). 30. A.E. Gelfand and A.F.M. Smith, Sampling-based approaches to calculating marginal densities, Journal oj

the American Statistical Association 85 (410), 398-409, (1990). 31. S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions and the Bayesian restoration of Images.

IEEE lPransactions on Pattern Analysis and Machine Intelligence 6 (6), 721-741, (1984). 32. G. Casella and E.I. George, Explaining the Gibbs sampler, American Stutistzczan 46 (3), 167-174, (1992) 33. M.A. Tanner, Tools for Statistical Inference: Methods for the Explorutzon of Posteraor Distributions and

Likelihood Functions, Third Edition, Springer-Verlag, New York, (1996).


34. S.L. Zeger and R.M. Karim, Generalized linear models with random effects, Journal of the American Statis- tical Association 86 (413), 79-86, (1991).

35. B. Danielsen, A.G. Castles, C.L. Damberg and J.B. Gould, Newborn discharge timing and readmissions: California, 1992-1995, Pediatrics 106 (l), 31-39, (2000).

modelling inpatient length of stay by a hierarchical mixture regression via the em algorithm

Documents