a balanced multi-level rotation sampling design and its efficient composite estimators

17
Journal of Statistical Planning and Inference 137 (2007) 594 – 610 www.elsevier.com/locate/jspi A balanced multi-level rotation sampling design and its efficient composite estimators Y.S. Park a , , J.W. Choi b , K.W. Kim c a Department of Statistics, Korea University, 5-1Anam-Dong, Sungbuk-gu, Republic of Korea b NCHS, CDC, 3311 Toledo RD, Hyattsville, MD 20782, USA c Department of Informational Statistics, Korea University, 208 Seochang-ri, Jochiwon, Chungnam, Republic of Korea Received 10 January 2005; received in revised form 30 August 2005; accepted 18 December 2005 Available online 20 February 2006 Abstract We present a multi-level rotation sampling design which includes most of the existing rotation designs as special cases. When an estimator is defined under this sampling design, its variance and bias remain the same over survey months, but it is not so under other existing rotation designs. Using the properties of this multi-level rotation design, we derive the mean squared error (MSE) of the generalized composite estimator (GCE), incorporating the two types of correlations arising from rotating sample units. We show that the MSEs of other existing composite estimators currently used can be expressed as special cases of the GCE. Furthermore, since the coefficients of the GCE are unknown and difficult to determine, we present the minimum risk window estimator (MRWE) as an alternative estimator. This MRWE has the smallest MSE under this rotation design and yet, it is easy to calculate. The MRWE is unbiased for monthly and yearly changes and preserves the internal consistency in total. Our numerical study shows that the MRWE is as efficient as GCE and more efficient than the existing composite estimators and does not suffer from the drift problem [Fuller W.A., Rao J.N.K., 2001.A regression composite estimator with application to the Canadian Labour Force Survey. Surv. Methodol. 27 (2001) 45–51] unlike the regression composite estimators. © 2006 Elsevier B.V.All rights reserved. Keywords: Three-way balancing; Minimum risk estimator; Generalized regression estimator; Time-in-sample bias; Recall bias 1. Introduction In a monthly rotation sampling survey, respondents are frequently asked to recall past events going back to one or more previous months (or any time units) to obtain more information on the events. Often a recording system or diary is used to recall such events. If one has to recall one or more previous months, the rotation design is called multi-level rotation design. If there is no recall, it is called a one-level rotation design (Binder and Hidiroglou, 1988). We present a class of multi-level rotation designs which possess desirable properties in rotating sample units (i.e., three-way balancing as described below). This class encompasses most rotation designs currently used as special cases. Also, we derive general formula for variance and bias of a generalized composite estimator (GCE). These variance and bias are applicable to other known composite estimators as special cases. Furthermore, we propose an alternative Corresponding author. E-mail address: [email protected] (Y.S. Park). 0378-3758/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2005.12.007

Upload: ys-park

Post on 26-Jun-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Journal of Statistical Planning and Inference 137 (2007) 594–610www.elsevier.com/locate/jspi

A balanced multi-level rotation sampling design and its efficientcomposite estimators

Y.S. Parka,∗, J.W. Choib, K.W. Kimc

aDepartment of Statistics, Korea University, 5-1 Anam-Dong, Sungbuk-gu, Republic of KoreabNCHS, CDC, 3311 Toledo RD, Hyattsville, MD 20782, USA

cDepartment of Informational Statistics, Korea University, 208 Seochang-ri, Jochiwon, Chungnam, Republic of Korea

Received 10 January 2005; received in revised form 30 August 2005; accepted 18 December 2005Available online 20 February 2006

Abstract

We present a multi-level rotation sampling design which includes most of the existing rotation designs as special cases. When anestimator is defined under this sampling design, its variance and bias remain the same over survey months, but it is not so underother existing rotation designs. Using the properties of this multi-level rotation design, we derive the mean squared error (MSE) ofthe generalized composite estimator (GCE), incorporating the two types of correlations arising from rotating sample units. We showthat the MSEs of other existing composite estimators currently used can be expressed as special cases of the GCE. Furthermore,since the coefficients of the GCE are unknown and difficult to determine, we present the minimum risk window estimator (MRWE)as an alternative estimator. This MRWE has the smallest MSE under this rotation design and yet, it is easy to calculate. The MRWEis unbiased for monthly and yearly changes and preserves the internal consistency in total. Our numerical study shows that theMRWE is as efficient as GCE and more efficient than the existing composite estimators and does not suffer from the drift problem[Fuller W.A., Rao J.N.K., 2001. A regression composite estimator with application to the Canadian Labour Force Survey. Surv.Methodol. 27 (2001) 45–51] unlike the regression composite estimators.© 2006 Elsevier B.V. All rights reserved.

Keywords: Three-way balancing; Minimum risk estimator; Generalized regression estimator; Time-in-sample bias; Recall bias

1. Introduction

In a monthly rotation sampling survey, respondents are frequently asked to recall past events going back to one ormore previous months (or any time units) to obtain more information on the events. Often a recording system or diaryis used to recall such events. If one has to recall one or more previous months, the rotation design is called multi-levelrotation design. If there is no recall, it is called a one-level rotation design (Binder and Hidiroglou, 1988).

We present a class of multi-level rotation designs which possess desirable properties in rotating sample units (i.e.,three-way balancing as described below). This class encompasses most rotation designs currently used as special cases.Also, we derive general formula for variance and bias of a generalized composite estimator (GCE). These varianceand bias are applicable to other known composite estimators as special cases. Furthermore, we propose an alternative

∗ Corresponding author.E-mail address: [email protected] (Y.S. Park).

0378-3758/$ - see front matter © 2006 Elsevier B.V. All rights reserved.doi:10.1016/j.jspi.2005.12.007

Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610 595

estimator to the GCE, the minimum risk window estimator (MRWE), which is simple and yet found to be more efficientthan other composite estimators.

Many types of rotation designs are actually used in many surveys, each using different recall level and rotation patternreturning times of sample units. For example, the US Current Population Survey (CPS) and Canadian Labour ForceSurvey (CLFS) use a one-level rotation design, whereas the US quarterly Consumer Expenditure Interview Survey(CEQ) is a three-level rotation design, and the US National Crime Victimization Survey (NCVS) is a six-level rotationdesign (Bailar, 1989). Monthly samples in these surveys are balanced in the sense that each monthly sample includesall possible number of interview times, all rotation groups, and the equal number of recall levels. We call a rotationdesign with such balancing a “monthly balanced design”, and from it we can obtain the unbiased estimator for change(e.g., monthly or yearly). Apart from such monthly balancing, some of these designs have the rotation plans in whichthe replacement between incoming and outgoing units depends only on their interview times in sample but not on thesurvey months. We call this a “time-invariant rotation plan”, from which we can obtain the estimators whose variancesdo not vary over the survey months.

These two properties (i.e., monthly balanced design and time-invariant rotation plan) exist when monthly samplefrom a multi-level rotation design is balanced on time in sample, rotation group, and recall level as described in thenext section. We call such rotation design three-way balanced rotation design. The previous general classes of rotationdesigns (McLaren and Steel, 2000; Steel and McLaren, 2000; Cantwell, 1990; Park et al., 2001) are all special casesof this three-way balanced design. We show the necessary and sufficient conditions for a rotation design to be in thisclass of three-way balanced designs. We then define the GCE, a linear combination based on the current as well as thepast information, and show that all the existing composite estimators are special cases of the GCE. One of the biggestadvantages to develop the three-way balanced design is that the GCE and all the currently used composite estimatorssuch as the AK composite estimator (Lent et al., 1999) and regression composite estimators (Singh et al., 2001) havethe same variances over survey months due to the time-invariant rotation plan and are unbiased for monthly and yearlychanges due to the monthly balancing. But this is not true when the composite estimators are defined under otherrotation designs.

Using the necessary and sufficient conditions for the three-way balancing, we derive the mean squared error (MSE)of the GCE. This MSE incorporates two types of correlations (the first-order correlation arising from repeated measure-ments of the same unit, and the second-order correlation arising from different units of the same rotation group) andtwo types of biases (time-in-sample bias arising from different interview times, and recall bias arising from differentlevels of recall).

Although the GCE is a general estimator, it cannot be used with its unknown coefficients. Thus, we derive theminimum risk GCE (MRGE) (i.e., the GCE with minimum MSE) by optimizing the coefficients of the GCE. However,it is difficult to use the MRGE in practice since its coefficients yet involve unknown parameters which are difficult todetermine. To overcome this deficiency of the MRGE, we propose an alternative estimator, MRWE, from the � monthwindow estimator (Bell, 2001). The MRWE is easier to calculate than the MRGE and preserves the internal consistency(e.g., the sum of estimates of Employed, Unemployed, and Not in Labour Force should add up to the total population) asseen in the regression estimators. Our numerical examples show that the MRWE is as good as the MRGE in efficiency.

The remainder of this paper is divided into five sections. Section 2 describes the basic concepts of the three-waybalanced design and shows the necessary and sufficient conditions for three-way balancing. Section 3 derives the MSEof the GCE and presents the MRWE. We estimate the difference of time-in-sample biases and the difference of recallbiases. Section 4 investigates the relationship between the GCE and other estimators (i.e., the multi-level versions ofthe AK and regression composite estimators). Section 5 presents numerical examples to evaluate the MRWE comparedto the theoretical MRGE. We also compare the MRWE to the AK and regression composite estimators under variousscenarios of correlations and biases. Section 6 expresses some concluding remarks.

2. Rotation sampling design

In a multi-level rotation sampling design, it is important to determine the levels of recall, frequencies of interviewing,numbers of times to be interviewed, and the number of rotation groups. Proper choice of these factors can reduce biasand variation in estimation. Thus, in this section, we discuss the proper combination of these factors by investigatingthe relationship among them. Section 2.1 describes the basic concept of their relationship. Section 2.2 provides thenecessary and sufficient conditions for a rotation design to be balanced in three-ways.

596 Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610

Fig. 1. The three-way balanced 2 − 4 − 2(3) design.

We assume that our population is divided into an appropriate number of rotation groups and that each group containsenough units to last the entire survey, each unit representing its group in the monthly sample.

We describe a general rotation plan for a �-level rotation design where the recall level � = 1, 2, . . . . In this �-levelrotation design, every sample unit has to recall from one previous month to � − 1 previous months so that each sampleunit provides � months information by a single interview. We distinguish these � months information by their recalltimes. For example, when a unit recalls two months before, we call it the unit with recall time 2.

Denote the number of months for a unit to be in sample by r1i and out of sample by r2i for the ith cycle in total ofm cycles. When a sample unit is selected from each rotation group, this unit returns to the sample for every �th monthuntil its r11th interview and out of the sample for the next r21 + �− 1 successive months. Then, the same sample unit isagain interviewed for every �th month until its (r11 + r12)th interview and is out of the sample for the next r22 + � − 1months. This procedure is repeated for m cycles until this sample unit returns to the sample for its final

(∑mi=1 r1i

)th

interview. We call this design r11 − · · · − r2,m−1 − r1m(�) design. Thus, the level � not only indicates the number ofprevious months recalled but also implies that there is a gap between the inclusion times of a sample unit in the survey.When m = 1, it is called the �-level r11 in-then-out design and expressed by r11 − 0(�). For example, the CanadianLabour Force Survey is a 6-0(1) design and the Consumer Expenditure Interview Survey (CEQ) is a 4-0(3) design.

Fig. 1 illustrates the 2-4-2(3) design with four rotation groups (i.e., level � = 3, months in the sample r11 = r12 = 2,months out of sample r21 = 4, and number of cycles m = 2). In this design, a sample unit is in sample every thirdmonth for two times, out of sample for the next six months, and returns to the sample every third month for another twotimes. Finally the unit completely retires from the sample. To see this rotation plan, Fig. 1(a) uses the notation (�, g)

to indicate the index for the �th sample unit in the gth rotation group and the ui indicates the sample unit interviewedfor the ith time (i = 1, 2, . . . , 4) in any given month. The symbols “|” and “‖” above the sample unit ui are the sameunit ui with respective recall times 1 and 2.

Reading Fig. 1(a) in a perpendicular direction along the survey month for each unit denoted by (�, g), we see therotation plan of the 2-4-2(3) design. For example, the unit of (3, 3) is interviewed at t for the second time, out of samplefor the next six months, and then returns at t + 7 for the third interview. This unit will be back to the sample at t + 10for its last interview. The sample unit (3, 3) reports three months information at each interview. For example, this unitreports the current time information of the month t + 7 by interview (denoted by u3) and the information of the monthst + 6 and t + 5 by recall one and two months ago, respectively (denoted by | and ‖, respectively).

2.1. Three-way balanced rotation designs

We formally define the three-way balanced multi-level rotation design.

Definition 1. The r11 − r21 −· · ·− r2,m−1 − r1m(�) design is balanced in three-ways if it has G rotation groups whereG =∑m

i=1 r1i and satisfies the following three properties:

(i) For each survey month, all G rotation groups are represented in the sample, and each rotation group is representedby its � different sample units, one with recall time 0, another with recall time 1, . . ., and the last with recall time� − 1.

Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610 597

Fig. 2. The three-way balanced 4-0(3) design and the CEQ design. (�, g) is the index for the �th sample unit in the gth rotation group; ui indicatesthe sample unit interviewed for the ith time (i = 1, 2, . . . , 4) in any given month; and “|” and “‖” above the sample unit ui are the same unit ui withrespective recall times 1 and 2.

(ii) For each survey month and for each recall time j , j = 0, 1, . . . , � − 1, the monthly sample is balanced in such away that one of the G sample units is interviewed for the first time, one for the second time, . . ., and one for theGth time.

(iii) For every span of G survey months and for each recall time, each of the G rotation groups contributes its G sampleunits in which one sample unit is interviewed for the first time, . . ., and one for the Gth time.

The properties (i) and (ii) are the precise conditions for a rotation design to be a monthly balanced rotation design,while (iii) leads to the time-invariant rotation plan.

To illustrate these three properties of the three-way balanced design, we construct Fig. 1(b) with respect to surveymonths and rotation groups for each recall time, ignoring � from Fig. 1(a). In each picture, all possible four rotationgroups and interview times from 1 to 4 are included in any survey month. This shows properties (i) and (ii). We alsoobserve property (iii) by considering any perpendicular span of four survey months in each picture.

The two designs (i.e., 4-0(3) and CEQ) in Fig. 2 follow the same three-level 4-in-then-out rotation plan. In particular,Fig. 2(b) is the rotation plan currently used in the CEQ. Note that each sample unit is actually interviewed five timesin CEQ, but it is understood that the first interview data are discarded since it is used only for bounding. Bounding is atechnique to prevent misdating the occurrence of an event, and it usually reduces the time-in-sample bias (Silbersteinand Jacobs, 1989).

Example 1 (A three-way balanced rotation design and a non-three-way balanced design). It is easy to see that both4-0(3) design and CEQ satisfy conditions (i) all rotation groups are in sample and (ii) all the interview times in thesample of Definition 1 as seen from Fig. 2(a) and (b). But Fig. 2(c) shows that the 4-0(3) design meets condition (iii)all interview times for each rotation group within any span of four months, while the CEQ does not satisfy condition(iii). For example, when we consider rotation group 1 with recall time = 0 from month t to month t + 3 in Fig. 2(c),this rotation group contains only the units interviewed for the first and fourth times but not the units for the second andthird times. Therefore, 4-0(3) is a three-way balanced rotation design, whereas the CEQ is a monthly balanced designbut not balanced in three-ways.

The rotation plan of the 4-0(3) design depends only on interview times in sample but not on survey month. InFig. 2(c), for example, u4 at t and u3 at t + 1 are from the same rotation group and this is true for any consecutivetwo survey months. Similarly, as another example, u4 and u1 are from the same rotation group as long as their surveymonths are apart by three months. However, the rotation plan of the CEQ depends on both time in sample and surveymonth. For example, two u4’s at t and t + 1 are from the same rotation group but those at t + 2 and t + 3 are not fromthe same rotation group (i.e., one is from group 1 and the other is from group 2). This dependency on survey monthinduces that the variance of any estimator from the CEQ should depend on the survey month when there is a correlationbetween the two sample unit which are from the same rotation group (i.e., the second-order correlation).

When we consider a one-level rotation design (i.e., � = 1), the properties (i)–(iii) of Definition 2.1 for three-waybalancing are reduced to the properties required for the previous two-way balanced design (Park et al., 2001). Property

598 Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610

(i) implies that the number of groups in the monthly sample is equal to the total number for which a sample unit isinterviewed during the entire survey. This indicates that the three-way balanced design also includes the Cantwell’sbalanced design (Cantwell, 1990) as a special case.

2.2. Necessary and sufficient conditions for three-way balancing

We now investigate the relationship among in-sample period r1i , out-sample period r2i , number of cycles m, andlevels of recall � to establish the necessary and sufficient conditions for the three-way balancing. The conditions arepresented by imposing certain rules on the numbers, r1i , r2i , m, and � for the r11 − r21 − · · · − r2,m−1 − r1m(�) designas follows. Let gt (i) be the rotation group interviewed for the ith time at month t .

Theorem 2. A r11 − · · · − r2,m−1 − r1m(�) rotation design is balanced in three ways and follows a time-invariantrotation plan if and only if there is a unique integer mk , 1�mk �G, satisfying

modG

{mk + k + (mk − 1)(� − 1) +

m−1∑i=1

r2iI[mk>∑i

�=1r1�]

}= 0

and gt (mk) = gt+1(mk−1) for each k = 0, 1, . . . , G − 1, (1)

where I[·] = 1 if [·] is true, mk−1 = mG−1 for k = 0, and gt (mk) �= gt (mk1) for mk �= mk1 .

The proof of Theorem 2.1 and other proofs are shown in theAppendix. The m0, . . . , mG−1 are alternative expressionsof the interview times in sample. These mk’s are used to identify which two sample units are overlapped and/or fromthe same rotation group. Hence, by appropriately defining the matrices using these mk’s, we can derive the variances,covariances, and biases of the estimators in Sections 3 and 4.

Example 2 (2-4-2(3) design). We have r11 = r12 = 2, r21 = 4, m = 2, � = 3, and G = 4 in the 2-4-2(3) design. Thus,(1) is reduced to

mod4(mk + k + 2(mk − 1) + 4I [mk > 2]) = 0.

Using this, we obtain m0 = 2, m1 = 3, m2 = 4, and m3 = 1 and the relationship of gt (2) = gt+1(1), gt (3) = gt+1(2),gt (4)= gt+1(3), and gt (1)= gt+1(4). Namely, the two sample units interviewed for the second time at month t and forthe first time at month t + 1 are from the same rotation group, the two units for the third time at t and for the secondtime at t + 1 are from the same rotation group, and so on. This can be also observed from Fig. 1(a).

When we consider a one-level rotation design (i.e., �=1) with the same in-sample and out-sample periods (i.e., r1i=r1,r2i =r2 for all i=1, 2, . . . , m) and r2 is a multiple of r1, it is shown that (1) is reduced to modmr1{m∗r1+(m∗−1)r2}=0for m∗ = 1, 2, . . . , m. This is the condition for the previous two-way balanced design presented by Park et al. (2001).

3. Variance and bias

For the interview times in sample i = 1, 2, . . . , G and the recall level j = 0, 1, . . . , � − 1, let xtij be a generalizedregression estimator of a characteristic of interest for month t from the sample unit interviewed for the ith time atmonth t + j by recalling the j th previous month (Särndal, 1980). We can define an extension of the previous GCE(Park et al., 2001) to �-level rotation designs at month t as

yt =G∑

i=1

�−1∑j=0

aij xtij − �G∑

i=1

�−1∑j=0

bij xt−1,i,j + �yt−1 = a′Xt − �b′Xt−1 + �yt−1, (2)

where 0�� < 1, a = (a10, . . . , aG0, . . . , aG,�−1)′ and b = (b10, . . . , bG0, . . ., bG,�−1)

′ with a′1 = b′1 = 1, and Xt =(xt10, . . . , xtG0, . . . , xt,G,�−1)

′.

Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610 599

The estimator yt is a weighted mean by the three unknown coefficients (i.e., a, b, and �) and is efficient whenrepeated measurements are correlated. Since

∑i,j aij =∑

ij bij = 1 and 0�� < 1, yt is a weighted estimator of xtij ’sand estimates the same characteristic of xtij . Its first term a′Xt is the monthly level estimate and it is adjusted by theestimates of the previous months with the weight � given to the previous month.

3.1. Variance of the GCE

The repeated interviews of the same sample unit are likely to be correlated (the first-order correlation). Furthermore,the sample units in the same rotation group are often close to each other since the rotation often occurs among theneighboring sampling units. This leads to the second-order correlations. These two types of correlations are incorporatedinto our variance.

The interview-time-in-sample and recall level of a sample unit may also have some impact on the variance of thedata. Hence, we do not assume that the variance remains the same, but rather varies over the course of interview-time-in-sample i and recall level j . Thus, the variance and covariance of xtij and xt+t1,i1,j1 can be summarized as, fori, i1 = 1, . . . , G, j, j1 = 1, . . . , � − 1,

Cov(xtij , xt+t1,i1,j1) =

⎧⎪⎪⎪⎨⎪⎪⎪⎩�2

ij if t1 = 0, i = i1 and j = j1,

�1t1�ij�i1j1 if both are from the same unit,

�2t1�ij�i1j1 if both are from different units,

of the same group,

0 otherwise,

(3)

where �1t1is the first-order correlation of two different measures on the same unit and �2t1

is the second-ordercorrelation of two different measures on the two different units of the same rotation group between months t and t + t1for t1 = 0, 1, . . ..

In practice, the unknown parameters �1t , �2t , and �ij in (3) can be estimated by a design-based method. Since thecovariance structure of (3) explicitly implies its stationarity over month t , our approach is a mixture of model-basedand design-based methods.

Denote Vk = Cov(Xt ,Xt−k) for k�0. The explicit form of this covariance matrix Vk is given in Lemma A.1 in theAppendix using two matrices L1 and L2. L1 identifies whether or not two measurements are from the same sample unit

and L2 identifies whether or not they are from the same rotation group. Let B1,0 =∞∑

k=0�kV1+k , and B1,1 =

∞∑k=0

�k+1Vk .

Then we have the following result.

Theorem 3. Suppose that �-level rotation design is balanced in three-ways. Then under the covariance structure (3),the variance of yt is

(1 − �2)Var(yt ) = a′Q1a + �2b′Q1b − 2�b′Q2a,

where Q1 = V0 + 2�B1,0 and Q2 = B1,0 + B ′1,1.

The proof is in Appendix. For some integers t0, t1 > 0, we are also interested in the three other variances of thechange (i.e., yt − yt−t0 ), the aggregate (i.e., St,t0 = ∑t0−1

t1=0 yt−t1 ), and the change of two aggregates (i.e., St,t0 −St−t1,t0 =∑t0−1

i=0 yt−i −∑t0−1i=0 yt−t1−i for t1 � t0). These variances have the same forms of V ar(yt ) except the different

Q1 and Q2 as shown in Theorem A.2 provided in the Appendix.If the second-order correlation �2t = 0 for all t , the L2 matrix is no longer necessary in variance estimation for the

r11 − r21 −· · ·− r2,m−1 − r1m(�) design because the matrix L2 is related only to the second-order correlation as shownin (A.6). When �2t = 0, the correlation structure in the monthly balanced design is the same as that in the three-waybalanced design (see Example 1 to distinguish the monthly balanced design from the three-way balanced design). Thus,the variance for any monthly balanced rotation design can be obtained by simply letting L2 = 0 in Theorems 3 andA.2. One special case of it is the variances in Cantwell (1990), and other special cases are discussed in Section 4.

600 Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610

3.2. Bias of the generalized composite estimator

Non-sampling errors in a rotation design arise from many sources. Some of them are the reporting errors includingmisdating the occurrence of an event (i.e., telescoping), failing to report events that actually occurred (i.e., omission), theinfluence of previous interviews to the subsequent interviews (i.e., panel conditioning), non-response, and respondentburden from repeated interviews. These may be incorporated into the two classes of two non-sampling errors in amulti-level rotation design, which are the time-in-sample bias and the recall bias (Bailar, 1989; Kalton et al., 1989).We assume that these two biases can be expressed in the additive model as shown below (Bailar, 1989).

Denote the time-in-sample bias of interview time i by �i and the recall bias of recall level j by �j . We assumethat E(xtij ) = �t + �i + �j for month t where �t is the true mean to be estimated. Let � = (�1, �2, . . . , �G)′ and� = (�0, �1, . . . , ��−1)

′. Then, it can be expressed in matrix form, for k = 0, 1, . . .,

E(Xt−k) = �t−k1 + 1� ⊗ � + � ⊗ 1G, (4)

where ⊗ denotes the Kronecker product, 1� and 1G are the � × 1 and G × 1 unit vectors, respectively. Since E(yt ) =a′E(Xt ) − �b′E(Xt−1) + �E(yt−1), by solving this equality recursively, we have the following result.

Lemma 4. For the three-way balanced design,

E(yt ) = �t + 1

1 − �(a′ − �b′)[(1� ⊗ �) + (� ⊗ 1G)],

where the second term is the bias of yt .

The three-way balancing implies that each monthly sample contains G interview-time-in-sample from 1 to G forany fixed recall time j so that

E

(G∑

i=1

xtij

)= G�t +

G∑i=1

�i + G�j . (5)

Perpendicular balancing in three-way balanced designs indicates G sample units interviewed for the ith time (i =1, . . . , G) for any consecutive G months so that

E

(G∑

t=1

, xtij

)=

G∑t=1

�t + G�i + G�j . (6)

Therefore, unbiased estimators of �j −�0 and �i −�1 are obtained from (5) and (6), respectively, and they are given by

1

T G2

T G∑t=1

G∑i=1

(xtij − xti0) for �j − �0 and

1

T G�

T G∑t=1

�−1∑j=0

(xtij − xt1j ) for �i − �1, (7)

where we assume that the survey has been conducted at least TG months for a positive integer T .

3.3. Minimum risk generalized composite estimator

We consider the four types of estimands for the GCE. They are the monthly level, monthly level change, aggregatelevel, aggregate level change (i.e., yt , yt −yt−t0 , St,t0 =∑t0−1

t1=0 yt−t1 , and St,t0 −St−t1,t0 =∑t0−1i=0 yt−i −∑t0−1

i=0 yt−t1−i

for t0 �1 and t1 � t0). Defining specific values of t0 and t1, we assume that there are n estimands of interest and denotethem by zth for h = 1, 2, . . . , n.

Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610 601

We derive the coefficients a and b of yt in (2) by minimizing a weighted sum of MSE of n estimands. To do so, setthe object function O,

O =n∑

h=1

hMSE(zth) − 1(1′a − 1) − 2(1′b − 1),

where ’s are the Lagrange multipliers and ’s are the weights which represent the relative importance one wishes toassign to the estimands (see Park et al., 2001; Fuller and Rao, 2001 for details of h).

Since the square of the bias in Lemma 4 takes the same form of the variance in Theorems 3 and A.2, it is clear that,for the three-way balanced r11 − r21 − · · · − r2,m−1 − r1m(�) design, MSE(zth) can be expressed as

MSE(zth) = a′C1ha + b′C2ha + b′C3hb, (8)

where the matrices, C1h, C2h, and C3h are appropriately defined by Theorems 3, A.2, and Lemma 4.Using (8) and the usual Lagrange multiplier method, we minimize the object function O and obtain the following

compromise coefficients to have the MRGE in the three-way balanced r11 − r21 − · · · − r2,m−1 − r1m(�) design.

Lemma 5. For the three-way balanced design, the compromise coefficients a and b for the MRGE are given by(ab

)=(

C1 + C′1 C′

2 − 1′R11(C1 + C′1)

−1C′2

C2 − 1′R31(C3 + C′3)

−1C2 C3 + C′3

)−1 (1′R11′R3

),

where Ck =∑nh=1hCkh for k = 1, 2, 3, R1 = 1(C1 + C′

1)−11′, and R3 = 1(C3 + C′

3)−11′.

Observe that the a and b in Lemma 5 depend on the first- and second-order correlations (i.e., �1 and �2) as well asan unknown weight � which is usually determined by a grid search (Park et al., 2001). As long as the second-ordercorrelation is not zero, the a and b vary with month t . In this case, the GCE is biased for changes in the monthlybalanced design (e.g., CEQ).

The MRGE cannot satisfy the internal consistency in total unless we designate one of the components (usually theleast important one) as a residual. This component is obtained by subtraction from the known total. For example, if welet “Not in Labour Force” be the least important in the three components of “Employed”‘, “Unemployed”, and “Not inLabour Force”, we can obtain “Not in Labour Force” by subtraction from the known population of age 15 years andover.

Therefore, we develop an alternative estimator (i.e., the MRWE) which satisfies the internal consistency and is usedfor the monthly balanced design which includes the three-way balanced design as a special case.

3.4. Minimum risk window estimator

The � month window estimator of Bell (2001) can be written in the r11 − r21 −· · ·− r2,m−1 − r1m(�) rotation designas

yt,B� =�−1∑j=0

d′jXt−j ,

where dj is a �G × 1 vector to be determined with d′01 = 1 for j = 0 and d′

j 1 = 0 for j = 1, . . . , � − 1. The varianceand bias of yt,B� are the immediate results from Sections 3.2 and 3.3:

Var(yt,B�) = D′V�D and Bias(yt,B�) =�−1∑j=0

d′j [(1� ⊗ �) + (� ⊗ 1G)], (9)

where D′ = (d′0, d′

1, . . . , d′�−1) and V� = Cov(Xt ,Xt−1, . . . ,Xt−�−1). Since the bias of yt,B� is free from time t , it is

also unbiased for the change.Let yt,i,B� be the � month window estimator for the ith part level estimator so that the parts add up to the total (e.g.,

the i indicates Employed or Unemployed which satisfies Employed + Unemployed = Labour Force). As in the GCE,

602 Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610

let zBtih be the hth estimand based on yt,i,B� and denote MSE(zB

tih) = D′iC

BihDi without loss of generality from (9) for

i = 1, . . . , I .Two constraints are needed to have level estimator unbiased and the internal consistency in total. The first is the

unbiased constraint which is 1′di0=1 and 1dij =0 for j=1, . . . , �−1 for each i. The second constraint is∑I

i=1 yt,i,B� =Kt where Kt is a known population control total. For instance, when I =2, Labour Force (i.e., yt,1,B�) + Not in LabourForce (i.e, yt,2,B�) = Population of the age of 15 years and over (i.e, Kt).

Because∑I

i=1 yt,i,B� = Kt can be rewritten by X′i,t :t−�+1Di + ∑I

k=1,k �=i yt,k,B� = Kt where X′i,t :t−�+1

= (X′it ,X

′i,t−1, . . . ,X

′i,t−�+1), above two constraints can be written as RDi = ei where R is a � × �G� matrix

with (R)pq = 1 when q = (p − 1)�G + 1 to p�G for p = 1, . . . , � − 1, (R)pq = X′i,t :t−�+1 for p = �, and the

remaining elements are zeros; ei is a � × 1 vector with zeros as its elements except the first being 1 and the last beingKt −∑I

k �=i yt,k,B� . Then, we obtain the coefficients of Di by minimizing∑n

h=1 hMSE(zBtih) under the two constraints:

Di = M−1i R′[RM−1

i R′]−1ei , (10)

where i = 1, 2, . . . , I , M = h(CBih + CB

ih

′), and

∑h = 1.

Let D∗i denote the Di given in (10) by ignoring the second-order correlation (i.e., we set �2 = 0). Let y∗

t,i,B�be the

yt,i,B� with the D∗i . This y∗

t,i,B�can be used for any r11 − r21 − · · · − r2,m−1 − r1m(�) design including the CEQ and

NCVS because D∗i remains the same during the entire survey period. This y∗

t,i,B�is called the MRWE. In Section 5 we

show that the MRWE is comparable to MRGE even if for �2 �= 0.

4. GCE and other composite estimators

Various estimators in one-level rotation sampling designs have been proposed to improve their efficiencies by utilizingthe information from previous months. The AK composite estimator (Gurney and Daly, 1965; Lent et al., 1999) andthe regression composite estimators (Singh et al., 2001; Gambino et al., 2001) are typical examples. These estimatorscan be easily obtained as special cases from the GCE as discussed below.

To investigate the relationship between the GCE and the composite estimators, let xt be the grand mean of a studyvariable xt , xm,t−1 and xm,t the means of the overlapped rotation groups at month t − 1 and t , respectively, and xB,t

the mean of the unmatched rotation groups at month t . Further, let � be fraction of the sample overlapped between thetwo successive months in the r11 − · · · − r2,m−1 − r1m(�) rotation design.

Extending the one-level case (Lent et al., 1999), we can show that, in the r11 −· · ·− r2,m−1 − r1m(�) rotation design,the “AK” composite estimator can be written as

yt,AK = G + m(K − 1) − Am

Gxm,t + (1 − K + A)m

GxB,t − Kxm,t−1 + Kyt−1,AK, (11)

where G =∑mi=1 r1i and A and K are parameters to be determined.

The regression composite estimation uses some external information as well as information of the previous month tocalibrate the weights of the sampling units (Singh et al., 2001; Gambino et al., 2001). Following Fuller and Rao (2001),we discuss a single study variable with no external information (i.e., the univariate case) and the regression compositeestimator can be written as

yt,FR = ((1 − ��t )� + ��t )xm,t + (1 − ��t )(1 − �)xB,t − �t xm,t−1 + �t yt−1,FR, (12)

where �t is the regression coefficient of xt on xt−1 and 0���1 is to be determined. In particular, the MR1 and MR2estimators (Singh, 1996) are special cases of (12) when � = 0 and 1, respectively. The yt,FR in (12) can be regarded asthe general case by considering �t to be the regression coefficient of xt on xt−1 after the xt and xt−1 are adjusted forexternal variables and other study variables.

The stationary covariance structure given in (3) implies that xtij can be expressed as a trend stationary process (i.e.,the sum of the true fixed mean �t and an error in which the error follows a stationary process such as a first-orderautoregression, see Fuller and Rao, 2001). As �t in (12) remains almost the same over the time t , we let �t = �0 for allt throughout this section.

Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610 603

We show that the composite estimators of (11) and (12) are the special cases of our GCE and that their variances arealso the special cases of those of the GCE as follows.

Define B = {(i, j) : i = 1, r11 + 1, . . . ,∑m−1

k=0 r1k + 1, j = � − 1} where r10 = 0 and D = {(i, j) : i = r11, r11 +r12, . . . ,

∑mk=1 r1k, j = 0}. The B and D indicate the set of the indices for the unmatched sample units at month t and

t − 1, respectively. Thus it follows that, in the r11 − · · · − r2,m−1 − r1m(�) rotation design,

xB,t = 1

m

∑(i,j)∈B

xtij , xm,t = 1

G� − m

∑(i,j)/∈B

xtij and xm,t−1 = 1

G� − m

∑(i,j)/∈D

xt−1,i,j . (13)

Since our GCE is defined as yt = ∑Gi=1

∑�−1j=0aij xtij − �

∑Gi=1

∑�−1j=0 bij xt−1,i,j + �yt−1 with

∑Gi=1

∑�−1j=0 aij =∑G

i=1∑�−1

j=0 bij = 1, the AK estimator and the regression composite estimator are the special cases of the GCE by(11)–(13). More precisely, the AK composite estimator is obtained by setting

� = K, ai,j = 1 − K + A

G�for (i, j) ∈ B, aij = G� + m(K� − 1) − Am

G�(G� − m)for (i, j) /∈B,

bi,0 = 0 for (i, j) ∈ D and bij = 1

G� − mfor (i, j) /∈D. (14)

The regression composite estimator (12) is obtained by

� = �0, aij = (1 − �)(1 − ��0)

mfor (i, j) ∈ B, aij = � + ��0(1 − �)

G� − mfor (i, j) /∈B,

bi,0 = 0 for (i, j) ∈ D and bij = 1

G� − mfor (i, j) /∈D. (15)

Therefore, the variances and biases of the AK and the regression composite estimators including the MR1 and MR2estimators are the special cases obtained by plugging (14) and (15) into Theorems 3, A.2, and Lemma 4.

In particular, the bias of the regression estimator yt,FR in (12) is given by

(1 − �0)Bias(yt,FR) = � + ��0(1 − �) − �0

G� − m

⎛⎝�

G∑i=1

�i + G

�−1∑j=0

�j

⎞⎠+ G�(1 − �)(1 − ��0) − m

(G� − m)m

(m−1∑k=0

�∑ki=0 r1i+1 + m��−1

)

+ �0

G� − m

(m∑

k=1

�∑ki=1 r1i

+ m�0

). (16)

The biases of the MR1 and MR2 are obtained by letting � = 0 and 1 in (16), respectively.Note that �0 in (16) becomes � when a variable of interest is regulated by a first-order autoregression with a parameter

�. When � is close to one, the bias in (16) approaches infinity. This danger is recognized by Fuller and Rao (2001) andBell (2001). This will be further discussed in Section 5. Since the bias of yt,FR (16) is free from time t and thus remainsthe same for all time, any change estimator yt,FR − yt−k,FR for k > 0 must be unbiased. However, this is not the casewhen a time dependent coefficient �t is used in place of �0 in Eq. (16).

5. Numerical study

We start this section by investigating the impact of correlations on the variances of the MRGE for the yearly mean andyearly mean change using three rotation designs. The three rotation designs are the 3-level 2-4-2 design (i.e., 2-4-2(3)design) in Fig. 1, the 3-level 4-in-then-out design (i.e., 4-0(3)) in Fig. 2(a) which are balanced in three-ways, and theCEQ in Fig. 2(b) which is not a three-way balanced design.

Under the variance being 100 for a variable xtij (i.e., �2ij = 100) and no bias, Table 1 shows the variances of the

MRGE when the first-order correlation �1t = �t1 and the second-order correlation �2t = �2 · �1t for �1 = 0.4, 0.6, 0.8

604 Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610

Table 1Variances of yearly mean and yearly mean change for varying �1 and �2 and for three different designs

Var(MRGE) �1 0.4 0.6 0.8

�2 0.0 0.3 0.6 0.0 0.3 0.6 0.0 0.3 0.6

St,12 4-0(3) 1.36 1.73 2.11 1.95 2.87 3.78 2.86 5.13 7.39(yearly mean) 2-4-2(3) 1.25 1.64 2.02 1.70 2.66 3.62 2.41 4.78 7.15

CEQ 1.36 1.74 2.12 1.95 2.89 3.81 2.86 5.19 7.48

St,12 − St−12,12 4-0(3) 2.62 3.33 4.05 3.61 5.27 6.92 4.75 8.08 11.40(yearly mean change) 2-4-2(3) 2.42 3.16 3.90 3.17 4.90 6.63 4.00 7.50 10.99

CEQ 2.62 3.34 4.06 3.61 5.31 6.98 4.86 8.42 11.61

and �2 = 0, 0.3, 0.6. The MRGE is obtained by minimizing 0.7 V ar(St,12) + 0.3 V ar(St,12 − St−12,12) where St,12is the yearly mean and St,12 − St−12,12 is the yearly mean change. The variances of both yearly mean and yearlymean change increase as �1 or �2 increases regardless of designs. Thus, ignoring the second-order correlation �2underestimates the variances of St,12 and St,12 − St−12,12. This underestimation becomes more serious as the first-ordercorrelation �1 becomes larger. This is the reason why we should include the second-order correlation in calculating thevariance.

Table 1 also shows that the 2-4-2(3) design is better than both 4-0(3) design and CEQ for all �1 and �2. Compared withthe 4-0(3) design and CEQ, the relative efficiency of the 2-4-2(3) design increases as �1 increases, while it decreasesas �2 increases. This means that the design with positive out-sample months (i.e., r2i > 0 and we call it design gap) isgenerally better than the design without design gap. However, this is true only when there is no external-telescoping(i.e., tendency of drawing events into the reference period while the events actually occurred during the design gap).Since the external-telescoping is not avoidable in a multi-level rotation design, a design with design gap needs boundingto prevent the external-telescoping. Thus, one needs to be careful in choosing a rotation design to control the cost inbounding when we adopt a design gap principle.

Table 1 also shows that the 4-0(3) design is slightly better than the CEQ. It indicates that the 4-0(3) design can bea good alternative to the CEQ because the rotation plan of the 4-0(3) design does not vary with survey time (i.e., atime-invariant rotation plan) while the CEQ has a time-dependent rotation plan as described in Section 2.

We compare the MRWE to the MRGE, the AK composite estimator, and the regression composite estimators (i.e.,MR1, MR2, and Fuller and Rao’s estimator (FR)). Because any sample unit in the r11 − · · ·− r2,m−1 − r1m(�) rotationdesign stays in the sample for �G months, the MRWE is defined with �G months window (i.e., � = �G).

A comparative study is conducted under four different situations. The first situation is the exponential decayingcorrelations of �1t =�t

1 and �2t =�2 ·�1t for �1 =0.6, 0.8, 0.9 and �2 =0.0, 0.3 with �2ij =100 and no bias in the 4-0(3)

design (Table 2). The second situation is the same as the first situation except that we use the 12-0(1) design to evaluatethe level effect by comparing with the first situation in Table 2 (Table 3). The two designs have the same overlappingpercentage between different months and the same number of monthly sample, but have different correlation structures.The third situation concerns how a small value of bias impacts on the efficiency of the six estimators when the firstcorrelation is close to one (i.e., �1 = 0.8, 0.9, 0.95) (Table 4). The last situation uses real values of correlations andbiases from the data in the 6-0(1) design (i.e., Canadian Labour Force) (Table 5). To determine the coefficients ofMRGE and MRWE as well as the � of the FR, we minimize 1

4 (MSE of monthly level + MSE of monthly change +MSE of yearly mean + MSE of yearly mean change).

Table 2 shows that the MRWE is as good as the MRGE for all characteristics. These two estimators are generallybetter than the remaining four other estimators except the monthly change for which the MR2 is barely better. We alsoobserve that even a small value of the second-order correlation (0.3 in this example) increases variances more thanwe expected. This increase can be partially explained as follows. Although the 12-0(1) design in Table 3 has the sameoverlapping percentage as the 4-0(3) design in Table 2, the two designs have totally different correlation structures. Forexample, the second-order correlations occur in only one pair of sample units between any two consecutive monthsin the 12-0(1) design while the correlations occur in the 25 pairs of the sample units in the 4-0(3) design between theconsecutive two months. As a result, the influence of the second-order correlation on variances in the 4-0(3) design(Table 2) is much more than the influence in the 12-0(1) (Table 3). This implies that, as the second-order correlation

Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610 605

Table 2Variances of the six estimators in the 4-0(3) design with no bias

Characteristics �1 �2 MRGE MRWE AK MR1 MR2 FR

Monthly level 0.6 0.0 8.02 8.03 8.21 8.01 8.38 8.010.3 11.09 11.11 11.30 11.11 11.43 11.11

0.8 0.0 7.47 7.49 7.50 7.51 8.46 7.490.3 11.71 11.74 11.75 11.80 12.55 11.78

0.9 0.0 6.68 6.75 6.98 6.95 8.60 6.890.3 11.69 11.74 11.88 12.64 13.86 12.59

Monthly change 0.6 0.0 7.13 7.12 7.04 7.21 7.02 7.160.3 9.46 9.47 9.41 9.57 9.39 9.52

0.8 0.0 3.82 3.82 3.70 3.98 3.57 3.890.3 5.31 5.36 5.26 5.54 5.16 5.45

0.9 0.0 2.09 2.13 2.06 2.36 1.80 2.250.3 2.91 2.97 2.90 3.36 2.83 3.25

Yearly sum 0.6 0.0 2.00 1.99 2.20 2.00 2.26 2.020.3 2.91 2.91 3.09 2.92 3.14 2.93

0.8 0.0 3.02 3.01 3.16 3.05 4.04 3.100.3 5.25 5.25 5.36 5.29 6.08 5.33

0.9 0.0 3.66 3.65 3.88 3.78 5.83 3.860.3 7.33 7.32 7.47 7.91 9.46 7.97

Yearly sum change 0.6 0.0 3.70 3.69 3.99 3.70 4.13 3.720.3 5.34 5.33 5.61 5.35 5.71 5.37

0.8 0.0 4.99 4.95 5.22 4.97 6.16 5.000.3 8.26 8.24 8.44 8.26 9.22 8.29

0.9 0.0 4.96 5.06 5.85 4.98 6.25 4.930.3 8.72 8.65 9.21 9.11 10.09 9.08

increases, the variance in multi-level rotation designs increases much more than the variance in one-level rotationdesigns does.

Table 3 also shows that the MR1 is comparable to the MRGE and MRWE for monthly level and that MR2 is betterthan the MRGE and MRWE for monthly change. However, this is not the case under the presence of biases (see Tables4 and 5). The MR1 is better than the MR2 for monthly level while the MR1 is worse than the MR2 for monthly changeas previously indicated (Singh et al., 2001, Fuller and Rao, 2001). Furthermore, we observe that MR1 is better thanMR2 for the aggregate characteristics (i.e., yearly mean and yearly mean change).

In Table 4, we examine the drift problem (see Fuller and Rao, 2001 for the definition) with the high first-ordercorrelation �1 = 0.8, 0.9, and 0.95 and with arbitrary recall biases �0 = 3, �1 = 2, and �2 = −2 which are very smallrelative to the variance �2

ij = 100. Table 4 shows the MSEs of the six estimators under this situation. Compared to theMSEs in Table 2 where no bias is assumed, the MSEs of the MRGE and MRWE in Table 4 are marginally increased foreach characteristic. For the monthly level and the yearly mean, Table 4 also shows that the MSEs of MR1, MR2, andFR are sharply increased as the first-order correlation �1 increases, compared to corresponding MSEs in Table 2. Thisconfirms that the regression composite estimators suffer from the drift problem even for a small bias as the first-ordercorrelation approaches 1.

Table 5 provides the MSEs of the six estimators for the Employed Agriculture Data from the Canadian Labour ForceSurvey. The Employed Agriculture Data has high first-order correlations (i.e., �11 = 0.955, �12 = 0.926, �13 = 0.901,. . .). In Table 5, we use the same estimates of the first-order correlations, second-order correlations, and time-in-samplebiases as those in Kumar and Lee (1983) and Park et al., (2001). The MSEs of the MRWE are smaller than those ofthe MR1 in all characteristics while the MSEs of the MRWE are slightly larger than those of AK, MR2, and FR in themonthly change and yearly mean change. However, the MSEs of the MRWE are much smaller than those of AK, MR2,and FR in the monthly level and yearly mean. Thus, the MRWE is more stable than the other estimates. Because of thehigh first-order correlations and the existence of the time-in-sample biases, the MR2 is almost useless for the monthlylevel and yearly mean due to their extremely large MSEs.

606 Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610

Table 3Variances of the six estimators in the 12-0(1) design with no bias

Characteristics �1 �2 MRGE MRWE AK MR1 MR2 FR

Monthly level 0.6 0.0 8.02 8.03 8.21 8.01 8.38 8.020.3 8.11 8.13 8.32 8.14 8.43 8.13

0.8 0.0 7.47 7.49 7.50 7.51 8.46 7.500.3 7.77 7.84 7.78 7.93 8.57 7.87

0.9 0.0 6.68 6.75 6.98 6.95 8.60 6.880.3 7.27 7.39 7.41 7.91 8.94 7.77

Monthly change 0.6 0.0 7.13 7.12 7.04 7.21 7.02 7.110.3 7.07 7.08 7.02 7.17 7.00 7.08

0.8 0.0 3.82 3.82 3.70 3.98 3.57 3.810.3 3.72 3.78 3.67 3.96 3.56 3.79

0.9 0.0 2.09 2.13 2.06 2.36 1.80 2.150.3 2.00 2.10 2.00 2.40 1.84 2.19

Yearly sum 0.6 0.0 2.00 1.99 2.20 2.00 2.26 2.040.3 2.07 2.07 2.26 2.08 2.29 2.11

0.8 0.0 3.02 3.01 3.16 3.05 4.04 3.160.3 3.33 3.31 3.40 3.38 4.12 3.45

0.9 0.0 3.66 3.65 3.88 3.78 5.83 3.960.3 4.31 4.26 4.39 4.61 6.08 4.71

Yearly sum change 0.6 0.0 3.70 3.69 3.99 3.70 4.13 3.750.3 3.83 3.81 4.09 3.83 4.18 3.87

0.8 0.0 4.99 4.95 5.22 4.97 6.16 5.060.3 5.36 5.33 5.49 5.39 6.27 5.43

0.9 0.0 4.96 5.06 5.85 4.98 6.25 4.930.3 5.53 5.44 5.94 5.62 6.50 5.54

Table 4MSE of the six estimators in the 4-0(3) design with negligibly small recall biases �0 = 3, �1 = 2, and �2 = −2

Characteristics �1 �2 MRGE MRWE AK MR1 MR2 FR

Monthly level 0.80 0.0 7.57 7.56 10.38 10.49 16.40 10.870.3 11.80 11.81 14.63 14.79 20.49 15.16

0.90 0.0 6.75 6.81 9.86 13.90 34.52 15.200.3 11.76 11.80 14.76 19.59 39.78 20.89

0.95 0.0 5.73 6.06 9.53 26.33 101.63 31.110.3 11.58 11.54 14.79 47.11 121.60 51.89

Monthly change 0.80 0.0 3.88 3.84 3.70 3.98 3.57 3.890.3 5.37 5.38 5.26 5.54 5.16 5.45

0.90 0.0 2.12 2.14 2.06 2.36 1.80 2.250.3 2.95 2.99 2.90 3.36 2.83 3.25

0.95 0.0 1.17 1.28 1.25 1.56 0.90 1.430.3 1.70 1.72 1.65 3.52 2.89 3.40

Yearly sum 0.80 0.0 3.04 3.04 6.04 6.04 11.99 6.470.3 5.27 5.28 8.24 8.28 14.02 8.70

0.90 0.0 3.68 3.67 6.75 10.73 31.75 12.160.3 7.34 7.34 10.35 14.86 35.38 16.27

0.95 0.0 3.83 3.86 7.20 24.00 100.06 28.970.3 8.69 8.66 11.84 41.41 116.52 46.33

Yearly sum change 0.80 0.0 5.04 5.02 5.22 4.97 6.16 5.000.3 8.31 8.30 8.44 8.26 9.22 8.29

0.90 0.0 5.00 5.12 5.85 4.98 6.25 4.930.3 8.76 8.71 9.21 9.11 10.09 9.08

0.95 0.0 4.07 4.53 6.11 4.27 4.68 4.080.3 7.25 7.08 8.17 15.09 15.33 14.94

Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610 607

Table 5MSEs of the six estimators for employed agriculture in Canadian labour force survey

Characteristics MRGE MRWE AK MR1 MR2 FR

6-0(1) Monthly level 347.45 361.56 1293.30 434.06 24045.13 884.58Monthly change 177.70 182.52 174.34 210.53 159.55 158.40Yearly mean 173.63 176.39 1120.80 246.40 23878.37 706.11Yearly mean change 243.92 272.33 264.88 249.03 327.93 250.13

6. Concluding remarks

In this paper, we contribute the three new major points to the rotation sampling design. First, we present the three-waybalanced multi-level rotation design in which the variance and bias of a composite estimator remains the same overthe survey months. Second, we derive the MSE of the GCE, incorporating the first- and second-order correlations,time-in-sample bias, and recall bias. This MSE is quite general and applicable to other composite estimators such asthe AK, MR1, MR2, and FR. Third, we propose the minimum risk � month window estimator (MRWE). The MSEof the MRWE is almost the same as that of the MRGE, the smallest MSE. However, MRWE is easy to calculate andpreserves the internal consistency in total. The MRWE can also be used for the monthly balanced design which includesthe three-way balanced design as a special case. Our numerical examples show that the MRWE is generally better thanother estimators such as the AK, MR1, MR2, and FR.

Although we present a general class of rotation designs which include most of currently used designs, some otherrotation designs are still left out from our current study such as the US Survey of Income and Program Participation(David, 1985) and Monthly Retail Trade Survey (Huang, 1984). These special designs neither have balanced monthlysample nor a time-invariant rotation plan. Thus, we will extend our future work to include those designs.

7. Appendix

Proof of Theorem 2. Assume that an �-level rotation design is balanced in three-ways and time-invariant. Then thereexists the unique integer mk , 1�mk �G, satisfying

gt (mk) = gt+k+1(1), (A.1)

where k=0, 1, . . . , G−1. Observe that, from (A.1), mG−1 =1 by (iii) of Definition 2.1 and thus gt (m0)=gt+1(mG−1).Since time-invariant property and (A.1) imply gt+1(mk−1) = gt+1+k(1) for k = 1, 2, . . . , G − 1, we have gt+k+1(1) =gt (mk) = gt+1(mk−1) for k = 1, 2, . . . , G − 1 which is the second claim.

Now, we show that the mk’s satisfying (A.1) also satisfy

modG

{mk + k + (mk − 1)(� − 1) +

m−1∑i=1

r2iI[mk>∑i

�=1 r1�]

}= 0. (A.2)

Let mk =∑n1k

i=0 r1i +n2k w.l.o.g where 0�n1k �m−1 and n2k =1, 2, . . ., r1,n1k+1 for all k. Then, since each sampleunit returns to the sample for every �th month in an �-level rotation design, (A.1) implies that

gt (mk) = gt+k+1(1) = gt+k+1+�(2) = · · · = gt+k+1+(r11−1)�(r11)

= gt+k+1+(r11−1)�+r21+�(r11 + 1) = · · · = gt+k+1+(r11−1)�+r21+r12�(r11 + r12).

Repeat this procedure until we have

gt (mk) = gt+k+1+�

∑n1k+1i=0 r1i+∑n1k

i=0 r2i−�

(n1k+1∑i=0

r1i

), (A.3)

where r20 = 0.

608 Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610

Similarly, it can be also shown that

gt (mk) = gt

(n1k∑i=0

r1i + n2k

)= gt+�

(n1k∑i=0

r1i + n2k + 1

)

= gt+2�

(n1k∑i=0

r1i + n2k + 2

)= · · · = gt+(r1,n1k+1−n2k)�

(n1k+1∑i=0

r1i

). (A.4)

Since the same interview time is repeated for every G months from (iii) of Definition 2.1, (A.3) and (A.4) indicatethat t + k + 1 + �

∑n1k+1i=0 r1i +∑n1k

i=0 r2i − � − (t + (r1,n1k+1 − n2k)�) is a multiple of G or equivalently,

modG

{(n1k∑i=0

r1i + n2k

)� +

n1k∑i=0

r2i − � + k + 1

}= 0. (A.5)

Finally, since mk =∑n1k

i=0 r1i + n2k and n2k = 1, . . . , r1,n1k+1, n1k =∑m−1i=1 I[mk>

∑i�=1r1�]. This implies that

∑n1k

i=0 r2i

is equal to∑m−1

i=1 r2iI[mk>∑i

�=1 r1�]. Thus, we have (A.2).

We show the sufficiency for three-way balancing. The uniqueness of mk and mod operator imply that{m0, m1, . . . , mG−1} is a permutation of G integers from 1 to G. Thus, since gt (mk) �= gt (m

′k) for mk �= m′

k ,monthly sample at month t contains all G rotation groups and interview times. This is also true for t + 1 sincegt (mk) = gt+1(mk−1). Hence (i) and (ii) of Definition 2.1 for recall time j = 0 hold by mathematical induction. By therelation of gt (mk) = gt+1(mk−1), we have

gt (mk) = gt+1(mk−1) = gt+2(mk−2) = · · · = gt+k(m0) = gt+k+1(mG−1)

= · · · = gt+G−1(mk+1) = gt+G(mk).

This shows (iii).To show the conditions (i)–(iii) for other levels of recall j = 1, 2, . . ., � − 1, let g

(j)t (i) be the rotation group with

interview time i and recall time j at month t . Following this notation, observe that gt (mk) ≡ g(0)t (mk). Since each

rotation group reports the information of the current month as well as � − 1 previous months in an �-level rotationdesign, the following is immediate: for each i

g(j)t (i) = gt+j (i) for j = 0, 1, . . . , � − 1.

This implies that conditions (i)–(iii) of Definition 2.1 hold for j = 1, 2, . . . , � − 1 once when they hold for j = 0. Thiscompletes the proof. �

Lemma A.1 (for Vk = Cov(Xt ,Xt−k)). Define a G × G matrix (Lt1)p,q = 1 if t∗p = t∗q − t for p�q and 0 otherwise,

where t∗p=(p−1)l+∑m−1j=1 r2j I[p>

∑j

�=1 r1�] for p=1, 2, . . . ,∑m

i=1 r1i . Also, define another G×G matrix Lt2=Lt −Lt

1

where Lt = Lt−1 · L with

(L)p,q =

⎧⎪⎨⎪⎩1 if (p, q) ∈ {(mk, mk−1); k = 1, 2, . . . , G − 1}

or (p, q) = (m0, mG−1),

0 otherwise.

Let xtj = (xt1j , xt2j , . . . , xtGj )′ for j = 0, 1, . . . , � − 1. Using the two identification matrices of L1 and L2, we have

the following lemma.

Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610 609

Lemma A.1. Suppose that �-level rotation design is balanced in three-ways. Then, under the covariance structure (3),the G × G covariance matrix is given by

Cov(xtj , xt+t1,j1) = �1t1 jL

t1−j+j11 j1 + �2t1

jLt1−j+j12 j1 , (A.6)

where j, j1 = 0, 1, . . . , � − 1 and j = diag(�1j , �2j , . . . , �Gj ).

Proof. First we show that the Lt11 and L

t12 matrices completely identify two sample units at month t and t + t1.

The r11 − · · · − r2,m−1 − r1m(�) rotation plan implies that a sample unit introduced for the first time at month t isinterviewed for the pth time at month t + t∗p where t∗p = (p − 1)l +∑m−1

j=1 r2j I[p>∑j

�=1 r1�] for p = 1, 2, . . . ,∑m

i=1 r1i .

That is, the sample unit interviewed for pth time at month t is the same unit as the sample unit interviewed for qthtimes at month t + t1 for t1 �0 only if t∗q = t∗p + t1. This relationship can be expressed by the G × G matrix of L

t11 in

which (Lt11 )p,q = 1 if t∗p = t∗q − t1 for p�q and 0 otherwise.

Because the mk’s satisfy the relation of gt (mk) = gt+1(mk−1) from Theorem 2.1, the relation can be expressed asgt+1 = Lgt where gt = (gt (1), . . . , gt (G))′. That is, the two rotation groups interviewed for the ith and j th times atmonths t and t + 1 are the same when (L)i,j = 1. In general, gt (i) = gt+t1(j) if (Lt1)i,j = 1 for t1 = 1, 2, . . . sincegt+t1 = Lgt+t1−1 = · · · = Lt1 gt . Thus, the matrix Lt1 is used to identify whether or not two sample units interviewedat the two months t and t + t1 are from the same rotation group.

Since Lt12 = Lt1 − L

t11 , when there are two sample units interviewed for the ith and j th times at months t and t + t1,

respectively, they are different but from the same rotation group if (Lt12 )p,q = 1.

Therefore, the Lt11 and L

t12 matrices completely identify two sample units as follows. Assume that the two sample

units are interviewed for the pth and qth times at respective months t and t + t1. Then, they are the same if (Lt11 )p,q =1

and (Lt12 )p,q = 0 while they are two different units but from the same rotation group if (L

t11 )p,q = 0 and (L

t12 )p,q = 1.

We now prove the covariance of (A.6). Since x(j)t,i and x

(j1)t+t1,i1

are measured by the respective two sample unitsinterviewed for the ith and i1th times at months t + j and t + t1 + j1, these two sample units are the same if(L

|t+t1+j1−t−j |1 )i,i1 =(L

|t1+j1−j |1 )i,i1 = 1. Therefore, we have

Cov(x(j)t , x(j1)

t+t1) = �1t1

jL|t1+j1−j |1 j1 (A.7)

if any pair of x(j)t and x(j1)

t+t1is from the same sample unit.

Similarly, Cov(x(j)t,i , x

(j1)t+t1,i1

) = �2t1�ij�i1j1 if (L

|t1+j1−j |2 )i,i1 = 1. This gives

Cov(x(j)t , x(j1)

t+t1) = �2t1

jL|t1+j1−j |2 j1 (A.8)

if any pair of x(j)t and x(j1)

t+t1is measured from different sample units but from the same rotation group. Combining (A.7)

and (A.8), we have the desired result. �

Proof of Theorem 3. Observe that yt−1 =∑∞j=0 (�j a′Xt−1−j − �j+1b′Xt−1−j−1) by recursively solving (2). It is

also easy to show that (1−�2)Var(yt )=a′ Var(Xt )a+�2b′Var(Xt−1)b−2�b′ Cov(Xt−1,Xt )a+2� Cov(yt−1, a′Xt )−2� Cov(yt−1, �b′Xt−1). This completes the proof by Lemma A.1. �

Theorem A.2 (Variances of (a) yt − yt−t0 , (b) St,t0 , and (c) St,t0− St−t1,t0 ). Let Vt1 = Cov(Xt ,Xt+t1) for t1 �0 andBn1,n2 = ∑∞

k=0 �k+n2Vn1+k−n2 for nonnegative integers n1 �n2. We also let P1(t1) = (2(1 − �t1)/(1 − �2))(V0 +2�B1,0)−2

∑t1−1n=0 Bt1,n,P2(t1)=−(4�(1−�t1)/(1−�2))(B1,0 +B ′

1,1)+2∑t1−1

n=0 (�Bt1+1,n +B ′t1,n+1) and P3(t1)=

�2P1(t1) for t1 �1. Then,by the same approach used in Theorem 3,we have the three variances.

610 Y.S. Park et al. / Journal of Statistical Planning and Inference 137 (2007) 594–610

Theorem A.2. Under the same conditions as in Theorem 3,

(a) Var(yt − yt−t0) = a′P1(t0)a + b′P2(t0)a + b′P3(t0)b,

(b) Var(St,t0) = a′⎛⎝t2

0 Q1 −t0−1∑t1=1

(t0 − t1)P1(t1)

⎞⎠ a

+ b′⎛⎝t2

0 Q2 −t0−1∑t1=1

(t0 − t1)P2(t1)

⎞⎠ a

+ b′⎛⎝t2

0 �2Q1 −t0−1∑t1=1

(t0 − t1)P3(t1)

⎞⎠b and

(c) Var(St,t0 − St−t1,t0) = a′⎛⎝ t0−1∑

t2=−t0+1

(t0 − |t2|)P1(t1 − t2) − 2t0−1∑t2=1

(t0 − t2)P1(t2)

⎞⎠ a

+ b′⎛⎝ t0−1∑

t2=−t0+1

(t0 − |t2|)P2(t1 − t2) − 2t0−1∑t2=1

(t0 − t2)P2(t2)

⎞⎠ a

+ b′⎛⎝ t0−1∑

t2=−t0+1

(t0 − |t2|)P3(t1 − t2) − 2t0−1∑t2=1

(t0 − t2)P3(t2)

⎞⎠b.

References

Bailar, B.A., 1989. Information needs, surveys, and measurement errors. In: Panel Survey.Wiley, New York, pp. 1–24.Bell, P., 2001. Comparison of alternative labour force survey estimators. Surv. Methodol. 27, 53–63.Binder, D.A., Hidiroglou, M.A., 1988. Sampling in time. In: Handbook of Statistics, 6: Sampling.Elsevier Science, New York, pp. 187–211.Cantwell, P.J., 1990. Variance formulae for composite estimators in rotation designs. Surv. Methodol. 16, 153–163.David, M., 1985. The design and development of SIPP. J. Econom. Soc. Meas. 13, 215–224.Fuller, W.A., Rao, J.N.K., 2001. A regression composite estimator with application to the Canadian Labour Force Survey. Surv. Methodol. 27,

45–51.Gambino, J., Kennedy, B., Singh, M.P., 2001. Regression composite estimation for the Canadian Labour Force Survey: evaluation and implementation.

Surv. Methodol. 27, 65–74.Gurney, M., Daly, J.F., 1965. A multivariate approach to estimation in periodic sample surveys. Proceedings of the Survey Research Methods Section,

American Statistical Association, pp. 247–257.Huang, E.T., 1984. An imputation study for the Monthly Retail Trade Survey. Statistical Research Division Report Series, Bureau of the Census.Kalton, G., Kasprzyk, D., McMillen, D.B., 1989. Information needs, surveys, and measurement errors. In: Panel Surveys.Wiley, New York,

pp. 249–270.Kumar, S., Lee, H., 1983. Evaluation of composite estimation for the Canadian Labor Force Survey. Surv. Methodol. 9, 403–408.Lent, J., Miller, S.M., Cantwell, P.J., Duff, M., 1999. Effect of composite weights on some estimates from the Current Population. J. Official Statist.

15, 431–448.McLaren, C.H., Steel, D.G., 2000. The impact of different rotation patterns on the sampling variance of seasonally adjusted and trend estimates.

Surv. Methodol. 26, 163–172.Park,Y.S., Kim, K.W., Choi, J.W., 2001. One-level rotation design balanced on time in monthly sample and in rotation group. J. Amer. Statist. Assoc.

96, 1483–1496.Särndal, C.E., 1980. On �-inverse weighting versus best linear unbiased weighting in probability sampling. Biometrika 67, 639–650.Silberstein, A.R., Jacobs, C.A., 1989. Symptoms of repeated interview effects in the consumer expenditure interview survey. In: Panel Surveys.Wiley,

New York, pp. 289–303.Singh, A.C., 1996. Combining information in survey sampling by modified regression. Proceedings of the Survey Research Methods Section,

American Statistical Association, pp. 120–129.Singh, A.C., Kennedy, B., Wu, S., 2001. Regression composite estimation for the Canadian Labour Force Survey with a rotating panel design. Surv.

Methodol. 27, 33–44.Steel, D.G., McLaren, C.H., 2000. The effect of different rotation patterns on the revisions of trend estimates. J. Official Statist. 16, 61–76.