a multilevel structural equation model for dyadic dataweb.pdx.edu/~newsomj/dyadic3.doc · web...

Dyadic Data 1

A Multilevel Structural Equation Model for Dyadic Data

Jason T. Newsom

Portland State University

RUNNING HEAD: Multilevel SEM for Dyadic Data

Submitted for publication.

Draft: 8/2/01

Address correspondence to Jason T. Newsom, Ph.D., Institute on Aging, School of Community

Health, Portland State University, P.O. Box 751, Portland, OR 97207-0751, email:

[email protected]. Partial support for preparation of this article was provided by AG 15159

and AG14130 from the National Institute on Aging. The author wishes to thank Joop Hox and

David Morgan for helpful discussions and Patrick Curran and Tor Neilands for comments on an

earlier draft.

mailto:[email protected]

Dyadic Data 2


Abstract

Dyadic data involving couples, twins, or parent-child pairs are common in the social sciences,

but available statistical approaches are limited in the types of hypotheses that can be tested with

dyadic data. A novel structural modeling approach, based on latent growth curve model

specifications, is proposed for use with dyadic data. The approach allows researchers to test

more sophisticated causal models, incorporate latent variables, and estimate more complex error

structures than is currently possible using hierarchical linear modeling or multilevel structural

equation models. A brief introduction to multilevel regression and latent growth curve models

is given, and the equivalence of the statistical model for nested and longitudinal data is

explained. Possible expansion of the strategy for application with small groups and with

unbalanced data is briefly discussed.

Running Head: MULTILEVEL SEM FOR DYADIC DATA

Dyadic Data 3

Social scientists commonly study dyadic data, as found in research on marital couples

(Raudenbush, Brennen, & Barnett, 1995), twins (Kendler, Karkowski, & Prescott, 1999),

employees and supervisors (Fleenor, McCauley, & Brutus, 1996), or doctors and patients

(Goldberg, Cohen, & Rubin, 1998). When data are collected from both individuals of the dyad,

scores from each individual in a dyad are usually dependent. Data from each member of the

dyad cannot be treated as independent observations without the underestimation of standard

errors, resulting in increased Type I error. In addition, many measures, such as household

income, are measured at the dyadic level rather than the individual level. Other variables,

although measured at the individual level, partially reflect experiences or circumstances that are

common to both members of the couple (e.g., marital quality), and, thus, containing individual-

level and group-level components. For these reasons, dyadic data can be considered

"hierarchically structured."

Multilevel regression (MLR; also known as hierarchical linear models or HLM; Bryk &

Raudenbush, 1992; Kreft & de Leeuw, 1998; Snijders & Bosker, 1999) and multilevel structural

equation models (multilevel SEM: Muthen,1989,1994; Muthen & Satorra, 1989) are two

approaches to analyzing hierarchically structured data in which individuals are nested within

groups.1 MLR can be conceptualized as a two-level regression model, with the level-1 model

represented by the usual regression equation generated for each group. The level-2 model

involves prediction of intercept and slope coefficients obtained from the level-1 models. These

models are now commonly used throughout the social sciences and have been applied to dyadic

data by a number of researchers (Barnett, Marshall, Raudenbush, & Brennan, 1993; Ozer,

Barnett, Brennan, & Sperling, 1998; Segal & Hershberger, 1999).

Dyadic Data 4

The multilevel SEM approach to hierarchically structured data is a recently developed

approach (e.g., Muthen, 1989, 1997a; Muthen & Satorra, 1995; McArdle, & Hamagami, 1996)

that can estimate measurement, path, or full structural models at the individual and group-level

(usually termed within and between levels, respectively). The multilevel SEM approach involves

analyzing separate covariance matrices at the within (level 1) and between level (level 2). The

ability to estimate measurement error is an important advantage over regression-based methods,

but the approach is limited to separate models at the within and between levels and hypotheses

involving causal relations between the two levels cannot be tested. Moreover, slope variability

across groups cannot be estimated with the technique (Kaplan, 2000). Multilevel SEM can also

be cumbersome to implement because separate within and between covariance matrices must be

estimated and read into an SEM software package. The within and between covariance matrices

are then tested using a multigroup structural model with special model specifications (Muthen,

1997a; McArdle, & Hamagami, 1996). Although matrix estimation has been automated and

model estimation has been simplified with Mplus software (Muthen & Muthen, 2001), other

software packages require more laborious procedures to estimate multilevel models.

Growth curve analysis, another common application of MLR, is statistically equivalent to

MLR with hierarchically structured data, because repeated measures are nested within

individuals (Bryk & Raudenbush, 1992). Although latent growth curve analysis has an

analogous statistical relationship to multilevel SEM, the actual implementation or model

specification of the two types of models is quite different. The latent growth curve specification

does not have the same limitations as multilevel SEM in that there are no restrictions on

relationships between level-1 and level-2 variables and it is convenient to implement in any

software package.

Dyadic Data 5

The statistical equivalence of the multilevel and growth curve models allows for the

possibility of using a latent growth curve approach to model specification with certain types of

nested data. Based on this rationale, I describe below a novel approach to the analysis of dyadic

data that employs a latent growth curve model specification. The approach has advantages over

the MLR approach to dyadic data without the limitations of multilevel SEMs, because latent

variables can be used, more complex error structures can be estimated, more sophisticated causal

hypotheses can be tested, and model specification is convenient with any SEM package.

I begin by giving a brief overview of latent growth models and hierarchical linear

models. A general familiarity with structural equation models is assumed, but familiarity with

multilevel regression models is not assumed. After illustrating the equivalence of MLR growth

models and latent growth models, I describe a new approach to analysis of dyadic data using a

growth curve formulation. Finally, I discuss some possibilities for how this approach may be

expanded for use with small groups and in situations in which group sizes are not equal.

Latent Growth Curve Models

Both MLR and SEM can be used to analyze longitudinal data by estimating individual growth

curves. The statistical models for growth curve analysis and hierarchical analysis are identical,

because repeated measures can be considered to be nested within individuals (Bryk &

Raudenbush, 1992). Thus, growth curve models, in general, are a special case of the multilevel

model. The SEM approach, called latent growth curve modeling (McArdle, 1988; Meredith &

Tisak, 1990), has advantages compared to multilevel regression because of the ability to estimate

measurement error when multiple indicators are used and the ability to specify complex error

structures (Willet, 1994). Although, statistically, multilevel structural models and latent growth

curve models are identical, their implementation with structural modeling packages is quite

Dyadic Data 6

different. Implementation of the latent growth model does not require estimation of separate

covariance matrices nor does it require a multigroup structural model. Latent growth models

estimate latent intercepts and slopes representing an individuals initial level and change in the

dependent variable over time. The variability in intercepts or slopes across individuals and the

factors that explain the variability are often of interest to researchers. Most importantly, latent

growth curve models afford considerable flexibility for researchers, because the slopes and

intercepts can be incorporated into more complex models as predictors or outcomes.

Latent growth curve models estimate individual growth curves or "trajectories" by using

repeated measures as indicators of two latent variables, an intercept variable (0) and a slope

variable (1). The interpretation of the intercept variable depends on how the loadings on the

slope factor are fixed. For instance, one approach to defining the slope variable is to fix loadings

to values 0, 1, 2, 3, 4, … , t-1, sometimes referred to as "time codes." In this case, the intercept

latent variable represents the initial value, because the first loading on 1 is set to 0. One can

also “center “ these loadings by setting the middle time point to 0 (e.g., -3.,-2,-1,0,1,2,3), giving

the intercept factor the average value of y at the middle time point, t0. Figure 1 illustrates a

simple growth curve model with four time points.

Algebraically, the latent growth curve model is represented by the following formulas.

For simplicity, the Lisrel "all-y" notation is used throughout (Hayduk, 1987):

level-1 equation (measurement model):

Dyadic Data 7

level-2 equations (structural model):

In equation , yti is the dependent variable. The subscripts i and t indicate a measurement

within an individual, i, for each time point, t. is a latent variable that represents the level-1

intercept, is a latent variable that represents the relationship between the time code and the

dependent variable (i.e., the growth trajectory), ti are the loadings for each time point on the

intercept latent variable (0) and the slope latent variable (1). The measurement intercept, ,

associated with each loading ( matrix) is assumed to be zero as it is in most traditional structural

models and in latent growth curve models (e.g., Muthen, 1997b; Willet & Sayer, 1994), and

therefore is not shown above. To simplify, no level-2 predictors are presented in or , but

predictors of the intercepts or slopes could be included. In the level-2 equations, 0 and 1 are

the intercepts or the average value of 0 and 1 for all individuals. 0 and 1 are residuals. The

variance for residuals is found in the matrix.

More traditionally, the structural model is represented by grouping each variable into

matrices:

y

In equations and , is a 2 X t matrix representing the relationship between 2 latent

variables, 0 and 1, and t indicators, one for each time point. The first column in which

corresponds to 0, is comprised of all 1s, because each loading for this variable is set equal to 1

to define it as the intercept. is the matrix of residual errors for each indicator at each time

point. is a 2 X 1 vector containing latent means for the intercept and slope, representing the

Dyadic Data 8

average intercept across individuals and the average slope (i.e., trajectory) across individuals,

respectively. is a matrix of error terms (i.e., the elements) and provides information about

the variances of the intercepts and slopes, 0 and 1, and their covariances. The variance of the

intercepts and slopes, 0 and 1, are obtained by estimation of the matrix.

Although the formulae will not be presented here, it is possible within the latent growth curve

framework, to use multiple indicators of each construct at each time point (e.g., McArdle, 1988;

Duncan, Duncan, Strycker, Li, & Alpert, 1999). In this formulation, latent variables at each time

point are the used as indicators of second-order latent intercept and slope variables. The

resulting model accounts for residual variation at each time point that is not accounted for by the

growth parameters and measurement error. In addition, correlated error structures across time

points are possible.

Multilevel Regression Models

Multilevel regression models (or hierarchical linear models) estimate predictive

relationships when the data are nested or hierarchically structured, as in the case of students

nested within schools. The statistical model used for hierarchically structured data is the same

statistical model used for longitudinal analysis of individual growth curves. With growth curve

models, longitudinal data measurements are considered to be nested within individuals. In

general, a multilevel regression with a single level-1 predictor and no level-2 predictors can be

written with two sets of equations:

Level-1 equation:

Level-2 equations:

Dyadic Data 9

Equation is the familiar regression equation, with r representing error or unexplained

variance. The subscripts i and g indicate whether the value is for each individual or each group.

In equation , the intercept values for each group serve as the dependent variable. For simplicity

sake, there are no predictors in equation or . In , 00 is the intercept (mean of all group

intercepts), and u0 is the error or remaining variance. The variance of across the groups gives

an estimate of the variability of the intercept values, , across groups. Because the intercepts

represent adjusted means for each group (i.e., adjusting or controlling for the effects of xi),

is the variance of the adjusted means for each group. In MLR texts, is

typically referred to as .

In the third equation, , the slope estimates, 1g, obtained from the level-1 regression model for

each group serve as values of the dependent variable. 10 is the intercept in this equation, and

represents the average of all slopes, 1g, interpreted as the average effect of xi on the dependent

variable across all groups. u1 is the error term and its variance, , represents the

variability of the slopes across groups (i.e., the variability in the relationship between x and y

across the groups). is customarily referred to as . One can also examine the

covariances or the correlations between the slopes and intercepts as they covary across groups,

.

By substituting equations and into equation , the MLR model can be expressed as a single

regression equation,

Dyadic Data 10

or, by rearranging the terms,

If growth curve models are tested, the level-1 x-variable is replaced by time codes, xt (e.g.,

0, 1, 2, 3, . . ., t-1). The dependent variable at each time point is regressed on the time code at

level-1. Instead of individuals nested within groups, repeated measures are nested within

individuals. In other words, level 2 consists of individuals rather than groups. In growth models,

(i.e., the ) represents the variability of the initial or baseline value of y across

individuals, , or , is the variation of the growth across individuals, and , or

, is the covariation of the initial value and the growth in y across individuals.

Comparing the SEM and MLR growth models

With a single measure at each time point, the two approaches to growth models are essentially

identical. The parallels between the SEM and MLR approaches can be seen by comparing their

algebraic formulas (see Table 1).

These formulas are fully equivalent, although this may not be apparent at first glance. In

SEM, loadings are used in place of level-1 regression coefficients. In equation , the level-1

intercept is represented by the product term ti0i, which refers to the loadings for latent intercept

variable and the intercept variable itself. The matrix is analogous to the X matrix in matrix

regression in which the first column is a vector of 1's used to produce the intercept. By setting

the loadings for the intercept (0i) to 1, the product of the loadings and the intercept (ti0i ) of is

simply equivalent to the intercept term, 0, of equation . The next term in equation , ti1i,

representing the slope factor, can also be considered identical to the slope in equation as long as

Dyadic Data 11

the loadings in the matrix are set to values that would be used as predictors in growth curve

analysis, such as 0, 1, 2, 3, . . . t-1. Here, ti in is equivalent to xt in . Because 's are equivalent

to x's and the 's are equivalent to 's, it would make more sense to re-express equation as,

By estimating the means and variances of 0i and 1i , we can obtain estimates of the average

latent intercept and average latent slope and the extent to which they vary across individuals.


Because growth models and two-level hierarchical regression models are identical

statistical models, as illustrated above, it is possible to specify a multilevel SEM for certain

hierarchical data situations that use the same model specifications as those used in latent growth

models. I start by describing the data requirements necessary with dyadic data (e.g., couples,

twins, mother-child dyads). I then describe two model specification options, a single-indicator

model and a second-order multiple indicator model, giving an example of each. The second-

order multiple indicator model is then used in an example illustrating the use of a level-2 (dyad-

level) predictor.

Data characteristics

At minimum, a single dependent measure obtained from each member of the dyad is

needed.2 Multiple indicators for each individual can also be used, with a minimum of three

Dyadic Data 12

indicators for each member of the dyad. Dyads will be assumed to be non-exchangable. That is,

there is a basis for distinguishing members of each couple in an identical manner in all groups.

Examples might include husbands and wives, mother and child, first born and second born, or

caregiver and care recipient. The data set should be configured in the so-called "repeated

measures" format, in which each case in the data matrix contains information about both

members of the dyad. For example, each record contains information about the husband and the

wife, recorded under different variable names (e.g., y1h, y2h, y3h, y4w, y5w, y6w). This

configuration is analogous to that used for latent growth curve analysis.

Example data set

To illustrate, I will use an example from a study I conducted recently examining

interactions between spousal caregivers and care recipients. There are 116 couples (232

individuals), in which each member of the couple was interviewed separately. I examine five

items from the Veit and Ware (1983) positive affect subscale of the Mental Health Inventory.

Items such as "How much of the time have you felt the future look hopeful and promising?" were

rated on a 6-point scale of frequency of occurrence. Thus the analysis is based on 10 variables—

5 items for caregivers and 5 items for care recipients. In the single-indicator model, illustrated

first, the five items are averaged for caregivers and for care recipients.

Single-indicator models

Intercept-only model. I first take the simplest case in which there is only one measure (i.e.,

indicator) of positive affect for caregivers and for care recipients (the measure was computed by

averaging the five items for each). This model specification follows that of the growth curve

model described above in the case in which there is only two time points tested and is depicted in

Figure 2. The basic model, which I will call the intercept-only model, includes no level-1 or

Dyadic Data 13

level-2 predictors. At level 2, there is only one equation, because there is no slope obtained from

level 1. Using the multilevel regression notation, this is the model given by the following

separate equations:

Level-1 equation:

Level-2 equation:

This model can be shown to be equal to a random effects ANOVA (Raudenbush, 1993), in

which 0 is the mean score for each dyad and the estimate of the variance of the residual, rig, is

the within-group variation, usually designated by 2. 00, the level-2 intercept, represents the

average of the dyad means and the estimate of the variance of u0g, known as 00, is the between

group variation of the means.

Using the equivalent structural modeling notation, these equations would be:

Level-1 equation:

Level-2 equation:

One can calculate the ratio of between-dyad variation relative to the total

variation using the intraclass correlation coefficient. The intraclass correlation provides

information about the degree to which dyad members have similar scores on the

dependent variable. In HLM notation, intraclass correlation coefficient is expressed as a

ratio of between to within plus between variation:

Dyadic Data 14

Using the SEM approach, the measurement residual represents the within-dyad variation

and the variance of the latent intercept, 0, represents the between group variation. So, the

intraclass correlation coefficient is given by:

where 00 represents the variance of 0.

Results for the intercept-only model. Parallel analyses were conducted using Mplus

(Version 2, Muthen & Muthen, 2001) and HLM 5 (Raudenbush, Bryk, Cheong, & Congdon,

2000). The results reported in Table 2 indicate nearly identical findings with the two statistical

packages. The average positive affect score for dyads was 4.189. The between-dyad variance,

given by the random effect for the intercept using HLM, and the variance of the intercept

variable in Mplus was approximately .28 and was significantly different from zero in either case.

A significant variance indicates that the average positive affect score for each couple varies

across the dyads. The within-dyad variance was .570, indicating there was greater variation

within-dyads than between dyads. The intraclass correlation coefficient was approximately .33,

indicating couples had positive affect scores that were moderately related. The association

between caregiver and care recipient positive affect indicates that an OLS regression assuming

couples were independent cases would not be appropriate and would provide underestimates of

standard errors in statistical tests.

Difference model. A predictor at level 1 can be added by incorporating a slope variable

to represent the difference between dyad members (e.g., gender). In the context of the care

Dyadic Data 15

recipient study, a variable designating whether an individual was a caregiver or care recipient

was used. A mean slope significantly different from zero would indicate a significant difference

between caregivers and care recipients on positive affect. The difference model is depicted in

Figure 3. Two latent variables are defined: a latent intercept, 0, and a latent slope, 1,. There is

only one indicator for each of these latent variables. The intercept variable, 0, is defined by

fixing loadings on each of the two indicators to 1. The slope variable, 1, is defined by fixing

loadings on the same two indicators to 0 and 1. The average intercept and average slope across

couples (i.e., 00 and 01 in MLR notation) are obtained by estimating the mean structures of each

(not estimated by default in SEM software packages). The variances of the slopes and intercepts

across couples are indicated by the variances of 0 and 1 (i.e., the matrix).

Results for the difference model. As above, parallel analyses were conducted using Mplus and

HLM 5. With SEM or MLR, it is not possible to estimate variances (i.e., random effects) for the

slopes and intercepts simultaneously with dyadic data.3 One can obtain variance estimates by

running separate models fixing the random effect (variance) of the intercept, the slope, or the

correlation between them to a fixed value (most typically, zero). Separate analyses suggested that

the slope variance was not significantly different from zero, and, therefore, the variance of the

slope (and the covariance between the slope and intercept) was fixed to zero.

Analyses are presented for dummy coding of the caregiving variable (0 and 1) and for

group-mean centering (-.5 and +.5). When dummy coding is used, the intercept represents the

average score for caregivers (because they were coded as 0). When group-centering is used, the

average intercept represents the grand mean for all couples (caregivers and care recipients

combined). In multilevel regression, group-mean centering is achieved by subtracting each

individual’s score from the mean of the dyad (this can be done automatically in the HLM

Dyadic Data 16

software). To obtain the group-mean centered solution using the SEM approach, however, one

simply sets the loadings of the slope variable to -.5 and +.5, rather than 0 and 1. For the single

indicator model with no level-2 predictors, the overall model fit cannot be evaluated because the

model is just identified. However, slope and intercept estimates, their variance estimates, and

their significance tests are available.

As can be seen in Table 3, the means for the intercept and slope variables and their

standard errors are identical in the SEM method and the MLR method. The average intercept in

the dummy coded example of 4.222 represents the average positive affect for caregivers, and the

average intercept in the centered example of 4.189 represents the average positive affect for

caregivers and recipients combined. Notice that the average intercept obtained with centering

differs little from that obtained with dummy coding. This will not always be true, but, in this

case, it is because there is very little difference between caregivers and care recipients on

positive affect scores. The average slope of -.067, which is not significantly different from zero,

represents the difference between caregivers and care recipients on positive affect.

Second-order factor model

The above approach may be useful if only one item or measure is available for each

individual, but the single-indicator model assumes no measurement error. A second-order factor

model approach, similar to the second-order growth curve approach described by McArdle

(1988) referred to as the curve-of-factors-scores (CUFFS) model, is also possible. In this model,

two parallel first-order latent variables for each member of the dyad, each defined by multiple

indicators. To define the intercept and slope, fixed paths are set leading from two second-order

factors (which, for continuity, I will continue to refer to as 0 and 1, respectively) to the two

first order factors (referred to here as cg and cr). Both second-order loadings are set to 1 for

Dyadic Data 17

the intercept factor, 0. As with the single-indicator model, a choice of uncentered, dummy

coding (0,1) or group-mean centered coding (-.5,+.5) for the slope, 1, provides different

interpretations for the intercept. With dummy coding, the intercept represents the mean of dyad

members who are assigned 0 for the slope variable. With, centering, the intercept represents the

average of the means of both dyad members. Figure 4 illustrates the model using dummy

coding.

1 A number of other statistical approaches have been developed to analyze dyadic data (e.g.,

Cook, 1994; Kenny & Cook, 1999). I focus only on the multilevel regression and multilevel

structural modeling approaches as alternatives, however, because they are designed to address

particular research questions which conceptualize the data as hierarchically structured.

2 It is theoretically possible to impute missing data using the expectation maximization (EM)

algorithm (e.g., Little & Rubin, 1987) or estimate models with missing data using Full

Information Maximum Likelihood (FIML; e.g., Arbuckle, 1996) provided data are missing at

random, but I will begin with the assumption that there is complete data for all dyads. I will

return to the issue of missing data below.

3 The reason it is not possible to estimate both variances (or “random effects”) is different

with MLR and SEM. In MLR, such a model is not identified, because MLR models are limited

in the number of random parameters that can be estimated by the number of variance-covariance

matrix elements within each level-2 unit (Snijders & Bosker, 1999). The variance-covariance

matrix among level-1 units is given by:

Dyadic Data 18

Several details of the specification are important. First, it is important to investigate the

measurement model and ensure that the fit of the first-order factors is adequate before

proceeding. Second, the intercepts for indicators of the first-order measurement equation, , are

set to zero as is typical in most structural equation models This is a common specification in

most traditional structural models (Bollen, 1987), but, in some software packages, they may be

estimated by default because mean structures are requested. Third, loadings should be

where , and are the variance estimates for the intercept and slope and the covariance

between intercept and slope, respectively, and is the variance of the level-1 residual. and

is the predictor variable for the two cases in a dyad (Goldstein, 1999). The level-1 variance-

covariance matrix for dyadic data (i.e., 2 cases per group or nj = 2), given above, contains

or unique elements. A model containing a single level-1 predictor

with random effects estimated for the intercept and slope, such as that in equations (1.6) through

(1.8) requires 4 parameters to be estimated: , , , and . In general,

parameters are required (q is number of level-1 predictors) for a model

with all possible random variances and covariances. With dyadic data, a model with a random

intercept and slope results in 1 too many parameters to be estimated given the number of

covariance elements available, leading to identification problems.

In SEM, the limitation is the degrees of freedom based on the overall covariance matrix and

the number of parameters estimated in the model. The number of covariance elements is

v(v+1)/2 where v is the number of variables. Degrees of freedom for a structural model are

v(v+1)/2 – p. For the present model, the number of available degrees of freedom are 2(2+1)/2 =

3. If both mean and variances for 0 and 1 are estimated, the model will have negative degrees

of freedom. In SEM, including one or more additional variables in the model will lead to an

Valued Gateway Client, 01/03/-1,

Use multlv13e.out for this example.

Dyadic Data 19

constrained to be equal for parallel items in the first order factors. This is an assumption of

factorial invariance across members of the dyad and is similar to the latent growth curve

assumption of longitudinal factorial invariance (Meredith & Tisak, 1982; Nesselroade, 1983) and

can be tested using chi-square difference tests. If the constraints are not imposed and factor

patterns are allowed to differ, the second-order factors may be confounded with dyad differences

in measurement. Fourth, disturbances for the first order factors are set equal to one another. This

provides a single estimate of the within-group variance. The intraclass correlation can be

obtained using the estimate for the first order disturbances and the estimate of the variance of 0

in an intercept-only model and equation .4

Fifth, the second-order factor model will be theoretically identified when three or more

indicators are used. However, when the variance estimate for the intercept or slope is near zero

in the population, estimation difficulties may arise due to empirical underidentification. When a

population value is near zero, the sample estimate may be negative due to sampling error. This

leads to difficulties with estimation in SEM, because the estimated covariance matrix cannot be

inverted. In larger samples, however, there will be less risk of empirical underidentification

problems, because the sample estimate will be nearer to the population estimate. Under these

circumstances, the researcher may wish to set this variance to zero (or, rather, a very small

positive value) in order to obtain an estimate provided there are no other difficulties with the

model specification. Within the structural modeling literature, setting a variance or other

parameter to a fixed value in order to obtain a solution is controversial, but setting random

effects (i.e., variance estimates) to zero is fairly common practice within the MLR literature.

overall model that is identified (although such models are not necessarily empirically identified),

whereas the addition of variables in MLR will not produce an identified model without

constraints on the random parameters.

Dyadic Data 20

One important reason for the difference in these standards involves the interpretation of the

variance estimate. In SEM, it is unusual to expect zero variance of a latent variable in the

population, and therefore a zero or negative variance typically indicates a problem with model

specification. However, in the context of multilevel analysis, nonsignificant or near zero

variance simply means there is little difference among groups in the level-1 intercepts or slopes.

For example, it is not unreasonable to assume the difference in affect between caregivers and

recipients is of approximately equal magnitude in all dyads.

Example

Measurement model fit. A single-factor measurement model of positive affect with 5

indicators fit the data well. The fit was significantly improved by the inclusion of a correlation

between two measurement errors ("How much of the time has your daily life been full of things

that were interesting to you?" and " How happy, satisfied, or pleased have you been with your

personal life?") for both caregiver and care recipient factors. These two items are the only items

concerning daily life satisfaction rather than hope for the future or a positive mood. The

resulting two-factor model had a nonsignificant chi-square (2 (36, N=116) = 40.20, p=.29) and

alternative fit indices (TLI=.990, RMSEA=.032) that indicated a good fit.

Intercept only model. To obtain an estimate of the intraclass correlation with the second-order

model, a single, intercept-factor was defined, setting the loadings to 1. This model also fit the

data well, although the chi-square was significant (2 (46, N=116) =67.03, p = .02, TLI=.960,

RMSEA = .063). The mean of 0= 4.435, representing the average latent-variable affect score.

The variance estimate obtained for 0 was .311, representing variance between dyads. The

within-group variance estimate is obtained from the estimate of the disturbances of cg and cr

(set equal) and was .478. Using these values, the estimate of the intraclass correlation is .394.


multvl13i.out


Multlv13h.out

Dyadic Data 21

Note that this value is somewhat larger than the estimate obtained with the single-indicator

model (.33). The difference is due to measurement error attenuation in the single-indicator

model.

Difference model. Next, a model estimating the slope with centered coding was tested. This

model provides a test of whether, on average, there is a difference between caregivers' and care

recipients' affect (i.e., mean of the slope variable, 0) and whether this difference varies

significantly across couples (i.e., variance of the slope variable, 1). An initial test of the model

in which both the intercept and the slope variance were estimated resulted in estimation

difficulties. Sensitivity tests suggested that this was a result of minimal variance in the slope

across dyads (as also suggested by the single-indicator model) indicating similar caregiver-

recipient differences in affect. Consequently, the model was estimated setting the value of the

slope variance to near zero (.0001) to identify the model. The model fit the data well (2 (44,

N=116) =65.08, p = .02, TLI=.958, RMSEA = .064). The mean intercept value was significantly

different from zero (0=4.434, p<.001), and the mean slope value was nonsignificant (1= -.076,

ns), indicating no overall difference between caregivers and care recipient positive affect.

Level-2 predictors.

Using either of the above specifications, it is possible to examine latent intercept and slope

variables in more complex models, either as exogenous or endogenous variables. This is a

distinct advantage over existing multilevel SEM capabilities, and the flexibility to use slopes and

intercepts as predictors expands the types of hypotheses that can be tested. In MLR, level-2

predictors are often used to explain variability of intercepts or slopes. The test of whether a

level-2 variable predicts the slope variable is referred to as a “cross-level interaction” (e.g., Kreft

& de Leeuw, 1998), because a significant level-2 slope means that the effect of a level-1


multlv13g.out

Dyadic Data 22

predictor depends on the value of the level-2 predictor. The SEM statistical model is identical for

level-1 variables,

but an additional variable, 2, and path coefficient is needed in each of the level-2 equations;

The SEM model parallels that of the MLR model in which a level-2 predictor, z, has been added.

Level-1 equation:

Level-2 equations:

To illustrate the use of a level-2 variable, a new model is tested which uses the number of

relatives providing help as a predictor. Older couples in which one member has difficulties with

daily activities (e.g., climbing stairs) often report negative reactions to outside help they receive

from other family members, such as a daughter or son (Newsom, 1999). Although care

recipients often report negative emotional reactions when they must receive help from a spouse

or other family member, little is known about how couples react. Because the previous model

constrained the variance of the slope to be near zero, the path predicting the slope variable is

omitted from this model. In other models, however, the researcher seeking to explain variability


Use multlv13i.out for this example.

Dyadic Data 23

in dyadic differences across couples may wish to predict the slope variable instead of or in

addition to the intercept variable.

Fit indices suggested a good fit to the data, although the chi-square was significant (2(54,

N=116)=75.976, p=.03, TLI=.957, RMSEA=.059). The number of family or friends providing

assistance in addition to the spouse was negatively associated with couples' positive affect (b=-

321, p<.01). This finding suggests that couples may perceive help from others as intrusive or

interfering, and they may prefer to handle daily functioning tasks on their own (Newsom, 1999;

Newsom & Schulz, 1998).

Generalization of the Approach

There are several possibilities for building upon the dyadic models discussed here. In

addition to level-2 predictors, the dyadic model can also be expanded to include level-1

predictors to explain within-dyad variability. If the second-order factor approach were used,

level-1 predictors would involve paths to the first-order factors. Raudenbush (1993) has shown

the equivalence of intercept-only models to the random effects ANOVA model and the

incorporation of level-1 covariates in such models to be equivalent to random effects ANCOVA

models. Use of latent variables would further develop the random effects ANCOVA model to

include latent variables. Applications of the dyadic model would also extend to traditional

within-subjects ANOVA applications (Judd, Kenny, & McClelland, 2001), such as pretest-

posttest designs or other within-subjects studies in which the researcher wishes to compare two

different instruments assessing the same individual (Snijders & Bosker, 1999).

Another application of the dyadic model, is the potential generalization of the approach to

data involving larger groups (e.g., 2, 3, 4,or 5 individuals per group) such as family units or

small working groups. Their application, however, would likely be limited to the

Dyadic Data 24

nonexchangeable case, in which each group member has a role consistently present across

groups. As the number of individuals per group increases, however, models would become more

complex. Of particular difficulty is the specification of the difference slope variable. A simple

difference variable becomes more complicated to implement, because there must be nj-1 dummy

variables to represent comparisons (where nj is the number of individuals per group).

A nested test of the means for the full set of dummy variables would provide an omnibus test of

differences among group members. Inclusion of dummy variables would not be necessary,

however. When differences among group members are not of interest or the researcher is solely

interested in examining or explaining within-group or between-group variability, intercept-only

models could be used.

Although the examples presented above are based on complete data for dyads, application of

these models would not necessarily be restricted to complete data. MLR is commonly employed

with unequal group sizes (i.e, unbalanced n), with a minimum of one case per group (Bryk &

Raudenbush, 1992). Longford (1993) discusses the similarity between missing data estimation

with the expectation maximization (EM) algorithm and random coefficient models with

unbalanced n. The increasing availability and use of structural models with full information

maximum likelihood (FIML) estimation for missing data provides the capability of

implementing multilevel dyadic or small-group models with unbalanced data and should produce

comparable solutions to those obtained with MLR for unbalanced data.

Summary and Conclusions

The SEM approach described above offers distinct advantages over existing multilevel

structural equation models applied to dyadic data. First, slopes and intercepts can be used as

predictors, outcomes, or mediators in models with any combination of level-1 (individual) or

Dyadic Data 25

level-2 (dyad) variables, greatly enhancing flexibility compared to current multilevel SEM

methods. Many psychological, sociological, and organizational theories posit reciprocal

causation between individual and group behavior, but the current multilevel SEM approach is

limited to examining between-group and within-group variables separately. Second, the dyadic

structural equation model is simple to implement in any available SEM software and requires no

special matrix preparation or software features.

The proposed approach also provides advantages over an MLR approach to dyadic data.

MLR models do not allow researchers to predict level-2 variables with level-1 variables.

Moreover, the SEM approach allows one to test model fit and compare nested models. The

approach also offers greater flexibility than MLR models, because correlated measurement errors

can be incorporated and error homogeneity can be tested. The ability to test for measurement

invariance across members of the dyad allows researchers to rule out differences due to

nonequivalent measurement. In addition, because the dyadic SEM approach is implemented

within the traditional SEM framework, other features of SEM, such as multigroup analysis or

growth curve analysis, can be integrated.

The proposed approach is, of course, not without limitations. The dyadic SEM model is

restricted to data with nonexchangable partners. Thus, it may not be possible to implement such

models in research on gay couples, monozygotic twins, or student friendship interactions. The

single-indicator model described above has some of the same limitations as MLR models. In

these models, intercept and slope cannot be tested simultaneously and measurement error is not

estimated. In addition, as with the second-order factor model example presented here, smaller

sample sizes may lead to greater likelihood of problems with empirical identification when

population variances are near zero. In general, the behavior of estimates and standard errors is as

Dyadic Data 26

yet unknown with various sample sizes or model specifications, although, as shown here, the

estimates are equivalent to those obtained with MLR. Despite the limitations noted, the dyadic

structural equation model is a flexible approach that provides researchers with a variety of other

model testing options previously unavailable.

Dyadic Data 27

References

Arbuckle, J.L. (1996) Full information estimation in the presence of incomplete data. In

G.A. Marcoulides and R.E. Schumacker [Eds.] Advanced structural equation modeling: Issues

and Techniques. Mahwah, NJ: Lawrence Erlbaum Associates.

Barnett, R.C., Marshall, N.L., Raudenbush, S.W., & Brennan, R.T. (1993). Gender and

the relationship between job experiences and psychological distress: A study of dual-earner

couples. Journal of Personality and Social Psychology, 64, 94-806.

Bassiri, D. (1988). Large and small sample properties of maximum likelihood estimates

for the hierarchical linear model. Unpublished doctoral dissertation, Michigan State University.

Bryk, A.S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and

data analysis methods. Newbury Park, CA: Sage.

Cook, W.L. (1994). A structural equation model of dyadic relationships within the family

system. Journal of Consulting and Clinical Psychology, 62, 500-509.

Duncan, T.E., Duncan, S.C., Strycker, L.A., Li, F., & Alpert, A. (1999). An introduction

to latent variable growth curve modeling: Concepts, issues, and applications. Mahwah, NJ:

Erlbaum.

Fleenor, J.W., McCauley, C. D., Brutus, S. (1996). Self-other rating agreement and

leader effectiveness. Leadership Quarterly, 7, 487-506.

Goldberg, A.I., Cohen, G., Rubin, A. E. (1998) Physician assessments of patient

compliance with medical treatment. Social Science & Medicine, 47, 1873-1876.

Goldstein, H. (1999). Multilevel statistical models (Second Edition). London: Arnold.

[On-line book]. Available: http://www.arnoldpublishers.com/support/goldstein.htm.

Dyadic Data 28

Judd, C.M., Kenny, D.A., & McClelland, R. (2001). Estimating and testing mediation

and moderation in within subjects designs. Psychological Methods, 6, 115-134.

Kaplan, D. (2000). Structural equation modeling: Foundations and extensions.Thousand

Oaks, CA: Sage.

Kendler, K. S, Karkowski, L. M, & Prescott, C.A. (1999). Causal relationship between

stressful life events and the onset of major depression. American Journal of Psychiatry, 837-848.

Kreft, I., & de Leeuw, J. (1998). Introducing multilevel modeling. London: Sage.

Little, R.J.A., & Rubin, D.B. (1987). Statistical analysis with missing data. New York:

Wiley & Sons.

McArdle, J.J. (1988). Dynamic but structural equation modeling of repeated measures

data. In Nesselroade, J.R., and Cattel, R.B. (eds.), Handbook of Multivariate Experimental

Psychology (2nd ed.). New York: Plenum Press.

McArdle, JJ., & Hamagami, F. (1996). Multilevel models from a multiple group

structural equation perspective. In G.A. Marcoulides, , & R.E. Schumacker (eds.), Advanced

Structural Equation Modeling: Issues and Techniques (pp. 89-124). Mahway, NJ: Erlbaum.

Meredith, W., & Tisak, J. (1982). Canonical analysis of longitudinal and repeated

measures data with stationary weights. Psychometrika, 47, 47-67.

Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107-122.

Muthen, B. (1989). Latent variable modeling in heterogeneous populations.

Psychometrika, 54, 557-585.

Muthen, B. (1994). Multilevel covariance structure analysis. Sociological Methods &

Research, 22, 376-398.

Dyadic Data 29

Muthen, B. (1997a). Latent variable modeling with longitudinal and multilevel data. In

A. Raftery (Ed.), Sociological Methodology (pp.453-480). Boston: Blackwell Publishers.

Muthen, B. (1997b). Latent variable growth modeling with multilevel data. In M.

Berkane (ed.), Latent variable modeling with applications to causality (pp. 149-161). New York:

Springer Verlag.

Muthen, L.K., & Muthen, B.O. (2001). Mplus: The comprehensive modeling program

for applied researchers (Version 2 user’s guide). Los Angeles: Muthen & Muthen.

Muthen, B., & Satorra, A. (1989). Multilevel aspects of varying parameters in structural

models. In R. D. Bock (Ed.), Multilevel analysis of educational data (pp. 87-99). San Diego, CA:

Academic.

Muthen, B. & Satorra, A. (1995). Complex sample data in structural equation modeling.

In P. Marsden (ed.), Sociological Methodology 1995, 216-316.

Nesselroade, J.R. (1983). Temporal selection and factor invariance in the study of

development and change. Life-span development and behavior, 5, 59-87.

Newsom, J.T. (1999). Another side to caregiving: Negative reactions to being helped.

Health Psychology, 17, 172-181.

Newsom, J.T., & Schulz, R. (1998). Caregiving from the recipient's perspective:

Negative reactions to being helped. Health Psychology, 17, 172-181.

Ozer, E.M., Barnett, R.C., Brennan, R.T., & Sperling, J. (1998). Does childcare

involvement increase or decrease distress among dual-earner couples. Journal of Women's

Health: Research on Gender, Behavior, and Policy, 4, 285-311.

Raudenbush, S.W. (1993). Hierarchical linear models and experimental design. In L.K.

Edwards (Ed.), Applied analysis of variance in behavioral science. New York: Marcel Dekker.

Dyadic Data 30

Raudenbush, S.W., Brennan, R.T., & Barnett, R.C. (1995). A multivariate hierarchical

model for studying psychological change within married couples. Journal of Family Psychology,

9, 161-174.

Segal, N.L., & Hershberger, S.L. (1999). Cooperation and competition between twins:

Findings from a prisoner's dilemma game. Evolution and Human Behavior, 20, 29-51.

Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect

correlates and predictors of individual change over time. Psychological Bulletin, 116, 363-381.

Dyadic Data 31

Table 1. A comparison of the SEM and MLR growth curve models.

SEM MLR

Level-1

Level-2

Dyadic Data 32

Table 2. Results for the interecept-only models.

MLR SEM

Intercept Mean (SE) 4.189(.070) 4.189(.070)

Intercept Variance .285 .281

Within-dyad variance .570 .570

Dyadic Data 33

Table 3. Mean and variance estimates when the caregiving variable is uncentered/dummy coded

(0,1) and group-mean centered (-.5,+.5).

.

Means (SE) Variances(SE)

MLR SEM MLR SEM

Dummy coded

Intercept 4.222 (.082) 4.222 (.086) .284 (NA) .282 (.083)

Slope -.067 (.098) -.067 (.099) 0 0

Centered

Intercept 4.189 (.070) 4.189 (.070) .284 (NA) .282 (.083)

Slope -.067 (.099) -.067 (.099) 0 0

NA = not available (in HLM, the standard error of the random effects are not reported).

Dyadic Data 34

Figure 1. An example of a latent growth curve model with four time points.

yt1

0(Intercept)

(Slope)

1

01

Note: y1-y2 represent repeated measures of a single variable. Loadings are set to 1 for 0 and 0, 1, 2, and 3 and for 1 .

yt2 yt3 yt4

2 31 11

Dyadic Data 35

Figure 2. Multilevel model for dyadic data: Intercept only single-indicator model.

y1cg

0

(Intercept)

1

Note: y1cg represents caregiver’s positive affect, y2cr represents care recipient’s positive affect. Error variances for y1cg and y2cr are set equal.

y2cr

1

Dyadic Data 36

Figure 3. Multilevel model for dyadic data: The single-indicator difference model.

y1cg

0(Intercept)

(Slope)

10

1

Note: y1cg represents caregiver’s positive affect, y2cr represents care recipient’s positive affect. Loadings for 1 are set to 0 for caregivers and 1 for care recipients to define 0as the average score for wives taken across couples.

y2cr

1

0 0

Dyadic Data 37

Figure 4. Multilevel model for dyadic data: Specification 3.

y1cg y2cg y3cg y4cg y5cg

0(Intercept)

cg

1

* * ** **

1

Note: y1cg-y5cg represent responses from caregivers, y6cr-y10cr represent responses from care recipients. cg is a latent variable for caregivers, and cr is a latent variable for care recipients. 0 and 1 represent the intercept and the slope, respectively. a indicates disturbances are set to be equal.

y6cr y7cr y8cr y9cr y10cr

*

cr

*

1(Slope)

1

0

a

1

1

a

Dyadic Data 38

Footnotes

4 In the difference model that includes 1, an intraclass correlation can also be computed, but

is considered an "adjusted" intraclass correlation.

a multilevel structural equation model for dyadic dataweb.pdx.edu/~newsomj/dyadic3.doc · web...

Documents