a multilevel structural equation model for dyadic dataweb.pdx.edu/~newsomj/dyadic3.doc · web...
TRANSCRIPT
Dyadic Data 1
A Multilevel Structural Equation Model for Dyadic Data
Jason T. Newsom
Portland State University
RUNNING HEAD: Multilevel SEM for Dyadic Data
Submitted for publication.
Draft: 8/2/01
Address correspondence to Jason T. Newsom, Ph.D., Institute on Aging, School of Community
Health, Portland State University, P.O. Box 751, Portland, OR 97207-0751, email:
[email protected]. Partial support for preparation of this article was provided by AG 15159
and AG14130 from the National Institute on Aging. The author wishes to thank Joop Hox and
David Morgan for helpful discussions and Patrick Curran and Tor Neilands for comments on an
earlier draft.
Dyadic Data 2
A Multilevel Structural Equation Model for Dyadic Data
Abstract
Dyadic data involving couples, twins, or parent-child pairs are common in the social sciences,
but available statistical approaches are limited in the types of hypotheses that can be tested with
dyadic data. A novel structural modeling approach, based on latent growth curve model
specifications, is proposed for use with dyadic data. The approach allows researchers to test
more sophisticated causal models, incorporate latent variables, and estimate more complex error
structures than is currently possible using hierarchical linear modeling or multilevel structural
equation models. A brief introduction to multilevel regression and latent growth curve models
is given, and the equivalence of the statistical model for nested and longitudinal data is
explained. Possible expansion of the strategy for application with small groups and with
unbalanced data is briefly discussed.
Running Head: MULTILEVEL SEM FOR DYADIC DATA
Dyadic Data 3
Social scientists commonly study dyadic data, as found in research on marital couples
(Raudenbush, Brennen, & Barnett, 1995), twins (Kendler, Karkowski, & Prescott, 1999),
employees and supervisors (Fleenor, McCauley, & Brutus, 1996), or doctors and patients
(Goldberg, Cohen, & Rubin, 1998). When data are collected from both individuals of the dyad,
scores from each individual in a dyad are usually dependent. Data from each member of the
dyad cannot be treated as independent observations without the underestimation of standard
errors, resulting in increased Type I error. In addition, many measures, such as household
income, are measured at the dyadic level rather than the individual level. Other variables,
although measured at the individual level, partially reflect experiences or circumstances that are
common to both members of the couple (e.g., marital quality), and, thus, containing individual-
level and group-level components. For these reasons, dyadic data can be considered
"hierarchically structured."
Multilevel regression (MLR; also known as hierarchical linear models or HLM; Bryk &
Raudenbush, 1992; Kreft & de Leeuw, 1998; Snijders & Bosker, 1999) and multilevel structural
equation models (multilevel SEM: Muthen,1989,1994; Muthen & Satorra, 1989) are two
approaches to analyzing hierarchically structured data in which individuals are nested within
groups.1 MLR can be conceptualized as a two-level regression model, with the level-1 model
represented by the usual regression equation generated for each group. The level-2 model
involves prediction of intercept and slope coefficients obtained from the level-1 models. These
models are now commonly used throughout the social sciences and have been applied to dyadic
data by a number of researchers (Barnett, Marshall, Raudenbush, & Brennan, 1993; Ozer,
Barnett, Brennan, & Sperling, 1998; Segal & Hershberger, 1999).
Dyadic Data 4
The multilevel SEM approach to hierarchically structured data is a recently developed
approach (e.g., Muthen, 1989, 1997a; Muthen & Satorra, 1995; McArdle, & Hamagami, 1996)
that can estimate measurement, path, or full structural models at the individual and group-level
(usually termed within and between levels, respectively). The multilevel SEM approach involves
analyzing separate covariance matrices at the within (level 1) and between level (level 2). The
ability to estimate measurement error is an important advantage over regression-based methods,
but the approach is limited to separate models at the within and between levels and hypotheses
involving causal relations between the two levels cannot be tested. Moreover, slope variability
across groups cannot be estimated with the technique (Kaplan, 2000). Multilevel SEM can also
be cumbersome to implement because separate within and between covariance matrices must be
estimated and read into an SEM software package. The within and between covariance matrices
are then tested using a multigroup structural model with special model specifications (Muthen,
1997a; McArdle, & Hamagami, 1996). Although matrix estimation has been automated and
model estimation has been simplified with Mplus software (Muthen & Muthen, 2001), other
software packages require more laborious procedures to estimate multilevel models.
Growth curve analysis, another common application of MLR, is statistically equivalent to
MLR with hierarchically structured data, because repeated measures are nested within
individuals (Bryk & Raudenbush, 1992). Although latent growth curve analysis has an
analogous statistical relationship to multilevel SEM, the actual implementation or model
specification of the two types of models is quite different. The latent growth curve specification
does not have the same limitations as multilevel SEM in that there are no restrictions on
relationships between level-1 and level-2 variables and it is convenient to implement in any
software package.
Dyadic Data 5
The statistical equivalence of the multilevel and growth curve models allows for the
possibility of using a latent growth curve approach to model specification with certain types of
nested data. Based on this rationale, I describe below a novel approach to the analysis of dyadic
data that employs a latent growth curve model specification. The approach has advantages over
the MLR approach to dyadic data without the limitations of multilevel SEMs, because latent
variables can be used, more complex error structures can be estimated, more sophisticated causal
hypotheses can be tested, and model specification is convenient with any SEM package.
I begin by giving a brief overview of latent growth models and hierarchical linear
models. A general familiarity with structural equation models is assumed, but familiarity with
multilevel regression models is not assumed. After illustrating the equivalence of MLR growth
models and latent growth models, I describe a new approach to analysis of dyadic data using a
growth curve formulation. Finally, I discuss some possibilities for how this approach may be
expanded for use with small groups and in situations in which group sizes are not equal.
Latent Growth Curve Models
Both MLR and SEM can be used to analyze longitudinal data by estimating individual growth
curves. The statistical models for growth curve analysis and hierarchical analysis are identical,
because repeated measures can be considered to be nested within individuals (Bryk &
Raudenbush, 1992). Thus, growth curve models, in general, are a special case of the multilevel
model. The SEM approach, called latent growth curve modeling (McArdle, 1988; Meredith &
Tisak, 1990), has advantages compared to multilevel regression because of the ability to estimate
measurement error when multiple indicators are used and the ability to specify complex error
structures (Willet, 1994). Although, statistically, multilevel structural models and latent growth
curve models are identical, their implementation with structural modeling packages is quite
Dyadic Data 6
different. Implementation of the latent growth model does not require estimation of separate
covariance matrices nor does it require a multigroup structural model. Latent growth models
estimate latent intercepts and slopes representing an individuals initial level and change in the
dependent variable over time. The variability in intercepts or slopes across individuals and the
factors that explain the variability are often of interest to researchers. Most importantly, latent
growth curve models afford considerable flexibility for researchers, because the slopes and
intercepts can be incorporated into more complex models as predictors or outcomes.
Latent growth curve models estimate individual growth curves or "trajectories" by using
repeated measures as indicators of two latent variables, an intercept variable (0) and a slope
variable (1). The interpretation of the intercept variable depends on how the loadings on the
slope factor are fixed. For instance, one approach to defining the slope variable is to fix loadings
to values 0, 1, 2, 3, 4, … , t-1, sometimes referred to as "time codes." In this case, the intercept
latent variable represents the initial value, because the first loading on 1 is set to 0. One can
also “center “ these loadings by setting the middle time point to 0 (e.g., -3.,-2,-1,0,1,2,3), giving
the intercept factor the average value of y at the middle time point, t0. Figure 1 illustrates a
simple growth curve model with four time points.
Algebraically, the latent growth curve model is represented by the following formulas.
For simplicity, the Lisrel "all-y" notation is used throughout (Hayduk, 1987):
level-1 equation (measurement model):
Dyadic Data 7
level-2 equations (structural model):
In equation , yti is the dependent variable. The subscripts i and t indicate a measurement
within an individual, i, for each time point, t. is a latent variable that represents the level-1
intercept, is a latent variable that represents the relationship between the time code and the
dependent variable (i.e., the growth trajectory), ti are the loadings for each time point on the
intercept latent variable (0) and the slope latent variable (1). The measurement intercept, ,
associated with each loading ( matrix) is assumed to be zero as it is in most traditional structural
models and in latent growth curve models (e.g., Muthen, 1997b; Willet & Sayer, 1994), and
therefore is not shown above. To simplify, no level-2 predictors are presented in or , but
predictors of the intercepts or slopes could be included. In the level-2 equations, 0 and 1 are
the intercepts or the average value of 0 and 1 for all individuals. 0 and 1 are residuals. The
variance for residuals is found in the matrix.
More traditionally, the structural model is represented by grouping each variable into
matrices:
y
In equations and , is a 2 X t matrix representing the relationship between 2 latent
variables, 0 and 1, and t indicators, one for each time point. The first column in which
corresponds to 0, is comprised of all 1s, because each loading for this variable is set equal to 1
to define it as the intercept. is the matrix of residual errors for each indicator at each time
point. is a 2 X 1 vector containing latent means for the intercept and slope, representing the
Dyadic Data 8
average intercept across individuals and the average slope (i.e., trajectory) across individuals,
respectively. is a matrix of error terms (i.e., the elements) and provides information about
the variances of the intercepts and slopes, 0 and 1, and their covariances. The variance of the
intercepts and slopes, 0 and 1, are obtained by estimation of the matrix.
Although the formulae will not be presented here, it is possible within the latent growth curve
framework, to use multiple indicators of each construct at each time point (e.g., McArdle, 1988;
Duncan, Duncan, Strycker, Li, & Alpert, 1999). In this formulation, latent variables at each time
point are the used as indicators of second-order latent intercept and slope variables. The
resulting model accounts for residual variation at each time point that is not accounted for by the
growth parameters and measurement error. In addition, correlated error structures across time
points are possible.
Multilevel Regression Models
Multilevel regression models (or hierarchical linear models) estimate predictive
relationships when the data are nested or hierarchically structured, as in the case of students
nested within schools. The statistical model used for hierarchically structured data is the same
statistical model used for longitudinal analysis of individual growth curves. With growth curve
models, longitudinal data measurements are considered to be nested within individuals. In
general, a multilevel regression with a single level-1 predictor and no level-2 predictors can be
written with two sets of equations:
Level-1 equation:
Level-2 equations:
Dyadic Data 9
Equation is the familiar regression equation, with r representing error or unexplained
variance. The subscripts i and g indicate whether the value is for each individual or each group.
In equation , the intercept values for each group serve as the dependent variable. For simplicity
sake, there are no predictors in equation or . In , 00 is the intercept (mean of all group
intercepts), and u0 is the error or remaining variance. The variance of across the groups gives
an estimate of the variability of the intercept values, , across groups. Because the intercepts
represent adjusted means for each group (i.e., adjusting or controlling for the effects of xi),
is the variance of the adjusted means for each group. In MLR texts, is
typically referred to as .
In the third equation, , the slope estimates, 1g, obtained from the level-1 regression model for
each group serve as values of the dependent variable. 10 is the intercept in this equation, and
represents the average of all slopes, 1g, interpreted as the average effect of xi on the dependent
variable across all groups. u1 is the error term and its variance, , represents the
variability of the slopes across groups (i.e., the variability in the relationship between x and y
across the groups). is customarily referred to as . One can also examine the
covariances or the correlations between the slopes and intercepts as they covary across groups,
.
By substituting equations and into equation , the MLR model can be expressed as a single
regression equation,
Dyadic Data 10
or, by rearranging the terms,
If growth curve models are tested, the level-1 x-variable is replaced by time codes, xt (e.g.,
0, 1, 2, 3, . . ., t-1). The dependent variable at each time point is regressed on the time code at
level-1. Instead of individuals nested within groups, repeated measures are nested within
individuals. In other words, level 2 consists of individuals rather than groups. In growth models,
(i.e., the ) represents the variability of the initial or baseline value of y across
individuals, , or , is the variation of the growth across individuals, and , or
, is the covariation of the initial value and the growth in y across individuals.
Comparing the SEM and MLR growth models
With a single measure at each time point, the two approaches to growth models are essentially
identical. The parallels between the SEM and MLR approaches can be seen by comparing their
algebraic formulas (see Table 1).
These formulas are fully equivalent, although this may not be apparent at first glance. In
SEM, loadings are used in place of level-1 regression coefficients. In equation , the level-1
intercept is represented by the product term ti0i, which refers to the loadings for latent intercept
variable and the intercept variable itself. The matrix is analogous to the X matrix in matrix
regression in which the first column is a vector of 1's used to produce the intercept. By setting
the loadings for the intercept (0i) to 1, the product of the loadings and the intercept (ti0i ) of is
simply equivalent to the intercept term, 0, of equation . The next term in equation , ti1i,
representing the slope factor, can also be considered identical to the slope in equation as long as
Dyadic Data 11
the loadings in the matrix are set to values that would be used as predictors in growth curve
analysis, such as 0, 1, 2, 3, . . . t-1. Here, ti in is equivalent to xt in . Because 's are equivalent
to x's and the 's are equivalent to 's, it would make more sense to re-express equation as,
By estimating the means and variances of 0i and 1i , we can obtain estimates of the average
latent intercept and average latent slope and the extent to which they vary across individuals.
A Multilevel Structural Equation Model for Dyadic Data
Because growth models and two-level hierarchical regression models are identical
statistical models, as illustrated above, it is possible to specify a multilevel SEM for certain
hierarchical data situations that use the same model specifications as those used in latent growth
models. I start by describing the data requirements necessary with dyadic data (e.g., couples,
twins, mother-child dyads). I then describe two model specification options, a single-indicator
model and a second-order multiple indicator model, giving an example of each. The second-
order multiple indicator model is then used in an example illustrating the use of a level-2 (dyad-
level) predictor.
Data characteristics
At minimum, a single dependent measure obtained from each member of the dyad is
needed.2 Multiple indicators for each individual can also be used, with a minimum of three
Dyadic Data 12
indicators for each member of the dyad. Dyads will be assumed to be non-exchangable. That is,
there is a basis for distinguishing members of each couple in an identical manner in all groups.
Examples might include husbands and wives, mother and child, first born and second born, or
caregiver and care recipient. The data set should be configured in the so-called "repeated
measures" format, in which each case in the data matrix contains information about both
members of the dyad. For example, each record contains information about the husband and the
wife, recorded under different variable names (e.g., y1h, y2h, y3h, y4w, y5w, y6w). This
configuration is analogous to that used for latent growth curve analysis.
Example data set
To illustrate, I will use an example from a study I conducted recently examining
interactions between spousal caregivers and care recipients. There are 116 couples (232
individuals), in which each member of the couple was interviewed separately. I examine five
items from the Veit and Ware (1983) positive affect subscale of the Mental Health Inventory.
Items such as "How much of the time have you felt the future look hopeful and promising?" were
rated on a 6-point scale of frequency of occurrence. Thus the analysis is based on 10 variables—
5 items for caregivers and 5 items for care recipients. In the single-indicator model, illustrated
first, the five items are averaged for caregivers and for care recipients.
Single-indicator models
Intercept-only model. I first take the simplest case in which there is only one measure (i.e.,
indicator) of positive affect for caregivers and for care recipients (the measure was computed by
averaging the five items for each). This model specification follows that of the growth curve
model described above in the case in which there is only two time points tested and is depicted in
Figure 2. The basic model, which I will call the intercept-only model, includes no level-1 or
Dyadic Data 13
level-2 predictors. At level 2, there is only one equation, because there is no slope obtained from
level 1. Using the multilevel regression notation, this is the model given by the following
separate equations:
Level-1 equation:
Level-2 equation:
This model can be shown to be equal to a random effects ANOVA (Raudenbush, 1993), in
which 0 is the mean score for each dyad and the estimate of the variance of the residual, rig, is
the within-group variation, usually designated by 2. 00, the level-2 intercept, represents the
average of the dyad means and the estimate of the variance of u0g, known as 00, is the between
group variation of the means.
Using the equivalent structural modeling notation, these equations would be:
Level-1 equation:
Level-2 equation:
One can calculate the ratio of between-dyad variation relative to the total
variation using the intraclass correlation coefficient. The intraclass correlation provides
information about the degree to which dyad members have similar scores on the
dependent variable. In HLM notation, intraclass correlation coefficient is expressed as a
ratio of between to within plus between variation:
Dyadic Data 14
Using the SEM approach, the measurement residual represents the within-dyad variation
and the variance of the latent intercept, 0, represents the between group variation. So, the
intraclass correlation coefficient is given by:
where 00 represents the variance of 0.
Results for the intercept-only model. Parallel analyses were conducted using Mplus
(Version 2, Muthen & Muthen, 2001) and HLM 5 (Raudenbush, Bryk, Cheong, & Congdon,
2000). The results reported in Table 2 indicate nearly identical findings with the two statistical
packages. The average positive affect score for dyads was 4.189. The between-dyad variance,
given by the random effect for the intercept using HLM, and the variance of the intercept
variable in Mplus was approximately .28 and was significantly different from zero in either case.
A significant variance indicates that the average positive affect score for each couple varies
across the dyads. The within-dyad variance was .570, indicating there was greater variation
within-dyads than between dyads. The intraclass correlation coefficient was approximately .33,
indicating couples had positive affect scores that were moderately related. The association
between caregiver and care recipient positive affect indicates that an OLS regression assuming
couples were independent cases would not be appropriate and would provide underestimates of
standard errors in statistical tests.
Difference model. A predictor at level 1 can be added by incorporating a slope variable
to represent the difference between dyad members (e.g., gender). In the context of the care
Dyadic Data 15
recipient study, a variable designating whether an individual was a caregiver or care recipient
was used. A mean slope significantly different from zero would indicate a significant difference
between caregivers and care recipients on positive affect. The difference model is depicted in
Figure 3. Two latent variables are defined: a latent intercept, 0, and a latent slope, 1,. There is
only one indicator for each of these latent variables. The intercept variable, 0, is defined by
fixing loadings on each of the two indicators to 1. The slope variable, 1, is defined by fixing
loadings on the same two indicators to 0 and 1. The average intercept and average slope across
couples (i.e., 00 and 01 in MLR notation) are obtained by estimating the mean structures of each
(not estimated by default in SEM software packages). The variances of the slopes and intercepts
across couples are indicated by the variances of 0 and 1 (i.e., the matrix).
Results for the difference model. As above, parallel analyses were conducted using Mplus and
HLM 5. With SEM or MLR, it is not possible to estimate variances (i.e., random effects) for the
slopes and intercepts simultaneously with dyadic data.3 One can obtain variance estimates by
running separate models fixing the random effect (variance) of the intercept, the slope, or the
correlation between them to a fixed value (most typically, zero). Separate analyses suggested that
the slope variance was not significantly different from zero, and, therefore, the variance of the
slope (and the covariance between the slope and intercept) was fixed to zero.
Analyses are presented for dummy coding of the caregiving variable (0 and 1) and for
group-mean centering (-.5 and +.5). When dummy coding is used, the intercept represents the
average score for caregivers (because they were coded as 0). When group-centering is used, the
average intercept represents the grand mean for all couples (caregivers and care recipients
combined). In multilevel regression, group-mean centering is achieved by subtracting each
individual’s score from the mean of the dyad (this can be done automatically in the HLM
Dyadic Data 16
software). To obtain the group-mean centered solution using the SEM approach, however, one
simply sets the loadings of the slope variable to -.5 and +.5, rather than 0 and 1. For the single
indicator model with no level-2 predictors, the overall model fit cannot be evaluated because the
model is just identified. However, slope and intercept estimates, their variance estimates, and
their significance tests are available.
As can be seen in Table 3, the means for the intercept and slope variables and their
standard errors are identical in the SEM method and the MLR method. The average intercept in
the dummy coded example of 4.222 represents the average positive affect for caregivers, and the
average intercept in the centered example of 4.189 represents the average positive affect for
caregivers and recipients combined. Notice that the average intercept obtained with centering
differs little from that obtained with dummy coding. This will not always be true, but, in this
case, it is because there is very little difference between caregivers and care recipients on
positive affect scores. The average slope of -.067, which is not significantly different from zero,
represents the difference between caregivers and care recipients on positive affect.
Second-order factor model
The above approach may be useful if only one item or measure is available for each
individual, but the single-indicator model assumes no measurement error. A second-order factor
model approach, similar to the second-order growth curve approach described by McArdle
(1988) referred to as the curve-of-factors-scores (CUFFS) model, is also possible. In this model,
two parallel first-order latent variables for each member of the dyad, each defined by multiple
indicators. To define the intercept and slope, fixed paths are set leading from two second-order
factors (which, for continuity, I will continue to refer to as 0 and 1, respectively) to the two
first order factors (referred to here as cg and cr). Both second-order loadings are set to 1 for
Dyadic Data 17
the intercept factor, 0. As with the single-indicator model, a choice of uncentered, dummy
coding (0,1) or group-mean centered coding (-.5,+.5) for the slope, 1, provides different
interpretations for the intercept. With dummy coding, the intercept represents the mean of dyad
members who are assigned 0 for the slope variable. With, centering, the intercept represents the
average of the means of both dyad members. Figure 4 illustrates the model using dummy
coding.
1 A number of other statistical approaches have been developed to analyze dyadic data (e.g.,
Cook, 1994; Kenny & Cook, 1999). I focus only on the multilevel regression and multilevel
structural modeling approaches as alternatives, however, because they are designed to address
particular research questions which conceptualize the data as hierarchically structured.
2 It is theoretically possible to impute missing data using the expectation maximization (EM)
algorithm (e.g., Little & Rubin, 1987) or estimate models with missing data using Full
Information Maximum Likelihood (FIML; e.g., Arbuckle, 1996) provided data are missing at
random, but I will begin with the assumption that there is complete data for all dyads. I will
return to the issue of missing data below.
3 The reason it is not possible to estimate both variances (or “random effects”) is different
with MLR and SEM. In MLR, such a model is not identified, because MLR models are limited
in the number of random parameters that can be estimated by the number of variance-covariance
matrix elements within each level-2 unit (Snijders & Bosker, 1999). The variance-covariance
matrix among level-1 units is given by:
Dyadic Data 18
Several details of the specification are important. First, it is important to investigate the
measurement model and ensure that the fit of the first-order factors is adequate before
proceeding. Second, the intercepts for indicators of the first-order measurement equation, , are
set to zero as is typical in most structural equation models This is a common specification in
most traditional structural models (Bollen, 1987), but, in some software packages, they may be
estimated by default because mean structures are requested. Third, loadings should be
where , and are the variance estimates for the intercept and slope and the covariance
between intercept and slope, respectively, and is the variance of the level-1 residual. and
is the predictor variable for the two cases in a dyad (Goldstein, 1999). The level-1 variance-
covariance matrix for dyadic data (i.e., 2 cases per group or nj = 2), given above, contains
or unique elements. A model containing a single level-1 predictor
with random effects estimated for the intercept and slope, such as that in equations (1.6) through
(1.8) requires 4 parameters to be estimated: , , , and . In general,
parameters are required (q is number of level-1 predictors) for a model
with all possible random variances and covariances. With dyadic data, a model with a random
intercept and slope results in 1 too many parameters to be estimated given the number of
covariance elements available, leading to identification problems.
In SEM, the limitation is the degrees of freedom based on the overall covariance matrix and
the number of parameters estimated in the model. The number of covariance elements is
v(v+1)/2 where v is the number of variables. Degrees of freedom for a structural model are
v(v+1)/2 – p. For the present model, the number of available degrees of freedom are 2(2+1)/2 =
3. If both mean and variances for 0 and 1 are estimated, the model will have negative degrees
of freedom. In SEM, including one or more additional variables in the model will lead to an
Dyadic Data 19
constrained to be equal for parallel items in the first order factors. This is an assumption of
factorial invariance across members of the dyad and is similar to the latent growth curve
assumption of longitudinal factorial invariance (Meredith & Tisak, 1982; Nesselroade, 1983) and
can be tested using chi-square difference tests. If the constraints are not imposed and factor
patterns are allowed to differ, the second-order factors may be confounded with dyad differences
in measurement. Fourth, disturbances for the first order factors are set equal to one another. This
provides a single estimate of the within-group variance. The intraclass correlation can be
obtained using the estimate for the first order disturbances and the estimate of the variance of 0
in an intercept-only model and equation .4
Fifth, the second-order factor model will be theoretically identified when three or more
indicators are used. However, when the variance estimate for the intercept or slope is near zero
in the population, estimation difficulties may arise due to empirical underidentification. When a
population value is near zero, the sample estimate may be negative due to sampling error. This
leads to difficulties with estimation in SEM, because the estimated covariance matrix cannot be
inverted. In larger samples, however, there will be less risk of empirical underidentification
problems, because the sample estimate will be nearer to the population estimate. Under these
circumstances, the researcher may wish to set this variance to zero (or, rather, a very small
positive value) in order to obtain an estimate provided there are no other difficulties with the
model specification. Within the structural modeling literature, setting a variance or other
parameter to a fixed value in order to obtain a solution is controversial, but setting random
effects (i.e., variance estimates) to zero is fairly common practice within the MLR literature.
overall model that is identified (although such models are not necessarily empirically identified),
whereas the addition of variables in MLR will not produce an identified model without
constraints on the random parameters.
Dyadic Data 20
One important reason for the difference in these standards involves the interpretation of the
variance estimate. In SEM, it is unusual to expect zero variance of a latent variable in the
population, and therefore a zero or negative variance typically indicates a problem with model
specification. However, in the context of multilevel analysis, nonsignificant or near zero
variance simply means there is little difference among groups in the level-1 intercepts or slopes.
For example, it is not unreasonable to assume the difference in affect between caregivers and
recipients is of approximately equal magnitude in all dyads.
Example
Measurement model fit. A single-factor measurement model of positive affect with 5
indicators fit the data well. The fit was significantly improved by the inclusion of a correlation
between two measurement errors ("How much of the time has your daily life been full of things
that were interesting to you?" and " How happy, satisfied, or pleased have you been with your
personal life?") for both caregiver and care recipient factors. These two items are the only items
concerning daily life satisfaction rather than hope for the future or a positive mood. The
resulting two-factor model had a nonsignificant chi-square (2 (36, N=116) = 40.20, p=.29) and
alternative fit indices (TLI=.990, RMSEA=.032) that indicated a good fit.
Intercept only model. To obtain an estimate of the intraclass correlation with the second-order
model, a single, intercept-factor was defined, setting the loadings to 1. This model also fit the
data well, although the chi-square was significant (2 (46, N=116) =67.03, p = .02, TLI=.960,
RMSEA = .063). The mean of 0= 4.435, representing the average latent-variable affect score.
The variance estimate obtained for 0 was .311, representing variance between dyads. The
within-group variance estimate is obtained from the estimate of the disturbances of cg and cr
(set equal) and was .478. Using these values, the estimate of the intraclass correlation is .394.
Dyadic Data 21
Note that this value is somewhat larger than the estimate obtained with the single-indicator
model (.33). The difference is due to measurement error attenuation in the single-indicator
model.
Difference model. Next, a model estimating the slope with centered coding was tested. This
model provides a test of whether, on average, there is a difference between caregivers' and care
recipients' affect (i.e., mean of the slope variable, 0) and whether this difference varies
significantly across couples (i.e., variance of the slope variable, 1). An initial test of the model
in which both the intercept and the slope variance were estimated resulted in estimation
difficulties. Sensitivity tests suggested that this was a result of minimal variance in the slope
across dyads (as also suggested by the single-indicator model) indicating similar caregiver-
recipient differences in affect. Consequently, the model was estimated setting the value of the
slope variance to near zero (.0001) to identify the model. The model fit the data well (2 (44,
N=116) =65.08, p = .02, TLI=.958, RMSEA = .064). The mean intercept value was significantly
different from zero (0=4.434, p<.001), and the mean slope value was nonsignificant (1= -.076,
ns), indicating no overall difference between caregivers and care recipient positive affect.
Level-2 predictors.
Using either of the above specifications, it is possible to examine latent intercept and slope
variables in more complex models, either as exogenous or endogenous variables. This is a
distinct advantage over existing multilevel SEM capabilities, and the flexibility to use slopes and
intercepts as predictors expands the types of hypotheses that can be tested. In MLR, level-2
predictors are often used to explain variability of intercepts or slopes. The test of whether a
level-2 variable predicts the slope variable is referred to as a “cross-level interaction” (e.g., Kreft
& de Leeuw, 1998), because a significant level-2 slope means that the effect of a level-1
Dyadic Data 22
predictor depends on the value of the level-2 predictor. The SEM statistical model is identical for
level-1 variables,
but an additional variable, 2, and path coefficient is needed in each of the level-2 equations;
The SEM model parallels that of the MLR model in which a level-2 predictor, z, has been added.
Level-1 equation:
Level-2 equations:
To illustrate the use of a level-2 variable, a new model is tested which uses the number of
relatives providing help as a predictor. Older couples in which one member has difficulties with
daily activities (e.g., climbing stairs) often report negative reactions to outside help they receive
from other family members, such as a daughter or son (Newsom, 1999). Although care
recipients often report negative emotional reactions when they must receive help from a spouse
or other family member, little is known about how couples react. Because the previous model
constrained the variance of the slope to be near zero, the path predicting the slope variable is
omitted from this model. In other models, however, the researcher seeking to explain variability
Dyadic Data 23
in dyadic differences across couples may wish to predict the slope variable instead of or in
addition to the intercept variable.
Fit indices suggested a good fit to the data, although the chi-square was significant (2(54,
N=116)=75.976, p=.03, TLI=.957, RMSEA=.059). The number of family or friends providing
assistance in addition to the spouse was negatively associated with couples' positive affect (b=-
321, p<.01). This finding suggests that couples may perceive help from others as intrusive or
interfering, and they may prefer to handle daily functioning tasks on their own (Newsom, 1999;
Newsom & Schulz, 1998).
Generalization of the Approach
There are several possibilities for building upon the dyadic models discussed here. In
addition to level-2 predictors, the dyadic model can also be expanded to include level-1
predictors to explain within-dyad variability. If the second-order factor approach were used,
level-1 predictors would involve paths to the first-order factors. Raudenbush (1993) has shown
the equivalence of intercept-only models to the random effects ANOVA model and the
incorporation of level-1 covariates in such models to be equivalent to random effects ANCOVA
models. Use of latent variables would further develop the random effects ANCOVA model to
include latent variables. Applications of the dyadic model would also extend to traditional
within-subjects ANOVA applications (Judd, Kenny, & McClelland, 2001), such as pretest-
posttest designs or other within-subjects studies in which the researcher wishes to compare two
different instruments assessing the same individual (Snijders & Bosker, 1999).
Another application of the dyadic model, is the potential generalization of the approach to
data involving larger groups (e.g., 2, 3, 4,or 5 individuals per group) such as family units or
small working groups. Their application, however, would likely be limited to the
Dyadic Data 24
nonexchangeable case, in which each group member has a role consistently present across
groups. As the number of individuals per group increases, however, models would become more
complex. Of particular difficulty is the specification of the difference slope variable. A simple
difference variable becomes more complicated to implement, because there must be nj-1 dummy
variables to represent comparisons (where nj is the number of individuals per group).
A nested test of the means for the full set of dummy variables would provide an omnibus test of
differences among group members. Inclusion of dummy variables would not be necessary,
however. When differences among group members are not of interest or the researcher is solely
interested in examining or explaining within-group or between-group variability, intercept-only
models could be used.
Although the examples presented above are based on complete data for dyads, application of
these models would not necessarily be restricted to complete data. MLR is commonly employed
with unequal group sizes (i.e, unbalanced n), with a minimum of one case per group (Bryk &
Raudenbush, 1992). Longford (1993) discusses the similarity between missing data estimation
with the expectation maximization (EM) algorithm and random coefficient models with
unbalanced n. The increasing availability and use of structural models with full information
maximum likelihood (FIML) estimation for missing data provides the capability of
implementing multilevel dyadic or small-group models with unbalanced data and should produce
comparable solutions to those obtained with MLR for unbalanced data.
Summary and Conclusions
The SEM approach described above offers distinct advantages over existing multilevel
structural equation models applied to dyadic data. First, slopes and intercepts can be used as
predictors, outcomes, or mediators in models with any combination of level-1 (individual) or
Dyadic Data 25
level-2 (dyad) variables, greatly enhancing flexibility compared to current multilevel SEM
methods. Many psychological, sociological, and organizational theories posit reciprocal
causation between individual and group behavior, but the current multilevel SEM approach is
limited to examining between-group and within-group variables separately. Second, the dyadic
structural equation model is simple to implement in any available SEM software and requires no
special matrix preparation or software features.
The proposed approach also provides advantages over an MLR approach to dyadic data.
MLR models do not allow researchers to predict level-2 variables with level-1 variables.
Moreover, the SEM approach allows one to test model fit and compare nested models. The
approach also offers greater flexibility than MLR models, because correlated measurement errors
can be incorporated and error homogeneity can be tested. The ability to test for measurement
invariance across members of the dyad allows researchers to rule out differences due to
nonequivalent measurement. In addition, because the dyadic SEM approach is implemented
within the traditional SEM framework, other features of SEM, such as multigroup analysis or
growth curve analysis, can be integrated.
The proposed approach is, of course, not without limitations. The dyadic SEM model is
restricted to data with nonexchangable partners. Thus, it may not be possible to implement such
models in research on gay couples, monozygotic twins, or student friendship interactions. The
single-indicator model described above has some of the same limitations as MLR models. In
these models, intercept and slope cannot be tested simultaneously and measurement error is not
estimated. In addition, as with the second-order factor model example presented here, smaller
sample sizes may lead to greater likelihood of problems with empirical identification when
population variances are near zero. In general, the behavior of estimates and standard errors is as
Dyadic Data 26
yet unknown with various sample sizes or model specifications, although, as shown here, the
estimates are equivalent to those obtained with MLR. Despite the limitations noted, the dyadic
structural equation model is a flexible approach that provides researchers with a variety of other
model testing options previously unavailable.
Dyadic Data 27
References
Arbuckle, J.L. (1996) Full information estimation in the presence of incomplete data. In
G.A. Marcoulides and R.E. Schumacker [Eds.] Advanced structural equation modeling: Issues
and Techniques. Mahwah, NJ: Lawrence Erlbaum Associates.
Barnett, R.C., Marshall, N.L., Raudenbush, S.W., & Brennan, R.T. (1993). Gender and
the relationship between job experiences and psychological distress: A study of dual-earner
couples. Journal of Personality and Social Psychology, 64, 94-806.
Bassiri, D. (1988). Large and small sample properties of maximum likelihood estimates
for the hierarchical linear model. Unpublished doctoral dissertation, Michigan State University.
Bryk, A.S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and
data analysis methods. Newbury Park, CA: Sage.
Cook, W.L. (1994). A structural equation model of dyadic relationships within the family
system. Journal of Consulting and Clinical Psychology, 62, 500-509.
Duncan, T.E., Duncan, S.C., Strycker, L.A., Li, F., & Alpert, A. (1999). An introduction
to latent variable growth curve modeling: Concepts, issues, and applications. Mahwah, NJ:
Erlbaum.
Fleenor, J.W., McCauley, C. D., Brutus, S. (1996). Self-other rating agreement and
leader effectiveness. Leadership Quarterly, 7, 487-506.
Goldberg, A.I., Cohen, G., Rubin, A. E. (1998) Physician assessments of patient
compliance with medical treatment. Social Science & Medicine, 47, 1873-1876.
Goldstein, H. (1999). Multilevel statistical models (Second Edition). London: Arnold.
[On-line book]. Available: http://www.arnoldpublishers.com/support/goldstein.htm.
Dyadic Data 28
Judd, C.M., Kenny, D.A., & McClelland, R. (2001). Estimating and testing mediation
and moderation in within subjects designs. Psychological Methods, 6, 115-134.
Kaplan, D. (2000). Structural equation modeling: Foundations and extensions.Thousand
Oaks, CA: Sage.
Kendler, K. S, Karkowski, L. M, & Prescott, C.A. (1999). Causal relationship between
stressful life events and the onset of major depression. American Journal of Psychiatry, 837-848.
Kreft, I., & de Leeuw, J. (1998). Introducing multilevel modeling. London: Sage.
Little, R.J.A., & Rubin, D.B. (1987). Statistical analysis with missing data. New York:
Wiley & Sons.
McArdle, J.J. (1988). Dynamic but structural equation modeling of repeated measures
data. In Nesselroade, J.R., and Cattel, R.B. (eds.), Handbook of Multivariate Experimental
Psychology (2nd ed.). New York: Plenum Press.
McArdle, JJ., & Hamagami, F. (1996). Multilevel models from a multiple group
structural equation perspective. In G.A. Marcoulides, , & R.E. Schumacker (eds.), Advanced
Structural Equation Modeling: Issues and Techniques (pp. 89-124). Mahway, NJ: Erlbaum.
Meredith, W., & Tisak, J. (1982). Canonical analysis of longitudinal and repeated
measures data with stationary weights. Psychometrika, 47, 47-67.
Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107-122.
Muthen, B. (1989). Latent variable modeling in heterogeneous populations.
Psychometrika, 54, 557-585.
Muthen, B. (1994). Multilevel covariance structure analysis. Sociological Methods &
Research, 22, 376-398.
Dyadic Data 29
Muthen, B. (1997a). Latent variable modeling with longitudinal and multilevel data. In
A. Raftery (Ed.), Sociological Methodology (pp.453-480). Boston: Blackwell Publishers.
Muthen, B. (1997b). Latent variable growth modeling with multilevel data. In M.
Berkane (ed.), Latent variable modeling with applications to causality (pp. 149-161). New York:
Springer Verlag.
Muthen, L.K., & Muthen, B.O. (2001). Mplus: The comprehensive modeling program
for applied researchers (Version 2 user’s guide). Los Angeles: Muthen & Muthen.
Muthen, B., & Satorra, A. (1989). Multilevel aspects of varying parameters in structural
models. In R. D. Bock (Ed.), Multilevel analysis of educational data (pp. 87-99). San Diego, CA:
Academic.
Muthen, B. & Satorra, A. (1995). Complex sample data in structural equation modeling.
In P. Marsden (ed.), Sociological Methodology 1995, 216-316.
Nesselroade, J.R. (1983). Temporal selection and factor invariance in the study of
development and change. Life-span development and behavior, 5, 59-87.
Newsom, J.T. (1999). Another side to caregiving: Negative reactions to being helped.
Health Psychology, 17, 172-181.
Newsom, J.T., & Schulz, R. (1998). Caregiving from the recipient's perspective:
Negative reactions to being helped. Health Psychology, 17, 172-181.
Ozer, E.M., Barnett, R.C., Brennan, R.T., & Sperling, J. (1998). Does childcare
involvement increase or decrease distress among dual-earner couples. Journal of Women's
Health: Research on Gender, Behavior, and Policy, 4, 285-311.
Raudenbush, S.W. (1993). Hierarchical linear models and experimental design. In L.K.
Edwards (Ed.), Applied analysis of variance in behavioral science. New York: Marcel Dekker.
Dyadic Data 30
Raudenbush, S.W., Brennan, R.T., & Barnett, R.C. (1995). A multivariate hierarchical
model for studying psychological change within married couples. Journal of Family Psychology,
9, 161-174.
Segal, N.L., & Hershberger, S.L. (1999). Cooperation and competition between twins:
Findings from a prisoner's dilemma game. Evolution and Human Behavior, 20, 29-51.
Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect
correlates and predictors of individual change over time. Psychological Bulletin, 116, 363-381.
Dyadic Data 31
Table 1. A comparison of the SEM and MLR growth curve models.
SEM MLR
Level-1
Level-2
Dyadic Data 32
Table 2. Results for the interecept-only models.
MLR SEM
Intercept Mean (SE) 4.189(.070) 4.189(.070)
Intercept Variance .285 .281
Within-dyad variance .570 .570
Dyadic Data 33
Table 3. Mean and variance estimates when the caregiving variable is uncentered/dummy coded
(0,1) and group-mean centered (-.5,+.5).
.
Means (SE) Variances(SE)
MLR SEM MLR SEM
Dummy coded
Intercept 4.222 (.082) 4.222 (.086) .284 (NA) .282 (.083)
Slope -.067 (.098) -.067 (.099) 0 0
Centered
Intercept 4.189 (.070) 4.189 (.070) .284 (NA) .282 (.083)
Slope -.067 (.099) -.067 (.099) 0 0
NA = not available (in HLM, the standard error of the random effects are not reported).
Dyadic Data 34
Figure 1. An example of a latent growth curve model with four time points.
yt1
0(Intercept)
(Slope)
1
01
Note: y1-y2 represent repeated measures of a single variable. Loadings are set to 1 for 0 and 0, 1, 2, and 3 and for 1 .
yt2 yt3 yt4
2 31 11
Dyadic Data 35
Figure 2. Multilevel model for dyadic data: Intercept only single-indicator model.
y1cg
0
(Intercept)
1
Note: y1cg represents caregiver’s positive affect, y2cr represents care recipient’s positive affect. Error variances for y1cg and y2cr are set equal.
y2cr
1
Dyadic Data 36
Figure 3. Multilevel model for dyadic data: The single-indicator difference model.
y1cg
0(Intercept)
(Slope)
10
1
Note: y1cg represents caregiver’s positive affect, y2cr represents care recipient’s positive affect. Loadings for 1 are set to 0 for caregivers and 1 for care recipients to define 0as the average score for wives taken across couples.
y2cr
1
0 0
Dyadic Data 37
Figure 4. Multilevel model for dyadic data: Specification 3.
y1cg y2cg y3cg y4cg y5cg
0(Intercept)
cg
1
* * ** **
1
Note: y1cg-y5cg represent responses from caregivers, y6cr-y10cr represent responses from care recipients. cg is a latent variable for caregivers, and cr is a latent variable for care recipients. 0 and 1 represent the intercept and the slope, respectively. a indicates disturbances are set to be equal.
y6cr y7cr y8cr y9cr y10cr
*
cr
*
1(Slope)
1
0
a
1
1
a
Dyadic Data 38
Footnotes
4 In the difference model that includes 1, an intraclass correlation can also be computed, but
is considered an "adjusted" intraclass correlation.