fkrphwulnd . technical report · addition to the typical global tests (vermeulen & hartmann,...

30
GRAPHICAL TOOLS FOR LINEAR PATH MODELS Bryant Chen and Judea Pearl University of California, Los Angeles Rex B. Kline Concordia University Montréal, Canada This article introduces tools for analyzing path diagrams using graphical methods, which are better known in epidemiology than among researchers in the behavioral sciences, who would be more familiar with traditional structural equation modeling (SEM). We describe graphical tools that locate covariates for adjustment and instrumental variables, both of which can be used to identify and estimate individual causal effects. These tools also find conditional independences implied by the model and over- identified parameters, which can be used to test the model at the level of local fit. Their application in a real data set is demonstrated with a freely-available computer tool. Key words: causal effects, counterfactuals, equivalent models, goodness of fit, graphical models, identification, linear regression, misspecification test, path models, structural equation models, causal inference, d-separation Introduction Sewall Wright (1920, p. 328) presented the very first path diagram, which summarized hypotheses about genetic and environmental causes of piebald (spotted) coat patterns in guinea- pig litters. Today’s path diagrams have basically the same form as Wright’s and are widely used in structural equation modeling (SEM), which includes path analysis, confirmatory factor analysis, and other methods for analyzing causal models. Although path models 1 are limited to cases in which there is a single observed measure of each hypothetical construct, path analytic techniques are useful in many disciplines including psychology, sociology, and epidemiology, among many others. For example, about 25% of the approximately 500 SEM publications in psychology and education journals reviewed by MacCallum and Austin (2000) were path analytic studies. It is common practice for authors of path analytic studies to present the diagram of an initial causal model 2 and perhaps also diagrams for alternative models that represent competing theories for the same observed variables (e.g., Romney, Jenkins, & Bynner, 1992, p. 172). Path diagrams thus serve as graphical communication devices that summarize hypotheses about causal effects or noncausal associations between any pair of variables depicted in the diagram. Pearl (2009) noted that path diagrams capture the ability to predict the consequences of new 1 While path models are the focus of this paper, some of the graphical tools described here are also applicable to latent variable models. 2 In some cases, path models may represent only associational relations among the observed variables. Some of the tools in this paper (e.g. d-separation) are applicable to such models, but we focus on the analysis of causal relations using path models. Forthcoming, Psychometrika. TECHNICAL REPORT R-469 September 2016

Upload: others

Post on 19-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

GRAPHICAL TOOLS FOR LINEAR PATH MODELS

Bryant Chen and Judea Pearl University of California, Los Angeles

Rex B. Kline Concordia University

Montréal, Canada

This article introduces tools for analyzing path diagrams using graphical methods, which are better known in epidemiology than among researchers in the behavioral sciences, who would be more familiar with traditional structural equation modeling (SEM). We describe graphical tools that locate covariates for adjustment and instrumental variables, both of which can be used to identify and estimate individual causal effects. These tools also find conditional independences implied by the model and over-identified parameters, which can be used to test the model at the level of local fit. Their application in a real data set is demonstrated with a freely-available computer tool.

Key words: causal effects, counterfactuals, equivalent models, goodness of fit, graphical models, identification, linear regression, misspecification test, path models, structural equation models, causal inference, d-separation

Introduction

Sewall Wright (1920, p. 328) presented the very first path diagram, which summarized hypotheses about genetic and environmental causes of piebald (spotted) coat patterns in guinea-pig litters. Today’s path diagrams have basically the same form as Wright’s and are widely used in structural equation modeling (SEM), which includes path analysis, confirmatory factor analysis, and other methods for analyzing causal models. Although path models1 are limited to cases in which there is a single observed measure of each hypothetical construct, path analytic techniques are useful in many disciplines including psychology, sociology, and epidemiology, among many others. For example, about 25% of the approximately 500 SEM publications in psychology and education journals reviewed by MacCallum and Austin (2000) were path analytic studies.

It is common practice for authors of path analytic studies to present the diagram of an initial causal model2 and perhaps also diagrams for alternative models that represent competing theories for the same observed variables (e.g., Romney, Jenkins, & Bynner, 1992, p. 172). Path diagrams thus serve as graphical communication devices that summarize hypotheses about causal effects or noncausal associations between any pair of variables depicted in the diagram. Pearl (2009) noted that path diagrams capture the ability to predict the consequences of new

1While path models are the focus of this paper, some of the graphical tools described here are also applicable tolatent variable models. 2In some cases, path models may represent only associational relations among the observed variables. Some of the tools in this paper (e.g. d-separation) are applicable to such models, but we focus on the analysis of causal relations using path models.

Forthcoming, Psychometrika. TECHNICAL REPORT R-469

September 2016

Page 2: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

eventualities or manipulations in addition to counterfactual relations (e.g. “I would still have a headache if it were not for the aspirin”). In contrast, strictly algebraic expressions as well as regression equations lack the notational facility required to make explicit causal assumptions apart from statistical assumptions; that is, graphs provide an appropriate visual language for causal modeling.

Path diagrams are generally static entities in most SEM articles in that diagrams represent specific models describing an underlying phenomenon. In some cases, numerical results are also displayed, such as values of path coefficients and their standard errors. An exception in the traditional SEM literature are graphical methods intended to determine whether certain types of nonrecursive models are identified or not identified (Rigdon, 1995). Nonrecursive models have causal loops where at least two variables cause and effect each other (possibly indirectly) or have error covariances between pairs of endogenous variables. In contrast, recursive models have no causal loops and no error covariances, but measured exogenous variables are often assumed to covary. Some nonrecursive models are partially recursive, also called semi-Markovian (Pearl, 2009), meaning that they have at least one pair of endogenous variables with correlated error terms but no causal loops. All recursive models are identified, but additional requirements must be satisfied before nonrecursive or partially recursive models are identified (Paxton, Hipp, & Marquart-Pyatt, 2011).

Graphical methods for determining whether a model is identified do not require data, which is fortunate because, as Kenny and Milan (2012) put it, what researcher wants to find out that his or her model is not identified after putting in the effort to collect the data? Additionally, dealing with identification proactively is better than fitting a model of unknown identification status to the data using an SEM computer tool. If the model is not actually identified, the analysis may fail with a warning or error message, but this message is unlikely to pinpoint the source of the problem. This is because SEM computer tools perform only perfunctory checks for empirical identification, such as calculating the model degrees of freedom or estimating correlations between pairs of parameter estimates. Additionally, if the model is really identified but poor start values are specified, the analysis could fail, leaving the researcher to possibly conclude in error that identification is the problem. Graphical identification methods are also generally easier to apply than non-graphical methods, such as the rank condition, which requires knowledge of matrix operations (e.g., Kline, 2016, pp. 161–163).3 The rank condition is also a sufficient identification requirement only for certain kinds of nonrecursive models, such as ones with all possible error covariances (Paxton et al., 2011).

Rigdon (1995) described a set of necessary and sufficient graphical rules for the identification of nonrecursive models where the endogenous variables can be partitioned into sets of recursively-related blocks of size two or less. Blocks of size one are reserved for endogenous variables with strictly recursive relations to other variables. Such blocks are trivially identified. Blocks of size two are reserved for pairs of variables involved in a nonrecursive relation, defined as featuring a direct feedback loop and/or correlated error terms. Blocks of size two with variables that are not causally related (i.e. a pair of variables whose error terms are correlated but neither is a cause of the other) are also identified.

Presented in Figure 1 are abstractions from Rigdon’s (1995, p. 370) graphical types that represent the minimum required specifications for identifying blocks of two causally related variables, X and Y. In Figure 1(a), identification of the “bow-arc” block containing X and Y requires a single instrument Z. Instruments are variables that are correlated with the causal 3A computer program by Bekker, Merckens, and Wansbeek (1994) automatically evaluated the rank condition for some, but not all, types of nonrecursive models.

Page 3: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

variable but not with the error term of the outcome variable, and they allow causal effects to be estimated using 2SLS regression (Bollen, 2012). In Figure 1(b), identification of the direct feedback loop requires only one instrument if the error covariance is fixed to zero, but two instruments are required when the bow-pattern error covariance is freely estimated as is depicted in Figure 1(c). All other specifications beyond these minimum requirements are irrelevant; that is, they have no bearing on whether the nonrecursive block is identified.

In this paper, we describe basic graphical tools that extend the identification rules of Rigdon (1995). Like Rigdon’s identification rules, these methods allow the user to address identification both directly and proactively, that is, prior to collecting data. They can also be applied to any recursive or partially recursive model. Finally, these graphical conditions characterize when and how a given path coefficient can be estimated using simple ordinary least squares (OLS) regression, where covariate adjustment identifies causal effects, or can be estimated using a method such as 2SLS, where instruments identify causal effects. As a result, individual coefficients can be estimated directly, allowing causal effects to be computed even when the model is not identified as a whole. Multiple estimators for overidentified effects can also be obtained.

We also describe graphical tools that derive testable implications of the model, which can be used to evaluate the model at the local level of fit and, unlike global tests, are applicable even when the model is not identified. When the model is identified, these tests should be applied in addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken & Waller, 2003).

These tools underlie many of the recent breakthroughs in causal analysis using nonparametric models (i.e., models which make no commitment to the functional form of the equations and the error distributions) including methods for the identification of causal effects and counterfactuals, deriving testable implications, analyzing missing data, and assessing external validity. The purpose of this paper is to describe these graphical fundamentals in the hope that they become part of the standard SEM curriculum, enabling researchers to better deal with identification and to better understand and test the full range of hypotheses represented in their path models.

Graph Methods and the Structural Causal Model

Grace et al. (2012) described what they referred to as third-generation SEM4 and what Pearl (2009) called the structural causal model (SCM), which is a collection of causal assumptions in equation form combined with graphical methods for deriving the implications of those assumptions. These graphical methods originated in the computer science literature with Pearl’s work on Bayesian networks and machine learning. It has been elaborated by Pearl and others since the 1980s as a method for causal inference based on graphs that are nonparametric generalizations of Wright’s theory of path coefficients.

A directed acyclic graph (DAG) corresponds to either a recursive path model or a partially recursive model. Error covariances, if any, are represented by bidirectional paths rendered as dashed arcs. A directed cyclic graph (DCG) has a causal loop and thus corresponds to a nonrecursive path model. In this paper, we focus on graphical tools applicable to directed acyclic graphs, for which there are more freely-available computer tools. Some of these software 4Sewall Wright’s pioneering work on path analysis and Karl G. Jöreskog’s synthesis of path analysis and factor analysis in the LISREL framework are, respectively, the first and second generations of SEM.

Page 4: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

packages permit the user to determine whether any given direct or total effect is identified by controlling for covariates that block biasing paths or through the location of instruments. This possibility also helps the researcher to know when multiple estimators of the same causal effect are available. In contrast, standard SEM software almost never signals the possibility for multiple estimators and, as mentioned, is generally unable to conclusively diagnose whether individual causal effects are identified.

The capabilities just described not only help the researcher deal with the problem of identification, they also support both estimation of individual model parameters and evaluation of model fit at the level of the residuals, or local fit. Most path models described in the literature are estimated using simultaneous methods, such as maximum likelihood (ML), that estimate all model parameters together using iterative algorithms. Under conditions that may not hold in many actual studies, simultaneous methods are generally more efficient than single-parameter methods, such as 2SLS. An efficient estimator has lower variation among estimates of the same parameter when analyzing a correctly specified model over random samples compared with a less efficient estimator. But this potential benefit is generally more theoretical than real. Researchers rarely analyze correctly specified models over random samples. Most samples in the SEM literature are ad hoc (convenience) samples, not probability samples where each member of the population has an equal probability of being selected, and estimates of error variation in ad hoc samples do not reflect purely sampling error.

If the model is misspecified, single-parameter methods may actually outperform simultaneous methods because they better isolate the effects of errors due to misspecified segments of the model. In contrast, simultaneous methods may allow errors to propagate and bias parameter estimates throughout the model. As a result, estimates from single-parameter methods may be less distorted by bias than estimates from simultaneous methods (Bollen, Kirby, Curran, Paxton, & Chen, 2007). Given doubt about model specification, single-parameter estimation is likely to be the better option. Another potential drawback of simultaneous estimation methods is that they require fully identified models. If some, but not all causal effects are identified, then the analysis may fail even though estimators of the identified effects are available using single-parameter methods.

Global fit testing is usually emphasized when simultaneous methods are used, but local fit testing can—and should—be conducted, too. Global fit statistics, such as the model chi-square, Bentler Comparative Fit Index (CFI), and Steiger-Lind root mean square error of approximation (RMSEA), among others, measure only average or overall fit of the model to the sample data matrix. Thus, it can and does happen that values of global fit statistics mask poor model-data correspondence at the level of the residuals (Tomarken & Waller, 2003). When this happens, the model should not be retained. A serious flaw in many, if not most, published SEM studies is the failure to report information about local fit (Vermeulen & Hartmann, 2015). Nomenclature

Next we “translate” from terms and symbols in graph representations to counterparts more recognizable to researchers familiar with traditional (second-generation) SEM. As mentioned, we will focus on graphical tools that analyze directed acyclic graphs. These graphs will be used to represent linear path models, which are the most common type of path model analyzed in the behavioral sciences, but many of these tools were originally developed for nonparametric models that make no commitment to distributional assumptions for any individual variables. Direct effects in such models are likewise nonparametric and represent all forms of the functional relation between cause and effect, such as linear, quadratic, and all higher-order curvilinear trends for a pair of continuous variables. As a result, most of the ideas presented in this work are equally applicable to nonlinear models for noncontinuous outcomes (Pearl, 2009).

Page 5: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

In directed acyclic graphs, variables are called nodes or vertices, and paths are called arcs, edges, or links, but here we use the more familiar terms variables and paths. There are two kinds of paths, directional and bidirectional, designated by the respective symbols listed next:

→ and (1) The first symbol in Equation 1 represents a direct effect, just as in traditional path diagrams. The second symbol designates a noncausal association between a pair of measured variables due to a common unmeasured—and thus latent—cause, or confounder. Thus, a pair of endogenous variables in a DAG connected by this symbol represents the hypothesis of correlated errors while the absence of such a symbol implies that the error terms are assumed to be independent and uncorrelated.

The analogous symbol for a bidirectional path in traditional path diagrams, or

(2) has the same interpretation when it connects the error terms for a pair of endogenous variables as does the second symbol in Equation 1. But when the symbol in Equation 2 connects a pair of measured exogenous variables, it allows for an association due to any unmodeled relation, either noncausal (spurious) due to a confounder or causal, such as when one exogenous variable directly affects another. In graphs presented in this work, we use the symbol for a bidirectional path in Equation 1 to designate error covariances but use the corresponding symbol in Equation 2 to connect pairs of measured exogenous variables that are assumed to covary for any reason whatsoever.

Presented in Figure 2 are the basic kinds of paths that involve at least three variables: causal, confounding, and collider. Causal and confounding paths transmit statistical association between X and Y, but for different reasons: an indirect effect through the intervening variable W (causal; Figure 2(a)) versus a spurious association due to the common cause W (confounding; Figure 2(b)). Causal and confounding paths are described as open because they transmit association between the variables at each end of the path, but either can be closed or blocked by controlling for the intervening variable, W. For example, X and Y are correlated in both Figures 2(a) and 2(b) due to the open path between them, but Y and X are conditionally independent given W. As a result, regressing Y on both X and W—designated as Y on X, W—in Figure2(a) and Figure 2(b) would result in the coefficient for X being zero. Likewise, the partial correlation between X and Y given W would vanish, or equal zero.

The capability of common causes and intervening variables to block correlation can be illustrated using a simple example. The fact that ice cream sales are correlated with drowning deaths is often used to demonstrate that correlation does not imply causation. When the temperature (T) rises people tend to both buy ice cream and play in the water, resulting in both increased ice cream sales (S) and drowning deaths (D). But if we only consider days with the same temperature or the same number of people engaging in water activities, then the correlation between ice cream sales and drowning will vanish. Similarly, if we control for water activity, then temperature and drownings will no longer be correlated. These results are confirmed by graphical analysis. The relation between ice cream sales and drownings can be modeled using a simple path diagram, or

S ← T → W → D

Page 6: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

where W measures water activity. Here, we see that the open path from S to D is blocked by controlling for either T or W. Likewise, the open path from T to D is blocked by controlling for W.

Whenever two arrowheads collide at a variable along a path, we call that variable a collider. For example, depicted in Figure 2(c) is a collider path where X and Y are specified as independent causes of W, the collider. Similarly,

X W Y

is a collider path with W, the collider. Colliders have special significance in regression analysis. Any path with a collider is closed or blocked, which means no association is transmitted between the variables at either end of a collider path, such as X and Y in the figure. But controlling for a collider opens any path going through that collider. For example, X and Y are dependent given W in Figure 2(c), and the regression Y on X and W would yield a nonzero coefficient for X even though the bivariate association between X and Y is zero. Thus, (1) controlling for a common outcome of unrelated two causes induces a spurious association between them; and (2) even if two variables are causally related, controlling for their common outcome adds a spurious component to their observed association. It is also true that controlling for a causal outcome, or descendent, of a collider induces a spurious association between the causes of the collider. For example, if variable A were added to Figure 2(c) as a direct outcome of W (i.e., W → A), then the regression Y on X and A would also induce a spurious association between X and Y.

The effect of colliders on the correlation between variables may seem unintuitive at first, but is easily understood using a real world example. It is well known that additional years of education (E) often affords one a greater salary (S). Additionally, it is also true that height (H) also has a positive impact on one’s salary. Assuming that there are no other determinants of salary, then the path model consists of a single collider and its causes, or

E → S ← H

Now, we would not expect E and H to be correlated in general. But if we were to look only at people with high salaries, then we would expect E and H to be negatively correlated. If someone is short and has a high salary, then she is more likely to be well educated. Similarly, if we were to control for an effect of salary, say whether the individual drives a high-end car, then we would also expect E and H to be correlated. A short person driving a Ferrari is more likely to be well educated. Thus, we see that when controlling for S or an effect of S, variables E and H become negatively correlated, even though they were not correlated to begin with. Explication of the role of colliders in causal models is a fundamental insight of graph representations with many implications—see Morgan and Winship (2015) for examples.

The concept of d-separation formalizes and generalizes the above concepts for arbitrary sets of variables, X, Y, and W, in a path model where there could be multiple paths between them. If all paths between X and Y are blocked by W, then we say that X and Y are d-separated, given W. In this case, the model predicts that X and Y are conditionally independent given W. This is true for both recursive and nonrecursive linear models (Spirtes, 1995). Formally, a collider-free path between X and Y is blocked by W, if the path does not traverse a member of W. A path with a collider is blocked by W, if no collider on the path is a member of W or is a cause of a variable in W. Additionally, a path with a collider is blocked by W if the path traverses a member of W

Page 7: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

that is not a collider on the path. This path is blocked even if W contains a collider on that path or an effect of a collider on that path. When checking whether variables are d-separated, remember that the symbol in a traditional path diagram represents any type of association, causal or noncausal, between a pair of exogenous variables.

Graphical Identification Criteria

In this section, we present two basic graphical methods for identifying a causal effect of

X on Y: through instrumental variables and through the selection of covariates that remove noncausal aspects of the association between X and Y.

Covariate Selection

Described next are graphical criteria for determining whether a causal effect from X to Y

is identified by controlling for covariates. The back-door criterion identifies the total effect of X on Y by blocking all back-door or confounding paths without generating any new ones:

Back-door criterion. An adjustment set of covariates Z satisfies the back-door criterion relative to the total effect of X on Y if

1. no variable in Z is a direct or indirect outcome of X; and 2. Z d-separates X and Y in the modified path diagram where all paths emanating from X are deleted.

If Z meets the back-door criterion, then the total effect of X on Y is identified and (in linear models) equal to the regression coefficient of X in the regression of Y on X and Z. As a result, the total effect of X on Y can be estimated using OLS.

Three comments are warranted. First, the back-door criterion applies to both linear and

nonlinear causal models, but here we consider only linear models with continuous outcomes. Second, the back-door criterion implicitly forbids controlling for a collider (or one of its effects) along a path between X and Y unless the adjustment set also includes another variable that is not a collider along the same path; otherwise, a spurious association between X and Y would be induced. Third, the reason for forbidding effects of X is as follows. In some cases, controlling for effects of X may generate spurious correlation through “virtual colliders” (Pearl, 2009, p. 339). For example, consider Figure 3(a), where the error term of C, Uc, has been explicitly included in the path diagram. Now, we can see that C is in fact a collider and controlling for its effect, D, induces a correlation between X and Uc, which in turn generates a back-door path from X to Y, as shown in Figure 3(b). It is for this reason that the back-door criterion forbids including effects of X in the adjustment set.

Consider Figure 3(c) where the total effect of X on Y is equivalent to the direct effect of X on Y. In addition to the causal path X → Y, there is also a back-door path

X ← S O → Y (3) The regression Y on X and O, where the adjustment set is {O}, would close the back-door path in Equation 3 leaving just the causal path open, which satisfies the back-door criterion. Doing so

Page 8: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

corresponds to the conventional logic that the causal effect of one variable on another can be estimated by controlling for other direct causes of the same outcome. This is how virtually all standard SEM computer tools work when calculating values of path coefficients. But the regression Y on X and S also closes the same back-door path, so the adjustment set {S} also meets the back-door criterion. The logic of this second regression analysis corresponds to that of controlling for selection factors of the causal variable, in this case S (Morgan & Winship, 2015). This method is just as valid as controlling for other direct causes of the same outcome and is necessary when certain direct causes of the outcome are latent or unobservable variables.

In Figure 3(c), the regression

Y on X, O, and S would also close the back-door path in Equation 3, but the adjustment set{O, S} is not minimally sufficient, meaning that there is a proper subset that satisfies the back-door criterion. (A proper subset does not include the original set.) For simplicity, we consider only minimally sufficient adjustment sets moving forward, but we can see for this example that there are multiple estimators (2) for the total effect of X on Y in Figure 3(c).

Now consider Figure 3(d). In this graph we assume that A and B are independent exogenous variables and the total effect of X on Y is equal to the direct effect. But there are now two noncausal paths between X and Y, one of which includes a collider C:

X ← B → C → Y (4) X ← B → C ← A → Y The first path in Equation 4 is a back-door path and thus is open, but the second path is blocked by the collider C. The regression Y on X and C closes the first path in Equation 4, but opens the second path because C is a collider along the second path. Thus, the adjustment set {C} does not meet the back-door criterion. In contrast, the regression of Y on X and B closes the first path in Equation 4 and leaves the second path closed. As a result, the adjustment set {B} satisfies the back-door criterion.

The adjustment set {A, C} also identifies the total effect of X on Y in Figure 3(d). Controlling for C closes the first path in Equation 4 while opening the second path, but controlling for A closes the second path again. Thus, the regression Y on X, A, and C provides a second estimator for the total effect of X on Y. The adjustment set {A, B} also meets the back-door criterion for the same total effect, but it is not a minimally sufficient set because the proper subset {B} is itself a sufficient set.

The total effect of X on Y in Figure 3(e) consists of a direct effect and an indirect effect through C. The latter variable is not a collider along any path between X and Y, but it is the intervening variable in the indirect effect of X on Y. As a result, controlling for C blocks part of the causal effect of X on Y and should not be included in the adjustment set when estimating the total effect. Note that C is correctly prohibited from being in the adjustment set by condition 2 of the back-door criterion. There are also two back-door paths between X and Y:

X ← A → C → Y (5)

X ← A B → Y The only minimally sufficient adjustment set that identifies the total effect of X on Y is {A}; that

Page 9: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

is, the coefficient for X in the regression Y on X and A estimates the total effect of X on Y. The single-door criterion determines whether direct effects can be identified using

covariate adjustment:

Single-door criterion. In linear models with continuous outcomes, an adjustment set of covariates Z satisfies the single-door criterion relative to the direct effect of X on Y if

1. no variable in Z is a direct or indirect outcome of Y; and 2. Z d-separates X and Y in the modified graph formed by deleting the path X → Y

from the original graph. If Z meets the single-door criterion, the direct effect of X on Y is identified and equal to the regression coefficient of X in the regression of Y on X and Z. As a result, the direct effect can be estimated using OLS.

In Figure 3(f ), the path between X and Y has been removed compared with the original

graph in Figure 3(e). In the modified graph there are three paths between X and Y:

X → C → Y (6) X ← A → C → Y X ← A B → Y

The first path in Equation 6 is causal but the rest are back-door paths. An adjustment set that identifies the direct effect of X on Y must block all three paths. There are two minimally sufficient sets that meet the single-door criterion, {A, C} and {B, C}. Each covariate set deactivates all three paths in Equation 6. Thus, the coefficients for X in the regressions of Y on X, A, and C and Y on X, B, and C are both equal to the direct effect. In other words, the direct effect of X on Y is estimated by the coefficients βYX·AC and βYX·BC in the two separate unstandardized population regression equations listed next:

Y = β01+ βYX·AC X + βYA·XC A + βYC·AX C Y = β02+ βYX·BC X + βYB·XC B + βYC·BX C

Locating Instruments

Next, we state a graphical definition of an instrument:

Instrumental variable. In linear models with continuous outcomes, a variable Z is an instrument for the path X → Y if

1. Z is d-separated from Y in the modified graph formed by deleting the path X → Y from the original model; and

2. Z is not d-separated from X in the modified graph. If an instrument Z exists for the path X → Y, then the coefficient is identified and can

be estimated using 2SLS regression.

Page 10: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

Consider the graph in Figure 4(a). The relation between X and Y forms what Brito and

Pearl (2003) call a bow-arc, a pair of variables in which one variable is a direct cause of the other and whose error terms are correlated (see also Figure 1(a))5. An unmeasured common cause of X and Y implied by the bidirectional path in Figure 4(a) is explicitly represented in Figure 4(b) as the latent variable U with direct effects on both X and Y. Now, there are two back-door paths between X and Y:

X ← A → B→ Y (7)

X ← U → Y Since U is latent, it cannot appear in an adjustment set, and the second path in Equation 7 cannot be closed through covariate adjustment. Moreover, A is not an instrument for X because in the modified graph without the path X → Y, shown in Figure 4(c), variable A is d-connected to Y by the path A → B → Y (8) With no instrument, it might seem that the coefficient for the path X → Y in Figure 4 is not identified unless more measured variables were added, such as a proxy measure of the hypothetical construct represented by U in the diagram.

But we can actually identify the direct effect of X on Y in Figure 4 by creating a conditional instrument (Brito & Pearl, 2002). Here, variable A is rendered an instrument for X after controlling for B, which is designated here as A | B. Controlling for B closes the open path between A and Y in Equation 8, and thus d-separates A and Y in Figure 4(c). At the same time, the variable A | B is not d-separated from X because controlling for B does not sever the direct path from A to X in Figure 4(c). Thus, A | B is a proper instrument for causal variable X. In general, a variable Z is a conditional instrument for the path X → Y when controlling for W if conditions 1 and 2 above are satisfied when controlling for W. Chalak and White (2011) described extended methods for making instruments out of variables that may not be proper instruments in their original state, but can be transformed in order to help identify and thus estimate substantive causal effects.

The back-door criterion, single-door criterion, and instrumental variables represent the most basic of the graphical methods for identifying causal effects in linear path models. More powerful graphical methods for identifying path models include generalized instrumental sets (Brito & Pearl, 2002), the half-trek criterion6 (Chen,Tian, & Pearl, 2014; Foygel, Draisma, & Drton, 2012), and auxiliary instrumental sets (Chen, Pearl, & Bareinboim, 2016). These methods are not covered here, but more information can be found in the works just cited.

Testable Implications

The concept of d-separation formalizes the intuition that open paths carry associational

information between variables and that this flow of information can be blocked by covariate selection. This fundamental concept underlies all the graphical identification criteria considered

5Brito and Pearl (2003) showed that any partially recursive models without bow-arcs is identified.6The half-trek criterion is also applicable to directed cyclic graphs with causal loops, and a computer tool that implements this identification method is described by Foygel et al. (2012).

Page 11: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

to this point. Once the data are collected, the same idea can also guide the evaluation of the correspondence between model and data at the level of local fit since d-separation implies certain patterns of conditional independences, or pairs of variables that should be rendered independent after controlling for certain other variables.

For recursive path models, the number of implied conditional independences exactly equals the model degrees of freedom, or dfM. For example, if dfM = 5 for a recursive model, then there are a total of five partial correlations that should vanish (equal zero), given the configuration of paths specified in the model. These five vanishing partial correlations represent all the constraints that the model specification imposes on the covariance matrix. As a result, they represent all the testable implications of the model, assuming that the variables are normally distributed7. The model chi-square test where dfM = 5 for the same graph can be seen as an overall significance test of whether all five partial correlations vanish, but inspecting the values of the sample partial correlations provides all the details behind the omnibus chi-square test.

Consider Figure 3(c), where variables S and Y are not d-separated by{O, X}; similarly, variables X and O are not d-separated by {S, Y}. (Recall that the path is equivalent to →, ←, and when it connects two exogenous variables.) Consequently, the graph has no degrees of freedom, and there are no testable implications. This conclusion matches the intuition that models where dfM = 0 test no specific hypothesis. For Figure 3(d), dfM = 4, so there are a total of four vanishing partial correlations, which are listed next:

ˆ ˆ ˆ ˆρ ρ ρ ρ 0= = = =g gAB AX CX B BY ACX (9) Exogenous variables A and B in Figure 3(d) are specified as independent, so no covariate is needed to render them unrelated. Likewise, there are neither causal paths nor confounding paths in the graph that connect A and X, so they are also predicted to be orthogonal. A confounding path connects variables C and X, but controlling for their common cause, or B, renders this pair independent. Variables B and Y in Figure 3(b) are connected by the three paths listed next: B → C→ Y (10)

B → X → Y B → C ← A → Y

Controlling for both C and X closes the two causal paths in Equation 10. The last path is already blocked by the collider C, but it remains closed after controlling for variable A. Thus, the set {A, C, X} d-separates variables B and Y in Figure 3(d) and ρ gBY ACX = 0. We leave to the reader as an exercise to prove that Figure 3(e) where dfM = 3 implies the three vanishing correlations listed next: ˆ ˆ ˆρ ρ ρ 0= = =g g gBC A BX A AY BCX (11) The reader can use one of the freely-available computer tools for analyzing directed acyclic graphs described later in this work.

There are ways to derive minimal sets of implied vanishing partial correlations, called basis sets, that consist of the smallest number of conditional independences that imply all others 7Recursive path models with non-normally distributed exogenous variables may impose constraints on third order moments (Shimizu & Kano, 2008).

Page 12: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

for a particular graph. For the same graph, there are may be multiple basis sets, each with the same overall number of conditional independences, that can be derived using different but logically-consistent methods to generate a basis set. We do not cover basis sets here, but see Kang and Tian (2009), Pearl (2009, pp. 142–145), or Shipley (2000, pp. 61–63) for more information.

Hypothesis testing of vanishing partial correlations can be done using a Fisher Z transformation, which is fairly straightforward to implement in standard statistical tools. Additionally, Shipley (2000) proposed a method for simultaneously testing all of the implied vanishing partial correlations. In general, failure of such tests should be taken seriously, and, in the event of failure, the model should be respecified in a theoretically meaningful way. If no such respecification exists, then no model should be retained.

In some cases, appreciable departures from zero may fail to be statistically significant in small samples. As a result, we recommend inspecting the actual values of the partial correlations being tested. (Calculation of partial correlations from sample data can be performed using basically any computer tool for general statistical analyses, such as SPSS or SAS/STAT.) The rule of thumb that absolute discrepancies between predicted and sample correlations > .10 may signal appreciable discrepancy between model and data seems reasonable (e.g., Pett et al., 2003; Tabachnick & Fidell 2013). If the partial correlation is above .10 but the significance test passes, then more data should be collected to ensure that the model is properly evaluated.

It is also possible that trivial departures from zero could be statistically significant in large samples. While failure of the test in such cases is indicative of a non-zero partial correlation, the partial correlation may be so small that it is not necessary to reject the model. As a result, it may be prudent to instead perform a hypothesis test that the partial correlation is below some threshold value, such as .10 in absolute value.

Obtaining multiple estimators using the above graphical identification criteria also provides opportunities to test the model. In some cases, these overidentifying restrictions may be equivalent to a conditional independence constraint. In nonrecursive models, however, they may represent new testable implications. Testing whether these overidentifying restrictions hold can be done using the Durbin-Wu-Hausman test, also called the Hausman specification test (Hausman, 1978). This test is commonly implemented in standard statistical tools. Like other statistical significance tests, common sense should be applied when interpreting the results, particularly in very small or very large samples. This caution is also consistent with recent concerns about widespread misinterpretation of p values in significance testing expressed by the American Statistical Association (2016).

Equivalent Models

Because conditional independences represent all of the constraints a recursive model

imposes on the covariance matrix, two recursive models that share the same implied conditional independences cannot be distinguished using the sample covariance matrix8. In such cases, we say that the models are covariance equivalent. In nonrecursive models, there may be additional constraints that are not conditional independences. As a result, sharing the same conditional independences is a necessary but insufficient condition for covariance equivalence in nonrecursive models. Most authors of SEM studies fail to acknowledge the existence of equivalent models, which is a serious form of confirmation bias (Hoyle & Isherwood, 2013), but 8Again, if the variables are normally distributed, then covariance equivalence implies general equivalence, and two such models cannot be distinguished using data at all.

Page 13: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

the criteria provided in this paper make it easy for authors of SEM studies to directly address this issue.

One way to check whether two models share the same conditional independences is to apply the idea of d-separation. But in complex models with many variables, applying the d-separation criterion by hand can be tedious, if not infeasible. Instead, if the models are recursive, we can simply check that the two models have the same skeleton and v-structures (Pearl, 2009, pp. 145–149). The skeleton is obtained by replacing all of the paths in the model’s graph with undirected paths, for which the symbol is ─. A v-structure has a collider whose parents are not connected by a path. For example, in Figure 3(e), the chain

X → Y ← B

is a v-structure, but the chain

A → C ← X is not a v-structure because A and X are connected by the path A → X. The skeleton and v-structures of a graph can be seen together by replacing all of the paths in the model's graph with undirected paths, so long as that path does not contribute to a v-structure.

For example, Figures 5(a)–5(c) are covariance equivalent models. This claim can be verified by checking that these graphs share the same the skeleton and v-structures, which are depicted in Figure 5(d). Reversing the direction of the path between the pair X and Z or between the pair Y and Z neither introduces nor destroys a v-structure, but we cannot reverse the path from W to V in any of the equivalent models. Doing so generates a new v-structure, or X → W ← V

If the models are nonrecursive, then we need to make a slight modification before checking that the skeleton and v-structures are the same. For every two variables, X and Y, joined by a bidirected dashed path, we replace the path with a new latent common cause, e.g. X ← L → Y. For example, in Figure 6(a) the bidirected path between A and Y would be replaced by a latent common cause, as shown in Figure 6(c). Once all bidirected paths in the graphs have been replaced in this manner, then we can check whether they share the same skeleton and v-structures. If they do, then the models imply the same set of conditional independences and vanishing partial correlations. Again, this does not imply that the models are covariance equivalent, since nonrecursive models may impose additional constraints on the covariance matrix.

In the traditional SEM literature, the Lee-Hershberger replacing rules (Lee & Hershberger, 1990) are the most widely known method for generating equivalent path models, which in theory should all fit the same data equally well but feature different patterns of paths among the same observed variables. Applying the replacing rules involves reversing certain direct effects (e.g., replacing the path X → Y with the path Y → X) or substituting one kind of path for another (e.g., swapping a direct effect for an error covariance or vice versa). But applying the Lee-Hershberger replacing rules can result in a conditional independence being created or destroyed. If so, then the two models, original and respecified, are not d-separation equivalent, and thus have different testable implications. Such models are not truly equivalent.

The problem just described seems most likely to happen when application of the replacing rules changes the status of a variable as a collider, which has implications for d-separation. Consider the graph in Figure 6(a), which features the three paths between X and Y listed next:

Page 14: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

X → A ← U → Y (12)

X → B → A ← U → Y X → B → Y

where U represents a latent cause of A and Y and thus replaces their error covariance. The first two paths in Equation 12 are blocked by the collider A, but the third path can be closed by controlling for B; thus, Figure 6(a) implies ρ gXY B= 0 in a linear model with continuous outcomes.

In Figure 6(b), the direct effect between variables A and B has been reversed relative to the original graph in Figure 6(a). Although this reversal is permitted under the replacing rules because A and B have a common cause, X, doing so destroys the sole conditional dependency implied by the original model. To witness, consider the paths between X and Y Figure 6(b):

X → A ← U → Y (13)

X → B ← A ← U → Y X → B → Y

Now variable B is a collider in the second path of Equation 13. Controlling for B will close the third path, but doing so opens the second path, and thus, induces a spurious association between X and Y. Consequently, Figure 6(b) does not imply ρ gXY B = 0, and the two graphs in Figure 6 are not equivalent. This fact is also apparent from inspecting the v-structures; specifically, the v-structure in Figure 6(c),

B → A ← L is destroyed when the path between A and B is reversed in Figure 6(d). As a result, the two graphs are not equivalent. Thus, in some cases, the Lee-Hershberger replacement rules may generate or destroy conditional independences. In contrast, covariance equivalent models can easily be generated for a recursive model using the skeleton and v-structures. For nonrecursive models, models generated from the skeleton and v-structures may not be fully covariance equivalent, but will at least share the same implied conditional independences. See Pearl (2009, pp. 146–148) for additional examples.

Computer Tools

There are freely-available computer tools for analyzing directed acyclic graphs that automatically apply the graphical methods just described. Some of these tools are stand-alone applications for installation on personal computers, but others are online applications that analyze graphs drawn by the user within an Internet browser. All such tools support the evaluation of a path diagram in the planning stages for a study. Some of these computer tools are described next. This list is not comprehensive, but all these tools help researchers to reap the potential benefits of analyzing their graphs and thus testing their ideas before collecting the data:

1. The DAGitty program is web-based tool for analyzing causal graphs that can also be used

Page 15: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

offline (Textor, Hardt, & Knüppel, 2011).9 It evaluates whether direct or total effects are identified either through covariate selection or through the analysis of instruments, including conditional instruments. It also lists conditional independences implied by the graph.

2. The Belief and Decision Network Tool (Porter et al., 1999–2009) is a Java applet forlearning about the concept of d-separation.10 For example, after drawing a graph onscreen,this program can then be optionally run in “ask the applet” mode, where the user clicks ontwo focal variables and a set of covariates, and the program automatically indicates whetherthe focal variables are independent, given those covariates.

3. The dagR package for R (Breitling, 2010) provides a set of functions for drawing,manipulating, and analyzing directed acyclic graphs and also simulating data consistentwith the corresponding diagram.11 It can evaluate effects of analyzing different subsets ofcovariates when estimating causal effects of exposure variables on outcome variables,among other capabilities. Graphs are specified in syntax, but the corresponding graph of themodel can be manipulated in the R environment.

Example Problem

Presented in Figure 7(a) is a variation on a path model analyzed by Roth, Wiebe, Fillingim, and Shay (1989). We assume a linear model with continuous variables. The hypotheses are that

1. exercise (E) and mental hardiness (H) covary;2. variables E and H each indirectly affects health problems (P) through, respectively,

physical fitness (F) and stress (S); and3. the errors of F and P covary; that is, they share an unmeasured common cause.

The relation between F and P forms a bow-pattern error covariance. As a result, there is no adjustment set that d-separates F from P. Additionally, there is no proper instrument for explanatory variable F. For example, variable E is not a proper instrument because in the modified graph without F → P, variables E and P remain d-connected by the back-door path

E H → S → P

It would seem that the coefficient for F → P is not identified. In fact, each and every causal effect in Figure 7 is identified, if we analyze conditional

instruments in the 2SLS method. If we also apply the single-door criterion and the back-door criterion, we can generate additional OLS estimators using adjustment sets. The graph also implies certain vanishing partial correlations, which can be located by the d-separation criterion. The empirical results just mentioned can be generated using standard statistical software, which for this example is SPSS.

Readers can verify analysis options for this example using DAGitty. Error covariances are represented in DAGitty by specifying latent (unmeasured) variables as common causes, such

9http://www.dagitty.net/ 10http://aispace.org/bayes/ 11http://cran.r-project.org/web/packages/dagR/

Page 16: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

as U1 and U2 in Figure 7(b). See the appendix for code that automatically generates the graph for this example in DAGitty. The total number of available estimators reported by DAGitty is indicated in parentheses for each causal effect listed next:

1. F → P (3). The DAGitty tool indicates that the coefficient for this direct effect is identified through the conditional instruments E |S and H |S. Controlling for S in each conditional instrument just listed d-separates variables E and H from the outcome P in the modified graph without F → P. It is also true that the same direct effect is identified by the conditional instrument E |H.

2. S → P (3). Two different adjustment sets, E and H, each identify the direct effect: Controlling for either E or H blocks the back-door path S ← H E → F → P in the modified graph without S → P. The conditional instrument H | (F, E) also identifies the same direct effect.

3. E → F, H → S (3 each). There are no back-door paths between either pair of explanatory and outcome variables; thus, the coefficient for each direct effect can be estimated in bivariate regression (i.e., the adjustment set is ∅ , the empty set). Two different instruments are also available for each direct effect, H and S for E → F and E and F for H → S.

4. E → F → P (12). This indirect effect is also a total effect. It is identified in the OLS method by two different adjustment sets, H and S, each of which blocks the back-door path E H → S → P The conditional instrument H |S also identifies the total effect because controlling for S d-separates H and P in the modified graph without E → F → P. Because there are three different estimators for each constituent direct effect, E → F and F → P (points 1 and 3), there are also a total of nine product estimators for this indirect (total) effect.

5. H → S → P (10). The adjustment set E identifies the total effect of H on P because controlling for E blocks the back-door path H E → F → P Because there are three different estimators for each constituent direct effect, H → S and S → P (points 2 and 3), there are also a total of nine product estimators for this indirect (total) effect.

6. Listed next are five vanishing partial correlations implied by Figure 7: ˆ ˆ ˆ ˆ ˆρ ρ ρ ρ ρ 0= = = = =g g g g gES H HF E FS H FS E HP ES For example, the pair E and S are d-separated in the graph by H, and the pair H and P are d-separated by the pair E and S.

A veritable cornucopia of estimators and testable implications is available for this example by applying concepts from graph theory. Summary statistics (correlations, standard deviations, and means) for the observed variables in Figure 7 measured within a sample of 373 university students were analyzed in SPSS in order to generate results for this example. Syntax files with the data for this analysis summarized in matrix form and output files for SPSS in both plain text and Adobe PDF format can be downloaded from the supplemental materials page for this article. The two equations described next are helpful in the 2SLS method when analyzing summary statistics instead of raw data: The unstandardized sample coefficient for the path X → Y when the instrument is Z equals

,YZ

YX ZXZ

BBB

= (14)

Page 17: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

where BYZ is the unstandardized bivariate coefficient when regressing Y on Z and BXZ is the analogous coefficient when regressing X on Z12. The unstandardized coefficient for the path X → Y when Z |W is the conditional instrument equals

., |

.

YZ WYX Z W

XZ W

BBB

= (15)

where BYZ.W is the sample partial regression coefficient for Z when Y is predicted by both Z and W and BXZ.W is the corresponding coefficient for Z when X is regressed on both Z and W.

Reported in the top part of Table 1 are unstandardized estimators of direct effects for this example. Estimators for E → F and H → S vary in both direction and magnitude, and thus are very inconsistent. This pattern attests to likely specification error. For example, the 2SLS results for E → F, −.646 and .719 (see the table), each assumes that the corresponding instrument, respectively, H and S, has no direct effect on F. Additionally, they assume that there is no bidirectional path that connects either instrument just mentioned with F (see Figure 7). At least one of these assumptions is wrong. Estimators of F → P agree in direction (all negative) but appreciably vary in magnitude; the range is −6.927 to −.558. The outlier value, or −6.927, is from the 2SLS analysis that assumes no direct effect between variables H and P. Values of multiple estimators for S → P are all > 0 and generally similar in magnitude (.597–1.161), which is more reassuring. Given inconsistent estimators for three out of four direct effects, it is not surprising that results for two indirect (total) effects are also uneven. For example, estimators in Table 1 for the total effect of E on P range from −4.981 to 4.475. At least one of the assumptions that underlie these estimators is wrong, but it is difficult at this point to discern the precise nature of the problem. Results described next offer additional clues.

Reported in Table 2 are values of sample partial correlations for conditional independences implied by Figure 7. Absolute correlations that exceed .10 are shown in boldface in the table. The results

ρ gFS H = −.117 and ρ gFS E = −.120

speak to the relative failure of the prediction that F and S are independent, controlling for either H or E (i.e., fitness and stress are unrelated after taking account of either hardiness or exercise). The omission in Figure 7 of a path between this pair of variables, such as

F → S, S → F, or F S where the last path just listed represents correlated errors, is consistent with the results in Table 2. Absolute values of some other partial correlations in the table nearly exceed .10, and these results suggest that the omission of direct effects from H to F or P may also be specification errors. Respecification of the original model in Figure 7 based on the partial correlations just described would require a specific rationale, such as the prediction that fitness affects stress instead of vice versa, among other possibilities. In any event, all results considered so far indicate that Figure 7 is clearly not supported by the data. 12The estimator resulting from plugging in estimates for BYZ and BXZ into Equation (14) is the Wald estimator.

Page 18: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

Fitting the model in Figure 7 to the data for this example with default ML estimation in standard SEM computer tools leads to problems. For example, both LISREL and Mplus generate solutions with no warning or error messages, but the estimated correlation between the parameter estimates for F → P and for the error variance of P is −.920. (Specify option PC in LISREL or Tech1 and Tech3 in Mplus to obtain this result). Clearly, the estimates for these parameters are highly dependent, which indicates an identification problem. The correlation matrix of parameter estimates is optional output in many standard SEM computer tools, but a less technically sophisticated user may not even know to ask for it. Additionally, without applying the concept of d-separation, the user would not know which zero partial correlations to check using the correlation matrix. Neither Mplus nor LISREL—nor any other standard SEM computer tool—indicate the availability of multiple estimators for causal effects.

Values of selected global fit statistics in ML estimation for this example are listed next:

2Mχ (4) = 10.529, p = .032

RMSEA = .066, 90% CI [.017, .116] CFI = .958; SRMR = .051

The model chi-square is marginally significant, and the upper bound of the confidence interval based on the RMSEA is unfavorable, but other results do not indicate grossly poor global fit. Through selective reporting (e.g., omit the CI based on the RMSEA) and ignoring the failed chi-square test—both common practices in the SEM literature (Ropovik, 2015)—a researcher could potentially argue for retaining the model. But applying the microscope of graphical tools to this problem tells us that any decision to retain Figure 7, given the data, is imprudent. Readers can also download from the supplemental materials web page the results in LISREL and Mplus just described for simultaneous estimation. Conclusion

The benefits of causal graphs are usually attributed to their ability to represent theoretical assumptions visibly and transparently, by abstracting away unnecessary algebraic details. Not as widely recognized among practitioners of traditional SEM is the inferential power of analyzing graphs apart from analyzing the data. We demonstrated how a few basic graphical tools can be applied to evaluate the identification status of causal effects including the characterization of instrumental variables, recognition of the availability and specific form of multiple estimators of the same causal effect, and the reading of vanishing partial correlations from graphs that can give rise to new methods for assessing local fit. The same methods can also be used to determine whether two graphs for the same variables are equivalent in the terms of their testable implications. We believe that such graphical tools can enrich traditional methods of analyzing path models in SEM.

Page 19: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

Appendix

DAGitty Model Code for Figure 7

Copy the text below and paste it in the model code window of DAGitty in order for the path diagram to be to automatically drawn. The blank line must be retained in order for the code to correctly execute. E 1 @-1.300,-0.500 F 1 @-0.700,-0.500 H 1 @-1.300,0.300 P 1 @-0.200,-0.100 S 1 @-0.700,0.300 U1 U @-1.800,-0.100 U2 U @-0.250,-0.450 E F F P H S S P U1 E H U2 F P

Page 20: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

References

American Statistical Association (2016). Statement on statistical significance and p-values [Press release]. Retrieved

from https://www.amstat.org/newsroom/pressreleases/P-ValueStatement.pdf Bekker, P. A., Merckens, A., & Wansbeek, T. J. (1994). Identification, equivalent models and computer algebra.

Boston: Academic Press. Bollen, K. A. (2012). Instrumental variables in sociology and the social sciences. Annual Review of Sociology, 38,

37–72. doi: 10.1146/annurev-soc-081309-150141 Bollen, K. A., Kirby, J. B., Curran, P. J., Paxton, P. M. & Chen, F. (2007). Latent variable models under

misspecification: Two-stage least squares (TSLS) and maximum likelihood (ML) estimators. Sociological Methods & Research, 36, 48–86. doi: 10.1177/0049124107301947

Breitling, L. P. (2010). dagR: A suite of R functions for directed acyclic graphs. Epidemiology, 21, 586–587. doi: 10.1097/EDE.0b013e3181e09112

Brito, C., & Pearl, J. (2002). Generalized instrumental variables. In A. Darwiche & N. Freeman (Eds.), Proceedings of the eighteenth conference on uncertainty in artificial intelligence (pp. 85–93). San Francisco: Morgan Kaufmann.

Brito, C., & Pearl, J. (2003). A new identification condition for recursive models with correlated errors. Structural Equation Modeling, 9, 459–474. doi: 10.1207/S15328007SEM0904_1

Chalak, K., & White, H. (2011). Viewpoint: An extended class of instrumental variables for the estimation of causal effects. Canadian Journal of Economics, 44, 1–51. doi: 10.1111/j.1540-5982.2010.01622.x

Chen, B., Pearl, J., & Bareinboim, E. (2016). Incorporating knowledge into structural equation models using auxiliary variables. Unpublished manuscript. Retrieved from https://arxiv.org/pdf/1511.02995.pdf

Chen, B., Tian, J. & Pearl, J. (2014). Testable implications of linear structural equation models. In C. E. Brodley & P. Stone (Eds.), Proceedings of the 28th AAAI conference on artificial intelligence (pp. 2424–2430). Palo Alto, CA: AAAI Press.

Foygel, R., Draisma, J., & Drton, M. (2012). Half-trek criterion for generic identifiability of linear structural equation models. Annals of Statistics, 40, 1682–1713. doi: 10.1214/12-AOS1012

Grace, J. B., Schoolmaster Jr., D. R., Guntenspergen, G. R., Little, A. M., Mitchell, B. R., Miller, K. M., & Schweiger, E. W. (2012). Guidelines for a graph-theoretic implementation of structural equation modeling. Ecosphere, 3(8). doi: 10.1890/ES12-00048.1

Hausman, J. A. (1978). Specification tests in econometrics. Econometrica: Journal of the Econometric Society, 1251–1271.

Hoyle, R. H., & Isherwood, J. C. (2013). Reporting results from structural equation modeling analyses in Archives of Scientific Psychology. Archives of Scientific Psychology, 1, 14–22. doi:10.1037/arc0000004

Kang, C., & Tian, J. (2009). Markov properties for linear causal models with correlated errors. Journal of Machine Learning Research, 10, 41–70. Retrieved from http://www.jmlr.org/

Kenny, D. A., & Milan, S. (2012). Identification: A nontechnical discussion of a technical issue. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (pp. 145–163). New York: Guilford.

Kline, R. B. (2016). Principles and practice of structural equation modeling (3rd ed.). New York: Guilford. Lee, S., & Hershberger, S. L. (1990). A simple rule for generating equivalent models in covariance structure

modeling. Multivariate Behavioral Research, 25, 313–334. doi: 10.1207/s15327906mbr2503_4 MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation modeling in psychological research.

Annual Review of Psychology, 51, 201–236. doi: 10.1146/annurev.psych.51.1.201 Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social

research (2nd ed.). New York: Cambridge University Press. Paxton, P., Hipp, J. R., & Marquart–Pyatt, S. T. (2011). Nonrecursive models: Endogeneity, reciprocal

relationships, and feedback loops. Thousand Oaks, CA: Sage. Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). New York: Cambridge University Press. Pett, M. A., Lackey, N. R., & Sullivan, J. J. (2003). Making sense of factor analysis: The use of factor analysis for

instrument development. Thousand Oaks, CA: Sage. Porter, K., Poole, D., Kisynski, J., Sueda, S., & Knoll, B., Mackworth, A., … Hoos, H., Gorniak, P., & Conati, C.

(1999–2009). Belief and Decision Network Tool Version 5.1.10 [Computer software]. Retrieved from http://aispace.org/bayes

Rigdon, E. E. (1995). A necessary and sufficient identification rule for structural models estimated in practice. Multivariate Behavioral Research, 30, 359–383. doi: 10.1207/s15327906mbr3003_4

Romney, D. M., Jenkins, C. D., & Bynner, J. M. (1992). A structural analysis of health-related quality of life

Page 21: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

dimensions. Human Relations, 45, 165–176. doi: 10.1177/001872679204500204 Ropovik, I. (2015). A cautionary note on testing latent variable models. Frontiers in Psychology, 6, 1715. doi:

10.3389/fpsyg.2015.01715 Roth, D. L., Wiebe, D. J., Fillingim, R. B., & Shay, K. A. (1989). Life events, fitness, hardiness, and health: A

simultaneous analysis of proposed stress-resistance effects. Journal of Personality and Social Psychology, 57, 136–142. doi: 10.1037/0022-3514.57.1.136

Shimizu, S., & Kano, Y. (2008). Use of non-normality in structural equation modeling: Application to direction of causation. Journal of Statistical Planning and Inference, 138, 3483–3491. doi: 10.1016/j.jspi.2006.01.017

Shipley, B. (2000). A new inferential test for path models based on directed acyclic graphs. Structural Equation Modeling, 7, 206–218. doi: 10.1207/s15328007sem0702_4

Spirtes, P. (1995, August). Directed cyclic graphical representations of feedback models. In P. Besnard & S. Hanks (Eds.), Proceedings of the 11th conference on uncertainty in artificial intelligence (pp. 491–498). San Francisco: Morgan Kaufmann.

Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics (6th ed.) Boston: Pearson. Textor, J., Hardt, J., & Knüppel, S. (2011). DAGitty: A graphical tool for analyzing causal diagrams. Epidemiology,

5, 745. doi: 10.1097/EDE.0b013e318225c2be Tomarken, A. J., & Waller, N. G. (2003). Potential problems with “well-fitting” models. Journal of Abnormal

Psychology, 112, 578–598. doi: 10.1037/0021-843X.112.4.578 Vermeulen, I., & Hartmann, T. (2015). Questionable research and publication practices in communication science.

Communication Methods and Measures, 9, 189–192. doi:10.1080/19312458.2015.1096331 Wright, S. (1920). The relative importance of heredity and environment in determining the piebald pattern of

guinea-pigs. Proceedings of the National Academy of Sciences, 6, 320–332. doi:10.1073/pnas.6.6.320

Page 22: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

Table 1 Estimators of Causal Effects for Figure 7

Estimator

Effect OLS 2SLS Product

Direct E → F .108 (∅) −.646 (H) — .719 (S) H → S −.203 (∅) 1.469 (E) — −1.637 (F) F → P — — −.558 (E |S) — −6.927 (H |S) −.734 (E |H) S → P .628 (E) 1.161 (H |F,E) — .597 (H) Indirect E → F → P −.080 (H) 1.852 (H |S) −.060 −.748 −.079 −.059 (S) .360 4.475 .474 −.401 −4.981 −.528 H → S → P −.267 (E) — −.127 −.121 −.236 .923 .877 1.706 −1.028 −.977 −1.901

Note. E, exercise; H, mental hardiness; F, physical fitness; S, stress; P, health problems; OLS, ordinary least squares; 2SLS, two-stage least squares; ∅ , empty (null) set. All estimators are unstandardized. Indirect effects are also total effects. Reported in parentheses for OLS estimators is the adjustment set and for 2SLS estimators is the instrument or conditional instrument.

Page 23: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

Table 2 Sample Values of Vanishing Partial Correlations Implied by Figure 7

Conditional independence Partial correlation

ES ·H −.058 HF ·E .089 FS ·H −.117 FS ·E −.120 HP ·ES −.093

Note. E, exercise; H, mental hardiness; F, physical fitness; S, stress; P, health problems.

Page 24: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

Figure 1. Minimum required specifications in Rigdon’s (1995) graphical method for identifying blocks of two nonrecursively-related variables, X and Y, with instruments, Z; xor, exclusive or (either Z1 or Z2, but not both).

X Y

Z1 Z2

xor

(b) Loop, no error covariance

and

Z2

X Y

Z1

(c) Loop with error covariance

(a) Bow pattern or error in exogenous

X

Z

Y

Page 25: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

Figure 2. Types of paths that involve three variables.

(a) Causal

X W Y

(b) Confounding

W

Y

X

(c) Collider

Y

X

W

Page 26: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

Figure 3. Original graph with error term of C explicitly displayed (a), and additionally with correlation induced by conditioning on the virtual collider C displayed (b). Simple graph with two causes of Y, (c), and a larger example (d). An original graph (e), and the original graph modified by deleting the direct effect from X to Y (f ).

(e) Original

D C

B

X Y

A

(f) Modified No X � Y D C

B

X Y

A

(b) Virtual collider

X

C

B

Y

A

Uc

D

(a) Explicit error term of C

C

B

X Y

A

Uc

D

(c) Two causes of Y

S O

X Y

C

X

B Y

A

(d) Larger example

Page 27: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

Figure 4.

An original graph with an implied unmeasured common cause, or confounder (a). The graph respecified to show the confounder, U (b). The original graph modified by deleting the direct effect from X to Y (c).

(a) Implied confounder

B

X Y

A

(b) Explicit confounder

B

X Y

A

U

(c) Modified No X � Y B

X Y

A

U

Page 28: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

Figure 5. Example equivalent models (a)–(c). Their equivalence can be verified by checking that they share the same skeleton (d) and v-structure, which is X → W ← Y for graphs (a)–(c).

(d) Skeleton with v-structure

X Y

Z

W

V

(b) Equivalent model 2

X Y

Z

W

V

(a) Equivalent model 1

X Y

Z

W

V

(c) Equivalent model 3

X Y

Z

W

V

Page 29: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

Figure 6. An original graph (a) and a modified version generating by applying the replacing rules that is not d-separation equivalent (b). The original graph with bidirected path replaced with latent common cause (c). The modified graph with bidirected path replaced with latent common cause (d).

(a) Original ρ 0XY B =g

A

B

X Y

(b) Modified ρ 0XY B ≠g

A

B

X Y

A

B

X Y

L

A

B

X Y

L

(c) Original with bidirected path replaced with latent common cause

(d) Modified with bidirected path repalced with latent common cause

Page 30: FKRPHWULND . TECHNICAL REPORT · addition to the typical global tests (Vermeulen & Hartmann, 2015), which can mask poor model-data correspondence at the level of residuals (Tomarken

Figure 7. Example path model expressed as a directed acyclic graph (a). The same model as it would be expressed in the DAGitty computer tool, where U1 and U2 are latent variables (b). E, exercise; H, mental hardiness; F, physical fitness; S, stress; P, health problems.

(b) Representation in DAGitty

E F

H S

P

U2

U1

(a) Example path model

E F

H S

P