path analysis

16
1 Path Analysis Overview Path analysis is an extension of the regression model, used to test the fit of the correlation matrix against two or more causal models which are being compared  by the researcher. The model is usually depicted in a circle-and-arrow figure in which single-headed arrows indicate causation. A regression is done for each variable in the model as a dependent on others which the model indicates are causes. The regression weights predicted by the model are compared with the observed correlation matrix for the variables, and a goodness-of-fit statistic is calculated. The best-fitting of two or more models is selected by the researcher as the best model for advancement of theory. Path analysis requires the usual assumptions of regression. It is particularly sensitive to model specification because failure to include relevant causal variables or inclusion of extraneous variables often substantially affects the path coefficients, which are used to assess the relative importance of various direct and indirect causal paths to the dependent variable. Such interpretations should be undertaken in the context of comparing alternative models, after assessing their goodness of fit discussed in the section on structural equation modeling (SEM  packages are commonly used today for path analysis in lieu of stand-alone path analysis programs). When the variables in the model are latent variables measured  by multiple observed indicators, path analysis is termed structural equation modeling, treated separately. We follow the conventional terminology by which  path analysis refers to modeling single-indicator variables. Key Concepts and Terms o Estimation . Note that path estimates may be calculated by OLS regression or by MLE maximum likelihood estimation, depending on the computer package. Two-Stage Least Squares (2SLS) , discussed separately, is another path estimation procedure designed to extend the OLS regression model to situations where non- recursivity is introduced because the researcher must assume the covariances of some disturbance terms are not 0 (this assumption is discussed below). Click here for a separate discussion of 2SLS. o Path model. A path model is a diagram relating independent, intermediary, and dependent variables. Single arrows indicate causation between exogenous or intermediary variables and the dependent(s). Arrows also connect the error terms with their respective endogenous variables. Double arrows indicate correlation  between p airs o f exogen ous variab les. Sometimes the width o f the arrows in the path model are drawn in a width which is proportional to the absolute magnitude of the corresponding path coefficients (see below). o Causal paths to a given variable include (1) the direct paths from arrows leading to it, and (2) correlated  paths from endogenous variables correlated with others which have arrows leading to the given variable. Consider this model:

Upload: mohamed-kuthubudeen-t-m

Post on 05-Mar-2016

5 views

Category:

Documents


0 download

DESCRIPTION

Path Analysis using SPSS AMOS

TRANSCRIPT

Page 1: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 1/16

1

Path Analysis

Overview

Path analysis is an extension of the regression model, used to test the fit of the

correlation matrix against two or more causal models which are being compared by the researcher. The model is usually depicted in a circle-and-arrow figure inwhich single-headed arrows indicate causation. A regression is done for eachvariable in the model as a dependent on others which the model indicates arecauses. The regression weights predicted by the model are compared with theobserved correlation matrix for the variables, and a goodness-of-fit statistic iscalculated. The best-fitting of two or more models is selected by the researcher asthe best model for advancement of theory.

Path analysis requires the usual assumptions of regression. It is particularlysensitive to model specification because failure to include relevant causal

variables or inclusion of extraneous variables often substantially affects the pathcoefficients, which are used to assess the relative importance of various direct andindirect causal paths to the dependent variable. Such interpretations should beundertaken in the context of comparing alternative models, after assessing theirgoodness of fit discussed in the section on structural equation modeling (SEM packages are commonly used today for path analysis in lieu of stand-alone pathanalysis programs). When the variables in the model are latent variables measured by multiple observed indicators, path analysis is termed structural equationmodeling,  treated separately. We follow the conventional terminology by which path analysis refers to modeling single-indicator variables.

Key Concepts and Terms

o  Estimation. Note that path estimates may be calculated by OLS regression or by MLE maximum likelihoodestimation, depending on the computer package. Two-Stage Least Squares (2SLS), discussed separately, isanother path estimation procedure designed to extend the OLS regression model to situations where non-recursivity is introduced because the researcher must assume the covariances of some disturbance terms arenot 0 (this assumption is discussed below). Click here for a separate discussion of 2SLS.

o  Path model. A path model is a diagram relating independent, intermediary, and dependent variables. Singlearrows indicate causation between exogenous or intermediary variables and the dependent(s). Arrows alsoconnect the error terms with their respective endogenous variables. Double arrows indicate correlation

 between pairs of exogenous variables. Sometimes the width of the arrows in the path model are drawn in awidth which is proportional to the absolute magnitude of the corresponding path coefficients (see below).

Causal paths  to a given variable include (1) the direct paths from arrows leading to it, and (2) correlated paths from endogenous variables correlated with others which have arrows leading to the given variable.Consider this model:

Page 2: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 2/16

2

This model has correlated exogenous variables A, B, and C, and endogenous variables D and E. Error termsare not shown. The causal paths relevant to variable D are the paths from A to D, from B to D, and the pathsreflecting common anteceding causes -- the paths from B to A to D, from C to A to D, and from C to B to D.Paths involving two correlations (C to B to A to D) are not relevant. Likewise, paths that go backward (E toB to D, or E to B to A to D) reflect common effects and are not relevant.

Exogenous and endogenous variables. Exogenous variables in a path model are those with no explicit

causes (no arrows going to them, other than the measurement error term). If exogenous variables arecorrelated, this is indicated by a double-headed arrow connecting them. Endogenous variables, then, are thosewhich do have incoming arrows. Endogenous variables include intervening causal variables and dependents.Intervening endogenous variables have both incoming and outgoing causal arrows in the path diagram. Thedependent variable(s) have only incoming arrows.

Path coefficient/path weight. A path coefficient is a standardized regression coefficient (beta) showing thedirect effect of an independent variable on a dependent variable in the path model. Thus when the model hastwo or more causal variables, path coefficients are partial regression coefficients which measure the extent ofeffect of one variable on another in the path model controlling for other prior variables, using standardizeddata or a correlation matrix as input. Recall that for bivariate regression, the beta weight (the b coefficient forstandardized data) is the same as the correlation coefficient, so for the case of a path model with a variable asa dependent of a single exogenous variable (and an error residual term), the path coefficient in this specialcase is a zero-order correlation coefficient.

Consider this model, based on Bryman, A. and D. Cramer (1990). Quantitative data analysis for social scientists, pp. 246-251.

This model is specified by the following path equations:

Equation 1. satisfaction = b11age + b12autonomy + b13 income + e1 Equation 2. income = b21age + b22autonomy + e2 Equation 3. autonomy = b31age + e3 

where the b's are the regression coefficients and their subscripts are the equation number and variable number(thus b21 is the coefficient in Equation 2 for variable 1, which is age.

 Note: In each equation, only (and all of) the direct priors of the endogenous variable being used as thedependent are considered. The path coefficients, which are the betas in these equations, are thus thestandardized partial regression coefficients of each endogenous variable on its priors. That is, the beta for any

 path (that is, the path coefficient) is a partial weight controlling for other priors for the given dependentvariable.

Formerly called p coefficients, now path coefficients are called simply beta weights, based on usage inmultiple regression models. Bryman and Cramer computed the path coefficients = standardized regressioncoefficients = beta weights, to be:

Page 3: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 3/16

3

Correlated Exogenous Variables. If exogenous variables are correlated, it is common to label thecorresponding double-headed arrow between them with its correlation coefficient.

Disturbance terms.The residual error terms, also called disturbance terms, reflect unexplainedvariance (the effect of unmeasured variables) plus measurement error. Note that the dependent ineach equation is an endogenous variable (in this case, all variables except age, which is exogenous).

 Note also that the independents in each equation are all the variables with arrows to the dependent.

The effect size of the disturbance term for a given endogenous variable, which reflects unmeasuredvariables, is (1 - R 2), and its variance is (1 - R 2) times the variance of that endogenous variable,where R 2 is based on the regression in which it is the dependent and those variables with arrows toit are independents. The path coefficient is SQRT(1 - R 2).

The correlation between two disturbance terms is the partial correlation of the two endogenousvariables, using as controls all their common causes (all variables with arrows to both). Thecovariance estimate is the partial covariance: the partial correlation times the product of thestandard deviations of the two endogenous variables.

Path multiplication rule: The value of any compound path is the product of its path coefficients. Imagine a

simple three-variable compound path where education causes income causes conservatism. Let the regressioncoefficient of income on education be 1000: for each year of education, income goes up $1,000. Let theregression coefficient of conservatism on income be .0002: for every dollar income goes up, conservativismgoes up .0002 points on a 5-point scale. Thus if education goes up 1 year, income goes up $1,000, whichmeans conservatism goes up .2 points. This is the same as multiplying the coefficients: 1000*.0002 = .2. Thesame principle would apply if there were more links in the path. If standardized path coefficients (betaweights) were used, the path multiplication rule would still apply, but the the interpretation is in standardizedterms. Either way, the product of the coefficients along the path reflects the weight of that path.

Effect decomposition. Path coefficients may be used to decompose correlations in the model into direct andindirect effects, corresponding, of course, to direct and indirect paths reflected in the arrows in the model.This is based on the rule that in a linear system, the total causal effect of variable i on variable j is the sum ofthe values of all the paths from i to j. Considering "satisfaction" as the dependent in the model above, andconsidering "age" as the independent, the indirect effects are calculated by multiplying the path coefficientsfor each path from age to satisfaction:

age -> income -> satisfaction is .57*.47 = .26age -> autonomy -> satisfaction is .28*.58 = .16age -> autonomy -> income -> satisfaction is .28*.22 x .47 = .03total indirect effect = .45

That is, the total indirect effect of age on satisfaction is plus .45. In comparison, the direct effect   is onlyminus .08. The total causal effect  of age on satisfaction is (-.08 + .45) = .37.

Page 4: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 4/16

4

Effect decomposition is equivalent to effects analysis  in regression with one dependent variable. Pathanalysis, however, can also handle effect decomposition for the case of two or more dependent variables.

In general, any bivariate correlation may be decomposed into spurious and total causal effects, and the totalcausal effect can be decomposed into a direct and an indirect effect. The total causal effect  is the coefficientin a regression with all of the model's prior but not intervening variables for x and y controlled (the betacoefficient for the usual standardized solution, the partial b coefficient for the unstandardized or raw

solution). The  spurious effect   is the total effect minus the total causal effect. The direct effect   is the partialcoefficient (beta for standardized, b for unstandardized) for y on x controlling for all prior variables and allintervening variables in the model. The indirect effect   is the total causal effect minus the direct effect, andmeasures the effect of the intervening variables. Where effects analysis in regression may use a variety ofcoefficients (partial correlation or regression, for instance), effect decomposition in path analysis is restrictedto use of regression.

For instance, imagine a five-variable model in which the exogenous variable Education is correlated with theexogenous variable Skill Level, and both Education and Skill Level are correlated with the exogenousvariable Job Status. Further imagine that Education and each of the other two exogenous variables aremodeled to be direct causes of Income and also of Median House Value, which are the two dependentvariables. We might then decompose the correlation of Education and Income:

1.  Direct effect of Education on Income, indicated by the path coefficient of the single-headed arrow

from Education to Income.2.  Indirect effect due to Education's correlation with Skill Level, and Skill Level's direct effect on

Income, indicated by multiplying the correlation of Education and Skill Level by the pathcoefficient from Skill Level to Income.

3.  Indirect effect due to Education's correlation with Job Status, and Job Status's direct effect onIncome, indicated by multiplying the correlation of Education and Job Status by the pathcoefficient from Job Status to Income.

As a second example decomposition for the same five-variable model is a bit more complex if we wish to break down the correlation of the two dependent variables, Income and Median House Value. Since heresomewhat implausibly the two dependents are modeled not to have a direct effect from Income to HouseValue, the true correlation is hypothesized to be zero and all correlations are spurious.

4. 

The spurious direct effect of Education as a common anteceding variable directly causing both

dependents, indicated by multiplying the path coefficient from Education to Income by the pathcoefficient of Education to House Value.

5. 

The spurious direct effect of Skill Level as a common anteceding variable directly causing bothdependents, indicated by multiplying the path coefficient from Skill Level to Income by the pathcoefficient of Skill Level to House Value.

6. 

The spurious direct effect of Job Status as a common anteceding variable directly causing bothdependents, indicated by multiplying the path coefficient from Job Status to Income by the pathcoefficient of Job Status to House Value.

7. 

The spurious indirect effect of Education and Skill Level as a common antecedings variabledirectly causing both dependents, indicated by multiplying the path coefficient from Education toIncome by the correlation of Education and Skill Level by the path from Skill Level to HouseValue and adding the product of the path from Skill Level to Income by the correlation ofEducation and Skill Level by the path from Education to Median House Value.

8.  The spurious indirect effect of Education and Job Status as a common anteceding variables directlycausing both dependents, indicated by multiplying the path coefficient from Education to Income

 by the correlation of Education and Job Status by the path from Job Status to House Value andadding the product of the path from Job Status to Income by the correlation of Education and JobStatus by the path from Education to Median House Value..

9.  The spurious indirect effect of Skill Level and Job Status as a common anteceding variablesdirectly causing both dependents, indicated by multiplying the path coefficient from Skill Level toIncome by the correlation of Skill Level and Job Status by the path from Job Status to House Valueand adding the product of the path from Job Status to Income by the correlation of Skill Level andJob Status by the path from Skill Level to Median House Value..

Page 5: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 5/16

5

10.  The residual effect is the difference between the correlation of Income and Median House Valueand the sum of the spurious direct and indirect effects.

Correlated exogenous variables. The path weights connecting correlated exogenous variables are equal to thePearson correlations. When calculating indirect paths, not only direct arrows but also the double-headedarrows connecting correlated exogenous variables, are used in tracing possible indirect paths, except:

Tracing rule: An indirect path cannot enter and   exit on an arrowhead. This means that you cannot have adirect path composed of the paths of two correlated exogenous variables.

Significance and Goodness of Fit in SEM Path Models 

  OLS vs. SEM  While a series of OLS regressions may be used to implement path analysis, testingindividual path coefficients using the standard t or F test from regression output, today it is far morecommon to use  structural equation modeling (SEM) software. This section uses AMOS with amodel based on Ingram et al. (2000), used with the kind permission of Karl Wuensch. Use ofAMOS is described more fully in the section on structural equation modeling. A structural equationmodel with simple rather than latent variables is a path model.

  The SEM path model . The path model is drawn as usual in SEM. Illustrated below is the model forthe Ingram data, which deals with application to graduate schools. In this model, Attitude,SubNorm, and PBC all predict Intent, while the ultimate dependent variable, Behavior, is predicted

 by Intent and also directly by PBC. As customary, the straight arrows represent regression paths for presumed causal relationships, which the curved double-headed arrows represent assumedcovariances among the exogenous variables. The endogenous variables are depicted with associatederror terms.

In this model, Attitude is the individual's attitude toward graduate school; SubNorm is subjectivenorms, reflected by attitudes toward graduate school of others around the individual; PBC is

 planned behavioral control, which is the individual's level of control over behaviors related tograduate school. Intent is the individual's intent to go to graduate school. Behavior is applying tograduate school.

  Select outputs. Statistical tests and other outputs are selected under View, Analysis Properties, in

the AMOS menu system, yielding the dialog shown below:

Page 6: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 6/16

6

 

 Path coefficients  in standardized and unstandardized form are generated by AMOS by selectingAnalyze, Compute Estimates. Un like the OLS regression method, all parameters are calculatedsimultaneously. These coefficients may be caused to be displayed on the path diagram, and alsoappear in the output, which is obtained in the menu system by selecting View, Text Output. For thisexample, note that the paths from SubNorm and PBC to Intent are not significant. That is, Intent is

 primarily a function of Attitude.

Page 7: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 7/16

7

  Overall test of the model . The likehood ratio chi-square test, also called the model chi-square test ordeviance test, assesses the overall fit of the model. A finding of nonsignificance corresponds to anadequate model - one whose model-implied covariance matrix does not differ from the observedcovariance matrix. For this example, there is adequate fit:

  Result (Default model)

  Minimum was achieved

 

Chi-square = .847  Degrees of freedom = 2

  Probability level = .655

However, the likelihood ratio chi-square test cannot be relied upon alone, particularly for largesamples, because a finding of significance (rejecting the model) can occur even with very smalldifferences of the model-implied and observed covariance matrices (note below that AMOS labelsthe likelihood ratio chi-square CMIN). Therefore a large variety of other goodness of fit measureshave been devised. Their use and relative merits are described in the section on  structural equation

Page 8: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 8/16

8

modeling. In the output below, the "default model" is the researcher's model. The "saturated model"is the perfectly explanatory but trivial model with all possible arrows. he "independence model" isthe model with no regression arrows (straight arrows). Suffice it to say, these goodness of fitmeasures support the adequacy of the model in the example (for example, RMSEA should be <.05for a good model and here is .000). Note, however, that in spite of statistically adequate goodnessof fit, normally the researcher would drop non-significant structural arrows indicated above in the

 path coefficients section.

Page 9: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 9/16

9

Page 10: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 10/16

10

  Correlations. Also included in AMOS output are the correlations among the exogenous variables(correlations in the upper output) and between the exogenous variables and the endogenousvariables (squared multiple correlation in lower output) as illustrated below. The model explains34.3% of the variance in the dependent variable, Behavior.

  Correlations: (Group number 1 - Default model)

  Estimate

  SubNorm <--> PBC .505

 

Attitude <--> SubNorm .472

  Attitude <--> PBC .665

 

  Squared Multiple Correlations: (Group number 1 -

Default model)

  Estimate

  Intent .600

  Behavior .343

Page 11: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 11/16

11

   Direct and indirect effects. AMOS will use the muliplication rule automatically to partition overall effectsinto direct and indirect effects for the endogenous variables (for Intent and Behavior in this example).

   Modification indexes.  Modificiation indexes (MI) may be used to add arrows to the model. Thelarger the MI, the more adding the model will improve model fit. As discussed in the section onstructural equation modeling, MIs should be interpreted in relation to critical ratios (CR), which area measure of the change in likelihood ratio chi square. And as noted above, nonsignificance of path

Page 12: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 12/16

12

coefficients is used to drop arrows in a model-building and model-trimming process discussed inthe section on structural equation modeling. The conservative approach calls for adding or droppingone arrow at a time as each change will affect the coefficients. For this example, the MIs are sosmall that no addition of arrows is warranted. In fact, all MIs are well below the usual lowerthreshold of 4.0.

Assumptions

Linearity: relationships among variables are linear (though, of course, variables may be nonlineartransforms).

Additivity:  there are no interaction effects (though, of course, variables may be interaction crossproductterms)

Interval level data for all variables, if regression is being used to estimate path parameters. As in other formsof regression modeling, it is common to use dichotomies and ordinal data in practice. If dummy variables areused to code a categorical variable, one must be careful that they are represented as a block in the pathdiagram (ex., if an arrow is drawn to one dummy it must be drawn to all others in the set). If an arrow were to

 be drawn from one dummy variable to another dummy variable in the same set, this would violate therecursivity assumption discussed below.

Residual (unmeasured) variables are uncorrelated with any of the variables in the model other than theone they cause.

o  Disturbance terms are uncorrelated with endogenous variables . As a corollary of the previousassumption, path analysis assumes that for any endogenous variable, its distubance term is uncorrelated withany other endogenous variable in the model. This is a critical assumption, violation of which may makeregression inappropriate as a method of estimating path parameters. This assumption may be violated due tomeasurement error in measuring an endogenous variable; when an endogenous variable is actually a direct orindirect cause of a variable which the model states is the cause of that endogenous variable (reverse

Page 13: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 13/16

13

causation); or when a variable not in the model is a cause of an endogenous variable and a variable the modelspecifies as a cause of that endogenous variable (spurious causation).

o  Low multicollinearity (otherwise one will have large standard errors of the b coefficients used in removingthe common variance in partial correlation analysis).

o  No underidentification or underdetermination of the model is required. For underidentified models thereare too few structural equations to solve for the unknowns. Overidentification usually provides betterestimates of the underlying true values than does just identification.

 

Recursivity:  all arrows flow one way, with no feedback looping. Also, it is assumed thatdisturbance (residual error) terms for the endogenous variables are uncorrelated. Recursive modelsare never underidentified.

Proper specification  of the model is required for interpretation of path coefficients. Specification error  occurs when a significant causal variable is left out of the model. The path coefficients will reflect the sharedcovariance with such unmeasured variables and will not be accurately interpretable in terms of direct andindirect effects. In particular, if a variable specified as prior to a given variable is really consequent to it, "wecan do ourselves considerable damage" (Davis, 1985: 64) because if a variable is consequent it would beestimated to have no path effect, whereas when it is included as a prior variable in the model, this erroneouslychanges the coefficients for other variables in the model. Note, however, that while interpretation of pathcoefficients is inaccurate under specification error, it is still possible to compare the relative fit of twomodels, perhaps both with specification error.

Appropriate correlation input. When using a correlation matrix as input, it is appropriate to use Pearsoniancorrelation for two interval variables, polychoric correlation for two ordinals, tetrachoric for two dichotomies,

 polyserial for an interval and an ordinal, and biserial for an interval and a dichotomy.

Adequate sample size is needed to assess significance. Kline (1998) recommends 10 times as many cases as parameters (or ideally 20 times). He states that 5 times or less is insufficient for significance testing of modeleffects.

o  The same sample is required for all regressions used to calculate the path model. This may require reducingthe data set down so that there are no missing values for any of the variables included in the model. Thismight be achieved by listwise dropping of cases or by data imputation. 

Frequently Asked Questions

Does path analysis confirm causation in a model? 

 No, although this is sometimes said. Everitt and Dunn (1991) note, "However convincing,respectable and reasonable a path diagram... may appear, any causal inferences extracted are rarelymore than a form of statistical fantasy". The authors are referring to the fact that ultimately pathanalysis deals with correlation, not causation of variables. The arrows in path models do indeedreflect hypotheses about causation. However, many models may be consistent with a given dataset.Path analysis merely illuminates which of two or more competing models, derived from theory, ismost consistent with the pattern of correlations found in the data. The competing theories may berepresented in separate path models with separate path analyses, or may be combined in a single

 path diagram, in which case the researcher is concerned with comparing the relative importance ofdifferent paths within the diagram.

Can path analysis be used for exploratory rather than confirmatory purposes? 

Methodologists favor a priori formulation of hypotheses about the results to be obtained from pathanalysis, not post factum conclusions based on the results. That is, the researcher should be seekingto confirm hypotheses made beforehand. At a minimum, the researcher should posit the sign ofeach relationship (arrow) in the model, and ideally should go further to posit the arrangement bymagnitude of the importance of the independents, or even better yet, the intervals within which the

 path coefficients will be expected to lie. Since data can support multiple models, path analysisshould focus on determining which of two or more theoretically-derived models most conform tothe underlying data.

Page 14: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 14/16

14

If used in an exploratory way, note that the more models you test, the more likely you will confirmone just by chance. Thus the actual confidence level is equal to the individual test confidence levelto the power of the number of models tested, here .95 cubed = .86. Thus testing 3 models at .95would mean we were actually operating at about .86 confidence. At the .99 level, however, actualconfidence would still be .95 with 5 models.

o  How do I know if my model is "underidentified" and what difference does it make? How does this

relate to "recursivity? 

A unique path solution cannot be calculated if a model is underidentified. Identification is definedand steps the researcher can take to deal with underidentification are dealt with in a section onstructural equation modeling.  How to determine beforehand if a model is underidentified, otherthan by running a path analysis program for sample or fictional data and seeing if there are errorflags, is discussed more fully under  a second section on structural equation modeling.

How does the significance of a path coefficient compare with the significance of the corresponding

regression coefficient? 

They are identical. The path coefficient is the beta for the regression of the dependent or otherendogenous variable on the other variables with arrows to it. The significance of the beta and bcoefficients will be the same, and is displayed on the same line in SPSS regression output.

 Naturally, all paths in the model should be significant!

How do you assess the significance of the total (direct and indirect) effect of exogenous variable x on

endogenous variable y? 

Run a regression with y as dependent and all others as independents, leaving out any variablewhich mediates between x and y. The significance of the b or beta for x in this equation is a test ofthe significance of the total effect.

o  Why might the direct effect be zero? 

There is a fully controlling mediating effect or fully controlling anteceding effect. See furtherdiscussion in the section on partial correlation. 

o  How are path coefficients related to the correlation matrix for purposes of testing a model?  

First, recognize that computation of the model-estimated correlations and their comparison withobserved correlations is best done by relying on a model-estimating program such as LISREL orAMOS. The model path coefficients can be compared to the  predicted   path coefficients ascomputed from the correlation matrix, following which the model coefficients can be tested forgoodness-of-fit with the predicted coefficients.

The tracing rule is a rule for identifying all the paths, the sum of effects of which is the estimatedcorrelation between two variables in the model. This model-estimated correlation can be comparedto the observed correlation to assess the fit of the model to the data. The tracing rule is simply thatthe model-implied correlation between two variables in a model is the sum of all valid paths(tracings) between the two variables. These include the total effect (which is the sum of direct and

indirect effects) plus any associational effects due to correlated exogenous variables. Theseassociational effects are calculated by multiplying the correlation between the exogenous variableunder consideration with a second exogenous variable, by this second exogenous variable"s totaleffect on the target variable under consideration.

For simplicity, consider this simple model:

Page 15: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 15/16

15

The actual (observed) correlation matrix might look like this:

A B D

A 1 .379 -.652

B 1 -.451

C 1

When running AMOS or another path analysis program, the path coefficients (standardizedregression coefficients) would be:

A B D

A 1 .379 -.562B 1 -.238

C 1

If one had only the path output and wanted to estimate back to the correlation matrix, one woulduse these equations, one for each path:

r(AB) = n = .379 (the correlation is the standardized path

coefficient, in the bivariate case

where the independent is exogenous

and the dependent has no other inputs)

r(BD) = q + np = -.238 + .379(-.562) = -.451

r(AD) = p + nq = -.562 + .379(-.238) = -.652

In testing a model, somewhat similar reasoning is followed to compare model-implied covariancematrices with observed covariance matrices, with smaller differences indicating better goodness offit.

o  How, exactly, can I compute path coefficients in SPSS? 

The recommended method is to enter a path model into AMOS, which is the SPSS program forstructural equation modeling, discussed in that section. However, in the SPSS regression module,for a recursive model, let VARA cause VARB and VARC, and let VARB cause VARC. A series ofregressions is conducted with each non-exogenous variable considered as a dependent in turn. Forthe foregoing model, the path coefficient from VARA to VARB is given by this SPSS code:

REGRESSION

/MISSING LISTWISE

/STATISTCS COEFF OUTS R ANOVA

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT VARB

/METHOD=ENTER VARA.

Page 16: Path Analysis

7/21/2019 Path Analysis

http://slidepdf.com/reader/full/path-analysis-56d9ec30145d9 16/16

16

The path coefficients from VARA to VARC and from VARB to VARC are given by this secondregression command:

REGRESSION

/MISSING LISTWISE

/STATISTCS COEFF OUTS R ANOVA

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT VARC

/METHOD=ENTER VARA VARB.

How do I compute the value of the path from an error term to an endogenous variable?  

The path is the square root of (1 - R-squared), where R-squared is from the regression equation forthe corresponding dependent variable. Do not use adjusted R-squared.

How can mul tiple group path analysis  determine if the path model differs across groups in my sample? 

 Multiple group path analysis  may be accomplished simply by running separate path analysis foreach group in the sample, then comparing the path estimates. A more sophisticated approachsupported by some path analysis and SEM packages involves a second step: to impose a cross-group equality constraint on the path estimates, then run the analysis separately for each group,then see if the goodness-of-fit for the constrained models is as good as for the unconstrainedmodels. If the fit of the constrained model is worse than that for the corresponding unconstrainedmodel, then the researcher concludes that model direct effects differ by group.

o  Could I substitute logistic regression in doing effect decomposition? 

 No. Forward paths cannot be decomposed accurately with log linear techniques. See Davis (1985):48, 59.

o  Can path analysis handle hierarchical/multilevel data? 

Multilevel path analysis is available in Mplus but not yet in SPSS or SAS as of 2011.

o  Are regression and SEM the only approaches to path analysis?  

  Partial least squares path analysis is also available, through custom software (not SPSS or SAS,which only support PLS regression). PLS can support small sample models, even where there aremore variables than observations, but it is lower in power than SEM approaches.