icpsr general structural equations

51
1 ICPSR General Structural Equations Week 4 No. 2

Upload: audrey-cunningham

Post on 02-Jan-2016

35 views

Category:

Documents


3 download

DESCRIPTION

ICPSR General Structural Equations. Week 4 No. 2. Review of solutions for non-normal and missing data (see handout). Issue #1: My data are not normally distributed Each variable has a reasonable number of discrete values (10 or more for - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ICPSR General Structural Equations

1

ICPSR General Structural Equations

Week 4 No. 2

Page 2: ICPSR General Structural Equations

2

Review of solutions for non-normal and missing data (see handout)Issue #1: My data are not normally distributed

Each variable has a reasonable number of discrete values (10 or more for most, with perhaps the odd variable with 5-6* but definitely no variables with fewer than 5 discrete values). [*variables with smaller number of categories should not be heavily skewed]Solution #1: AMOS, LISREL and SAS-CALISTransform the data to reduce the level of kurtosis within the Stat package (SAS or SPSS).

COMPUTE LVAR1 = LN(VAR1).COMPUTE LVAR1= LN(VAR1 + .1). [if there are 0 values]COMPUTE VAR1_2 = VAR1**2. COMPUTE VAR1_2 = 1 / VAR1.See John Fox’s Regression text or other regression texts formore details. Usually, dealing with skewedness also dealswith Kurtosis.

Checks: DESCRIPTIVES VARIABLES=VAR1 /STATISTICS = SKEW KURTOSIS.

With transformed data, regular ML covariance analysis can be used.

Page 3: ICPSR General Structural Equations

3

Non-normal data ADF estimationSolution #2: AMOS, LISREL, SAS-CALIS

Use an ADF (arbitrary distribution function) estimator.

AMOS: An option under Analysis Options.LISREL: Input of a asymptotic covariance matrix (4th moment matrix)

is required. To generate such a matrix in PRELIS, check off the asymptotic covariances check box and insert a file

name. In LISREL, you will need to add a line to read in this matrix:CM FI= AC FI=

And you will need to specify the ADF fit function:OU ME=WLSAS PROC CALIS METHOD=WLSImportant note on the ADF fit function:

Large sample sizes are required For the acov matrix to be non-singular, N must be at least > p + (1/2)(p)(p+1)

20 variables: N>23030 variables: N>495

Working anywhere near these minima is not recommended.

Page 4: ICPSR General Structural Equations

4

Solution #3: LISREL only. Scaled test statistics.

Use a scaled or adjusted chi-square and standard error calculation

(e.g., Bentler-Satora). Input of asymptotic covariance matrix is required, so it is necessary to specify an AC= line which points to the asymptotic covariance matrix but specify ME=ML and not ME=WL in LISREL. Probably Definitely better than ADF for small to moderate sized samples.

Page 5: ICPSR General Structural Equations

5

Missing Data

Issue #2: I have missing cases. My data are fairly normally distributed (or I have transformed them to near normality – Kurtosis values in the +1 to -1 range or fairly close to this).

Solution #1: Use EM algorithm to construct imputed covariance matrix [assumes normality]

LISREL: This is an option under PRELIS.Small limitation: imputed data treated as “real” data byLISREL (affects N, significance tests)

AMOS: If you have the SPSS Missing Data module, you may be able to generate an imputed covariance matrix. If you

do not, a “last ditch” approach would be to “flip” the datasetinto SAS (if you have it), use the SAS MI procedure, then “flip” the covariance matrix file back into SPSS. See Appendix in handout.

Page 6: ICPSR General Structural Equations

6

Missing Data

Issue #2: I have missing cases. My data are fairly normally distributed (or I have transformed them to near normality – Kurtosis values in the +1 to -1 range or fairly close to this).

Solution #2: Use a multiple-group model to explicitly model missing data.AMOS, LISREL (SAS CALIS will not estimate multiple-group models)

This works if the number of missing data patterns is fairly small (say <3-5) or if cleaning up problems with a small number of missing data patterns deals with most of the overall problem

Page 7: ICPSR General Structural Equations

7

Missing Data

Issue #2: I have missing cases. My data are fairly normally distributed (or I have transformed them to near normality – Kurtosis values in the +1 to -1 range or fairly close to this).

Solution #3: Use nearest-neighbor imputationLISREL only. Limitation: for data with small number of values for each variable, “ties” will be generated. Even with a generous criterion, imputation could easily fail for ½ of the cases.Small limitation: imputed data treated as “real” data byLISREL (affects N, significance tests)If working with STATA files, there is a user routine called hotdeck (see Stata Tech. bulletins #51 and #54). Must be installed.

Page 8: ICPSR General Structural Equations

8

Missing Data

Solution #3: Use nearest-neighbor imputationIf working with STATA files, there is a user routine called

hotdeck (see Stata Tech. bulletins #51 and #54). Must be installed. This is not the same as the Prelis nearest neighbour procedure, but uses some similar principles. With AMOS, must use Stata or PRELIS. Stata: use Stat-Transfer or DBMS-Copy to convert file to AMOS-readable SPSS .sav file.

Important note, from hotdeck documentation:If a dataset contains many variables with missing values then it ispossible that many of the rows of data will contain at least onemissing value. The hotdeck procedure will not work very well in suchcircumstances. There are more elaborate methods that only replacemissing values, rather than the whole row, for imputed values.

PRELIS: More complicated process to move data into SPSS. (see point #4 in handout “PRELISQuirks.doc”).

Page 9: ICPSR General Structural Equations

9

Missing Data

Solution #4: Use FIML estimation [assumes normality]AMOS:Check off “estimation using means and intercerpts”

under Analysis Options and then input dataset withmissing values. Amos will not provide modification indices with its version of FIML estimation (some other form of estimation needed for model-fitting)

LISREL Must input raw data into LISREL. Declare missing values in PRELIS (already done if SPSS file read into PRELIS), save the PRELIS .psf file and then read it into LISREL:Instead of CM FI= or SY FI= :RA FI=C:\TEMP\MYDATA.PSFWill also need a DA statement:

Page 10: ICPSR General Structural Equations

10

Missing Data

Issue #3: My data need to be weightedNote: sophisticated adjustment of standard errors, test statistics (see STATA documentation) not available. It is possible to construct some stratified sample problems as multiple group analyses.

Solution #1: Use weighting in generating a covariance matrix to be passed to the SEM program

PRELIS: Under Transformation select Weight Variable before generating the covariance matrix.

*It is not clear if LISREL can handle weighted data in conjunction with FIML estimation. Some other missing data technique may be required.

Page 11: ICPSR General Structural Equations

11

Missing DataSolution #1: Use weighting in generating a covariance matrix to be passed to the SEM program

PRELIS: Under Transformation select Weight Variable before generating the covariance matrix.

*It is not clear if LISREL can handle weighted data in conjunction with FIML estimation. Some other missing data technique may be required.

Data weight cases menu

Note: it is not clear if weight variable needs to be rescaled to mean=1.0 (probably a good idea)

Page 12: ICPSR General Structural Equations

12

Missing DataIssue #3: My data need to be weighted

Solution #1: Use weighting in generating a covariance matrix to be passed to the SEM program

AMOS: AMOS will not accept a weighted SPSS dataset. In fact,if you try to get AMOS to work with a dataset where a

WEIGHT command has been issued, it may generate anerror message (to unweight data, simply use the commands:

COMPUTE WTVAR=1.0 & WEIGHT WTVAR). But it should be possible to construct a covariance matrix within SPSS

(using weighting) and then pass the “covariance matrix system file” to AMOS.

In spss: Weight by wgtvar.correlations variables= [list of variables]/missing=listwise/ matrix out(*).mconvert matrix=in(*) / replace.save outfile = 'c:\temp\covs1.sav'.

Page 13: ICPSR General Structural Equations

13

Coarsely categorized data

Issue #4: My data are at best ordinal (3-5 discrete values per indicator)

Solution #1: Use CVM techniques for ordinal data. PRELIS only: By default, variables with less than 15 discrete

valuesare treated as “ordinal” and matrices are not simplecovariance matrices. Use the Data Define Variables menus to alter any defaults.Usually, you will want to generate an

asymptotic covariance matrix tooIf there are also missing data, strictly speaking, the use of FIML or EM imputation is

not correct. Nearest neighbor approaches (issue #2, solution #3 above) are acceptable.

Page 14: ICPSR General Structural Equations

14

Coarsely categorized data

Issue #4: My data are at best ordinal (3-5 discrete values per indicator)Solution #2: Resort to “item parcels”

(Best check these variables, with crosstabulations, first)Add scores of 2 or more variables you believe to be parallel indicators to form single indicators.

Missing data approaches for parcels can be tricky. Considertrying to create parcels with very similar patterns of missing-ness(same respondents missing, same respondents non-missing acrossboth) and then give the variable a missing value when either of thevariables is missing.

Once variables have a sufficient number of discrete values with parceling, if the distributions are not normal, refer to issue #1 for solutions.IF you parcel variables, read the “pro and con” literature (see course outline).

Page 15: ICPSR General Structural Equations

15

Ordinal Data models

CVM approaches in PRELIS/LISREL.

Example file: Week4Examples\OrdinalData2

See folder for listing of programs, output listings and a codebook for variables used.

Program LisrelU1.ls8 is simple model based on PM matrix.

Page 16: ICPSR General Structural Equations

16

Extensions of the ordinal variable model

Basic form: Threshold parameters, representing mapping of

z* (latent variable, continuous) onto z (coarsely categorized variables, where z has m categories.

These thresholds will be familiar to anyone used to working with logistic regression models (or probit models):

Univariate case:ln (cumulative odds) = τ(k)

Tau coefficient = ln ( kth category or lower / higher categories)

Page 17: ICPSR General Structural Equations

17

Extensions of the ordinal variable model

Univariate case:ln (cumulative odds) = τ(k)

Tau coefficient = ln ( kth category or lower / higher categories)

Example:

20 20 30 40 50 distribution of cases

Tau1 = ln ( 20 / (20+30+40+50)

Tau2 = ln (40 / (30+40+50)

Tau3 = ln (70 / (40+50)

Tau4 = ln (110 / 50)

Page 18: ICPSR General Structural Equations

18

Polychoric correlations

Polychoric correlations:- Estimate thresholds from univariate

distributions- Then, minimize a fit function involving

reproduced probabilities based on a parameter vector that includes thresholds + p (est. correlation)

Page 19: ICPSR General Structural Equations

19

Categorical Variable Model(ordinal data)

For each of the variables, the mean is fixed to 0 and the standard deviation fixed to 1.0 (otherwise, under-identified)

ParameterizationMean Std. dev. Thresholds0.0 1.0 τ1 τ2 τ3 τ4

Alternative parameterization:u1 σ1 0 1 τ3* τ4*

Page 20: ICPSR General Structural Equations

20

Fixing thresholds

“Equal Thresholds” Same threshold for 2 variables measured

over time (longitudinal data) Same threshold for 1 variable measured in

two different groups See Week4Examples/OrdinalData2 files

Page 21: ICPSR General Structural Equations

21

Longitudinal data

I. Modeling of latent variable mean differences over time

II. More complicated tests (linear growth, quadratic growth, etc.)

See slides from previous class

Page 22: ICPSR General Structural Equations

22

Applications to longitudinal data

I. Modeling of latent variable mean differences over time

II. More complicated tests (linear growth, quadratic growth, etc.)

Page 23: ICPSR General Structural Equations

23

Applications to longitudinal data

Basic model for assessing latent variable mean change: Can run this model

on X or Y side (LISREL)

Equations:

X1 = a1 + 1.0L1 + e1

X2 = a2 + b1 L1 + e2

X3 = a3 + b2 L1 + e3

X4 = a4 + 1.0 L2 + e4

X5 = a5 + b3 L2 + e5

X6 = a6 + b4 L2 + 36

Constraints:

b1=b3 b2=b4 LX=IN

a1=a4 a2=a5 a3=a6 TX=IN

Ka1 = 0 ka2 = (to be estimated)

Page 24: ICPSR General Structural Equations

24

Applications to longitudinal data

Basic model for assessing latent variable mean change:

Can run this model on X or Y side (LISREL)

Equations:

X1 = a1 + 1.0L1 + e1

X2 = a2 + b1 L1 + e2

X3 = a3 + b2 L1 + e3

X4 = a4 + 1.0 L2 + e4

X5 = a5 + b3 L2 + e5

X6 = a6 + b4 L2 + 36

Constraints:

b1=b3 b2=b4 LX=IN

a1=a4 a2=a5 a3=a6 TX=IN

Ka1 = 0 ka2 = (to be estimated)

Correlated errors

Page 25: ICPSR General Structural Equations

25

Applications to longitudinal data

Model for assessing latent variable mean change

Ksi-1

x11

1

x2

1

x3

1

Ksi-2

x4 x5 x61

1 1 1

Ksi-3

x7 x8 x91

1 1 1

Usual parameter constraints:

TX(1)=TX(4)=TX(7)

LISREL: EQ TX 1 TX 4 TX 7

AMOS: same parameter name

0,

Ksi-1

a1

x1

0,

1

1a2

x2

0,

1a3

x3

0,

1

0,

Ksi-2

a1

x4

0,

a2

x5

0,

a3

x6

0,

1

1 1

0,

Ksi-3

a1

x7

0,

a2

x8

0,

a3

x9

0,

1

1 1 1

Page 26: ICPSR General Structural Equations

26

Applications to longitudinal data

Model for assessing latent variable mean change

Ksi-1

x11

1

x2

1

x3

1

Ksi-2

x4 x5 x61

1 1 1

Ksi-3

x7 x8 x91

1 1 1

Usual parameter constraints:

TX(1)=TX(4)=TX(7)

LISREL: EQ TX 1 TX 4 TX 7

AMOS: same parameter name

KA(1) = 0

KA(2) = mean difference parameter #1

KA(3) = mean difference parameter #2

LISREL: KA=FI group 1 KA=FR groups 2,3

IN AMOS:

0,

Ksi-1

a1

x1

0,

1

1a2

x2

0,

1a3

x3

0,

1

kappa1,

Ksi-2

a1

x4

0,

a2

x5

0,

a3

x6

0,

1

1 1

kappa2,

Ksi-3

a1

x7

0,

a2

x8

0,

a3

x9

0,

1

1 1 1

Page 27: ICPSR General Structural Equations

27

Applications to longitudinal data

Model for assessing latent variable mean change

Ksi-1

x11

1

x2

1

x3

1

Ksi-2

x4 x5 x61

1 1 1

Ksi-3

x7 x8 x91

1 1 1

Usual parameter constraints:

TX(1)=TX(4)=TX(7)

LISREL: EQ TX 1 TX 4 TX 7

AMOS: same parameter name

KA(1) = 0

KA(2) = mean difference parameter #1

KA(3) = mean difference parameter #2

LISREL: KA=FI group 1 KA=FR groups 2,3

Some tests:

Test for change: H0: ka1=ka2=0

Linear change model: ka2 = 2*ka1

Quadratic change model: ka2 = 4*ka1

Page 28: ICPSR General Structural Equations

28

As a causal model:

• Beta 1 “stability coefficient”

Eta-1

1

1 1 1

Eta-2

1

1 1 1

Beta-1 1

• Stability coefficient is high if relative rankings preserved, even if there has been massive change with respect to means

• In model with AL1=0 and AL2=free, can have high Beta2,1 with a) AL(1)=AL(2) or AL(1) massively different from AL(2)

Page 29: ICPSR General Structural Equations

29

Causal models:

Ksi-1

Ksi-2 Eta-1

gamma1,1

gamma1,2

Ksi-2 as lagged (time 1) version of eta-1

(could re-specify as an eta variable)

Temporal order in Ksi-1 Eta-1 relationship

Page 30: ICPSR General Structural Equations

30

Causal models:

Ksi-1

Ksi-2 Eta-2

ga2,1

Eta-1

ga1,2

1

1

Cross-lagged panel coefficients

[Reduced form of model on next slide]

Page 31: ICPSR General Structural Equations

31

Causal models:

Reciprocal effects, using lagged values to achieve model identification

Ksi-1

Ksi-2 Eta-2

Eta-1

1

1

Page 32: ICPSR General Structural Equations

32

Causal models:

TV Use

PoliticalTrust

Pol TrustTime 2

gamma 1,1 gamma2,1

Beta 2,1

A variant

Issue: what does ga(1,1) mean given concern over causal direction?

Page 33: ICPSR General Structural Equations

33

Lagged and contemporaneous effects

1

1

This model is underidentified

Page 34: ICPSR General Structural Equations

34

Lagged effects model

ksi-2 eta-1 eta-2

ksi-1

Ksi-1 could be an “event”

1/0 dummy variable

Page 35: ICPSR General Structural Equations

35

First order model for three wave data(univariate)

1

1 1 1

1

1 1 1

1

1 1 1

Time 1 Time 2 Time 3

Page 36: ICPSR General Structural Equations

36

First order model for three wave data(univariate)

1

1 1 1

1

1 1 1

1

1 1 1

b1 b1

Tests: Equivalent of stability coefficients (b1)

Mean differences (see earlier slide)

Page 37: ICPSR General Structural Equations

37

Second order model for three wave data(univariate)

1

1 1 11

1 1 1

1

1 1 1

b1 b1

No longer comparable to b1 (t1 t2)

Page 38: ICPSR General Structural Equations

38

Second order model for three wave data(univariate)

1

1 1 11

1 1 1

1

1 1 1

b1 b1

Issue: adding appropriate error terms (2nd order)

Page 39: ICPSR General Structural Equations

39

Multivariate Model for Three-wave panel data: cross-lagged effects (first order)

1

1

1

1

Page 40: ICPSR General Structural Equations

40

Multivariate Model for Three-wave panel data: cross-lagged effects (first order)

1

1

1

1

Equivalence of parameters:

T1 T2

T2 T3

Page 41: ICPSR General Structural Equations

41

Multivariate Model for Three-wave panel data: cross-lagged effects (second order)

Page 42: ICPSR General Structural Equations

42

Multivariate Model for Four-wave panel data: cross-lagged effects (second order)

Page 43: ICPSR General Structural Equations

43

Lagged and contemporaneous effectsThree wave model with constraints:

a

e f

b

d

c

1

1

a

b

e f

1

1

d

c

Under many circumstances, there will be an empirical under-ident. problem, though in theory this model is identified

Page 44: ICPSR General Structural Equations

44

Example:

• Canada, Quality of Life data

• In directory \Panel in

Week4Examples

Page 45: ICPSR General Structural Equations

45

Re-expressing parameters:GROWTH CURVE MODELS

Intercept & linear (& sometimes quadratic) terms

Page 46: ICPSR General Structural Equations

46

Linear Growth Model

Two Factor LGM

Parm1,

Intercept

Parm2,

Slope

0

V1 - t1

0

V2 - t2

10

1

1

0, 01

0, 01

Page 47: ICPSR General Structural Equations

47

Linear Growth Model

Two Factor LGM

Parm1,

Intercept

Parm2,

Slope

0

LV-t1

0,

1

10,1

0,1

0

LV-t2

0,0,0,

1

111

1

01

1

A bit more complicated with latent variables instead of single manifest variables

Page 48: ICPSR General Structural Equations

48

Linear Growth ModelTwo Factor Linear Growth Model

Parm1,

Intercept

Parm2,

Slope

0

t1

0

t2

0

t3

11

1 01 2

0,1

0,1

0,1

Page 49: ICPSR General Structural Equations

49

Unspecified 2 factor Growth Curve Model

Two Factor Unspecified Growth Model

Parm1,

Intercept

Parm2,

Slope

0

t1

0

t2

0

t3

11

1 01 lambda

0,1

0,1

0,1

Page 50: ICPSR General Structural Equations

50

3 factor Growth Curve Model

Parm1,

Intercept

Parm2,

Linear

0

t1

0

t2

0

t3

11

10

1

0,1

0,1

0,1

2

0,

Quadratic0

1 4

Page 51: ICPSR General Structural Equations

51

Last slide