partial least squares path modeling: past and future carma ed rigdon, georgia state university oct....

66
Partial Least Squares Path Modeling: Past and Future CARMA Ed Rigdon, Georgia State University Oct. 9, 2015 1

Upload: heather-ray

Post on 03-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

MK 9200 S.E.M. Class 1: Introduction

Partial Least Squares Path Modeling:Past and FutureCARMAEd Rigdon, Georgia State UniversityOct. 9, 2015

1 Partial least squares (PLS), like ordinary least squares (OLS) or maximum likelihood (ML), is a general approach to parameter estimationpath modeling is just one application2Conceptual VariableProxyObserved VariablesFramework: Modeling unobserved conceptual variables3Mathematical Operations3The nominalist / naming fallacyAssuming that something labeled X is actually X

Cliff (1983), de Leeuw (1984)4

Conceptual VariableProxyObserved VariablesFramework: Modeling unobserved conceptual variables5Mathematical Operations5Conceptual VariableProxyObserved VariablesRealism vs operationalism6Mathematical OperationsRealism: variable exists independently of operations and dataOperationalism: operations define the variable6How does PLS path modeling work?7

Structural model linking conceptual variables8Ag1g2g3BCDStatistical model linking proxiesPLS path models are almost always recursive9A*g1g2g3B*C*D*Observed variables divided into exclusive blocks10A*a1a2a3c1 c2 c3g1g2g3b1b2b3d1 d2 d3B*C*D*Alternating proxies stand in for conceptual variables11

Outer proxy: weighted sum of the indicators for that block12A*a1a2a3c1 c2 c3g1g2g3b1b2b3d1 d2 d3B*C*D*Inner proxy: weighted sum of directly connected outer proxies13A*a1a2a3c1 c2 c3g1g2g3b1b2b3d1 d2 d3B*C*D*Inner weights and outer weights14A*a1a2a3c1 c2 c3g1g2g3Inner weights link proxies to other proxiesOuter weights link proxies to associated observed variablesb1b2b3d1 d2 d3B*C*D*Inner weights are estimated in regressions using outer proxies A*, B*, C*, D*15

Outer weights are estimatedusing inner proxies, generally using either Mode B or Mode A.

Each block must use one method or the other exclusively.16Mode B regresses each inner proxy on its associated indicators as a set17

A*a1a2a3eAMode A regresses each indicator, one at a time, on its associated inner proxy18

This looks a bit like factor analysis, but:(1) estimation not simultaneous,(2) residuals not formally part of model, and essentially unconstrained19

A*a1e1a2e2a3e3PLS routinely standardizes both indicators and structural variables:

Unstandardized: high variance components dominate composites

Standardized: high correlation components dominate composites20

Standardization also means that single-predictor regressions can be reversed without changing the coefficients21

Standardized, hence reversible . . .One at a time, so no correlation among predictors . . .

Mode B vs Mode A is notformative vs reflective, but rather regression weights vs. correlation weights22Becker et al. (2013), Rigdon (2012)23OLS Regression weights

Correlation weights

Dana & Dawes (2004), Waller & Jones (2010)OLS regression weights (Mode B) maximize in-sample R2s,but are only best out-of-sample when n and true predictability are high24

Dana & Dawes (2004), Becker et al. (2013)25

Dana & Dawes (2004)And correlation weights avoid the surprises of incorrectly signed weights that can emerge due to collinearity26

There are multiple schemes for forming inner proxies, too, but differences in results are minor2728

Loadings (zero order correlations between indicators and each structural variable) and structural path coefficients are by-products, not model parameters29Significance:

Standard errors estimated via bootstrapping, to minimize distributional assumptions30History and RationaleLaunched by econometrician Herman Wold in the 1970s at Wharton, inspired by student Karl Jreskogs factor-based innovations in the 1960s at Uppsala31

THEN: why use PLS path modelingTo approximate ML factor analysisWithout MLs sample size, distribution, and computing capacity requirementsSacrificing MLs accuracy to get a more interactive modeling experienceWith easy real-world application32Guide & Ketokivi (2015):

Journal of Operations Management desk-rejecting essentially all submissions that usePLS path modeling33Today: bad arguments for using PLSNon normal dataExploratoryLow sample size34

My data are non-normalMultinormality is an assumption of ML estimation, not factor-based SEM generallyML is robust against modest deviationsOther estimators accommodatenon-normality35

This is an exploratory studyAll studies are, to some degreeFormal model, hypotheses, instrumentContribution? Defies validation36My sample size is lowGet more dataEqual weights outperform at low n

(Dana & Dawes 2004; Becker et al. 2013)3738

Dana & Dawes (2004)Invalid arguments against using PLS path modelingBiased parameter estimatesNo overall fit testNot a latent variable methodDoesnt deal with measurement error39PLS yields biased estimates of factor model parameters(Outer) loadings over-estimated(Inner) path coefficients under-estimated40

PLS yields consistent estimates of composite model parameters

Becker et al. (2013) 41

42Becker et al. 2013PLS cant test / falsify models in the same way as factor methodsPLS path modeling lacks an overall fit statistic like factor-based SEMs 2

That is true, but . . .

what exactly does 2 tell us?43If a value of 2 is obtained, which is large compared to the number of degrees of freedom, this is an indication that more information can be extracted from the data.

Jreskog (1969, p. 201)44Conceptual VariableProxyObserved VariablesDoes it matter if the observed variables contain additional information?45Mathematical Operations45PLS is not a latent variable methodThis depends on what latent variable meansIf it means common factor, thats right, but so what?If it means unobserved variable with causal influence, thats not right46Conceptual VariableProxyObserved VariablesA framework47Mathematical Operations47PLS does not account for measurement errorNeither does factor analysis48Isnt this measurement error?49

Conceptual VariableProxyObserved Variables50Mathematical Operations

50Factor model residualsand factor indeterminacy51

Rank of a covariance matrixWith p observed variables . . .A covariance matrix has rank at most pThe data contain at most p distinct dimensions of information(Mulaik 2010)52Rank: covariance matrix of F & EThe factor model specifies

With all F independent of all EThe joint covariance matrix of F and E:

Will generally have rank p + k53

Factor indeterminacySolve for F in the factor model:

P is a weight matrix, closely tied to R2F,Y, R2 for the factors predicted by all the observed variables in the modelS is a set of arbitrary vectors, one per factor

Guttman (1955); Schnemann and Steiger (1976)54

S: arbitrary, not randomSame variance as its associated factor FOrthogonal to every other variable within the modelMay be correlated (+ or -) with any variable outside the model

Guttman (1955)Schnemann and Steiger (1976)55Conceptual VariableProxyObserved VariablesThe conceptual variable is outside the statistical model56Mathematical Operations56Factor indeterminacy blurs the correlation between common factor and any outside variable.

Clarity comes only from reducing factor indeterminacy57Steiger (1996)Determinacy index: Guttmans minMinimum correlation between different, equally correct realizations of the same common factor in the same model58

58Values of Guttmans minFour common factors correlated 0.7, congeneric (actually, parallel) indicators59Indicators Per Factor2468Loading.5.12.38.50.58.6.32.54.64.70.7.49.67.75.80.8.65.79.85.88.9.82.90.93.95 Observed variable residuals in factor-based SEM are repackaged as factor indeterminacy, and continue to threaten the validity of inferences about conceptual variables.60Summing up . . .6162Ag1g2g3BCDBoth approachesfactor-based and composite-basedcan be used to model and learn about relations between conceptual variables . . .. . . by building empirical proxies, formed out of data63A*a1a2a3c1 c2 c3g1g2g3b1b2b3d1 d2 d3B*C*D*Conceptual VariableProxyObserved VariablesA framework64Mathematical Operations(Un)reliability(In)validity64Thank you65ReferencesBecker, J.-M., Rai, A., Rigdon, E.E. (2013). Predictive validity and formative measurement in structural equation modeling: Embracing practical relevance. Thirty-Fourth International Conference on Information Systems.Cliff, N. 1983. Some cautions concerning the application of causal modeling methods. Multivar Behav Res, 18: 115-126.Dana, J., Dawes, R.M. (2004). The superiority of simple alternatives to regression for social science predictions. J Educ Behav Stat (29:3), 317-331.De Leeuw, Jan (1985). Reviews. Psychometrika (50:3), pp. 371-375.Guide, V.D.R., Ketokivi, M. (2015). Notes from the editors: Redefining some methodological criteria for the journal. J Oper Manag (37), v-viii.Guttman, L. (1955). The determinacy of factor score matrices with implications for five other basic problems of common-factor theory. Brit J Statist Psych (8:2), 65-81.Jreskog, K. G. (1969). A general approach to maximum likelihood confirmatory factor analysis. Psychometrika (34: 2), 183-202.Mulaik, S.A. (2010). Foundations of Factor Analysis (2nd ed.). Boca Raton, FL: Chapman & Hall / CRC.Rigdon, E.E, (2012). Rethinking partial least squares path modeling: In praise of simple methods. Long Range Plann (45:5-6), 341-358.Schnemann, P.H. Steiger, J.H. (1976). Regression component analysis. Brit J Math Statist Psych (29:2), 175-189.Steiger, J.H. (1996). The relationship between external variables and common factors. Psychometrika (44:1), 93-97.Waller, N., Jones, J. (2010). Correlation weights in multiple regression. Psychometrika (75:1), 5869.66