introduction to multivariate analysis of variance, factor analysis, and logistic regression

26
Introduction to Introduction to Multivariate Analysis Multivariate Analysis of Variance, Factor of Variance, Factor Analysis, and Logistic Analysis, and Logistic Regression Regression Rubab G. ARIM, MA Rubab G. ARIM, MA University of British Columbia University of British Columbia December 2006 December 2006 [email protected] [email protected]

Upload: mari

Post on 18-Mar-2016

94 views

Category:

Documents


3 download

DESCRIPTION

Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression. Rubab G. ARIM, MA University of British Columbia December 2006 [email protected]. Topics. Multivariate Analysis of Variance (MANOVA) Factor Analysis Principal Component Analysis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Introduction to Introduction to Multivariate Analysis of Multivariate Analysis of Variance, Factor Variance, Factor Analysis, and Logistic Analysis, and Logistic RegressionRegressionRubab G. ARIM, MARubab G. ARIM, MAUniversity of British ColumbiaUniversity of British ColumbiaDecember 2006December [email protected]@interchange.ubc.ca

Page 2: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

TopicsTopics Multivariate Analysis of Variance Multivariate Analysis of Variance

(MANOVA)(MANOVA) Factor AnalysisFactor Analysis

– Principal Component AnalysisPrincipal Component Analysis Logistic RegressionLogistic Regression

Page 3: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

MANOVAMANOVA Extension of ANOVAExtension of ANOVA More than one dependent variable (DV)More than one dependent variable (DV)

– Conceptual reasonConceptual reason– Statistically relatedStatistically related

Compares the groups and tells whether Compares the groups and tells whether there are group mean differences on there are group mean differences on the combination of the DVs the combination of the DVs

Page 4: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Why not just conduct a Why not just conduct a series of ANOVAs?series of ANOVAs? Risk of an inflated Type 1 error:Risk of an inflated Type 1 error:

The more analyses you run, the more The more analyses you run, the more likely you are to find a significant likely you are to find a significant result, even if in reality there are no result, even if in reality there are no differences between groups. differences between groups.

If you choose to do so:If you choose to do so: Bonferroni adjustment--divide your Bonferroni adjustment--divide your

alpha value .05 by the number of tests alpha value .05 by the number of tests that you are intending to performthat you are intending to perform

Page 5: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

MANOVA: Pros and MANOVA: Pros and ConsCons MANOVA prevents the inflation of MANOVA prevents the inflation of

Type 1 error Type 1 error Controls for correlation among a Controls for correlation among a

set of DVs by combining themset of DVs by combining themHowever,However, A complex set of proceduresA complex set of procedures Additional assumptions requiredAdditional assumptions required

Page 6: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

ExampleExample Research Question:Research Question:

Do adolescent boys and girls Do adolescent boys and girls differ in their problem behaviors?differ in their problem behaviors?

What you need?What you need?– One categorical IV (i.e., gender)One categorical IV (i.e., gender)– Two or more continuous DVs (e.g., Two or more continuous DVs (e.g.,

depression, aggression,depression, aggression,– etc.)etc.)

Page 7: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Example (cont’)Example (cont’) What MANOVA doesWhat MANOVA does

– Tests the null hypothesis that the Tests the null hypothesis that the population means on a set of DVs do population means on a set of DVs do not vary across different levels of a not vary across different levels of a grouping variablegrouping variable

AssumptionsAssumptions– sample size, normality, outliers, sample size, normality, outliers,

linearity, multicollinearity, linearity, multicollinearity, homogeneity of variance-covariance homogeneity of variance-covariance matrices matrices

Page 8: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Interpretation of the Interpretation of the outputoutput Descriptive StatisticsDescriptive Statistics

– Check N values (more subjects in Check N values (more subjects in each cell than the number of DVs)each cell than the number of DVs)

Box’s TestBox’s Test– Checking the assumption of Checking the assumption of

variance-covariance matricesvariance-covariance matrices Levene’s TestLevene’s Test

– Checking the assumption of equality Checking the assumption of equality of varianceof variance

Page 9: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Interpretation (cont’)Interpretation (cont’) Multivariate testsMultivariate tests

– Wilks’ Lambda (most commonly used)Wilks’ Lambda (most commonly used)– Pillai’s Trace (most robust)Pillai’s Trace (most robust)(see Tabachnick & Fidell, 2007)(see Tabachnick & Fidell, 2007)

Tests of between-subjects effectsTests of between-subjects effects– Use a Bonferroni AdjustmentUse a Bonferroni Adjustment– Check Sig. columnCheck Sig. column

Page 10: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Interpretation (cont’)Interpretation (cont’) Effect sizeEffect size

– Partial Eta Squared: the proportion of Partial Eta Squared: the proportion of the variance in the DV that can be the variance in the DV that can be explained by the IV (see Cohen, explained by the IV (see Cohen, 1988)1988)

Comparing group meansComparing group means– Estimated marginal meansEstimated marginal means

Follow-up analysesFollow-up analyses(see Hair et al., 1998; Weinfurt, 1995) (see Hair et al., 1998; Weinfurt, 1995)

Page 11: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Factor Analysis (FA)Factor Analysis (FA) Not designed to test hypothesesNot designed to test hypotheses Data reduction techniqueData reduction technique

– Whether the data may be reduced to Whether the data may be reduced to a smaller set of components or a smaller set of components or factorsfactors

Used in the development and Used in the development and evaluation of tests and scalesevaluation of tests and scales

Page 12: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Two main approaches in Two main approaches in FAFA Exploratory factor analysis (EFA)Exploratory factor analysis (EFA)

– Explore the interrelationships among Explore the interrelationships among a set of variablesa set of variables

Confirmatory factor analysis (CFA)Confirmatory factor analysis (CFA)– Confirm specific hypotheses or Confirm specific hypotheses or

theories concerning the structure theories concerning the structure underlying a set of variablesunderlying a set of variables

Page 13: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Principal Component Principal Component Analysis (PCA)Analysis (PCA) A technique similar to Factor Analysis A technique similar to Factor Analysis

in the sense that PCA also produces a in the sense that PCA also produces a smaller number of variables that smaller number of variables that accounts for most of the variability in accounts for most of the variability in the pattern or correlationsthe pattern or correlations

However,However, Factor AnalysisFactor Analysis

– Mathematical model: only the shared Mathematical model: only the shared variance in the variables is analyzedvariance in the variables is analyzed

Principal Component AnalysisPrincipal Component Analysis– All the variance in the variables are usedAll the variance in the variables are used

Page 14: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

PCA or FA?PCA or FA? If you are interested in a If you are interested in a

theoretical solution, use FA theoretical solution, use FA If you want an empirical summary If you want an empirical summary

of your data set, use PCA of your data set, use PCA (see Tabachnick & Fidell, 2001)(see Tabachnick & Fidell, 2001)

Page 15: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Steps involved in PCASteps involved in PCA Assessment of the suitability of the dataAssessment of the suitability of the data

– Sample size (see Stevens, 1996)Sample size (see Stevens, 1996)– Strength of the relationship among the Strength of the relationship among the

itemsitemsan inspection of the correlation matrix r > .30an inspection of the correlation matrix r > .30– Bartlett’s test of sphericity (p < .05)Bartlett’s test of sphericity (p < .05)– Kaiser-Meyer Olkin (KMO)Kaiser-Meyer Olkin (KMO)This index ranges from 0 to 1, with .6 This index ranges from 0 to 1, with .6

suggested as the minimum valuesuggested as the minimum value

Page 16: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Steps involved in PCA Steps involved in PCA (cont’)(cont’) Factor ExtractionFactor Extraction

– Determine the smallest number of Determine the smallest number of factors that best represent the factors that best represent the interrelations among the set of itemsinterrelations among the set of items

– Various techniques (e.g., principal factor Various techniques (e.g., principal factor analysis, maximum likelihood factoring)analysis, maximum likelihood factoring)

– Determine the number of factorsDetermine the number of factors Kaiser’s criterion (eigenvalue > 1)Kaiser’s criterion (eigenvalue > 1) Scree test (plots each eigenvalue, find the Scree test (plots each eigenvalue, find the

point where the shape becomes horizontal)point where the shape becomes horizontal)

Page 17: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Steps involved in PCA Steps involved in PCA (cont’)(cont’) Factor rotation and interpretationFactor rotation and interpretation

– Orthogonal (uncorrelated) factor solutionsOrthogonal (uncorrelated) factor solutionsVarimax is the most common techniqueVarimax is the most common technique– Oblique (correlated) factor solutionsOblique (correlated) factor solutionsDirect Oblimin is the most common Direct Oblimin is the most common

techniquetechnique– Simple structure (Thurstone, 1947): each Simple structure (Thurstone, 1947): each

factor is represented by a number of factor is represented by a number of strongly loading itemsstrongly loading items

Page 18: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

ExampleExample Research Question:Research Question:

– What is the underlying factor structure of the What is the underlying factor structure of the Subjective Age Identity (SAI) scale?Subjective Age Identity (SAI) scale?

What you needWhat you need– A set of correlated continuous variables (i.e., A set of correlated continuous variables (i.e.,

items of the SAI scale)items of the SAI scale) What PCA doesWhat PCA does

– Attempts to identify a small set of factors Attempts to identify a small set of factors that represents the underlying relationships that represents the underlying relationships among a group of related variables (i.e., SAI among a group of related variables (i.e., SAI items)items)

Page 19: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Example (cont’)Example (cont’) AssumptionsAssumptions

– Sample size N > 150+ and a ratio of at Sample size N > 150+ and a ratio of at least five cases for each of the itemsleast five cases for each of the items

– Factorability of the correlation matrixFactorability of the correlation matrixr = .3 or greater; KMO ≥ .6; Bartlett (p r = .3 or greater; KMO ≥ .6; Bartlett (p

< .05)< .05)– LinearityLinearity– Outliers among casesOutliers among cases

Page 20: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Interpretation of the Interpretation of the outputoutput Is PCA appropriate?Is PCA appropriate?

– Check Correlation MatrixCheck Correlation Matrix– Check KMO and Bartlett’s testCheck KMO and Bartlett’s test

How many factors? Eigenvalue > How many factors? Eigenvalue > 11– Check the Total Variance ExplainedCheck the Total Variance Explained– Look at the Scree PlotLook at the Scree Plot

Page 21: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Interpretation (cont’)Interpretation (cont’) How many components are How many components are

extracted?extracted?– Component MatrixComponent Matrix– Rotated Component MatrixRotated Component Matrix

Look for the highest loading items on Look for the highest loading items on each of the component-this can be each of the component-this can be used to identify the nature of the used to identify the nature of the underlying latent variable represented underlying latent variable represented by each componentby each component

Page 22: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Logistic RegressionLogistic Regression Three types of regressionThree types of regression

– BivariateBivariate– MultipleMultiple– Logistic*Logistic*

Relationships among variables Relationships among variables (NOT mean differences)(NOT mean differences) One DV + 2 or more predictors or explanatory One DV + 2 or more predictors or explanatory

variablesvariables *The DV is dichotomous*The DV is dichotomous *Core concept: Odds Ratio (OR)*Core concept: Odds Ratio (OR)

Page 23: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Logistic Regression Logistic Regression Program Program

AAProgram Program

BBMaleMale 200200 100100

FemaleFemale 5050 150150For malesFor males, the odds of watching Program A, the odds of watching Program Aare: 200/100 (or 2 to 1).are: 200/100 (or 2 to 1).For femalesFor females, the odds of watching Program A, the odds of watching Program Aare: 50/150 (or 1 to 3).are: 50/150 (or 1 to 3).To obtain the ratio of the odds for gender relative to Program ATo obtain the ratio of the odds for gender relative to Program A::This OR = (2/1) / (1/3) = 6This OR = (2/1) / (1/3) = 6>Males are six time more likely to be watching Program A. >Males are six time more likely to be watching Program A.

Page 24: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

ExampleExample Research Question:Research Question:

Are adolescent girls more likely to Are adolescent girls more likely to have anxiety/depression?have anxiety/depression?

What you need?What you need?– One categorical IV (i.e., gender)One categorical IV (i.e., gender)– One dichotomous DV (non-One dichotomous DV (non-

depressed=0 and depressed = 1)depressed=0 and depressed = 1)

Page 25: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Interpretation of the Interpretation of the outputoutput Nagelkerke Nagelkerke RR22 Is the model significant?Is the model significant? Wald’s TestWald’s TestAt the parameter-level of inference, At the parameter-level of inference,

is the gender variable significant? is the gender variable significant?

Page 26: Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression

Selected ReferencesSelected References Pallant, J. (2004). Pallant, J. (2004). SPSS survival manual: A SPSS survival manual: A

step by step guide to data analysis using step by step guide to data analysis using SPSS SPSS (2nd ed.).(2nd ed.). Maidenhead: Open University Maidenhead: Open University Press.Press.

Pett, M. A., Lackey, N. R., Sullivan, J. J. (2003). Making sense of factor analysis: The use of factor analysis for instrument development in health care research. Thousand Oaks, CA: Sage.

Tabachnick, B. G., & Fidell, L. S. (2001). Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statisticsUsing multivariate statistics (4th.ed.). (4th.ed.). Boston: Allyn & Bacon. Boston: Allyn & Bacon.