multivariate analysis overview. introduction multivariate thinking ◦ body of thought processes...

Multivariate AnalysisMultivariate AnalysisOverview

IntroductionIntroductionMultivariate thinking

◦Body of thought processes that illuminate the interrelatedness between and within sets of variables.

The essence of multivariate thinking is to expose the inherent structure and meaning revealed within these sets of variables through application and interpretation of various statistical methods

Why the multivariate Why the multivariate approach?approach?Big idea- multiple response outcomesWith univariate analyses we have just one

dependent variable of interestAlthough any analysis of data involving more

than one variable could be seen as ‘multivariate’, we typically reserve the term for multiple dependent variables

So MV analysis is an extension of UV ones, or conversely, many of the UV analyses are special cases of MV ones

Why MV over the univariate Why MV over the univariate approach?approach?Complexity

◦The subject/data studied may be more complex than what univariate methods can offer in terms of analysis

Reality◦In some cases it would be

inappropriate to conduct univariate analysis as the data/research demand a multivariate analysis

Why MV over the univariate Why MV over the univariate approach?approach?Experimental data

◦ Although experimental research can be and often is multivariate, typically subjects are assigned to groups and the manipulations regard corresponding changes to a single outcome Different doses of caffeine test performance Causality is more easily deduced

Non-experimental data◦ Likewise survey/inventory data might be

analyzed in univariate fashion, but typically it will require the multivariate approach to solve the questions stemming from it Correlational

Why not MV?Why not MV?In the past the computations

were overwhelming even with smaller datasets, and so MV analyses were typically avoided

Now this is not a problem but there are still reasons to not do a MV analysis

Why not MV?Why not MV?Ambiguity

◦ MV analysis may result in a less clear understanding of the data

E.g. group differences on a linear combination of DVs (Manova)

Differences are easily interpreted in a univariate sense

◦ Ambiguity because of ignorance of the technique is not a valid reason however

Unnecessary complexity◦ Just because SEM looks neat/is popular doesn’t mean you

have to do one, or that it is the best way to answer your research question

No free lunch◦ MV analyses come with their own rules and assumptions

that may make analysis difficult or not as strong

MultivariateMultivariate Pros and Cons Pros and Cons SummarySummary

Advantages of using a multivariate statistic◦ Richer realistic design◦ Looks at phenomena in an overarching way

(provides multiple levels of analysis)◦ Each method differs in amount or type of

Independent Variables (IVs) and DVs◦ Can help control for Type I Error

Disadvantages◦ Larger Ns are often required◦ More difficult to interpret◦ Less known about the robustness of

assumptions

Primary purposes of MV Primary purposes of MV analysisanalysisPrediction and explanationDetermining structure

PredictionPredictionThe goal in most research situations is

to be able to predict outcomes based on prior information◦ E.g. given a person’s gender and region, what

will their attitude be on some social issue?◦ Given a number of variables how well can we

predict group membership?Explanation

◦ Which variables are most important in the prediction of some outcome?

◦ In many cases this is end goal of an analysis, though a very problematic one

A caveat regarding A caveat regarding ‘explanation’‘explanation’Determining variable importance can

be a suspect endeavorSomething that might be deemed a

statistically significant variable may not make the cut had the study been conducted again

Depending on a number of factors, results may be sample specific◦ i.e. you may not see the same ordering next

time

StructureStructure A different goal in MV analysis is to determine

the structure of the data◦ Is there an underlying dimension that can

describe the data in a simpler fashion? Methods involve classification and/or data

reduction Latent variables (constructs)

◦ Example: Observed variables Giddiness, Silliness, Irrationality,

Possessiveness and Misunderstanding reduced to the underlying construct of ‘Love’

Interest may be in reducing variables (Factor analysis), emphasis on group membership (Cluster analysis), stimulus structure (MDS) etc.

Prediction and StructurePrediction and StructureBoth prediction and structure

may be the goal of analysis◦SEM and path analysis

How well does the model fit the data?

Multivariate ThemesMultivariate Themes

Things to considerThings to considerInitial variable choiceComes down to:

◦ Familiarity with previous research◦ Instrument used◦ Expertise with field of study◦ Common sense

Much of the ‘hard work’ consists of developing a plan of attack and deciding on how to study the problem

Initial Examination of DataInitial Examination of DataPreliminary analysis

◦A thorough initial examination of the data is not only required but also necessary for a full understanding of any research

◦Such initial analyses provide a better grasp of what is happening in the data and may inform the MV analysis to a certain extent

However, in the MV case, if the actual goal is interpretation of the UV analyses (as one often sees in MANOVA), the MV analysis is unwarranted

More to considerMore to consider Intro now, more details as we discuss each

methodAssumptions– important for inferences

beyond the sampleNormality: Basic assumption of General

Linear Model; concerned with an elliptical pattern of residuals for the data◦ Skewness: Distribution of scores is tilted

(asymmetrical) Direction established by tail greater skewness = less normality

◦ Kurtosis: Degree of peakedness of data 3 Types: leptokurtic (thin); mesokurtic (normal);

platykurtic (flattened)

More to considerMore to consider Linearity

◦ Data forms a relatively straight oval line when plotted Homoscedasticity

◦ variance of 1 variable is equal at all levels of other variables understood through standard deviations across

variables and scatter plots ◦ Referred to as homogeneity of variance in ANOVA

methods Homogeneity of regression

◦ Regression slopes between covariate and DV are equal across groups of IV

◦ Do not want this statistic (F) to be significantly different—if so, violation of assumption for (M)ANCOVA

More to considerMore to consider Multicollinearity

◦ Correlation coefficient (r) between predictors is noticeably large

◦ Causes instability in the statistical procedure ◦ Can’t differentiate which variables are contributing

to outcome◦ Singularity

Redundant variables—brings discriminant in equation to zero

Orthogonality ◦ Allows no association among variables◦ Not realistic in real world data◦ May allow greater interpretability versus data that

are too related

More to considerMore to considerOutliers

◦ Effect mean (inflate/deflate) disguising true relationship

◦ Distort data—create noise (error) lose power

◦ Transformations (log or square root) may be helpful with outliers Reshapes distribution creating a more normal

distribution However you now have a scale with which you

are unfamiliar and which you cannot generalize back to the original

Some distinctionsSome distinctionsTypes of data

◦Nominal/Categorical◦Ordinal◦Continuous

Interval or RatioThe types of variables involved

will say much about what analyses are going to be appropriate and/or how one might proceed with a particular analysis

Types of dataTypes of dataOne thing to keep in mind is that these

distinctions are largely arbitraryOne can dichotomize a continuous

measure into categories◦ A bad idea most of the time

An ordinal measure (e.g. likert question) has a mean/construct that actually falls along a continuum

How the data is to be considered is largely left to the researcher

Sample vs. PopulationSample vs. PopulationIn typical research we are rarely dealing

with a populationThe goal in research is not to simply

describe our data but to generalize to the real world

Many analyses and data collection are for a variety of reasons (not good) sample-specific, and not much use to the scientific community

Take care in the initial phase of research planning to help guard against such a situation

The linear combination of The linear combination of variablesvariablesWhether of IVs or DVs, a linear

combination of variables is often necessary to interpret the data◦This idea is essential to thinking

multivariatelyMultReg

◦Finding the linear combination of IVs that best predicts the DV

Manova◦What linear combination of DVs

maximizes the distinction between groups

How many variablesHow many variablesConsiderations

◦ Cost◦ Availability◦ Meaningfulness◦ Theory

For ease of understanding and efficiency we typically want the fewest number of variables that will explain the most◦ Ockham’s razor

Statistical power and Statistical power and effect sizeeffect sizeA problem that has plagued the social

sciences is the lack of power to find subtle effects

Some multivariate procedures will require relatively large amounts of data (e.g. SEM)

Power and sample size are a required consideration before any attempt at research, multivariate or otherwise

After the fact, emphasis should be placed on effect size and model fit, rather than p-values

More later…

The matrices of interestThe matrices of interest Data matrix

◦ What you see in SPSS or whatever program you’re using

◦ Includes the cases and their corresponding values for the variables of interest

Correlation matrix- R◦ Contains information about the linear relationship

between variables Standardized covariance

◦ Symmetrical◦ Square◦ Typically only the bottom portion is shown as the

top portion is its mirror image and the diagonal contains all ones (each variable is perfectly correlated with itself)

covxy

x y

rs s

The matrices of interestThe matrices of interestVariance/Covariance matrix - Σ

◦Square and symmetrical◦Variance of each variable is on the

diagonal, covariances with other variables on the off-diagonals

In some cases you will have the option to use correlations or covariances as the unit of analysis, with some debate about which is better under what circumstances

The matrices of interestThe matrices of interestSum of Squares and cross-products

matrix - S Precursor to the Variance/Covariance

matrix (the values before division by N-1)

On the diagonal is a variable’s sum of the squared deviations from its mean

Off-diagonal elements are the sum of the products of the deviation scores for the two variables

Methods of analysisMethods of analysisA host of methods are available

to the researcherThe kind of question asked will

help guide one in choosing the appropriate analysis, however the data may be available to multiple methods, and almost always is

Degree of relationshipDegree of relationship Bivariate r

◦ The degree of linear relationship between two variables

◦ Partial and semi-partial Multiple R

◦ The relationship of a set of variables to another (dependent) variable

Canonical R◦ The grandaddy◦ Relationship between sets of variables

Methods are also available to assess the relationship among non-continuous variables◦ E.g. Chi-square, Multiway Frequency Analysis

Group DifferencesGroup DifferencesVery popular research question in

social sciences (too popular really)Is group A different from B?

◦The answer is always yes, and with a large enough sample, statistically significantly so

Anova and relatedManova the multivariate

counterpartRepeated measures

Predicting group Predicting group membershipmembershipTurning the group difference

question the other way aroundDiscriminant function analysisLogistic regression

StructureStructureData reduction and classificationCluster analysis

◦ Seeks to identify homogeneous subgroups of cases or variables based on some measure of ‘distance’

◦ Identify a set of groups in which within-group variation is minimized and between-group variation is maximized

Principal components and Factor analysis◦ Reduce a large number of variables to smaller◦ Often used in psych for the development of

inventoriesStructural equation modeling

◦ Where factor analysis and regression meet

Time course of eventsTime course of eventsHow long is it before some event

occurs?How does a DV change over the course

of time?The former question can be answered

with survival/failure analysis◦ Survival rates for disease◦ Time before failure for a particular electronic

partThe latter is often examined with time-

series analysis◦ Many time periods are available for analysis

E.g. monthly stock prices over the past five years◦ Popular in the economics realm

Decision treeDecision tree

Decision tree

Decision treeDecision tree

Although such guides may be useful, as mentioned before, multiple analyses may be appropriate for the data under consideration

The best plan of attack is to have a well-defined research question, and collect data appropriate to the analysis that will best answer that question

Multivariate Methods: Quick GlanceMultivariate Methods: Quick Glance

Organizational Chart based on: Type of Research Focus (Group differences or Correlational).

Research Question IVs: Number and Scale # & Scale Method

Research Focus IVs DVs Multivariate

Number & Scale Number & Scale Method Group Differences

1+ categorical & continuous 1 continuous ANCOVA 1+ categorical 2+ continuous MANOVA 2+ continuous 1+ categorical DFA 1+categ or cont 1 categorical LR

Correlational 2+ continuous 1 continuous MR 2+ continuous 2+ continuous CC - 2+ continuous PCA & FA

Note: Scale and number of Independent (IV) and Dependent (DV) categorical or continuous variables. + indicates 1 or more; ANCOVA = Analysis of Covariance; MANOVA = Multivariate Analysis of Variance; DFA = Discriminant Function Analysis; LR=Logistic Regression; MR = Multiple Regression; CC = Canonical Correlation; PCA/FA = Principal Components/Factor Analysis

Summary of MethodsSummary of Methods The multivariate methods we will look at are a

set of tools for analyzing multiple variables in an integrated and powerful way.

They allow the examination of richer and perhaps more realistic designs than can be assessed with traditional univariate methods that only analyze one outcome variable and usually just one or two independent variables (IVs)

Compared to univariate methods, multivariate methods allow us to analyze a complex array of variables, providing greater assurance that we can come to some synthesizing conclusions with less error and more validity than if we were to analyze variables in isolation.

multivariate analysis overview. introduction multivariate thinking ◦ body of thought processes...

Documents

univariate analysis

multivariate analysiswhy

multivariate approach

univariate approach

special cases of mv

analysis of data

analysis difficult

univariate methods