بسم الله الرحمن الرحیم.. multivariate analysis of variance

48
ن م ح ر ل ه ا ل ل م ا س ب م ی ح ر ل ا.

Upload: edwin-rice

Post on 12-Jan-2016

232 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

بسم الله الرحمن الرحیم

.

Page 2: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Multivariate Analysis of Variance

Page 3: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Definition

– MANOVA is the multivariate extension of ANOVA in which there are more than one DVs.

– independent variables: categorical

– dependent variables: continues

– MANOVA is also considered a valid alternative to the repeated measures ANOVA when sphericity is violated.

Page 4: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Example research questions

A medical researcher is interested in determining whether the treatment method (experimental drugs vs. traditional drugs vs. control) and gender have any effect on blood pressure, cholesterol, tension, and stress levels.

independent variables (factors):1. treatment method (experimental drugs vs. traditional

drugs vs. control)2. gender

dependent variables: 1.blood pressure2. Cholesterol3. tension 4. stress levels

Page 5: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

The dependent variables in MANOVA need to conform to the parametric assumptions.

Generally, it is better not to place highly correlated dependent variables in the same model for two main reasons. First, it does not make scientific sense to place into a model two or three dependent variables which the researcher knows measure the same aspect of outcome.

The second reason for trying to avoid including highly correlated dependent variables is that the correlation between them can reduce

the power of the tests. If MANOVA is being used to reduce multiple testing, this loss in power needs to be considered as a trade-off for the reduction in the chance of a Type I error occurring.

Page 6: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

When we extended one-way ANOVA to Factorial ANOVA, we added one or more independent variables to a design.

When we extend a one-way ANOVA to a Multivariate ANOVA (MANOVA), we add one or more dependent variables to a design.

Aone-factor MANOVA consists of one independent variable (treatment) and tests two or more dependent variables (measurements).

A multi-factor MANOVA tests two or more independent variables against two or more dependent variables (i.e., combines factorial and multivariate designs).

Page 7: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance
Page 8: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

MANOVA Model

– The MANOVA model is to look at a series of normally-distributed, DV as they are influenced by one or more IVs. The basic model, then is:

Y1 + Y2 +Y3 + … Yn = X1 + X2 + X3 + … Xn

– MANOVA builds a function (new variable, or axis) on DV side that maximally separates the groups

(maximize SSb/SSw = [lamda or roots] ).

Page 9: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Description

Manova creates a linear combination of the dependent variables and then tests for differences in the new variable using methods similar to Anova. The independent variable used to group the cases is categorical. Manova tests whether the categorical variable explains a significant amount of variability in the new dependent variable.

Page 10: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

How the method works

A new variable is created that combines all the dependent variables on the left hand side of the equation such that the differences between group means are maximized. (The f-statistic from Anova is maximized, that is, the ratio of explained variance to error variance). The simplest significance test treats the first, new variable just like a single dependent variable in Anova, and uses the tests as in Anova. Additional, multivariate tests can also be computed that involve multiple new variables derived from the initial set of dependent variables.

Page 11: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

WHY DO MANOVA?

Advantages

1. tests the effects of several independent variables and several outcome (dependent) variables within a single analysis (Multiple DVs are analyzed simultaneously).

2. The Type I error rate is lower than performing multiple ANOVAs.3. It may reveal differences not shown in separate ANOVA. 4. interpretive advantages over a series of univariate ANOVAs

Page 12: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Limitations

1. like discriminant analysis, the assumptions on which it is based are numerous and difficult to assess and meet

2.The number of cases in the smallest cell should be larger than the total number of DVs.

3. Sensitive to outliers, for small N.

4. It assumes a linear relationship (some sort of correlation) between the DVs.

4. By adding one more DV, you lose 1 df (power decreases), so if DVs are correlated, the second DV does not add any unique variance.

Page 13: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Assumptions

1. sample size - sample size rule of thumb:the sample in each cell must be greater than the number of dependent variables

2. univariate and multivariate normality(i.e., any linear combination of the dependent variables must

follow a normal distribution). (when cell size > 30 this is less important)Multivariate normality implies that the sampling

distributions of means of the various DVs in each cell and all linear combinations of them are normally distributed.

These requirements are rarely if ever tested in practiceMANOVA is assumed to be a robust test that can stand up to

departures from multivariate normality in terms of Type I error rate

Page 14: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Univariate F is robust to modest violations of normaiity as iong as there are at least 20 degrees of freedom for error in a univariate ANOVA and the violations are not due to outliers .

Even with unequal n and only a few DVs, a sample size of about 20 in the smallest cell should ensure robustness .

In Monte Carlo studies, Seo, Kanda, and Fujikoshi (1995) have shown robustness to nonnormality in MANOVA with overall N = 40 (n = 10 per group).

Page 15: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

3. linearity - linear relationships among all pairs of dependent variables

(Check out pairwise relationships among the DVs for nonlinear relationships using scatter plots)

4. homogeneity of regression - covariates must have a homogeneity of regression effect (must have equal effects on the dependent variable across the groups)

5. homogeneity of variance-covariance matrix (Box's M)-In ANOVA we talked about the need for the variances of the

dependent variable to be equal across levels of the independent variable

-The variance–covariance matrices must be equal for all treatment

groups .

Page 16: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

The amount of variance within each group needs to be comparable so that it can be assumed that the groups have been drawn from a similar population.

Furthermore it is assumed that these results can be pooled to produce an error value which is representative of all the groups in the analysis.

If there is a large difference in the amount of error within each group the estimated error measure for the model will be misleading.

This homogeneity assumption is tested with a test that is similar to Levene’s test for the ANOVA case. It is called Box’s M, and it works the same way: it tests the hypothesis that the covariance matrices of the dependent variables are significantly different across levels of the independent variable

Page 17: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

– the F test from Box’s M statistics should be interpreted cautiously in that a significant result may be due to violation of the multivariate normality assumption and a nonsignificant result may be due to small sample size and lack of power

– fairly robust if equal sample sizes

– If Box’s M is significant, it means you have violated an assumption of MANOVA. This is not much of a problem if you have equal cell sizes and large N; it is a much bigger issue with small sample sizes and/or unequal cell sizes (in factorial anova if there are unequal cell sizes the sums of squares for the three sources (two main effects and interaction effect) won’t add up to the Total SS.

Page 18: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

6.The observations must be independent (i.e., responses among groups of respondents should not be correlated).

Subjects’ scores on the dependent measures should not be influenced by or related to scores of other subjects in the condition or level Can be tested with an intraclass correlation coefficient if lack of independence of observations is suspected

7. multicollinearity and singularity -When there is high correlation between dependent, one

dependent variable becomes a near-linear combination of the other dependent variables. Under such circumstances, it would become statistically redundant and suspect to include both combinations.

8. outliers MANOVA is very sensitive to the effect of outliers because

they impact on the Type I error (use Mahalanobis distance to check for multivariate outliers)

Page 19: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Testing for significance across groups

Page 20: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Testing for significance across groups

1. Roy's greatest characteristic root

2. Wilk's lambda (U)

3. Hotelling's trace

4. Pillai's criterion

these criteria assess differences across "dimensions" of the dependent variables

Page 21: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

1. Roy's greatest characteristic root

- tests for differences on only the first discriminant function

- most appropriate when dependent measures are strongly interrelated on a single dimension

- highly sensitive to violation of assumptions

- most powerful when all assumptions are met

Page 22: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

2. Wilk's lambda

- most commonly used statistic for overall significance- considers differences over all the characteristic roots- the smaller the value of Wilk's lambda, the larger the

between-groups dispersion-Use if assumptions appear to be met.

Page 23: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

3. Hotelling's trace

- considers differences over all the characteristic roots- Safely ignored in most cases

Page 24: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

4. Pillai's criterion

- considers differences over all the characteristic roots- more robust than Wilk's, should be used when sample size

decreases, unequal cell sizes or homogeneity of covariances is violated

-Most robust when assumptions are not met; -had an adequate power to detect true differences under

different conditions.-Highly recommended test

Page 25: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

evaluating strength of the effects - effect size

univariate ANOVA: eta-squaregives the proportion of variance in the dependent variable

that is attributed to the different levels of the significant independent variable

 multivariate ANOVA: 1 – Wilks Lambdamultivariate eta-squareWilks Lambda reflects the ratio of within-group variance across

all discriminant functions to total variance across all discriminant functions

Similar to R2 in MRC (multiple regression/correlation) % of variance of Y is accounted for by IVs.

Page 26: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Multivariate analysis of covariance (MANCOVA) is the multivariate extension of ANCOVA . MANCOVA asks if there are statistically significant mean differences among groups after adjusting the newly created DV for differences on one or more covariates. When a covariate is incorporated into a MANOVA it is usually referred to as a MANCOVA model. The ‘best’ covariate for inclusion in a model should be highly correlated with the dependent variables but not related to the independent variables. The dependent variables included in a MANCOVA are adjusted for their association with the covariate. Some experimenters include baseline data in as a covariate to control for any individual differences in scores since even randomisation to different experimental conditions does not completely control for individual differences. Example:suppose that before treatment subjects are pretested on test anxiety, minor stress anxiety, and free floating anxiety. When pretest scores are used as covariates, MANCOVA asks if mean anxiety on the composite score differs in the three treatment groups, after adjusting for pre existing differences in the three types of anxieties.

Page 27: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Sum of Squares

The sum of squares measure found in a MANOVA, like that reported in the ANOVA, is the measure of the squared deviations from the mean both within and between the independent variable.

In MANOVA, the sums of squares are controlled for covariance between the independent variables.

Page 28: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

There are six different methods of calculating the sum of squares.

Type I, hierarchical or sequential sums of squares, is appropriate when the groups in the MANOVA are of equal sizes.

Type I sum of squares provides a breakdown of the sums of squares for the whole model used in the MANOVA but it is particularly sensitive to the order in which the independent variables are placed in the model. If a variable is entered first, it is not adjusted for any of the other variables; if it is entered second, it is adjusted for one other variable (the first one entered); if it is placed third, it will be adjusted for the two other variables already entered.

Page 29: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Type II, the partially sequential sum of squares, has the advantage over Type I in that it is not affected by the order in which the variables are entered. It displays the sum of squares after controlling for the effect of other main effects and interactions but is only robust where there are even numbers of participants in each group.

Page 30: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Type III sum of squares can be used in models where there are uneven group sizes, although there needs to be at least one participant in each cell. It calculates the sum of squares after the independent variables have all been adjusted for the inclusion of all other independent variables in the model.

Page 31: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Type IV sum of squares can be used when there are empty cells in the model but it is generally thought more suitable to use Type III sum of squares under these conditions since Type IV is not thought to be good at testing lower order effects.

Page 32: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Type V has been developed for use where there are cells with missing data. It has been designed to examine the effects according to the degrees of freedom which are available and if the degrees of freedom fall below a given level these effects are not taken into account. The cells which remain in the model have at least the degrees of freedom the full model would have without any cells being excluded. For those cells which remain in the model the Type III sum of squares are calculated. However, the Type V sum of squares are sensitive to the order in which the independent variables are placed in the model and the order in which they are entered will determine which cells are excluded.

Page 33: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Type VI sum of squares is used for testing hypotheses where the independent variables are coded using negative and positive signs e.g. +1 = male, -1 = female.

Type III sum of squares is the most frequently used as it has the advantages of Types IV, V and VI without the corresponding restrictions.

Page 34: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Principal components analysis

Principal components analysis is a technique for forming new variables which are linear composites of the original variables

The objective of principal components analysis is to reduce the number of variables to a few components such that each component forms a new variable and the number of retained components explains the maximum amount of variance in the data.

Principal Components Analysis can be viewed as a Dimensional Reducing Technjque

Page 35: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

factor analysis

Factor Analysis can be viewed as a Dimensional Reducing Technjque

The objective of factor analysis, is to search or identify the underlying factor(s) or latent constructs that can explain the intercorrelation among the variables.-Find statistically independent variables. -Reduce dimensionality of data.

There are two major differences First, principal components analysis places emphasis on explaining the variance in the data; the objective of factor analysis is to explain the correlation among the indicators. Second, in principal components analysis the variables form an index.In factor analysis, the variables or indicators reflect the presence of unobservable construct(s) or factor(s).

Page 36: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Discriminant Function Analysis (DFA)

Description: DFA uses a set of independent variables (IV's) to separate cases based on groups you define; the grouping variable is the dependent variable (DV) and it is categorical. DFA creates new variables based on linear combinations of the independent set that you provided. These new variables are defined so that they separate the groups as far apart as possible.

How well the model performed is usually reported in terms of the classification efficiency, that is, how many cases would be correctly assigned to their groups using the new variables from DFA. The new variables can also be used to classify a new set of cases.

Page 37: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

How the method works

DFA creates a new variable from the independent variables. This new variable defines a line onto which the group centers would plot as far apart as possible from each other. In other words, this new variable is defined such that is provides the maximum separation between groups of cases. This process repeats with successive new variables that further separate the group centers.

Page 38: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Differences Between MANOVA and Discriminant Analysis

-In statistical testing MANOVA employs a discriminant function, which is the variate of dependent variables that maximizes the difference between groups

- Discriminant analysis employs a single nonmetric variable as the dependent variable. The categories of the dependent variable are assumed as given, and the independent variables are used to form variates that maximally differ between the groups formed by the dependent variable categories.

-MANOVA uses the set of metric variables as dependent variables,

Page 39: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

-The dependent variables in MANOVA (a set of metric variables) are the independent variables in discriminant analysis, and the single nonmetric dependent variable of discriminant analysis becomes the independent variable in MANOVA. and the objective becomes finding groups of respondents that exhibit differences on the set of dependent variables.The groups of respondents are not prespecified; instead, the researcher uses one or more independent variables (nonmetric variables) to form groups. MANOVA, even while forming these groups, still retains the ability to assess the impact of each nonmetric variable separately.

Moreover, both use the same methods in forming the variates and assessing the statistical significance between groups.

The differences, however, center around the objectives of the analyses and the role of the nonmetric variable(s).

Page 40: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

MULTIVARIATE LINEAR REGRESSION MODELS

Regression analysis is the statistical methodology for predicting values of one or more response (dependent) variables from a collection of predictor (independent) variable values. It can also be used for assessing the effects of the predictor variableson the responses.

Page 41: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Canonical correlation

-Canonical correlation is the appropriate technique for identifying relationships between two sets of variables. it is known that one set of variables is the predictor or independent set and another set of variables is the criterion or dependent set then

-the objective of canonical correlation analysis is to determine if the predictor set of variables affects the criterion set of variables. However. it is not necessary to designate the two sets of variables as the dependent and independent sets. In such cases the objective is simply to ascertain the relationship between the two sets of variables -canonical correlation analysis is also a data reduction technique

-an additional objective of canonical correlation is to determine the minimum number of canonical correlations needed to adequately represent the association between the two sets of variables..

Page 42: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Most of the dependence methods are special cases of canonical correlation analysis.

MANOVA and multiple-group discriminant analysis are also special cases of canonical correlation analysis.

When the criterion variables are dummy variables representing multiple groups then canonical correlation analysis reduces to multiple-group discriminant analysis. when the predictor variables are dummy variables representing the groups formed by the various factors, then canonical correlation analysis reduces to MANOVA. In fact SPSS does not have a separate procedure for canonical correlation analysis. Rather one has to use MANOVA for canonical correlation analysis.

Page 43: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

logistic regression

logistic regression is normally recommended when the independent variables do not satisfy the multivariate normality assumption.

Discriminant analysis assumes that the data come from a multivariate normal distribution. whereas logistic regression analysis makes no such distributional assumptions.

Since the multivariate normality assumption will clearly be violated for a mixture of categorical and continuous variables we suggest that in such cases one should use logistic regression analysis.

In the case when there are no categorical variables logistjc regression should be used when the multivariate assumption violated.

discriminant analysis should be used when the multivariate normality assumption is not violated because discriminant analysis is computationally more efficient.

Page 44: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

how to interpret MANOVA results

If the treatments result in statistically significant differences in the vector of dependent variable means, the researcher then examines the results to understand how each treatment impacts the dependent measures.

Three steps are involved: (1) interpreting the effects of covariates, if included;

(2) assessing which dependent variable(s) exhibited differences across the groups of each treatment;

(3) identifying if the groups differ on a single dependent variable or the entire dependent variate.

Page 45: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

When a significant effect is found, we say that there is a main effect, meaning that there are significant differences between the dependent variables of the two or more groups defined by the treatment. With two levels of the treatment, a significant main effect ensures that the two groups are significantly different. With three or more levels, however, a significant main effect does not guarantee that all three groups are significantly different, instead just that there is at least one significant difference between a pair of groups.

If there is more than one treatment in the analysis, the researcher must examine the interaction terms to see if they are significant, and if so, do they allow for an interpretation of the main effects or not.

If there are more than two levels for a treatment, then the researcher must perform a series of additional tests between the groups to see which pairs of groups are significantly different.

Page 46: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Describe the purpose of post hoc tests in MANOVA

Although the multivariate tests of MANOVA enable us to reject the nullhypothesis that the groups’ means are all equal, they do not pinpoint where the significant differences lie if there are more than two groups. Multiple t tests without any form of adjustment are not appropriate for testing the significance of differences between the means of paired groups because the probability of a Type I error increases with the number of intergroup comparisons made (similar to the problem of using multiple univariate ANOVAs versus MANOVA). If the researcher wants to systematically examine group differences across specific pairs of groups for one or more dependent measures, two types of statistical tests should beused: post hoc and a priori. Post hoc tests examine the dependent variables between all possible pairs of groupdifferences that are tested after the data patterns are establishedt

Page 47: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Interpret interaction results when more than one independent variable is used in MANOVA

The interaction term represents the joint effect of two or more treatments.Any time a research design has two or more treatments, the researcher must first examine the interactions before any statement can be made about the main effects. Interaction effects are evaluated with the same criteria as main effects.If the statistical tests indicate that the interaction is nonsignificant, this denotes that the effects of the treatments are independent. Independence in factorial designs means that the effect of one treatment (i.e., group differences) is the same for each level of the other treatment(s) and that themain effects can be interpreted directly. If the interactions are deemed statistically significant, it is critical that theresearcher identify the type of interaction (ordinal versus disordinal), because this has direct bearing on the conclusion that can be drawn from the results.

Page 48: بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance

Ordinal interaction occurs when the effects of a treatment are not equal across all levels of another treatment, but the group difference(s) is always the same direction. Disordinal interaction occurs when the differences between levels “switch” depending on how they are combined with levels from another treatment.Here the effects of one treatment are positive for some levels and negative for other levels of the other treatment.