multivariate analysis of variance (manova) description

1

Multivariate Analysis of Variance (Manova) Description: Manova creates a linear combination of the dependent variables and then tests for differences in the new variable using methods similar to Anova. The independent variable used to group the cases is categorical. Manova tests whether the categorical variable explains a significant amount of variability in the new dependent variable. How the method works: A new variable is created that combines all the dependent variables on the left hand side of the equation such that the differences between group means are maximized. (The f-statistic from Anova is maximized, that is, the ratio of explained variance to error variance). The simplest significance test treats the first, new variable just like a single dependent variable in Anova, and uses the tests as in Anova. Additional, multivariate tests can also be computed that involve multiple new variables derived from the initial set of dependent variables. Assumptions/limitations: Manova can require rather large sample sizes for complicated models because the number of cases in each category must be larger than the number of dependent variables. Manova also prefers that the groups have a similar number of cases in each group. In addition, Manova expects that the variance of dependent variables and the correlation between them are similar within groups.

2

Discriminant Function Analysis (DFA) Description: DFA uses a set of independent variables (IV's) to separate cases based on groups you define; the grouping variable is the dependent variable (DV) and it is categorical. DFA creates new variables based on linear combinations of the independent set that you provided. These new variables are defined so that they separate the groups as far apart as possible. How well the model performed is usually reported in terms of the classification efficiency, that is, how many cases would be correctly assigned to their groups using the new variables from DFA. The new variables can also be used to classify a new set of cases. How the method works: DFA creates a new variable from the independent variables. This new variable defines a line onto which the group centers would plot as far apart as possible from each other. In other words, this new variable is defined such that is provides the maximum separation between groups of cases. This process repeats with successive new variables that further separate the group centers. Assumptions/limitations: Like most multivariate techniques, DFA is sensitive to outliers and assumes multivariate normality. It also assumes that the variability within groups is similar for the independent variables. Classification efficiency of cases is often tested by plugging in the same data used to define the model. By using the same model to define and test the model, you are guaranteed to overestimate the classification skill of the DFA model. One way to avoid this problem is to jackknife the sample by leaving out 1 case each time, running the model, and using the results to classify the left-out site.

3

Independent Component Analysis (ICA) and Mutual Information (MI) Helsinki University of Technology www.cis.hut.fi/projects/ica/ Hyvärinen, A. and E. Oja, 1999. Independent Component Analysis: A Tutorial http://www.cis.hut.fi/aapo/papers/IJCNN99_tutorialweb/ Description: Independent Component Analysis (ICA) is a statistical technique for decomposing a complex dataset into independent sub-parts. Defines a generative model for the observed multivariate data. The data variables are assumed to be linear mixtures of some unknown variables, and the mixing system is also unknown. The latent variables are assumed nongaussian and mutually independent, and they are called the independent components of the observed data. These independent components, also called sources or factors, can be found by ICA Purpose: _ Find statistically independent variables. _ Reduce dimensionality of data. The FastICA package http://www.cis.hut.fi/projects/ica/fastica/

4

Principal Oscillation Pattern (POP) Von Storch et al., 1995. Principal Oscillation Patterns: A review. Journal of Climate, 8, 377-400. Description: Linear technique used to empirically infer the characteristics of the space-time variations of a system in a high dimensional space. Purpose: Reduce dimensionality of data. Assumptions/limitations: It is not useful if the time series is strongly non linear (ex. precipitation)

5

Non linear multivariate analysis by neural networks Hsieh, W.W. Nonlinear multivariate and time series analysis by neural network methods. Reviews of Geophysics. (accepted 2002/11/10). http://www.ocgy.ubc.ca/~william/ Non-Linear PCA (NLPCA) Hsieh, 2001. Tellus 53A, 599-615. Given a data set Xnt,nx, the PCA/EOF finds U = EX, (1) so that mean((X(t) – EU(t))2) = min The LNPCA uses a non-linear mapping (neural networks) from X to U while PCA uses a linear one. LNPCA uses as input the most significant modes from the linear PCA

6

U = (w(x).h(x) +b(x)) Unlike linear PCA, the LNPCA produces one spatial for each U from the bottle neck of the NN.

X’ = tanh[(W(U)h(U) + b(U))i]

The second mode is obtained by applying the same system in the residuals from the first mode and so on ... Example of NLPCA: Wu et al., 2002. JGR, 107, D21. Data: Grip points of surface air temperature over Canada from interpolation of observations.

[ ]k)x()x()x(

k )bXW(tanhh +=

[ ]k)U()U()U(

k )bUw(h +=

7

EOF space patterns

winter

spring

summer

fall

Mode 1 Mode 2 Mode 3

8

NLPCA time series

9

winter spring summer fall

10

Non- Linear CCA Hsieh, 2001. J. Climate, (14) 2528-2539. Given 2 data sets Ynt,ny and Znt,nz On a linear CCA, we look for a transformation of Y and Z,

U = YR; (1) V = ZQ; so that correlation(U,V) = max In the LNCCA, it is the same idea excepted that the linear procedures (1) are replaced by a non linear procedure that uses NN.

- Apply PCA on Y and Z before and normalize them by extract the mean and dividing by the standard deviation

11

For each U and V

U= w(y).h(y) + b(y) Y’=W(U)h(U) + b(U) V= w(z).h(z) + b(z) Z’=W(V)h(V) + b(V)

cor(U,V) = max when optimal values of W(y),W(z), b(y),b(z),w(y), w(z), b(y), and b(z) are found. Ensembles if 30 runs to find U, V and then to find Y’ and Z’. Second mode is found by applying the same system to the residual (Y – Y’) and so on.

[ ][ ]n

)z()z()z(n

k)y()y()y(

k

)bZW(tanhh

)bYW(tanhh

+=

+= [ ][ ]n

)V()V()V(n

k)U()U()U(

k

)bVw(tanhh

)bUw(tanhh

+=

+=

12

Example: NLNN applied to SLP and SST over the Pacific Ocean Time series SLP

13

Time series SST

LN EN

16

Matlab systems for the development of NLPCA and NLCCA and others: Neuralnets for Multivariate And Time Series Analysis (NeuMATSA) William Hsieh http://www.ocgy.ubc.ca/projects/clim.pred/download.html

multivariate analysis of variance (manova) description

Documents