analysis of variance (anova). hypothesis h 0 : i = g h 1 : i | ( i g ) logic s 2 within =...

Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA)Analysis of Variance (ANOVA) Hypothesis

H0 : i = G

H1: i | (i G )

LogicS2

within = error variability

S2between = error variability + treatment

Where, x = grand mean, xj = group mean, k = number of groups, nj = number of participants in a group, N = Total number of participants

2

21

22

1 1

1

j

k

j jjbetween

obs nkwithin

ij jj i

n x x ks

Fs

x x N k

Classic ANOVA TableClassic ANOVA TableTotal variability = Between variability + within variability

the group

the grand mean

the mean of a particular group

the participanti

i

x

x

p =

Sums of square

2

1 1

2

1 1

2

1

( )

( )

( )

k n

Total pii p

k n

Within pi ii p

k

Between i ii

SS x x

SS x x

SS n x x

Total Between WithinSS SS SS

ComputationComputation

Degrees of freedom

1

1

total

Within

between

df n

df n k

df k

total within betweendf df df

Means square

2

2

( )

( )

withinwithin within

within

betweenbetween between

between

SSMS s

df

SSMS s

df

F

between

within

MSF

MS

ANOVAANOVA

2

2 1

1

2

1 12

1 1

2

1 1

Source of variation

( )( ) 1

1

( )

( ) ( )

( ) 1

k

i iki Between

i ii Within

k n

pi ik ni p

pi ii p

k n

pii p

SS df MS F

n x xMS

Between n x x kk MS

x x

Within error x x n kn k

Total x x n


From ANOVA ij j ijy e

If the independent variables are continuous and the dependant variable is also continuous then we will perform a multiple regression. Now, if the independent variables are discrete and the dependant variable is still continuous we will performed an ANOVA

y Xb e T1 2 1[ , , ,..., ]k b To GLM where,

where, = grand mean, = treatment effect, e = error

Analysis of Variance (ANOVA)Analysis of Variance (ANOVA) Example


Using the GLM approach through a coding matrix

Logic: If, for example, there are 3 groups and we know that the participant number 12 is not part of the

first nor the second group, then we know that this participant is necessary part of group 3.

X1 X2 X3 X4G1 1 0 0 0G2 0 1 0 0G3 0 0 1 0G4 0 0 0 1


Performing ANOVA using GLM through a coding matrix

Logic: In other words, there is only 2 degrees of freedom in group assignation. Therefore, the third group column will be eliminated. A value of 1 will be assigned to the participants of group i and a value of 0 for the other groups. Whereas, the last group will receive a

value of -1 for all the group (balance things).

X1 X2 X3G1 1 0 0G2 0 1 0G3 0 0 1G4 1 1 1

Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)Then, for each subject we associate its

corresponding group coding.

= independent variables (X)= independent variables (X)

Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)XX

A =A =

yy

1 2

T T T T

[ : : ... : : ]

( ) ( )

p

pp pc

cp cc

S S

n

S S

A x x x y

A A 1 A 1 A SSCP

SSCP SSCP ==

R-SquareR-Square R2 is obtained by:

2 2 1 36 11 (1 ) 1 (1 0.354945) 0.294471

1 36 3 1adj

nR R

n p

2 1 1cp pp pc ccR s s s s

2 0.354945R

Number of participants

Number of predictors (independent

variables)

SSCP SSCP ==

ANOVA TableANOVA Table The hypothesis is that the R-Square between the predictors and the

criterion is null.

Since F(3,32)=5.86938, p<0.05, we reject H0 and we accept H1.

There is at least one group that is different from the others.

22

22 (1 )

(1 ) 11

1

regcccc

error

cccc

cc

Source SS df MS F

MSR SRegression R S p

p MS

R SError R S n p

n p

Total S n

2

2

15.87

1

R n pF

pR

ANOVA TerminologyANOVA Terminology

2 2

2 2

22

2

0.355

0.295

( 1) 1 (1 )0.289

1

cc Within Between

cc cc

Between Within

Total Within

S SS SSR

S S

R

SS k MS k n R

SS MS k n R

Coefficient of determination (proportion of explained variation)

ANOVAANOVA Now you know it!

ANOVA is a special case of regression. The same logic can be applied to t-test, factorial ANOVA, ANCOVA, simple effects (Tukey, Bonferoni, LSD, etc.)

Principal Component Analysis Principal Component Analysis (PCA)(PCA)

PCAPCA Why

To discover or to reduce the dimensionality of the data set. To identify new meaningful underlying variables

Assumptions Sample size : about 300 (in general) Normality Linearity Absence of outliers among cases

PCAPCA Illustration

First principal component

Second principal component

PCAPCA

X =

Zx = ij

ij iX

i

X XZ

s

Preliminary Data

Z scores

PCAPCA Preliminary

SSCP

Correlation Matrix

SSCP =

M =

TM Z Z

T T T T( ) ( ) n SSCP X X 1 X 1 X

PCAPCA Eigenvalues and eigenvectors

Let’s define a random vector as v(0) = [1, 1]T. Now, if we compute the inner product between the correlation matrix (M) and V(0) an re-multiply the result by M, again, again, and again, what the results will be after k iterations?

(1) (0)

(2) (1)

(3) (2)

(4) (3)

( ) ( 1)k k

v Mv

v Mv

v Mv

v Mv

v Mv


After convergence 1- The direction of the stable vector = Eigenvector () 2- The stable vector lengthening factor = Eigenvalue ()


Once the first eigenvector (and associated eigenvalue) has been identified, we remove it from the matrix.

And we repeat the process until all the eigenvectors and eigenvalues have been extracted.

T(1) (0) M M ΛΛ


There will be as many eigenvectors/eigenvalues as there are variables. Each eigenvector will be orthogonal to the others.

M =

T T T T1 1 1 2 2 2 3 3 3 4 4 4 M Λ Λ Λ Λ Λ Λ Λ Λ

PCA


There will be as many eigenvectors/eigenvalues as there are variables. Each eigenvector will be orthogonal to the others.

1 2 3 4


How many are important?Plot the eigenvalues, 1- If the points on the graph tend to level out (show an "elbow"), these eigenvalues are usually close enough to zero that they can be ignored. 2- Limit variance accounted for, (e.g. 90%)

Method 1

Method 2


Illustration of the data and the selected eigenvectors

1

2

x1

x3

x2

x4

= FT

PCAPCA VARIMAX Rotation

Why ? To improve the readability The VARIMAX rotation aims at finding a solution where an original variable loads highly on one particular factor and loads as low as possible on other factors.

cos sin

sin cos

T

Rotation matrix


The algorithm maximizes the VARIMAX index, V, the sum of the variances of the component-loading matrix.

22 2

21 1 1

2

1

m k kji ji

i j jj j

k

j jii

V

U U

h h

h U

U FT

(X-X)2

V will be a long equation that contains the variable. An optimization technique is then used to find the value that maximize V.



U =

V =



1

2

x1

x3

x2

x4

x1

x3

x2

x4

2

1

analysis of variance (anova). hypothesis h 0 : i = g h 1 : i | ( i g ) logic s 2 within =...

Documents