analysis of variance (anova). hypothesis h 0 : i = g h 1 : i | ( i g ) logic s 2 within =...
TRANSCRIPT
Analysis of Variance (ANOVA)Analysis of Variance (ANOVA) Hypothesis
H0 : i = G
H1: i | (i G )
LogicS2
within = error variability
S2between = error variability + treatment
Where, x = grand mean, xj = group mean, k = number of groups, nj = number of participants in a group, N = Total number of participants
2
21
22
1 1
1
j
k
j jjbetween
obs nkwithin
ij jj i
n x x ks
Fs
x x N k
Classic ANOVA TableClassic ANOVA TableTotal variability = Between variability + within variability
the group
the grand mean
the mean of a particular group
the participanti
i
x
x
p =
Sums of square
2
1 1
2
1 1
2
1
( )
( )
( )
k n
Total pii p
k n
Within pi ii p
k
Between i ii
SS x x
SS x x
SS n x x
Total Between WithinSS SS SS
ComputationComputation
Degrees of freedom
1
1
total
Within
between
df n
df n k
df k
total within betweendf df df
Means square
2
2
( )
( )
withinwithin within
within
betweenbetween between
between
SSMS s
df
SSMS s
df
F
between
within
MSF
MS
ANOVAANOVA
2
2 1
1
2
1 12
1 1
2
1 1
Source of variation
( )( ) 1
1
( )
( ) ( )
( ) 1
k
i iki Between
i ii Within
k n
pi ik ni p
pi ii p
k n
pii p
SS df MS F
n x xMS
Between n x x kk MS
x x
Within error x x n kn k
Total x x n
Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)
From ANOVA ij j ijy e
If the independent variables are continuous and the dependant variable is also continuous then we will perform a multiple regression. Now, if the independent variables are discrete and the dependant variable is still continuous we will performed an ANOVA
y Xb e T1 2 1[ , , ,..., ]k b To GLM where,
where, = grand mean, = treatment effect, e = error
Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)
Using the GLM approach through a coding matrix
Logic: If, for example, there are 3 groups and we know that the participant number 12 is not part of the
first nor the second group, then we know that this participant is necessary part of group 3.
X1 X2 X3 X4G1 1 0 0 0G2 0 1 0 0G3 0 0 1 0G4 0 0 0 1
Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)
Performing ANOVA using GLM through a coding matrix
Logic: In other words, there is only 2 degrees of freedom in group assignation. Therefore, the third group column will be eliminated. A value of 1 will be assigned to the participants of group i and a value of 0 for the other groups. Whereas, the last group will receive a
value of -1 for all the group (balance things).
X1 X2 X3G1 1 0 0G2 0 1 0G3 0 0 1G4 1 1 1
Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)Then, for each subject we associate its
corresponding group coding.
= independent variables (X)= independent variables (X)
Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)XX
A =A =
yy
1 2
T T T T
[ : : ... : : ]
( ) ( )
p
pp pc
cp cc
S S
n
S S
A x x x y
A A 1 A 1 A SSCP
SSCP SSCP ==
R-SquareR-Square R2 is obtained by:
2 2 1 36 11 (1 ) 1 (1 0.354945) 0.294471
1 36 3 1adj
nR R
n p
2 1 1cp pp pc ccR s s s s
2 0.354945R
Number of participants
Number of predictors (independent
variables)
SSCP SSCP ==
ANOVA TableANOVA Table The hypothesis is that the R-Square between the predictors and the
criterion is null.
Since F(3,32)=5.86938, p<0.05, we reject H0 and we accept H1.
There is at least one group that is different from the others.
22
22 (1 )
(1 ) 11
1
regcccc
error
cccc
cc
Source SS df MS F
MSR SRegression R S p
p MS
R SError R S n p
n p
Total S n
2
2
15.87
1
R n pF
pR
ANOVA TerminologyANOVA Terminology
2 2
2 2
22
2
0.355
0.295
( 1) 1 (1 )0.289
1
cc Within Between
cc cc
Between Within
Total Within
S SS SSR
S S
R
SS k MS k n R
SS MS k n R
Coefficient of determination (proportion of explained variation)
ANOVAANOVA Now you know it!
ANOVA is a special case of regression. The same logic can be applied to t-test, factorial ANOVA, ANCOVA, simple effects (Tukey, Bonferoni, LSD, etc.)
PCAPCA Why
To discover or to reduce the dimensionality of the data set. To identify new meaningful underlying variables
Assumptions Sample size : about 300 (in general) Normality Linearity Absence of outliers among cases
PCAPCA Eigenvalues and eigenvectors
Let’s define a random vector as v(0) = [1, 1]T. Now, if we compute the inner product between the correlation matrix (M) and V(0) an re-multiply the result by M, again, again, and again, what the results will be after k iterations?
(1) (0)
(2) (1)
(3) (2)
(4) (3)
( ) ( 1)k k
v Mv
v Mv
v Mv
v Mv
v Mv
PCAPCA Eigenvalues and eigenvectors
Let’s define a random vector as v(0) = [1, 1]T. Now, if we compute the inner product between the correlation matrix (M) and V(0) an re-multiply the result by M, again, again, and again, what the results will be after k iterations?
(1) (0)
(2) (1)
(3) (2)
(4) (3)
( ) ( 1)k k
v Mv
v Mv
v Mv
v Mv
v Mv
PCAPCA Eigenvalues and eigenvectors
After convergence 1- The direction of the stable vector = Eigenvector () 2- The stable vector lengthening factor = Eigenvalue ()
PCAPCA Eigenvalues and eigenvectors
Once the first eigenvector (and associated eigenvalue) has been identified, we remove it from the matrix.
And we repeat the process until all the eigenvectors and eigenvalues have been extracted.
T(1) (0) M M ΛΛ
PCAPCA Eigenvalues and eigenvectors
There will be as many eigenvectors/eigenvalues as there are variables. Each eigenvector will be orthogonal to the others.
M =
T T T T1 1 1 2 2 2 3 3 3 4 4 4 M Λ Λ Λ Λ Λ Λ Λ Λ
PCA
PCAPCA Eigenvalues and eigenvectors
There will be as many eigenvectors/eigenvalues as there are variables. Each eigenvector will be orthogonal to the others.
1 2 3 4
PCAPCA Eigenvalues and eigenvectors
How many are important?Plot the eigenvalues, 1- If the points on the graph tend to level out (show an "elbow"), these eigenvalues are usually close enough to zero that they can be ignored. 2- Limit variance accounted for, (e.g. 90%)
Method 1
Method 2
PCAPCA Eigenvalues and eigenvectors
Illustration of the data and the selected eigenvectors
1
2
x1
x3
x2
x4
= FT
PCAPCA VARIMAX Rotation
Why ? To improve the readability The VARIMAX rotation aims at finding a solution where an original variable loads highly on one particular factor and loads as low as possible on other factors.
cos sin
sin cos
T
Rotation matrix
PCAPCA VARIMAX Rotation
The algorithm maximizes the VARIMAX index, V, the sum of the variances of the component-loading matrix.
22 2
21 1 1
2
1
m k kji ji
i j jj j
k
j jii
V
U U
h h
h U
U FT
(X-X)2
V will be a long equation that contains the variable. An optimization technique is then used to find the value that maximize V.
PCAPCA VARIMAX Rotation
The algorithm maximizes the VARIMAX index, V, the sum of the variances of the component-loading matrix.
U =
V =