anova (statistics)

32
ANOVA Analysis of Variance Ibrahim bin Abdullah [email protected] www.facebook.com/ ibrahim.abdullah

Upload: ibrahim-abdullah

Post on 10-May-2015

2.031 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Anova (Statistics)

ANOVAAnalysis of Variance

Ibrahim bin [email protected]

www.facebook.com/ibrahim.abdullah

Page 2: Anova (Statistics)

Types of samples and appropriate testing:

1 sample•Use 1-sample t-test

2 samples•Use 2 samples t-test

3 samples•Use ANOVA

ANOVA is also called 2-samples-and-more test (rather than 3-samples-and-more test) as 2 samples can also employ ANOVA

Page 3: Anova (Statistics)

ANOVA can be:

• 1-way 1 independent variable

• 2-way 2 independent variable

• 3,4,etc-way 3,4,etc independent variable

Must be categorical (nominal/ordinal)

Must be categorical

Must be categorical

Page 4: Anova (Statistics)

In ANOVA, whatever the type, there is always only 1 Dependent Variable

ANOVA is UNIVARIATE (1 Dependent Variable). If there are more than 1 Dependent Variables,

use MANOVA

Must be continuous (numerical/scale)

Page 5: Anova (Statistics)

It can be further classified:

1-WAY ANOVA

INDEPENDENT ANOVA

REPEATED MEASURE

ANOVA

Page 6: Anova (Statistics)

2-WAY ANOVA

INDEPENDENT ANOVA

REPEATED MEASURE

ANOVA

MIXED ANOVA

So, they are called 2-way independent Anova, 2-way mixed Anova, etc

Page 7: Anova (Statistics)

We are testing the effect of blueberry on the eye sight.

Students taking blueberry

Specky

Non-specky

Students NOT taking blueberry

Specky

Non-specky

We can do t-test TWICE to test the samples. However, doing that will increase α (type 1 error ie. we tend to reject Ho when Ho should

not be rejected). Instead of doing t-test repeatedly, we must do ANOVA

Page 8: Anova (Statistics)

1 categorical independent variable

1 continuous dependent variable

3 or more groups (samples)

One-way Independent

ANOVA1-WAY ANOVA

INDEPENDENT ANOVA

REPEATED MEASURE ANOVA

First part of this chapter deals with 1-way Independent Anova

Later we will look at 1-way Repeated Measure Anova

Page 9: Anova (Statistics)

One-way Independent ANOVA

Assumptions that MUST be fulfilled:

1. Normality (any one of three)W-S or K-S (p ≥ 0.05)Skewness test (within S ± 2SE) Coefficient of variation:

2. Homogeneity of varianceLevene’s test (p ≥ 0.05)

%30100 x

s

If not normal, use non-parametric tests like Mann-Whitney or

Kruskal-Wallis (but the latter does not have Post

Hoc)

If homogenous read Tukey test

If not, read Dunnette test

1. Analyze2. Descriptive3. Explore4. Plot5. Normality

W-S

Page 10: Anova (Statistics)

One-way Independent ANOVA

Hypotheses:

1. Hoμ1 = μ2 = μ3 ….. μi

2. HAAt least one pair of means is not equal

(it can be μ1≠μ2 = μ3 etc)

Page 11: Anova (Statistics)

One-way Independent ANOVA

If p < 0.05 (significant, ie Ho rejected), then must do Post Hoc test (multiple pairwise

comparison test)

Post Hoc Tests

Tukey Test

Dunnette Test

Bonferroni Test

If homogenous, No control

If not homogenous,Has control

For repeated measure

On the other hand if not significant, test stops

Page 12: Anova (Statistics)

One-way Independent ANOVA

Knowledge Score on Vision and Mission of MSU

Student Yr1 60 55 45 50 55 60 70 45 35 35

Student Yr2 65 60 70 75 70 78 79 80 81 82 85

Student Yr3 60 60 60 60 70 70 70 70 75 70

Hypotheses:Ho: μ1 = μ2 = μ3

HA: At least one pair of means is not equal (it can be μ1≠μ2 = μ3 etc)

Transfer the data into PASW.Remember, since this is an independent test, all samples are recorded in similar column.

A study is carried out to determine if there is difference in the knowledge of Vision and Mission of the university among students of first year, second year and third year of The Management and Science University (MSU)

Page 13: Anova (Statistics)

One-way Independent ANOVA

Variable view

Page 14: Anova (Statistics)

One-way Independent ANOVA

Transfer the data from the test conducted in “Data View”

Since this is an independent test, same column (in this example labeled “year”) used for all samples

In repeated measure test we use different column for every variable

Page 15: Anova (Statistics)

ANALYSIS 1

Normality test

Page 16: Anova (Statistics)

One-way Independent ANOVA

1. Analyze2. Descriptive Statistics3. Explore4. Dependent = score5. Factor = year6. Plots7. Normality plot

MENU

Case Processing SummaryMSU Year Cases

Valid Missing Total

N Percent N Percent N PercentKnowledge

dimension1

Year 1 10 100.0% 0 .0% 10 100.0%Year 2 11 100.0% 0 .0% 11 100.0%Year 3 10 100.0% 0 .0% 10 100.0%

Descriptives

MSU Year

Statistic Std. ErrorKnowledge Year 1 Mean 51.000 3.5590

95% Confidence Interval for Mean

Lower Bound

42.949

Upper Bound

59.051

5% Trimmed Mean 50.833

Median 52.500

Variance 126.667

Std. Deviation 11.2546

Minimum 35.0

Maximum 70.0

Range 35.0

Interquartile Range 17.5

Skewness -.018 .687

Kurtosis -.563 1.334

Year 2 Mean 75.000 2.3549

95% Confidence Interval for Mean

Lower Bound

69.753

Upper Bound

80.247

5% Trimmed Mean 75.278

Median 78.000

Variance 61.000

Std. Deviation 7.8102

Minimum 60.0

Maximum 85.0

Range 25.0

Interquartile Range 11.0

Skewness -.731 .661

Kurtosis -.396 1.279

Year 3 Mean 66.500 1.8333

95% Confidence Interval for Mean

Lower Bound

62.353

Upper Bound

70.647

5% Trimmed Mean 66.389

Median 70.000

Variance 33.611

Std. Deviation 5.7975

Minimum 60.0

Maximum 75.0

Range 15.0

Interquartile Range 10.0

Skewness -.192 .687

Kurtosis -1.806 1.334

Tests of NormalityMSU Year Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.Knowledge

dimension1

Year 1 .139 10 .200* .952 10 .695Year 2 .195 11 .200* .931 11 .424Year 3 .327 10 .003 .770 10 .006

a. Lilliefors Significance Correction*. This is a lower bound of the true significance.

S-W test showed that Year 1 and Year 2 were normal but Year 3 was not

So, check Year 3 skewness: •Skewness -.192 .687

It showed normal. So we can use ANOVA

Page 17: Anova (Statistics)

ANALYSIS 2

The ANOVA test

Page 18: Anova (Statistics)

One-way Independent ANOVA

1. Analyze2. Compare means3. One-way ANOVA4. Dependent = score5. Factor = year6. Post Hoc7. Tukey8. Dunnette’s T39. Option10. Descriptive11. Homogeneity

MENU

If we have control, under Post Hoc choose Dunnette only

If p > 0.05, use TukeyIf p < 0.05, use Dunnette’s T3

Remember, we look at Post Hoc only if we reject Ho (ie there is at least a pair of means not equal)

Test of Homogeneity of Variances

Knowledge

Levene Statistic df1 df2 Sig.

2.022 2 28 .151P > 0.05, so homogeneity is assumed

Knowledge

MSU Year

N

Subset for alpha = 0.05

1 2

Tukey HSDa,b

dimension1

Year 1 10 51.000

Year 3 10 66.500

Year 2 11 75.000

Sig. 1.000 .079

Means for groups in homogeneous subsets are displayed.

a. Uses Harmonic Mean Sample Size = 10.313.

b. The group sizes are unequal. The harmonic mean of the group sizes

is used. Type I error levels are not guaranteed.

Homogenous subsetSee there are group 1 (Year 1) and group 2 (Year 3 and Year 2)

So, Year 2 and Year 3 are not significant, but both are significant when compared to Year 1

μ1 ≠ μ2 = μ3 (ie at least one pair of means is not equal)

Page 19: Anova (Statistics)

ANALYSIS 3

GLM (General Linear Model) test

The general linear model incorporates a number of different statistical models: ANOVA, ANCOVA, MANOVA, MANCOVA, ordinary linear regression, t-test and F-test. GLM is therefore a more general concept, compared to ANOVA.

Page 20: Anova (Statistics)

One-way Independent ANOVA

1. Analyze2. General Linear Model3. Univariate

MENU

Click year to Horizontal Axis first, then click Add

1

2Plots

Page 21: Anova (Statistics)

One-way Independent ANOVA

3 5

4If p is high (not significant, ie rejecting Ho), look at Observed Power (B). If

B is high (0.8 ie. 80% or more), then confirm to reject Ho. If B is low, probably means that the low sample size used in the test results in

rejection of Ho. Ho can still be accepted, instead of rejected – refer to type II error

Estimate of effect size will returns “Partial ETA Squared”. Value of 0.14 or more means high. Effect size is NOT influenced by sample number (as

opposed to p value, which can be influenced by sample size)

Cook’s distance shows the outliers. The value should be less than 1. Value of more than 1 means outlier (that can be removed). See Cook’s

distance at DATA VIEW under COO_1

Post Hoc

Save

Options

Page 22: Anova (Statistics)

One-way Independent ANOVA

Estimated Marginal Means

MSU Year

Dependent Variable:Knowledge

MSU Year

Mean Std. Error

95% Confidence Interval

Lower

Bound

Upper

Bound

dimension1

Year 1 51.000 2.707 45.454 56.546

Year 2 75.000 2.581 69.712 80.288

Year 3 66.500 2.707 60.954 72.046

Tests of Between-Subjects Effects

Dependent Variable:Knowledge

Source Type III Sum of

Squares df Mean Square F Sig.

Partial Eta

Squared

Noncent.

Parameter

Observed

Powerb

Corrected Model 3075.242a 2 1537.621 20.976 .000 .600 41.952 1.000

Intercept 127380.859 1 127380.859 1737.717 .000 .984 1737.717 1.000

year 3075.242 2 1537.621 20.976 .000 .600 41.952 1.000

Error 2052.500 28 73.304

Total 134160.000 31

Corrected Total 5127.742 30

a. R Squared = .600 (Adjusted R Squared = .571)

b. Computed using alpha = .05

Estimates of Effect Size (Partial ETA Squared) and Observed Power

This refers to unweighted means. This is important when comparing the means of

unequal sample sizes (as in ANOVA), where you take into consideration each mean in

porportion to its sample size. Unequal sample size can occur eg. due to drop-out

of participants which can destroy the random assignment of subjects to conditions, a critical feature of the

experimental design

Page 23: Anova (Statistics)

One-way Independent ANOVA

Cook’s Distance

Cook’s distance should be less than 1. If not, the data can be excluded.

SPSS 17 (left) and PASW 18 (right) show different results. See row 14.

The actual reading for COO_1 should be 0.00, not 2.53. It seems that there is a bug in PASW 18.

Page 24: Anova (Statistics)

One-way Independent ANOVA

The profile plot can be included in the thesis result

Profile Plot

Page 25: Anova (Statistics)

One-way Repeat Measure

ANOVA1-WAY ANOVA

INDEPENDENT ANOVA

REPEATED MEASURE ANOVA

We have looked at 1-way Independent Anova

Now, we look at 1-way Repeated Measure Anova

Page 26: Anova (Statistics)

One-way Repeat Measure ANOVA

In Repeat Measure, we repeat the test on the SAME sample but at DIFFERENT time intervals.

The data for different time or day must be put in DIFFERENT COLUMNS of PASW Variable View.

In this test, we are not concerned about homogeneity. Rather we are concerned about sphericity (Maunchly’s Sphericity Test). The value, W>0.05 showed sphericity.

For pairwise comparison (Post Hoc), we do not use Tukey or Dunnette but Bonferroni Test.

[If W>0.05, read Sphericity row. If W<0.05, read Greenhouse row]

Page 27: Anova (Statistics)

Knowledge Score on Vision and Mission of MSUSunday 60 55 45 50 55 60 70 45 35 35 65

Monday 60 55 45 50 55 82 85 60 60 60 60

Friday 85 60 60 60 60 70 70 70 70 75 70

Hypotheses:Ho: μ1 = μ2 = μ3

HA: At least one pair of means is not equal (it can be μ1≠μ2 = μ3 etc)

Transfer the data into PASW.Remember, since this is repeated measure test, all samples are recorded in different columns.

A study is carried out to determine if there is difference in the knowledge of Vision and Mission of the university on different days among students of first year of The Management and Science University (MSU)

One-way Repeat Measure ANOVA

Page 28: Anova (Statistics)

One-way Repeat Measure ANOVA

Variable view

Page 29: Anova (Statistics)

Transfer the data from the test conducted in “Data View”In repeated measure test we use different column for every variable

One-way Repeat Measure ANOVA

Page 30: Anova (Statistics)

1. Analyze2. General Linear Model3. Repeated Measures4. Factor5. Number of levels = 36. Define7. (Move all knowledge to right)8. Option9. Compare main effects10. (See picture on right, Select Bonferroni)11. Descriptive12. Estimates13. Observed power14. Save15. Cook’s distance16. Plots17. Move factor1 to Horizontal Axis18. Add19. Continue

MENU

One-way Repeat Measure ANOVA

Step 7

Step 9 - 13

Page 31: Anova (Statistics)

Mauchly's Test of Sphericityb

Measure:MEASURE_1

Within Subjects Effect

Mauchly's W

Approx. Chi-

Square df Sig.

Epsilona

Greenhouse-

Geisser Huynh-Feldt Lower-bound

dimension1

factor1 .961 .359 2 .835 .962 1.000 .500

Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to

an identity matrix.

a. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of

Within-Subjects Effects table.

b. Design: Intercept

Within Subjects Design: factor1

One-way Repeat Measure ANOVA

[If W>0.05, read Sphericity row. If W<0.05, read Greenhouse row]

Tests of Within-Subjects Effects

Measure:MEASURE_1

Source Type III Sum

of Squares df Mean Square F Sig.

Partial Eta

Squared

Noncent.

Parameter

Observed

Powera

factor1 Sphericity Assumed 1397.515 2 698.758 9.347 .001 .483 18.694 .957

Greenhouse-Geisser 1397.515 1.925 726.116 9.347 .002 .483 17.990 .951

Huynh-Feldt 1397.515 2.000 698.758 9.347 .001 .483 18.694 .957

Lower-bound 1397.515 1.000 1397.515 9.347 .012 .483 9.347 .787

Error(factor1) Sphericity Assumed 1495.152 20 74.758

Greenhouse-Geisser 1495.152 19.246 77.685

Huynh-Feldt 1495.152 20.000 74.758

Lower-bound 1495.152 10.000 149.515

a. Computed using alpha = .05

W > 0.05W < 0.05

See that the Observed Power is high

Look at Mauchly’s W

In this example, W = 0.961

Since W > 0.05, we will read

Sphericity, not Greenhouse

Page 32: Anova (Statistics)

One-way Repeat Measure ANOVA

Pairwise Comparisons here is Bonferroni test

1 and 2 are not significant (p=0.092)

1 and 3 are significant (p=0.08)

So we reject Ho because at least one pair of means is not equal

Pairwise Comparisons

Measure:MEASURE_1

(I) factor1 (J) factor1

Mean

Difference

(I-J)

Std.

Error Sig.a

95% Confidence Interval

for Differencea

Lower

Bound

Upper

Bound

dimension1

1dimension2

2 -8.818 3.508 .092 -18.886 1.250

3 -15.909* 4.035 .008 -27.490 -4.328

2dimension2

1 8.818 3.508 .092 -1.250 18.886

3 -7.091 3.491 .209 -17.112 2.930

3dimension2

1 15.909* 4.035 .008 4.328 27.490

2 7.091 3.491 .209 -2.930 17.112

Based on estimated marginal means

a. Adjustment for multiple comparisons: Bonferroni.

*. The mean difference is significant at the .05 level.