statistical analysis of independent groups in spss · statistical analysis of independent groups in...
TRANSCRIPT
1
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Statistical Analysis of
Independent Groups in
SPSS
Based on materials provided by Coventry University and
Loughborough University under a National HE STEM
Programme Practice Transfer Adopters grant
Peter Samuels
30th October 2015
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Overview
Lab session teaching you how to analyse
differences in the means/medians of two or
more independent samples of a single scale
variable
Common student activity
Self contained: only a finite number of
possibilities
2
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Workshop outline
Two groups:
Descriptives
Assumption checking (for parametric tests)
Independent samples t-test
Mann Whitney U test
Several groups:
Descriptives
Assumption checking (for parametric tests)
One-way ANOVA
Kruskall Wallis test
Post hoc testing
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
The data analysis process
for 2 independent groups
Descriptive statistics
Assumption
checking
Parametric testing:
t-test
Nonparametric testing:
Mann-Whitney U test
Pass Fail
3
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Example 1: 2 stool designs
A research project involving two different designs of stool
Tested by 40 people
Each person was assigned to assess one product, providing in an overall performance score out of 100
20 people per stool
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Create an error bar chart
Open the file TwoStools.spv
Graphs > Legacy Dialogs > Error Bar…
Click on Define
Put PerformanceScore as the Variable and Design as the Category Axis
Click OK
Go to the output window
4
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Interpretation:
Confidence intervals of the means of the performance scores
Means of samples are the circles
95% confident means of populations lie between whiskers
As the intervals overlap we should suspect the test will come back negative (informal, not failsafe!)
Also observe the intervals are roughly equal
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Robustness
Parameter-based
statistical tests make
certain assumptions
in their underlying
models
However, they often
work well in other
situations where
these assumptions
are violated
This is known as robustness
Robustness conditions depend upon the test being used
There are different opinions on robustness conditions
5
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Assumption checking
Parametric tests are more sensitive than nonparametric
tests but require certain assumptions to hold to be used
Thus we need to check these assumptions first
Not required with this test for equal group sizes 25 due
to robustness exceptions (Sawilowsky and Blair, 1992)
Here our groups were equal but only of size 20 so we
need to test for normality
For small sample sizes the best test is Shapiro-Wilk
Reference: Sawilowsky, S. S. and Blair, R. C. (1992) A
more realistic look at the robustness and Type II error
properties of the t test to departures from population
normality. Psychological Bulletin, 111(2), pp. 352–360.
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Assumption checking in SPSS
Analyze > Descriptive Statistics > Explore
Put PerformanceScore in the Dependent List and Design in the Factor List and select Plots…
Remove Stem-and-leaf, select Histogram and Normality plots with tests
6
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Add a fitted normal curve to
the histograms
Double click on the first histogram in the output window – this opens the Chart Editor window
Select this button
Close the Properties widow and the Chart Editor window
Repeat with the other histogram
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Design 1 histogram
appears to be
approximately
normally distributed
Design 2 histogram appears
to be a bit skewed to the
right. However its skewness
< twice its standard error.
7
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
The null and alternative
hypotheses
Statistical testing is about making a decision about the
significance of a data feature or summary statistic
We usually assume that this was just a random event
then seek to measure how unlikely such an event was
The statement of this position is known as the null
hypothesis and is written H0
In statistical testing we make a decision about whether
to accept or reject the null hypothesis based on the
probability (or ‘P-’) value of the test statistic
The logical opposite of the null hypothesis is known as
the alternative hypothesis
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
8
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Standard significance levels
and the null hypothesis (H0)
P-value of
test statistic
Signif-
icant? Formal action
Informal
interpretation Example
> 0.1 No Retain H0 No evidence to
reject H0
Chris
Froome
< 0.1 and
> 0.05 No Retain H0
Weak evidence
to reject H0
Plebgate
libel trial
< 0.05 and
> 0.01
Yes: at
95%
Reject H0 at 95%
confidence
Evidence to
reject H0
Climate
change
< 0.01 and
> 0.001
Yes: at
99%
Reject H0 at 99%
confidence
Strong evidence
to reject H0
Plebgate
police trial
< 0.001 Yes: at
99.9%
Reject H0 at
99.9% confidence
Very strong
evidence to
reject H0
Higgs
boson
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
The Shapiro-Wilk test is negative for both designs as the “Sig.” (or probability) values are both > 0.05
Therefore we can use the appropriate parametric test (the independent samples t-test)
Both these tests are not very sensitive with small sample sizes and over sensitive with larger samples (e.g. > 100)
For large samples the probability values should be interpreted alongside the histograms with fitted normal curves and Q-Q plots – see normality checking sheet
9
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
The independent samples t-test
Applies to the different (independent) subjects with one scaled-based data value
Tests the difference between the means of the two samples
The samples can be different sizes
Here: Product scores for Designs 1 and 2
Assumes normality
Null hypothesis (H0): The means of the performance scores for the two designs are equal
Two variants: Depends upon whether the variances of the two designs can be assumed to be equal (use Levene’s test first, H0: Variances are equal) – all done together in SPSS
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Analyze > Compare Means > Independent Samples T-Test…
Add PerformanceScore as the Test Variable and Design as the Grouping Variable
Select Define Groups… and add 1 for Group 1 and 2 for Group 2
10
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Automatically computes Levene’s test and outputs both versions:
Not significant at 95% (so we retain H0)
So equal variances can be assumed (now look at this row)
t-test significant at 95% (between 0.05 and 0.01)
Interpretation: There is evidence that the mean performance scores for the stool designs are different (this is different from our informal interpretation of the error bar chart)
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Nonparametric testing
A type of statistical inference which does not make any assumptions about the data coming from a distribution
Often applies to category-based data (nominal and ordinal) but can also apply to scale-based data if test assumptions are not met
Advantage: no need to check assumptions
Disadvantages:
Results are generally less sensitive (higher p-values)
Cannot handle more complex data structures (such as two-way ANOVA)
Appropriate test here is the Mann-Whitney U test
11
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Mann-Whitney U Test
A non-parametric test of two independent samples of ordinal or scale-based data
Tests whether there is an increasing/decreasing relationship between two samples
Need at least about 10 data categories for ordinal variables, otherwise use the Chi-squared test
Alternative to a independent samples t-test for scale-based data if the assumptions are not met (not the case here – just shown for illustration purposes)
Samples can be different sizes
Null hypothesis: Design 2 performance scores are equally likely to be higher or lower than Design 1 performance scores
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Running the Mann-Whitney U
test
Select: Analyze – Nonparametric Tests – Independent Samples…
On the Fields tab, add PerformanceScore in the Test Fields list and Design in the Groups list
Select Run
12
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
The correct test has been run
The “Sig.” value is about the same as the value for the independent samples t test (expected it to be higher)
Helpfully states the null hypothesis decision
Unhelpfully states the default significance level used (can be misleading)
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
The data analysis process for
several independent groups
Descriptive statistics
Assumption
checking
Parametric testing Nonparametric testing
Pass Fail
Post hoc testing
Significant differences
13
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Example 2: 3 stool designs
A research project involving three different designs of a new product
Tested by 60 people
Each person was assigned to assess one product, providing in an overall performance score out of 100
20 people per product
Open the file ThreeStools.spv
Create descriptive statistics and an error bar chart as before
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
What is one-way ANOVA?
An extension of t-tests to several groups
Usually independent measures
Accounts for variations both within and between groups
95% confidence
intervals for 3 groups
of measurements
These confidence
intervals do not
overlap, but does this
mean we can conclude
they are not all from
the same population?
14
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Initial observations
There appear to be differences between the sample means, i.e. variation between groups
But there is also variation within groups
Can we conclude that there are differences between groups (i.e. that they come from population with different means)?
We need a systematic objective approach – this is known as ANOVA
Called ANOVA from ANalysis Of VAriance
(The name is a bit confusing because it sounds like a variance test, not a means test)
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Introduction to ANOVA
Better than doing lots of two sample tests, e.g. 6 groups would require 15 two sample tests
For every test, there is a 0.05 probability that we reject H0 when it should be retained (assuming H0 is true)
Doing several tests increases the probability of making a wrong inference of significance (Type I error)
E.g. the probability of a Type I error with 6 groups, assuming they are all equally randomly distributed is 1 − 0.9515 = 1 − 0.463 = 0.537, i.e. more than 1 in 2
15
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
The ANOVA model
yij denotes performance score for the jth
measurement of the ith design
The parameter mi denotes how the performance score for design i differs from the overall mean μ
eij denotes the error (or residual) for the jth measurement of the ith design
The ANOVA model assumes that all these errors are normally distributed with zero mean and equal variances
ijiij emy
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Test hypothesis
In our example, we need to test the hypothesis:
H0: m1 = m2 = m3 = 0
Or, more simply, that the product score
population means are the same.
Intuitively, this is done by looking at the
difference between means relative to the
difference between observations, i.e. is the
mean-to-mean variation greater than what you
would expect by chance?
16
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Assumptions
(Similar to the independent sample t-test assumptions)
1. The measurements for each group are normally
distributed. However, if there are many groups
there is a danger of Type I errors.
2. The errors for the whole data set are normally
distributed (this theoretically follows from
Assumption 1, but it is worth testing separately with
small samples). To calculate these errors we first
need to estimate the group means.
3. The variances of each group are equal (we can still
use a version of ANOVA even if this one fails)
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Assumption 1: Check
normality of each group
No evidence that individual groups are not
normally distributed
17
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
First create the residuals
Select Analyze > General Linear Model > Univariate…
Assumption 2: Testing
errors for normality
Add the variables as shown
Select Save…
Choose Unstandard-ised residuals
Based on estimates of mi
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Select Analyze > Descriptive Statistics > Explore
Add the residual variable as shown but with no factor
Select Plots… and Histogram and Normality plots with tests as before
Then add a normal curve to the histogram as before
18
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Evidence that the residuals are not normally distributed
from the Shapiro-Wilk test (p < 0.05) even though the
degrees of freedom have been reduced slightly. The
Kolmogorov-Smirnov test is even more significant.
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Kurtosis (peakedness) looks a bit high
Formally we should compare the absolute value of the kurtosis with twice its standard error – this is significant as it is higher
19
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Assumption 3: Equal variances
Analyze > Compare Means > One-Way ANOVA…
Add PerformanceScore to the Dependent List and Design as the factor
Select Options… and Homogeneity of variance test
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Significant at 95% (p-value < 0.05) so we
have evidence to reject assumption of
equality of variances
Carries out a Levene’s test for homogeneity
of variance (similar to the t-test)
Null hypothesis: The variances are equal
20
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Robustness of ANOVA
ANOVA is quite robust to changes in skewness but not
to changes in kurtosis. Thus, it should not be used when:
Kurtosis > 2 × Standard Error of Kurtosis
for any group or the errors.
Otherwise, provided the group sizes are equal and there
are at least 20 degrees of freedom, ANOVA is quite
robust to violations of its assumptions
However, the variances must still be equal
Source:
Field, A. (2013) Discovering Statistics using SPSS. 4th edn.
London: SAGE, pp. 444-445.
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Robustness calculation
Group Kurtosis Standard Error
of Kurtosis Condition met
Design 1 0.493 0.992 Yes
Design 2 0.435 0.992 Yes
Design 3 0.115 0.992 Yes
Errors 1.553 0.608 No
Group sizes are equal
Total degrees of freedom = 20 + 20 + 20 – 1 = 59 > 20
Also standard ANOVA cannot be used because the
variances are not equal
21
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Summary of findings: ANOVA
assumptions
Assumption Finding
1. Normality of groups No evidence of non-normality
2. Normality of errors Evidence of non-normality
3. Equality of variances Evidence of non-equality
Robustness Kurtosis of errors too high
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
One-way ANOVA
If all 3 assumptions (or the robustness exceptions to non-normality) are OK then use standard one-way ANOVA
Analyze > Compare Means > One-Way ANOVA
Under Options… select Descriptive
Shown for illustration purposes
22
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Significance level < 0.001
So there is very strong evidence of differences in performance score between the three designs
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
What if these assumptions
are in doubt?
If normality assumptions (or their robust exceptions) are in doubt:
Use a nonparametric test: Kruskal-Wallis or median if there is no trend in the groups or Jonckheere-Terpstra if you are looking for a trend (e.g. mean of group 1 < mean of group 2 < mean of group 3, etc.)
Available under Analyze – Nonparametic Tests – Independent Samples…
If equality of variances assumption in doubt:
Use the Brown-Forsythe or Welch test
Select ANOVA and click on Options… button and select the Brown-Forsythe and Welch options
23
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
We should use the Kruskal-Wallis or median tests as there is no trend to observe between these designs
The median test is cruder than Kruskal-Wallis and should only be preferred when ranges of extreme values have been summarised together, which was not the case here (see http://tinyurl.com/median-KW)
Select Analyze – Nonparametric tests – Independent Samples…
Add PerformanceScore as the Test Field and Design as the Groups variable on the Fields tab
Nonparametric one-way ANOVA
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Returns a significance value < 0.001 (ignore the note below the result as before)
Very strong evidence that there are differences between the groups (as before)
Select the Settings tab and Customize tests and the Kruskal-Wallis test on the Settings tab
Then select Run
24
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
ANOVA with unequal variances
Our data set violated the normality of errors assumption but there were also differences in variances
The Brown-Forsythe and Welch tests should only be used with unequal variances if the data and errors are normally distributed (shown for illustration purposes here)
Under Options… select Brown-Forsythe and Welch tests
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Both tests are again significant at 99.9%
Very strong evidence that the means are not equal
Generally the Welch test is slightly better unless there is one group with an extreme mean and a large variance (which was not the case here, so the Welch test should be preferred) – see (Field, 2013: 443)
25
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Multiple comparisons
What if we conclude there are differences between the groups?
We don’t know which pairs are different
We can do post-hoc tests to compare each pair of groups
Similar to 2-sample tests but adjusted significance levels for the multiple testing issue
Note: You should only run post hoc tests if you obtain a positive result from the ANOVA (or equivalent) test
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Which post hoc test?
For equal group sizes and similar variances, use Tukey (HSD) or REGWQ, or for guaranteed control over Type I errors (more conservative), use Bonferroni
For slightly different group sizes, use Gabriel
For very different group sizes, use Hochberg’s GT2
For unequal variances, use Games-Howell (also recommended as a backup in other circumstances)
Source: (Field, 2013: 459)
26
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Our data set violated the normality of errors assumption
but there were also significant differences in variances
Try using the Games-Howell post hoc test (shown for
illustration purposes only)
Run the One-Way ANOVA as before
Select Post Hoc… and Games-Howell
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Very strong evidence of differences between groups 1 and 3
Evidence of differences between groups 1 and 2
Weak evidence of differences between groups 2 and 3
27
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
This is the correct post hoc testing for our data set
Double click on this output box in the output window:
This opens the Model Viewer window
Nonparametric post hoc
testing
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Change the View to Pairwise Comparisons
28
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
The output should then look like this. Concentrate on the Adjusted Sig. values:
Weak evidence of a difference between Design 1 and Design 2
Very strong evidence of a difference between Design 1 and Design 3
No evidence of a difference between Design 2 and Design 3
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Note
SPSS version 22 does not use Mann-Whitney U test in its Kruskal-Wallis post hoc testing but a variant called the Dunn-Bonferroni test
The Sig. values given by the pairwise comparison in the Model Viewer are higher that those for the Mann-Whitney U test (e.g. for Designs 1 and 2 was found earlier to be 0.013; note: need one more decimal place to calculate the correction)
However, we can still use their relative size to decide which pairs to run an individual post hoc test using the Mann-Whitney U test
For our dataset, we do not need to run Designs 2 and 3 because we know it will be non-significant even with a correction for this bug but we should run Designs 1 and 3
To obtain the adjusted Sig. values, multiply the Sig. value by the number of pairs
29
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Legacy Mann-Whitney U test
The Mann-Whitney U test
will not work in the new
dialog with three groups
Use the legacy dialog
instead: Analyze >
Nonparametric Tests >
Legacy Dialogs > 2
Independent Samples…
Choose groups 1 and 3
Mann-Whitney U is the
default test
Returns the value 0.000
Double click on the Exact
Sig. output to check it to
one more decimal place
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Example 2: Summary of results
Pair
Post hoc test
Games-
Howell
Kruskall-
Wallis (Dunn-
Bonferroni)
Piarwise Mann-
Whitney U with
Bonferroni adjustment
1 and 2 0.035 0.066 0.040
1 and 3 < 0.001 < 0.001 < 0.001
2 and 3 0.086 0.287 Not tested
According to the preferred (second and third) tests there is:
Very strong evidence of a difference between Designs 1 and 3
(Weak) evidence of a difference between Designs 1 and 2
No evidence of a difference between Designs 2 and 3
30
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
Recap: We have considered:
Two groups:
Descriptives
Assumption checking (for parametric tests)
Independent samples t-test
Mann Whitney U test
Several groups:
Descriptives
Assumption checking (for parametric tests)
One-way ANOVA
Kruskal-Wallis test
Post hoc testing
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
statstutor resources
www.statstutor.ac.uk
Normality checking (draft – electronic copy provided)
Normality checking solutions (draft – electronic copy provided)
Independent samples t-test (paper copy provided)
Mann-Whitney U test (available from statstutor website)
One way ANOVA (paper copy provided)
One way ANOVA additional material (available from statstutor website)
Kruskal-Wallis test (draft – electronic copy provided)
31
Discover… The Centre for Academic Success
http://library.bcu.ac.uk/learner/
References
IBM (2014) Post hoc comparisons for the Kruskal-Wallis test. http://www-01.ibm.com/support/docview.wss?uid=swg21477370.
IBM developerWorks (2015) Bonferroni with Mann-Whitney? https://www.ibm.com/developerworks/community/forums/html/topic?id=51942182-1ad0-4f26-9a49-56849775ac4f.
Field, A. (2013) Discovering Statistics using SPSS: (And sex and drugs and rock 'n' roll). 4th edn. London: SAGE.
Sawilowsky, S. S. and Blair, R. C. (1992) A more realistic look at the robustness and Type II error properties of the t test to departures from population normality. Psychological Bulletin, 111(2), pp. 352–360.
Statistica (n.d.) Statistica Help: Nonparametric Statistics Notes – Kruskall-Wallis ANOVA by Ranks and Median Test. http://tinyurl.com/median-KW.