exploring group differences
DESCRIPTION
Exploring Group Differences. Before Break:. 1) Descriptive Statistics: Measures of central tendency Measures of variability Z-scores 2) Understanding statistical significance Hypothesis testing Alpha and p-values 3) Testing for relationships/associations between variables: - PowerPoint PPT PresentationTRANSCRIPT
EXPLORING GROUP DIFFERENCES
Before Break:
1) Descriptive Statistics: Measures of central tendency Measures of variability Z-scores
2) Understanding statistical significance Hypothesis testing Alpha and p-values
3) Testing for relationships/associations between variables: Correlation (Pearson’s r) Simple regression Multiple regression
After Break:
1) Testing for Group differences T-tests ANOVA
2) Understanding statistical significance Effect Size Power
3) Nonparametric statistics and ‘other’ common tests Chi-square Logistic regression
If you have a firm grip on pre-break material – the second half of the
course becomes much easier (in my opinion)
Quick review I have a dataset that contains information on fitness
and academic performance in middle-school children I want to know if fitness is related to academic success I’m going to use PACER laps to quantify fitness and ISAT
science scores to quantify academic success
I can answer this question in various ways – let’s start with measures of association (correlations)
What would be my null and alternative hypothesis using a correlation?
Results
What is the relationship between aerobic fitness and science ISAT results?
p = 0.009, what does this mean? Low chance of random sampling error We would only see a correlation this strong, or stronger, 9
times out of 1000 due to random sampling error (due to chance)
Association
Association (and prediction) statistics like correlation and regression are useful, but can be limited
The other “half” of statistical testing is centered around determining group differences
For example, we could ask our fitness/academics question a different way and use a different set of statistics Also useful in experiments (treatment vs control),
comparing genders (males vs females), etc…
Example
Imagine I use PACER laps to split kids into two different groups High Fitness (high number of laps) Low Fitness (low number of laps)
NOTE: I took a continuous variable and made it into a categorical variable (nominal/ordinal)
Now I can ask the question a different way What are my null and alternative hypotheses? Remember, I believe that fitness is related to
academic success
Example cont
HO: There is no difference in science scores between the high fitness and low fitness group Notice, no difference would mean fitness has no effect
HA: There is a difference in science scores between the high fitness and low fitness group A difference would indicate that fitness has some
effect
This is simple enough – we know how to calculate and compare means in SPSS…
High vs Low Fitness: Mean
Conclusion…? Should I reject the null hypothesis?
Wait – could this difference be due to random sampling error?
Need for new statistical test
Is this difference due to random sampling error?
Due to the effect of random sampling, the two groups will NEVER have the exact same science scores I need a way to determine if this difference is
REAL or due to RSE I need to use a statistical test that can
determine group differences and provide me with a p-value
T-test
A t-test is a family of statistical tests designed to determine if differences exist between two groups (and ONLY two groups) Based on t-scores (which are very similar to z-
scores), should tip you off they are based on mean and SD
They test for ‘equality of means’ If the two group means are equal – then there is no
difference 3 major types of t-tests
One sample t-test, independent samples t-test, paired-samples t-test
T-tests One-sample t-test =
Compares mean of a single sample to known population mean i.e., group of 100 people took IQ test, are they different from the
population average? Do they have above average IQ? Independent samples t-test =
Compares the means scores of two different groups of subjects i.e., are science scores different between high fitness and low fitness
Paired-samples t-test = Compares the mean scores for the same group of subjects on
two different occasions i.e., is the group different before and after a treatment?
Also called a dependent t-test or a repeated measures t-test
In all cases TWO group means are being compared
Independent Samples T-Test
Let’s start here, since we need to use this test for our fitness/science question
Independent Samples T-tests: Used with a two-level, categorical, independent
variable (High/Low Fitness) – ONLY two groups… and with one continuous dependent variable
(science ISAT scores) Statistical assumptions – 1) data are normally
distributed, 2) samples represent the population, 3) the variance of the two groups are similar (homoscedasticity of variance)
NOTE: Same as correlation/regression – except we no longer have to worry about a ‘linear’ relationship since one of our variables is categorical (high/low fitness)
SPSS: Data format
In SPSS, the science scores are my continuous, dependent variable
I created the ‘high’ and ‘low’ fitness groups based on how many PACER laps each child completed When I created them, I coded ‘high fitness’
as 0 and ‘low fitness’ as 1 You need to recognize how your data are coded
for a t-test
SPSS – T-test
Move dependent variable to “Test Variable”
Move your independent variable to “Grouping Variable” Notice, it now has 2
question marks SPSS needs to know
which groups to compare
“Define Groups”
SPSS – T-test
Recall, ‘high fitness’ was 0, ‘low fitness’ was 1 Manually enter these values into the box When done, hit “continue’, then ‘ok’
T-test results
The first box will contain what you’ve already seen – the mean of the two groups:
Notice, n, mean, standard deviation (ignore SE) for each group
The next box is too big for one screen, so I’ve split it into two pieces…
SPSS results - Output
Recall, both groups need to have equal variance (homogeneity of variance, or homoscedasticity)
SPSS tests for this using “Levene’s Test” Null hypothesis = There is equal variance This means you do NOT want a p-value <
0.05
SPSS results - Output
If this Levene’s Test p-value is > 0.05 Equal variances exist, use the top line of
the table If this Leven’s Test p-value is < or =
0.05 Equal variance does not exist, use the
bottom line Becomes harder to find a statistically
significant result
df – Degrees of Freedom The table also shows df, or ‘degrees of freedom’ df is used to calculate the t-score and p-value
for the t-test df = n – 1
For each group you have, subtract 1 from the sample
We have two groups: High Fitness, n = 176, so 176 – 1 = 175 Low Fitness, n = 98, so 98 – 1 = 97 Total n = 176 + 98 = 274, we have 2 groups so… Total df = 274 – 2 = 272
Degrees of Freedom
df is important to understand if you are calculating the p-values by hand – we are NOT
All you need to know now is that: Larger sample size = ↑ df More groups = ↓ df
You want large df because it reduces your chance of random sampling error (a large sample) and increases the chance you’ll find statistically significant results
This becomes more important beyond t-tests, since we can have several groups (not just 2)
df in our example
Notice, the df in our example is 272 (274 subjects minus our two groups (high and low fitness)
If you do not have equal variances, SPSS ‘downgrades’ your df, making it more difficult to find statistically significant results
Before we move on… Questions about equality of variance test? df? Remember, we’re trying to determine if the
difference between the two groups is real – or due to RSE
What we know so far:
And, the two groups do have equal variance
More results
Here is the important stuff (remember, using top line): Our two groups (high/low fitness) had a mean
difference of 12.2 on the science ISAT 239.1 – 226.9 = 12.2
This difference is statistically significant, p = 0.001
Decisions
HO: There is no difference in science scores between the high fitness and low fitness group
HA: There is a difference in science scores between the high fitness and low fitness group
Decision? Results: The high fitness group scored higher than
the low fitness group on their science ISAT test by 12.2 points. This difference was statistically significant, t (272) 3.262, p = 0.001.
Usually report the t value of the test and the degrees of freedom in the paper (from table)
Questions about t-test results?
One more thing…
Notice in the t-test table that we also were provided with a 95% confidence interval:
95% confidence intervals are a statistic available from most tests, and are related to p-values. Lower Bound = 4.8, upper bound = 19.5
95% confidence intervals
Confidence intervals are similar to p-values Remember, p-values indicate probability of random
sampling error We want low p-values, which indicate a low
probability of random sampling error We most often use a p-value cutoff of 0.05,
meaning we like to be 0.95 (or 95% confident) that this was NOT due to random sampling error
Confidence intervals give you a similar type of information, but in a more practical sense Many people prefer confidence intervals over p-
values
95% confidence intervals
Remember, in statistics we are using samples to try and figure out information about the population When we calculate a mean for a sample, we are really
trying to understand what the REAL population mean is But, due to random sampling error, we always know
that our sample mean is different from the real population mean
Example – mean IQ score for all 7 billion humans is 100 Sample 1 of 100 humans = 102.1, Sample 2 = 105.3,
Sample 3 = 98.2, etc… Random sampling error
IQ
10085 11570 130
X = 100SD = 15
55 145
Distribution of IQ scores from the entire population
Sample 1 Mean = 102.1Sample 2 Mean = 105.3Sample 3 Mean = 98.2
1
23
Pretend we keep on drawing more and more samples until we got 100 different samples and 100 different lines on this chartIf we did that, would there be a pattern to where the lines were
drawn?
Would ALL the lines be so close to the population mean of 100?
10085 11570 130
X = 100SD = 15
55 145
Distribution of IQ scores from the entire population
Not all samples will be close to 100, just due to random sampling error
However, a 95% confidence interval would tell you where 95% of the 100 lines fell
95% Confidence Interval
10085 11570 13055 145
Distribution of IQ scores from the entire population
Could also make a 99% confidence interval if we wanted to
But notice, the ‘more confident’ we want to be, the wider the gap gets
99% Confidence Interval
Usually, people stick with a 95% confidence interval (since we usually use a p-value of 0.05)
10085 11570 130
X = 100SD = 15
55 145
Distribution of IQ scores from the entire population
In this example, a 95% confidence interval indicates that we are 95% certain that the REAL population mean falls between these two values
95% Confidence Interval
We can use a 95% confidence interval for virtually any population parameter we want to – such as a correlation coefficient, a regression slope, or a mean difference between two groups (like with our t-test)
Back to our t-test
Our 95% confidence interval:
Notice is says, “Interval of the Difference” We are 95% certain that the real difference is AT
LEAST as big as 4.8 and might be up to 19.5 points between our High and Low Fitness Groups
Our confidence level can never be 100%, so there is always a chance the real population difference is outside of this range (just like p can never be 0)
Confidence Intervals and p-values These two values are connected because:
Both related to RSE Both calculated using n (and df)
A low p-value (low chance of random sampling error) will result in a smaller (more narrow) confidence interval – we can be more confident
A larger p-value will result in a wider confidence interval – we are less confident
Questions on confidence intervals?
One more example t-test
Instead of using Aerobic fitness, let’s use flexibility
I split my sample into High Flexibility and Low Flexibility groups (based on sit and reach test)
Now, I’ll run a t-test to see if the High Flexibility kids score higher on their science tests than the Low Flexibility kids
What are my hypotheses?
HO: There is no difference in science scores between the high flexibility and low flexibility group
HA: There is a difference in science scores between the high flexibility and low flexibility group
T-test results: Flexibility
We can see that the high flexibility group has a higher Science ISAT score by about 5, but is this difference statistically significant???
T-test results: Flexibility
Levene’s Test p = 0.521 What does this mean?
df = 285 What is our sample size?
T-test results
Notice the mean difference (difference between High/Low groups) and the 95% confidence interval I have intentionally removed the p-value for this t-test
Is there a statistically significant difference between the two groups? Is the p-value < or = 0.05?
T-test results
Remember, to reject the null hypothesis we have to be reasonably certain that the two groups are different If this was the case, the difference between the
two groups could NOT be 0 If the mean difference is 0, the two groups are
identical
When the 95% confidence interval INCLUDES 0, we can’t be 95% certain that there is a group difference – and therefore, p is > 0.05
T-test results
Our 95% confidence interval includes 0 (one number is negative and the other is positive)
Therefore, we can’t be 95% certain the real group difference is NOT 0
p = 0.311, we can’t be sure this is not due to RSE
95% CI and p
If your 95% CI includes 0, your p-value will NOT be less than or equal to 0.05 Because both statistics are evaluating the
chance of RSE
If your 95% CI does not include 0 (both numbers are positive or both are negative), then we can be confident that the two groups are not the same
This means that p < or = 0.05Questions…?
Upcoming…
In-class activity
Homework: Cronk re-read 6.1, complete 6.3 (skip 6.2 for
now) Holcomb Exercises 35 and 37, 38, 39
More t-tests next week Single sample t-test Paired samples t-test (repeated measures t-test)
Example In-Class, 10 minutes Go to Blackboard and open the SPSS dataset
‘Fitness and Academics Reduced’ (week 7) Run two different independent samples t-
tests Determine if kids who are aerobically fit (using
PACER) score higher in reading or math than kids who have low fitness
Write down your results in this format (x2): T = XX, df = XX, p = XX Mean difference = XX, 95%CI (XX to XX)
Results of two t-tests
Results of two t-tests
Reading (equal variances NOT assumed): t = 4.856, df = 411.2, p < 0.0005 Mean difference = 9.8, 95%CI (5.9 to 13.8)
Math (equal variances assumed): t = 4.021, df = 837, p < 0.0005 Mean difference = 8.8, 95%CI (4.5 to 13.0)