comparing means: t-tests wednesday 22 february 2012/ thursday 23 february 2012

Comparing Means: t-tests

Wednesday 22 February 2012/ Thursday 23 February 2012

Judging whether differences occur by

chance… How do we judge whether it is plausible that two population means are the same and that any difference between sample means simply reflect sampling error?Example: Household size of minority ethnic groups(HOH = Head of household; data adapted by Richard Lampard from 1991 Census)

1.The size of the difference between the two sample means

MeanIndian HOH: 3.0Bangladeshi HOH: 5.0

MeanIndian HOH: 3.0Pakistani HOH: 4.0

The first difference is more ‘convincing’

2. The sample sizes of the two samples

MeanPakistani HOH: 3 4 5 4.0Bangladeshi HOH: 4 5 6 5.0

MeanPakistani HOH: 2 2 3 4 4 4 5 5 5 6 4.0Bangladeshi HOH: 2 3 4 4 5 5 6 6 7 8 5.0

The second difference is more ‘convincing’

Judging whether differences occur by chance…

3. The amount of variation in each of the two groups (samples)



The second difference is more ‘convincing’.

Judging whether differences occur by chance…

Example: the impact of variability on the difference of means.

The logic of a statistical test…The two statistical tests that we have so far looked at:

Testing the plausibility of a suggested population mean (via a t-test [or z-test]): Is the sample mean sufficiently different from the suggested population mean that it is implausible that the suggested population mean is correct?

Chi-square test: Are the observed frequencies in a table sufficiently different from what one would have expected to have seen if there was no relationship in the population for the idea that there is no relationship in the population to be implausible?

Both have asked whether the difference between the actual (observed) data and what one would have expected to have seen given a particular hypothesis is sufficiently large that the hypothesis is implausible.

…the same logic applies to comparing sample means.

If the two samples came from populations with identical population means, then one would expect the differencebetween the sample means to be (close to) zero. Thus,

• The larger the difference between the two sample means, the more implausible is the idea that the two population means are identical.

This is also affected by:• Sample size• The extent to which there is variation within each group (or

sample).

The logic of a statistical test…

t-tests

• Test the null hypothesis:H0: 1 = 2 or H0: 1- 2 = 0

• The alternative hypothesis is:

H1: 1 2 or H1: 1- 2 0

What does a t-test measure?

Note: T = treatment group and C = control group (from experimental research). In most discussions these will just be shown as groups 1 and 2, indicating different groups.

Example• We want to compare the average amount of

television watched by Australian and by British children.

• We have a sample of Australian and a sample of British children. We could say that what we have is something like this:

Population of Australian children

Population of British children

Sample of Australian children

Sample of British children

inference inference

Want to compare

Example contd.• Here the dependent variable is hours of TV• And the independent variable is

nationality. • When we are comparing means SPSS calls

the independent variable the grouping variable and the dependent variable the test variable.

Example contd.• If the null hypothesis, of no difference between the two

groups, is correct (and children watch the same amount of television in Australia and Britain) we would assume that if we took repeated samples from the two groups the difference in means between them would generally be small or zero.

• However it is likely that the difference between any two particular samples will be greater than zero.

• Therefore we build up a sampling distribution of the difference between the two sample means.

• We use this distribution to determine the probability of getting an observed difference between two sample means from populations with no difference.

If we take a large number of random samples and calculate the difference between each pair of sample means, we will end up with a sampling distribution that has the following properties:

It will be a t-distribution

(Under the null hypothesis) the mean of the difference between sample means will be zero

Mean M1 - M2 = 0

The spread of scores around this mean of zero (the standard error) will be defined by the formula:

This is called the pooled variance estimate

Back to example…

Descriptive statistic

Australian sample

British sample

Mean 166 minutes 187 minutes

Standard deviation 29 minutes 30 minutes

Sample size 20 20

When we are choosing the test of significance it is important that:

1. We are making an inference from TWO samples (of Australian and of British children). Therefore we need a two-sample test

2. The two samples are being compared in terms of an interval-ratio variable – hours of TV watched. Therefore the relevant descriptive statistic is the mean.

These facts lead us to select the two sample t-test for the equality of means as the relevant test of significance.

Table 1. Descriptive statistics for the samples

Descriptive statistic

Australian sample

British sample

Mean 166 minutes 187 minutes

Standard deviation

29 minutes 30 minutes

Sample size 20 20

SDM = (20-1)292 + (20-1)302 20+20 = 9.3 20 + 20 – 2 20 x 20

tsample = 166 – 187 = – 2.3

9.3

Calculating the t-score

Obtaining a p-value for a t-score• To obtain the p-value for this t-score we need to consult

the table for critical values for the t-distribution (see Appendix A.2. in Field)

• The number of degrees of freedom we refer to in the table in the combined sample size minus two:

df = N1 + N2 – 2• Here that is 20 + 20 – 2 = 38• The table doesn’t have a row of probabilities for 38. In that

case we refer to the row for the nearest reported number of degrees of freedom below the desired number. Here that is 35.

• With 38 degrees of freedom on a two-tail test, tsample falls between the two stated t-scores of 2.03 and 2.72.

• The p-value, which falls between the significance levels for these scores is therefore between 0.01 and 0.05

• Therefore the p-value is statistically significant at a 0.05 level but not at a 0.01 level.

Reporting the results

We can say that:The mean number of minutes of TV watched by the sample of 20 British children is 187 minutes, which is 21 minutes higher than the sample of 20 Australian children, and this difference is statistically significant at the 0.05 level (t(38)= -2.3, p = 0.03, two-tail). Based on these results we can reject the hypothesis that British and Australian children watch the same average amount of television every night.

Calculating the effect size

• To discover whether the effect is substantive we want to know the size of it.

• You can convert t-values into an r-value (a PRE statistic) with the following equation:

r = t2 = -2.32 = 0.34 t2 + df -2.32 + 38

This is a medium sized effect.

comparing means: t-tests wednesday 22 february 2012/ thursday 23 february 2012

Documents