hypothesis tests - hwjphillips/das/topic05.pdf · so 95% of the sample means lie between 146.28 and...

44
1 Topic 5 Hypothesis Tests Contents 5.1 Introduction to Tests of Hypothesis ......................... 2 5.1.1 Type 1 and 2 Errors ............................. 3 5.1.2 One-tailed and two-tailed tests ....................... 4 5.1.3 Different Significance Levels ........................ 5 5.2 Single mean - large samples ............................ 6 5.3 Single proportion - large samples .......................... 10 5.4 Difference of two means - large samples ...................... 12 5.5 Difference of two proportions - large samples ................... 14 5.6 Small Samples .................................... 17 5.6.1 Single mean ................................. 19 5.6.2 Confidence Intervals with Small Samples ................. 21 5.6.3 Difference of 2 Means from Small Samples ................ 21 5.6.4 Paired t test .................................. 24 5.7 The Chi-Squared Distribution ............................ 26 5.7.1 Checking for Association - Hair and Eye Colour .............. 27 5.7.2 Limitations of Chi-squared test ....................... 30 5.7.3 Goodness of Fit Tests ............................ 32 5.8 Coursework 1 ..................................... 36 5.9 Summary and assessment ............................. 38 Learning Objectives identify situations in experimentation where a hypothesis test will produce a useful result appreciate the ideas of null and alternative hypotheses use the standardised Normal distribution in hypothesis tests involving large samples use the student’s t distribution in hypothesis tests involving small samples explain Type 1 and Type 2 Errors use the formulae for standard error and test statistic in the cases of

Upload: others

Post on 31-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

1

Topic 5

Hypothesis Tests

Contents

5.1 Introduction to Tests of Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . 2

5.1.1 Type 1 and 2 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

5.1.2 One-tailed and two-tailed tests . . . . . . . . . . . . . . . . . . . . . . . 4

5.1.3 Different Significance Levels . . . . . . . . . . . . . . . . . . . . . . . . 5

5.2 Single mean - large samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

5.3 Single proportion - large samples . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.4 Difference of two means - large samples . . . . . . . . . . . . . . . . . . . . . . 12

5.5 Difference of two proportions - large samples . . . . . . . . . . . . . . . . . . . 14

5.6 Small Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.6.1 Single mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.6.2 Confidence Intervals with Small Samples . . . . . . . . . . . . . . . . . 21

5.6.3 Difference of 2 Means from Small Samples . . . . . . . . . . . . . . . . 21

5.6.4 Paired t test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.7 The Chi-Squared Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.7.1 Checking for Association - Hair and Eye Colour . . . . . . . . . . . . . . 27

5.7.2 Limitations of Chi-squared test . . . . . . . . . . . . . . . . . . . . . . . 30

5.7.3 Goodness of Fit Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.8 Coursework 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.9 Summary and assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Learning Objectives

� identify situations in experimentation where a hypothesis test will produce a usefulresult� appreciate the ideas of null and alternative hypotheses� use the standardised Normal distribution in hypothesis tests involving largesamples� use the student’s t distribution in hypothesis tests involving small samples� explain Type 1 and Type 2 Errors� use the formulae for standard error and test statistic in the cases of

Page 2: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

2

a) single mean - large samples

b) single proportion - large samples

c) difference between two means - large samples

d) difference between proportions - large samples

e) single mean - small samples

f) difference between two means - large samples� decide when to use a One or Two Tailed Test� appreciate the concept of degrees of freedom� calculate confidence interval for population mean based on a sample mean froma small sample� use a paired t test

c�

HERIOT-WATT UNIVERSITY 2003

Page 3: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.1. INTRODUCTION TO TESTS OF HYPOTHESIS 3

5.1 Introduction to Tests of Hypothesis

In the last Topic it was seen that a sample could be used to infer a confidence intervalfor the mean of the population that it was taken from. A very useful fact is that thismethod can be turned on its head and instead of being used to estimate a propertyof the population, a sample can help prove whether it is likely that a population has aparticular mean value (or proportion). In this chapter most of the worked examples willstart off by suggesting a hypothesis (assumption) and effectively either proving it to betrue or deciding that it is false.

Imagine that a company states in its sales pitch that a particular model of its mobilephones lasts for 150 hours before it is required to be next charged up. If you werethinking of buying one, you may like to obtain some proof that this assertion is true. Oneway of doing this is to take a sample of phones, make a number of measurements andthen calculate the mean number of hours between charging. It would be impossible todo this for every phone produced since the population is so large, so the best that canbe done is to calculate a sample mean.

Suppose that a sample of 40 was taken and this produced a mean value of 147.4 hours.Does this mean that the manufacturer’s claim has been disproved? Clearly 147.4 is lessthan 150 so it looks as if the manufacturer is over-estimating the time between charging.However, it must be appreciated that this was just one sample; it was shown in the lasttopic that if another sample was taken it might give a very different result (for example,it could give a value of 152.3 hours, in which case the phones are doing better than themanufacturers claim!).

The method of hypothesis testing starts by making an assertion about the population,usually an assumption that the mean is equal to a stated result. In this case it ishypothesised that the population mean, � , for the mobile phones is 150 hours. TheCentral Limit Theorem will next be used, and to do this a value for the standard deviationis required. Assume that in this case the population standard deviation is 12 hours.

From the last topic, 95% of all sample means lie between ������� ���� �The term �� � (which is the standard deviation of the sample means) is often called theStandard Error (S.E.). In this case it is equal to 1.90.

So the upper and lower bounds calculate as 150 � 3.72.

So 95% of the sample means lie between 146.28 and 153.72.

All the calculations so far have been based on the population; it is only now that thesample mean value needs to be used - recall that this was calculated as 147.4 hours.This is within the range of values that 95% of sample means are expected to fallbetween, so it has not been possible to disprove the hypothesis that the mean is 150.In other words there is only a 5% chance that the population mean is not 150. This isknown as a significance test with level 0.05.

There is no evidence to dispute the manufacturer’s claim at the 5% level.

The supposition that the population mean is equal to 150 can be written as

H0: � = 150

This is called the Null Hypothesis.

c�

HERIOT-WATT UNIVERSITY 2003

Page 4: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.1. INTRODUCTION TO TESTS OF HYPOTHESIS 4

To decide whether or not this assertion is true, it is necessary to have a comparison withan alternative hypothesis (so that one or the other will be true). This is written as

H1: ���� 150

It is usual to then draw a Normal distribution curve and shade in the appropriatesignificance level (here 5%).

The whole calculation can then be expressed more briefly in a diagram as:

Since the sample mean, 147.4, is not in the shaded region H0 is accepted. There is noevidence at the 0.05 level of significance that the population mean is not 150.

5.1.1 Type 1 and 2 Errors

Since probabilities are used in the hypothesis tests, there is always the chance of anerror in the conclusion being made. In the mobile phone example it is only being said thatthe sample mean value is consistent with a population mean value with 95% confidence.There is a 5% chance that the population mean value is not 150 hours. If the populationmean is, in fact, not 150 hours but the hypothesis test resulted in accepting H0, it is saidthat a Type 2 Error has occurred. Conversely, if H0 is actually true, but the sample meanresulted in it being rejected, it is said that a Type 1 Error has been made. This can besummarised in the table below.

c�

HERIOT-WATT UNIVERSITY 2003

Page 5: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.1. INTRODUCTION TO TESTS OF HYPOTHESIS 5

State of NatureDecision H0 is true H0 is false

Accept H0 correct decision Type 2 Error

Reject H0 Type 1 Error correct decision

5.1.2 One-tailed and two-tailed tests

In the mobile phone example, recall that the diagram of the normal distribution curvehad a 5% area shaded and this was split between both "tails". This will always be thecase when the alternative hypothesis has a "not equal to" sign and is called a two-tailedtest (for obvious reasons!).

In some hypothesis tests, the alternative hypothesis is given as " � is less than" or " �is greater than" some value. In cases like this, only one side of the normal distributioncurve is shaded and, not surprisingly, the test is called a one-tailed test.

The example will now be re-worked as a one-tailed test. A competing mobile phonemanufacturer wishes to prove that the time between charging for his rival’s phone isless than 150 hours. The hypotheses (plural of hypothesis) now become

H0 : ��� 150

H1 : ��� 150

The normal distribution curve in this case now has only one side shaded

To calculate the "cut-off" point, this time it is not ����� �! "$#&%' )(* that is used, but �,+� �".-$#�%' )(/ (From tables, the value of 1.64 gives an area under the normal distributionof approximately 0.05, whereas 1.96 gave 0.025)

This means that the lower bound is �1023+4� �".-5# 6879 :<;>= �?-@"A �B !cC

HERIOT-WATT UNIVERSITY 2003

Page 6: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.1. INTRODUCTION TO TESTS OF HYPOTHESIS 6

Using the same sample value as before since the sample mean, 147.4, is not in theshaded region, again the null hypothesis is accepted. There is no evidence at the 0.05level of significance that the population mean is less than 150.

Notice that for one-tailed tests with " D " in the alternative hypothesis it is the left handside of the Normal distribution curve that is shaded, whilst if it is " E " in the alternativehypothesis the right hand side is shaded.

5.1.3 Different Significance Levels

The significance level of 5% (or 0.05 in decimals) has been used in the mobile phoneexample. This is a very common value to use but it is not the only one that can beemployed. It implies that there is a 5% chance of making a mistake. However, if it isnecessary for the margin of error to be less (in medical matters, say) then this can bereduced to 1% or even 0.1% (or, indeed any other value). Changing the significancelevel will have an effect on the "cut-off" point. For example, in the mobile phone examplefor a two-tailed test and a significance level of 1%, the upper and lower bounds wouldbe calculated as

150 F 2.58 x S. E. i.e. from 145.10 to 154.90

The lower the significance level, the more difficult it is to prove the alternative hypothesis(which is often what you hope to do). If an alternative hypothesis is proved at the 5%level it is said to be significant; a level of 1% is termed highly significant whilst a 0.1%level is deemed to be a highly significant result.

To help in calculations at different significance levels, for the general result GHFJILKNMPO'Q)R/Q ,the appropriate z values are given in the table below

cS

HERIOT-WATT UNIVERSITY 2003

Page 7: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.2. SINGLE MEAN - LARGE SAMPLES 7

TZ U T

Z U TZ U

.50 0.0000 .050 1.6449 .030 1.8808

.45 0.1257 .048 1.6646 .029 1.8957

.40 0.2533 .046 1.6849 .028 1.9110

.35 0.3853 .044 1.7060 .027 1.9268

.30 0.5244 .042 1.7279 .026 1.9431

.25 0.6745 .040 1.7507 .025 1.9600

.20 0.8416 .038 1.7744 .024 1.9774

.15 1.0364 .036 1.7991 .023 1.9954

.10 1.2816 .034 1.8250 .022 2.0141

.05 1.6449 .032 1.8522 .021 2.0335

TZ U T

Z U TZ U

.020 2.0537 .010 2.3236 .050 1.6449

.019 2.0749 .009 2.3656 .010 2.3263

.018 2.0969 .008 2.4089 .001 3.0902

.017 2.1201 .007 2.4573 .0001 3.7190

.016 2.1444 .006 2.5121 .00001 4.2649

.015 2.1701 .005 2.5758 .025 1.9600

.014 2.1973 .004 2.6521 .005 2.5758

.013 2.2262 .003 2.7478 .0005 3.2905

.012 2.2571 .002 2.8782 .00005 3.8906

.011 2.2904 .001 3.0902 .000005 4.4172

T : significance level

5.2 Single mean - large samples

Hypothesis tests can be carried out on many different types of experimental data butthe method of implementation is always the same. The main points to note are thatthe analysis should always begin by stating the null and alternative hypotheses, anappropriate measure of standard error should then be calculated and finally the samplevalue should be plotted on the appropriate distribution curve - depending on where itlies the null or alternative hypothesis will be accepted.

Comparisons with the Normal distribution curve are only valid if the sample size isgreater than 30; when this is the case the sample is categorised as large. Small sampleswill be considered later.

The formula for the Standard Error in problems involving one large sample comesstraight from the Central Limit Theorem given earlier.V'W)X*WAY Z[ \c]

HERIOT-WATT UNIVERSITY 2003

Page 8: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.2. SINGLE MEAN - LARGE SAMPLES 8

Examples

1. The time between server failures in an organisation is recorded for a sample of 32failures and the mean value calculates as 992 hours. The organisation works on theassumption that the mean time between server failures is 1000 hours with a standarddeviation of 20. Is it justified to use this figure of 1000 hours? Use a significance level of0.05.

H0 : ^ = 1000

H0 : ^�_` 1000a'b)c*b ` dfeg h d `ji b�k i lThis is a two-tailed test with 2.5% shaded on each side of the Normal distribution so thecut-off points are given by 1000 m 1.96 x 3.536 , i.e. 993.070 and 1006.930.

This is shown on the diagram below, together with the sample mean of 992.

Since the sample mean is in the shaded area, the null hypothesis is rejected and sothe alternative hypothesis accepted. This means that there is evidence at the 5% levelthat the population mean is not 1000, so the organisation might like to review theirspecification for the server which, in fact, is performing better than they indicate.

It is often the case when performing hypothesis tests that a test statistic is calculatedfrom the sample value and this is compared with the standardised normal curve. Thisis doing exactly the same thing that was shown in Chapter 2 when converting Normaldistributions into a form that could be compared with the tables.

In this case, the test statistic is n ` oprqsut vwtThis gives z = -2.26

This is now compared with the standardised Normal curve

cx

HERIOT-WATT UNIVERSITY 2003

Page 9: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.2. SINGLE MEAN - LARGE SAMPLES 9

It is clearly seen that the test statistic falls in the shaded area so H1 is accepted asbefore.

The two previous diagrams show that the two methods are identical but simply involveconsidering different scales.

2. It is suspected that in a particular experiment the method used gives an under-estimate of the boiling point of a liquid. 50 determinations of the boiling point of waterwere made in an experiment in which the standard deviation was known to be 0.9degrees C. The mean value is calculated to be 99.6C. The correct boiling point of wateris 100 degrees C. Use a significance level of 0.01.

Since it would be desirable to prove that the population mean is less than 100, it issensible to use a one-tailed test with alternative hypothesis y{z 100. So the hypothesesbecome

H0 : y�| 100

H0 : y�z 100

The standard error is }~ �$�j�����1�u�Test statistic = 99.6 - 100/0.127 = -3.15

The standardised normal distribution curve gives a z value of -2.33 for an area of 0.01.Thus the diagram, with test statistic marked in, has the appearance:

c�

HERIOT-WATT UNIVERSITY 2003

Page 10: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.2. SINGLE MEAN - LARGE SAMPLES 10

Since the test statistic is in the shaded region, H1 is accepted. There is evidence at the1% level that the population mean is less than 100. It is logical to assume, then, that themethod of the experiment is underestimating the boiling point.

In the above examples the standard deviation of the population mean was known. Oftenthis will not be the case so as long as the samples are large ( � 30) it is acceptableto estimate this value by using the sample values (as was done in Topic 3 with theconfidence intervals).

Hypothesis testing

Q1:

A particular questionnaire is designed so that it can be completed in 2 minutes. Overa number of days a researcher measures the time taken by everyone who fills in theform. The results are given in the table below (and can be downloaded here). Take arandom sample and carry out a hypothesis test to check whether the 2-minute expectedcompletion time is valid. Times are given in minutes.

2.44 2.20 1.49 2.39 2.59 2.63 2.20 2.662.71 2.48 2.99 1.92 2.59 3.21 1.92 1.732.46 2.14 2.95 2.19 1.67 2.31 2.38 1.522.53 2.71 2.33 2.12 2.12 2.08 1.73 2.411.76 1.89 2.54 2.62 1.86 2.05 1.03 2.492.52 2.97 1.37 0.91 2.99 2.87 2.22 2.582.60 2.32 2.01 1.25 1.79 1.84 2.03 2.102.26 1.46 2.04 1.53 1.78 1.87 1.98 1.721.87 2.16 2.21 2.19 2.08 2.24 2.40 1.731.80 2.16 1.93 2.29 2.49 1.26 2.34 2.12

c�

HERIOT-WATT UNIVERSITY 2003

Page 11: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.3. SINGLE PROPORTION - LARGE SAMPLES 11

2.12 2.16 1.73 1.81 1.83 2.29 1.512.23 1.60 1.99 2.80 2.25 1.97 1.911.88 1.90 2.19 2.22 2.56 1.50 1.441.75 2.04 1.06 1.95 2.35 2.06 2.591.81 1.77 1.77 1.50 2.67 2.23 2.582.41 2.02 1.66 1.83 2.57 1.95 1.882.21 2.19 2.54 2.38 2.32 2.07 2.572.03 1.49 1.69 1.97 2.58 2.42 2.192.96 1.94 2.26 1.99 2.23 2.39 2.042.12 2.29 1.13 2.22 2.12 1.99 2.04

5.3 Single proportion - large samples

It was shown in Topic 3 that sample proportions also follow the theory of the CentralLimit Theorem. The standard deviation of the proportions (which will now be referred

to as the Standard Error) was given by the formula� �����f�r� �� , where � is the population

proportion and n is the sample size (again considered to be � 30). Hypothesis tests canbe carried out in much the same way as before.

Examples

1.

A survey of the first beverage that residents of the UK take when they waken up in themorning has shown that 17% have a cup of tea. It is thought that this figure might behigher in the county of Yorkshire, so a random sample of 550 Yorkshire residents isquestioned and out of that number 115 said they had tea first thing. Using a significancelevel of 0.05, test the idea that the tea figure is higher in Yorkshire.

The population proportion is thought to be 17% (or 0.17 as a decimal) so this is the figurethat must be used in the hypotheses (like in the "mean" case, where it was always thepopulation mean that was mentioned in the null hypothesis). Since it is hoped that it canbe proved that the Yorkshire figure is higher than average, the alternative hypothesismust have the form, �,� 0.17.

The hypotheses are therefore:

H0: �,� 0.17

H1: �,� 0.17

Now calculate the Standard Error (S.E.)

In this case, �'�)�/��� � �����f�r� �� � � ����� �¡���f�¢����� �f�£f£ � ��¤��)¤¦¥1§¤The test statistic here is comparable to the one for means.¨ �ª© �r�« � ¬w� , where P is the population proportion, in this case 115/550 or 0.209¨ � ��� ­f�f®��¢����� �¯���� ���8°f� �j±A��² ²The standardised Normal curve can be drawn as before, with the value of 2.33 beingused as the cut-off point for 1% (or 0.01).

HERIOT-WATT UNIVERSITY 2003

Page 12: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.3. SINGLE PROPORTION - LARGE SAMPLES 12

Since the test statistic for the sample proportion is in the shaded area, there is enoughevidence to reject the null hypothesis and accept the alternative one. In other words,Yorkshire folk drink more tea than the National average (using a significance level of1%). Note that it is harder to prove a fact using a significance level of 1% than it is for5%, so it can be said that this is a highly significant result.

2.

In an ESP test, a subject has to identify which of the five shapes appears on a card. Ina test consisting of 100 cards, would you be fairly convinced that a subject does betterthan just guessing if he gets 30 correct? Test at 1% significance level and at 0.1%significance levels.

If he just guesses the proportion of times he would get the answer right is 1/5 = 0.2. Soit is hoped to prove that the sample corresponds to a population proportion greater than0.2. The hypotheses are therefore:

H0: ´,µ 0.2

H1: ´,¶ 0.2

Now calculate the Standard Error (S.E.)

In this case, ·'¸)¹/¸�º¼» ½�¾�¿fÀr½ Á º¼» Ã�Ä Å�¾�¿fÀ¢Ã�Ä Å<Á¿8Ãfà º�Æ�¸)ÆÇThe test statistic is ÈɺËÊ Àr½Ì Ä ÍwÄ , where P = 30/100 = 0.3

So Èκ Ã�Ä Ï�À¢Ã�Ä ÅÃ�Ä Ã¯Ð ºjÑA¸�ÒThe standardised Normal curve with appropriately shaded regions is shown below.

HERIOT-WATT UNIVERSITY 2003

Page 13: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.4. DIFFERENCE OF TWO MEANS - LARGE SAMPLES 13

The test statistic is in the shaded region for the 1% significance test so accept H1 here.There is evidence at the 1% level that the subject displays powers of ESP.

However at the 0.1% level of significance, the test statistic is not in the shaded region.Therefore the null hypothesis has to be accepted in this case.

This shows that there is a highly significant evidence of the subject displaying ESP, butnot a very highly significant result.

Note that in both examples n Ô and n(1- Ô ) Õ 5, a property that is required for the CentralLimit Theorem to be valid.

5.4 Difference of two means - large samples

So far in this chapter the hypothesis tests have been used to compare one sample meanor proportion with a known value. However, it is very often the case that comparisonsare required between two samples in order to decide which is the better of the two for acertain purpose. For example, if a new piece of software is introduced into an office andworkers think that their job is now taking longer on the new system, it would be useful tohave a statistical test to check out their claims.

The Central Limit Theorem provides useful information about the distribution of samplemeans. However, it can be extended to also give information about the distribution ofthe difference of two sample means. In fact, it can be proved that this distributionis Normally distributed with mean 0. This is a very useful and interesting result and ithighlights once again why the Normal distribution is so important in statistics! The samerules apply as for the single mean case that the original populations do not have to be

HERIOT-WATT UNIVERSITY 2003

Page 14: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.4. DIFFERENCE OF TWO MEANS - LARGE SAMPLES 14

Normally distributed as long as the sample size is greater than 30.

The standard deviation of the difference of two sample means, referred to again in thissection as the Standard Error (S.E.) is given by the formula:×'Ø)Ù*ØAÚÜÛ Ý.Þßà ßâá ÝãÞÞà Þwhere the subscripts 1 and 2 refer to population 1 and population 2 respectively.

As in previous examples for large samples, if the population standard deviation isunknown it is fine to use the sample standard deviations (usually referred to as s1 ands2)

The hypothesis tests usually start of by assuming that there is no difference betweenthe population means ( ä 1- ä 2 = 0) and either confirming this or proving the assumptionto be wrong.

Example

The response times of two hard drives are measured and the values are given in thetable below (times are measured in seconds).

Disk 1 Disk 2n1 = 35åçæ Úéè1ês1 = 5

n2 = 38åìë Úíè1îs2 = 4

Is there a significant difference between response times?

Start off by assuming that there is no difference between the populations that the twosamples come from.

H0: ä 1 - ä 2 = 0

There is no need to check whether one disk is better or worse than the other so atwo-tailed test is a reasonable thing to use. Therefore the alternative hypothesis is:

H1: ä 1 - ä 2 ïÚ 0

The method of the test follows the same pattern as the previous examples in this chapter.The next step is to calculate the standard error and use it in the test statistic.×'Ø)Ù*ØAÚ Û ð Þñ ð áíò Þñfó ÚéèØ)ôuê êSince now it is the difference of means that is being considered, the test statistic takesthe form:õ Ú÷ö ø ß?ù ø Þ?ú ù öüû ß ù û Þ úýuþ ÿwþSince it is being assumed in the null hypothesis that m1 = m2, the second bracketedterm on the numerator is equal to zero

Thus õ Ú ö æ�� ù æ ñ ú ù��æ þ � ��� Ú��AØ���èNow make a sketch of the standardised Normal distribution curve and choose asignificance level of 0.05.

c

HERIOT-WATT UNIVERSITY 2003

Page 15: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.5. DIFFERENCE OF TWO PROPORTIONS - LARGE SAMPLES 15

It can be seen that the test statistic is in the shaded region so the null hypothesis isrejected and the alternative hypothesis accepted. It has therefore been shown thatthere is a significant difference between the response times.

It would, of course, have been possible to carry out a one-tailed test if required in theexample. The hypotheses would change to:

H0: 1 - 2 � 0

H1: 1 - 2 � 0

5.5 Difference of two proportions - large samples

In the same way as the theory relating to one sample mean was extended to thecomparison of two sample means, exactly the same thing can be done for sampleproportions.

The Central Limit Theorem provides useful information about the distribution of sampleproportions. However, it can be extended to also give information about the distributionof the difference of two sample proportions. In fact, it can be proved that thisdistribution is Normally distributed with mean 0. This is a very useful and interestingresult and it highlights once again why the Normal distribution is so important instatistics! The same rules apply as for the single proportion case that the originalpopulations do not have to be Normally distributed as long as the sample size is greaterthan 30. Also it is required that np and n(1- p) are greater than 5 for each population.

The standard deviation of the difference of two sample proportions, referred to again in

c

HERIOT-WATT UNIVERSITY 2003

Page 16: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.5. DIFFERENCE OF TWO PROPORTIONS - LARGE SAMPLES 16

this section as the Standard Error (S.E.) is given by the formula:����������� ��������� �!�#"$ � % �'&(�)��� �'&�"�'&where the subscripts 1 and 2 refer to population 1 and population 2 respectively.

Now, usually the population proportions are unknown and the null hypothesis will beassuming in any case that they are the same. For these reasons, a pooled value of thesample proportions is used in the formula instead of * 1 and * 2. This is referred to as + .Thus the formula for the standard error becomes:��������� � , ����� , "$ � % , �)��� , "$ &The hypothesis tests usually start of by assuming that there is no difference between thepopulation proportions ( * 1 - * 2 = 0) and either confirming this or proving the assumptionto be wrong.

Example

It is desired to investigate the proportion of people who attend church regularly inScotland and in England, so two random samples are taken and the results are givenbelow.

Scotland England

Attend regularly 47 31Do not attend regularly 136 106Total 183 137

Is there any evidence that more people in Scotland attend church than in England?

This is a problem dealing with two proportions so the method of solution is to use theformulae for the difference of two proportions.

Since it is desired to prove that the Scottish proportion is higher than the Englishproportion, a one-tailed test has to be used. If Scotland is referred to with subscripts "1"and England with subscripts "2", the alternative hypothesis will have to be of the form

H1: * 1 - * 2 - 0

So the null hypothesis will be

H0: * 1 - * 2 . 0

The problem is solved using exactly the same procedures as all the previous ones.

1.

Hypotheses

H0: * 1 - * 2 . 0

H1: * 1 - * 2 - 0

2.

Calculation of Standard Error

First calculate +c/

HERIOT-WATT UNIVERSITY 2003

Page 17: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.5. DIFFERENCE OF TWO PROPORTIONS - LARGE SAMPLES 17

This is calculated as 0(1�243655�7�38295�381;:=<?>�@BA!ANow, C >�DE>�:GF HBI 5�J H'KLNM O HBI 5�J HPKL!Q : F R6S T 0�0'U R6S 1WV�X5�7�3 O R6S T 0�0'U R6S 1WV�X5�381 :=<?>�<YANZ![

3.

Calculate test statistic

In this case \ : I H M J H Q K J I^] M J ] Q K_ S `aSIn the null hypothesis we have b 1 - b 2 c 0 so take the extreme case that b 1 - b 2 =0.

Now, P1 = 47/183 = 0.257 and P2 = 31/137 = 0.226

This gives \ : I R6S T V81dJ R6S T�T X K J RR6S R 087�V :=<?>�e!f!g4.

Compare the test statistic with the standardised Normal distribution curve (use a5% significance level).

5.

Offer a conclusion.

Since the test statistic is not in the shaded area the null hypothesis in accepted.There is no evidence, at the 5% level, that a higher proportion of the Scottishpopulation attend church than does the English population.

Notice that in this example h i and hkjWlnm i4o are greater than 5 for both samplesizes.

cp

HERIOT-WATT UNIVERSITY 2003

Page 18: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.6. SMALL SAMPLES 18

5.6 Small Samples

In the large sample (n q 30) problems discussed earlier in this Topic it was acceptableto estimate the population standard deviation by using the sample standard deviation.With small samples, where more chance variation must be allowed for, there is moreuncertainty in estimating this value and hence also the standard error. Some modi-fication of the procedure of using the test statistic is needed, and the technique to useis the t test. Its foundations were laid by W.S. Gosset [1876-1937], who wrote under thepseudonym "student", so that it is sometimes known as student’s t test. The proceduredoes not differ greatly from the one used for large samples, and in the one sample casethe test statistic looks very like the one used earlier, namely rks tYu vwx yThis t value is no longer compared with the standardised Normal distribution curve. Infact, if the underlying distribution is Normal then this random variable is said to follow astudent t distribution with parameter z = n - 1.

This is similar to the Normal distribution in the sense that it is a symmetrical "bell-shaped" curve, but it is slightly flatter and hence wider (the total area under it, of course,still equals 1). Note, though, that as n gets larger, the curve becomes indistinguishablefrom the Normal distribution.

Unlike the Normal distribution, however, where the same values for "cut-off" points wereused whatever the sample size (e.g. 1.96 for an area of 0.025), this is not the case inthe t distribution. These values change depending on what the sample size is. Theycan be obtained from Statistical tables (or computer packages) and are categorised interms of a quantity called the degrees of freedom ( z ). To grasp the concept of degreesof freedom, imagine you have been asked to select 5 numbers whose mean is 30 - thesum of these numbers will therefore be 150. If the first four numbers selected were 25,26, 29 and 33 there is no choice for the fifth one other than 37. In other words there areonly 4 degrees of freedom. In general if you have n numbers and the mean is specifiedthen you have n - 1 degrees of freedom.

An example of the t distribution curve with 10 degrees of freedom ( z = 10) is drawnbelow with a shaded area of 2.5% in each tail.

c{

HERIOT-WATT UNIVERSITY 2003

Page 19: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.6. SMALL SAMPLES 19

Part of the t tables are shown below. The number highlighted refers to an area of 5%and degrees of freedom = 6.| = 0.10 0.05 0.025} = 1 3.078 6.314 12.706} = 2 1.886 2.920 4.303} = 3 1.638 2.353 3.182} = 4 1.533 2.132 2.776} = 5 1.476 2.015 2.571} = 6 1.440 1.943 2.447} = 7 1.415 1.895 2.365} = 8 1.397 1.860 2.306} = 9 1.383 1.833 2.262} = 10 1.372 1.812 2.228| = 0.01 0.005 0.001 0.0005} = 1 31.821 63.657 318.31 636.62} = 2 6.965 9.925 22.326 31.598} = 3 4.541 5.841 10.213 12.924} = 4 3.747 4.604 7.173 8.610} = 5 3.365 4.032 5.893 6.869} = 6 3.143 3.707 5.208 5.959} = 7 2.998 3.499 4.785 5.408} = 8 2.896 3.355 4.501 5.041} = 9 2.821 3.250 4.297 4.781} = 10 2.764 3.169 4.144 4.587

Notice that the numbers are all positive, so if the shading is on the left-hand side of thecurve, a negative sign is placed in front of the appropriate number.

c~

HERIOT-WATT UNIVERSITY 2003

Page 20: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.6. SMALL SAMPLES 20

The following diagram shows how the t-distribution changes as � (and hence the samplesize) increases.

To summarise, the properties of the t-distribution are:

1. The t-distribution is "bell-shaped" and symmetric

2. The t-distribution is actually a family of curves, each determined by a parametercalled the degrees of freedom ( � ), with � = n - 1

3. The total area under a t-curve is 1

4. The mean, median, and mode of the t-distribution are equal to zero

5. As the degrees of freedom increase, the t-distribution approaches the standardnormal z-distribution

5.6.1 Single mean

The method of the t test is best illustrated by an example.

Example

A paint manufacturer claims that on average one litre of paint will cover 14 squaremetres. A firm buying the paint suspects that this is an exaggeration so they take arandom sample of 12 litres and measure the area covered by each. The data are:

13.6 13.9 13.2 14.5 12.6 12.6 13.2 13.8 13.4 12.4 14.3 13.2

The population standard deviation is unknown so it has to be estimated from the sample.Since it has a size less than 30 the z test statistic cannot be used, but the t test can beemployed (as long as the original data follow a Normal distribution).

It is a straightforward process to show that the sample mean, �����������!�!� and the samplestandard deviation, ���=�?���!�!�c�

HERIOT-WATT UNIVERSITY 2003

Page 21: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.6. SMALL SAMPLES 21

Now the hypotheses are set up as before. Since it is suspected that the area of paintcoverage is less than 14 square metres, a one-tailed test is used. The alternativehypothesis should therefore be of the form ��� 14.

The hypotheses are summarised as:

H0: ��� 14

H1: ��� 14

Now the standard error has to be calculated. In the case of small samples with a singlemean the formula is simply �����E�?� �� �In this case, �����E�?���6� ������  �¡ �=¢?�¤£�¥!¦The t statistic is calculated by the formula§ � ¨Y© ª« � ¬­�So

§ �  �® � ®�¯�¡�6�  �¯�¡ �±°³²��¤£P´Now the t distribution curve is drawn with a 0.05 significance interval shaded. Note thatthe "cut-off" value is obtained from tables using 11 degrees of freedom ( µ = 11) andreading down the appropriate column.

Since the test statistic is in the shaded region the null hypothesis is rejected and thealternative accepted. There is evidence at the 5% level that the paint coverage is lessthan 14 square metres and so the manufacturer is exaggerating.

HERIOT-WATT UNIVERSITY 2003

Page 22: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.6. SMALL SAMPLES 22

5.6.2 Confidence Intervals with Small Samples

In this section a short diversion from hypothesis tests is taken to fill in the gap ofestimating a population mean from a small sample. As is the case with large samples,a point estimate of the population mean is given by the sample mean. However in thesmall sample case, the confidence interval will depend on the sample size as well asthe mean. The formula is given by·¹¸»º�¼�½ ¾À¿Á ÂÄÃÆÅ�à ·ÈÇɺ�¼�½ ¾É¿Á Âwhere · is the sample mean, s is the sample standard deviation, n is the sample sizeand ta,v is available from tables.

Example

The lengths in cm of a random sample of 7 components taken from a the output of amanufacturing process are:

3.1 3.4 3.4 3.3 3.2 3.3 3.0

Give a 95% confidence interval for the population mean.

By calculation, ·ËÊÍÌ�Î�ÏBÐNÌ and Ñ ÊÓÒ?ΤÔ�Õ?ÔA 95% confidence interval results in a shaded area of 2.5% in the tails of the t curve.From tables then, using a significance level of 0.025 and a value of Ö = 6, the value2.447 is obtained. Substituting the values in the appropriate formula gives:·¹¸»º�×�½ ¾ ¿Á  ÃØÅ�à ·ÈÇɺW×Ù½ ¾ ¿Á ÂÌ�Î�ÏBÐNÌÚ¸ÛÏ�ÎÜÐ!ÐÞÝàß6áÜâ�ã6âÁ ä ÃØÅ�à Ì�Î�ÏBÐN̳ÇåÏ�ÎÜÐ!ÐÞÝàß6áÜâ�ã6âÁ ä3.103 æ Å æ 3.383

In other words, it can be deduced that the population mean will lie between 3.103 and3.383 with 95% confidence.

5.6.3 Difference of 2 Means from Small Samples

The sampling distribution of the difference between two means of small samples followsa t distribution with mean Å 1 - Å 2 and standard error given by the rather complicatedlooking formula:

ç Î�èEÎ�ÊêéÑÞë Ôì â Ç Ôìîíwith éÑ being estimated from the two sample standard deviations as:éÑ Ê�ï ð ÂNñWò â#ó)¿�ô ñ#õ ð  ô ò â#ó)¿�ôô ñ õ  ô ò íThe subscripts 1 and 2 refer to sample and population 1 and 2 respectively.

The format of the hypothesis test follows exactly that of the large sample case, butclearly uses a different formula for the standard error and the comparisons are madewith t distribution curves rather than standardised Normal.

The degree of freedom ( Ö ) for problems of this type can be calculated as n1 + n2 - 1.

HERIOT-WATT UNIVERSITY 2003

Page 23: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.6. SMALL SAMPLES 23

Example

A survey was carried out to investigate the number of hours worked per week by peoplein various countries and two specific countries were highlighted, Japan and Russia. Ithad always been believed previously that Russians worked the highest number of hoursper week, but the data do not seem to support this. The results are given below.

Russians (hours worked) Japanese (hours worked)÷ùøûúýüNþ�ÿ��� ø úÓü ÿ����� ø ú�� ü

÷ ú�� �Yÿ��� ú���ÿ�þ��� ú����

Is there a significant difference between the number of hours worked per week byRussians and the Japanese? Test at the 5% level.

1.

Set up the hypotheses:

H0: � 1 - � 2 = 0

H1: � 1 - � 2 �ú 0

(Subscripts 1 refer to Russia and 2 Japan).

2.

Calculate the Standard Error:� ÿ��Eÿ�ú����� ø��� � ø��! with �� � " � �$# ø&%(' ! �&) " � !*# ø&%(' !!��� ) �+! # so �� ú � " ��� # ø&%(' ! � ) " ��! # ø&%(' !!� � ) � !,# ú � ø.-0/�132Üø. ! ) ø415/�-62 7$8 !ø41 ) ø.9 # ú=ü ÿ�:Yü;�Therefore,

� ÿ��Eÿ?ú<���� ø� �=� ø� ! úÓü ÿ�:Yü;� � øø41 � øø.9 ú>�Yÿ��+:Yü3.

Use the standard error in the test statistic:

In the case of difference of two means for small samples this is given by -? ú " @ �$# @ ! % # "�A �B# A ! %C 2 DE2And by the null hypothesis, � 1 - � 2 = 0.

So,? ú 1B762 - # 96ø$2 Fø$2 9$GH1 ú>IJ�Yÿ��!þ�K

4.

Compare with t distribution curve with 27 degrees of freedom (recall that L iscalculated as n1 + n2 - 1):

cM

HERIOT-WATT UNIVERSITY 2003

Page 24: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.6. SMALL SAMPLES 24

5.

Make a conclusion:

Since the test statistic is not in the shaded region the null hypothesis is accepted.There is no evidence of a significant difference, at the 5% level, betweenthe number of hours worked per week by Japanese and Russian people. Tosummarise, it has not been proved that Russians still work the longest hours perweek (as had been previously thought), but it has been shown that although theJapanese figures initially seemed higher, there is, in fact, no significant differencebetween the Russians and the Japanese in terms of the number of hours workedper week.

Play length

Q2:

A music producer is interested in estimating whether there is a difference in the averageplay length of Country and Pop CD singles. Random samples are taken from eachcategory and the results are shown here.

cN

HERIOT-WATT UNIVERSITY 2003

Page 25: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.6. SMALL SAMPLES 25

Country (duration in minutes) Pop (duration in minutes)

3.80

3.30

3.43

3.30

3.03

4.18

3.18

3.83

3.22

3.38

3.88

4.13

4.11

3.98

3.98

3.93

3.92

3.98

4.67

Assuming that the duration times of both types of music come from Normal distributions,carry out a hypothesis test to investigate for a significant difference in duration.

5.6.4 Paired t test

In the last section, hypotheses were tested about the difference in two population meanswhen the samples were independent. A method is presented here to analyse situationswhen this is not the case, for example, if some quantity was measured before and aftera specific treatment, clearly one set of results would depend on the other.

The test statistic in situations like this is given by a much simpler formula than that of4.6.3, namely,OQP R3STUWVX Y , with Z = n-1

Note that:

n = number of pairs (by default, both sample sizes must be the same).

d = sample difference in pairs

D = mean population difference

Sd = standard deviation of population difference[ Pmean sample difference.

Example

Five keyboard operators were asked to perform the same task on two types of machine.Test if there is any significant difference in the time taken to do the task. Test at the 5%level. Times are in minutes.

c\

HERIOT-WATT UNIVERSITY 2003

Page 26: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.6. SMALL SAMPLES 26

Operator Machine A Machine B1 9.6 7.22 8.4 7.13 7.7 6.84 10.1 9.25 8.3 7.1

The null hypothesis assumes there is no difference in the population means, so D = 0.Therefore:

H0: D = 0

H1: D ]^ 0

Now calculate the differences, d.

These are 2.4, 1.3, 0.9, 0.9 and 1.2 (Note that they are all positive here, but it would beperfectly feasible to have both negative and positive results).

Now the mean and standard deviation of d are calculated in the usual way._ ^�`+a�bdcfe,g0hi^kj a�l `�mThe test statistic is given byn ^ h3opqWrs t ^vu$w xHy o{z|~} �~�(�s � ^�cfa��dc�jThe t distribution curve with � = 5 - 1= 4 and a significance level of 0.05 (0.025 eachside) is given below.

Since the test statistic is in the shaded region the null hypothesis is rejected and the

c�

HERIOT-WATT UNIVERSITY 2003

Page 27: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.7. THE CHI-SQUARED DISTRIBUTION 27

alternative accepted. There is evidence at the 5% level of a difference in times takento perform the task on both machines. Note that if it was desired to prove that machineB takes longer to do a task, a one-tailed test can be employed in the usual way. Thehypotheses would then become:

H0: D � 0

H1: D � 0

5.7 The Chi-Squared Distribution

So far the distributions discussed in the examples have all had graphs with a very similarshape. Apart from having different points where they cut the axes, both the standardisedNormal and the t distributions have bell-shaped, symmetric curves as shown below.

However, do not be misled into thinking that every statistical distribution looks likethis. The first non-parametric test now considered deals with comparison with the chi-squared distribution which has a graph shaped like the one below. (Chi is pronounced"kye" and is the Greek letter � ).

The chi-square distribution, like the t distribution, depends on the degrees of freedomand so is actually a family of curves. Some examples are given below.

c�

HERIOT-WATT UNIVERSITY 2003

Page 28: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.7. THE CHI-SQUARED DISTRIBUTION 28

Note that the curve is NOT symmetrical.

There are two main uses of the chi-squared distribution. The first is to test whetherthere is a significant association between two variables (like hair colour and a person’ssex) and the second is what is called a "goodness of fit" test - a check as to whetherobserved data follows a particular expected distribution.

5.7.1 Checking for Association - Hair and Eye Colour

The contingency table below was obtained from an experiment designed to examinewhether there is a relationship between hair and eye colour in humans.

Blond Brown Black Red TotalBlue 60 40 60 40 200Grey 20 50 20 10 100Hazel 10 50 10 30 100Brown 10 160 10 20 200Total 100 300 100 100 600

The first thing to do when analysing problems of this type is none other than the oldfamiliar process of setting up hypotheses. In testing for association there is only onepossibility for what they should be so there is no need to worry about whether it is a oneor two tailed test that is required. The general form of the hypothesis test is:

H0: The two criteria of classification are independent

H1: The two criteria of classification are not independent

In this particular case, then, the hypotheses will be

H0: There is no relationship between hair and eye colour

H1: There is a relationship between hair and eye colour

There is no concept of standard error in non-parametric tests, but it is still necessaryto calculate a test statistic. In examples checking for association, this test statistic willfollow the chi-squared distribution with an appropriate number of degrees of freedom.In order to calculate its value, the contingency table has to be redrawn with expected

c�

HERIOT-WATT UNIVERSITY 2003

Page 29: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.7. THE CHI-SQUARED DISTRIBUTION 29

values in each cell.

These expected values are calculated by assuming that both of the classifications areindependent and therefore probabilities can be multiplied using the equation

p(A and B) = p(A) � p(B)

There are 16 numbers to be calculated here, so only two will be carried out in full.

� Blue Eyes/blonde Hair

p(blue) =200600

=13

p(blond) =100600

=16

p(blue and blond) =13� 1

6=

118

Out of 600 people, then, it would be expected that 1/18 of them would have blueeyes and blonde hair. This calculates as 33.3 to one decimal place (expectedvalues are often not whole numbers and should be given to an appropriate degreeof accuracy in problems).� Grey Eyes/Brown Hair

p(grey) =100600

=16

p(brown) =300600

=12

p(grey and brown) =16� 1

2=

112

Out of 600 people, then, it would be expected that 1/12 of them would have greyeyes and brown hair.This calculates as 50.

The contingency table can be redrawn now to show expected values.

Blond Brown Black Red TotalBlue 33.3 100 33.3 33.4 200Grey 16.7 50 16.7 16.6 100Hazel 16.7 50 16.7 16.6 100Brown 33.3 160 33.3 33.4 200Total 100 300 100 100 600

Notice that the results in the "red" column were all rounded so that the "total" columnwas the same for the both expected results and for the original observed values. Thisshould always be done (and, in fact, saves some calculations of probabilities, since allthat is then required at that last stage is a subtraction).

Now the test statistic needs to be defined as it is this that follows the chi-squareddistribution. The easiest way to do this is to let O represent each original "observed"value and E represent each "expected" value in turn and then calculate:

Test statistic = � (O - E)2

E

c�

HERIOT-WATT UNIVERSITY 2003

Page 30: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.7. THE CHI-SQUARED DISTRIBUTION 30

This is often referred to as ����H�4�The degrees of freedom, � , for contingency table problems is calculated by (number ofrows -1) � (number of columns -1). Note that the "total" row and column is not counted.

So in this case, � = (4 - 1) � (4 - 1) = 9

Also in this problem then, the test statistic is given by

� ��~�4�=� (60 - 33.3)2

33.3 + (40 - 100)2

100 + ��������� + (20 - 33.3)2

33.3 = 180.06

Just like there are statistical tables for the standardised Normal and t distributions, sothere are tables for the chi-squared distribution. Part of a set of tables is shown here.Since the curve is not symmetrical, separate "cut-off" points need to be given for the leftand right hand sides.

� = .99 .975 .95 .90 .50 .20 .10� =1 .03157 .03982 .00393 .0158 .455 1.642 2.7062 .0201 .0506 .103 .211 1.386 3.219 4.6053 .115 .216 .352 .584 2.366 4.642 6.2514 .297 .484 .711 1.064 3.357 5.989 7.7795 .554 .831 1.145 1.610 4.351 7.289 9.2366 .872 1.237 1.635 2.204 5.348 8.558 10.6457 1.239 1.690 2.167 2.833 6.346 9.803 12.0178 1.646 2.180 2.733 3.490 7.344 11.030 13.3629 2.088 2.700 3.325 4.168 8.343 12.242 14.684

10 2.558 3.247 3.940 4.865 9.342 13.442 15.987

� = .05 .025 .02 .01 .005 .001� =1 3.841 5.024 5.412 6.635 7.879 10.8272 5.991 7.378 7.824 9.210 10.597 13.8153 7.815 9.348 9.837 11.345 12.838 16.2684 9.488 11.143 11.668 13.277 14.860 18.4655 11.070 12.832 13.388 15.086 16.750 20.5176 12.592 14.449 15.033 16.812 18.548 22.4577 14.067 16.013 16.622 18.475 20.278 24.3228 15.507 17.535 18.168 20.090 21.955 26.1259 16.919 19.023 19.679 21.666 23.589 27.877

10 18.307 20.483 21.161 23.209 25.188 29.588

These Tables are taken from Murdoch + Barns Statistical Tables

The tables reveal that for 9 degrees of freedom, the "cut-off" points for 5%, 1% and 0.1%are 16.919, 21.666 and 27.877 respectively.

A diagram is now shown with the area shaded appropriate to a significance level of0.001.

c�

HERIOT-WATT UNIVERSITY 2003

Page 31: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.7. THE CHI-SQUARED DISTRIBUTION 31

Since the test statistic is in the shaded region, the null hypothesis is rejected. There isevidence at the 0.1% level, therefore, that there is an association between hair and eyecolour. In other words there is a very highly significant relationship between hair andeye colour.

5.7.2 Limitations of Chi-squared test

Chi squared is a mathematical distribution and has been used so far without any proofgiven as to why it is useful in measuring whether there is an association betweencriteria. The mathematical details are not required in this course so are not providedhere (although at the end of this topic the distribution will be re-visited and put in adifferent context which may shed some light on how it comes about). However, accountmust be taken of some limitations so that it can be used validly for statistical tests.

The first problem occurs if there is only one degree of freedom. This happens more oftenthan you might think, since if the contingency table only has 2 rows and 2 columns, thedegrees of freedom will be (2 - 1) � (2 - 1) = 1. In cases like this, a Yates’ continuitycorrection must be made. This also occurs in other areas in probability where discretedistributions are being approximated by continuous ones. Basically what happens isthat 0.5 is subtracted from each calculated value of "O - E", ignoring the sign (plus orminus). In other words, an "O - E" value of + 5 becomes + 4.5, and an "O - E" value of-5 becomes -4.5. That number is then squared and divided by E. In terms of a formula,the test statistic is now given by:

�¡ ¢H£4¤=¥�¦ ( §O - E § - 0.5)2

E

The second limitation in the use of the chi-squared distribution, again to satisfy theunderlying mathematical assumptions, the expected values should be relatively large.The following simple rules are applied:

1. No expected category should be less than 1 (it does not matter what the observedvalues are)

2. AND no more than one-fifth of expected categories should be less than 5.

HERIOT-WATT UNIVERSITY 2003

Page 32: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.7. THE CHI-SQUARED DISTRIBUTION 32

If data do not meet these criteria then either larger samples have to be taken, or the datafor the smaller "expected" categories can be combined until their combined expectedvalue is 5 or more. This should only be done, however, if combinations are sensible

Example The example from Topic 4, where differences between two sampleproportions were considered, will now be re-worked using a chi-squared test instead ofthe method used previously of calculating a z value and comparing it with standardisedNormal distribution.

The problem examined church attendance in two countries, Scotland and England, andasked if there was a significant difference between the church visiting patterns of theScots and the English. These were the results:

Scotland England TotalAttend regularly 47 31 78

Do not attendregularly 136 106 242

Total 183 137 320

The hypothess test is given as:

H0: There is no relationship between church attendance and Country

H1: There is a relationship between church attendance and Country

A table of expected values is now calculated in the same way as before assuming thenull hypothesis to be true. These expected values are listed below. (This can be donevery quickly by noticing that, in fact, only one probability calculation is required - theothers are obtained by subtractions).

Scotland England TotalAttend regularly 44.6 33.4 78

Do not attendregularly 138.4 103.6 242

Total 183 137 320

© 2ª~«4¬=­¯®±°B²´³�µ·¶¸²dµ 0.5 ¹ 2¶­ °B² 47 - 44.6 ²dµ 0.5 ¹ 2

44.6 º °B² 31 - 33.4 ²5µ 0.5 ¹ 233.4 º °B² 136 - 138.4 ²5µ 0.5 ¹ 2

138.4 º °B² 106 - 103.6 ²dµ 0.5 ¹ 2103.6­ 0.081 + 0.108 + 0.026 + 0.035­ 0.25

Now compare with a chi-squared curve with one degree of freedom. The "cut-off" pointfor 5% is 3.841.

HERIOT-WATT UNIVERSITY 2003

Page 33: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.7. THE CHI-SQUARED DISTRIBUTION 33

Since the test statistic is not in the shaded region, the null hypothesis is accepted. Thereis no evidence, at the 5% level, of a relationship between people who attend churchregularly and whether they live in Scotland or England. This supports the conclusionreached in the previous chapter.

Note: No worked examples have been given in this section which show the limitations ofthe test when small expected values are calculated, but the reader should be aware ofthese limitations and address them appropriately if they are encountered in calculations.

Charter airlines

A consumer association has done some research on customer views on the reliability ofcharter airlines. The results are tabulated below:

Airline Good Average PoorHigh Life 50 40 30

Sky Coaxing 40 50 80Up and Away 35 55 50

Carry out an appropriate hypothesis test to determine if there is an association betweenthe airline and reliability.

5.7.3 Goodness of Fit Tests

The Chi-squared test can also be used in other situations where observed andexpected values are being compared. The test statistic will again be:

¼¡½¾H¿4À ÁñÄ4ÅkÆ·ÇÉÈ ½ÇThe degrees of freedom, Ê , will depend on the particular problem, but in general

Ê = (number of classes) - (number of parameters estimated) - 1

HERIOT-WATT UNIVERSITY 2003

Page 34: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.7. THE CHI-SQUARED DISTRIBUTION 34

Examples

1. Unfair die

A gambler suspects the die being used for a game is loaded and producing unfair results.A survey of 120 throws gave the following results:

Throw 1 2 3 4 5 6Frequency 17 16 19 23 22 23

These are clearly the observed values. The estimated values are quite simply 20 foreach throw if the die is fair.

The hypothesis test takes the form:

H0: The expected distribution is true (in this case "the die is fair")

H1: The expected distribution is false (in this case "the die is loaded")

The test will be carried out using a significance level of 0.01.

The following table shows how the calculations are carried out.

O E (O - E) (O - E)2 Ì�ÍÏÎ{ÐÒÑ 2E

17 20 3 9 0.4516 20 4 16 0.8019 20 1 1 0.0523 20 3 9 0.4522 20 2 4 0.2023 20 3 9 0.45

Total: 2.40

No parameters have been estimated in this problem so, Ó = 6 - 1 = 5

Tables give a Ô 2 value of 15.086 (1% level)

The diagram is as follows:

HERIOT-WATT UNIVERSITY 2003

Page 35: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.7. THE CHI-SQUARED DISTRIBUTION 35

Since the test statistic is not in the shaded region the null hypothesis must be accepted.There is no evidence that the die is loaded.

2. Company feelings

In this example a test will be carried out to verify whether a particular distribution isNormally distributed.

An attitude survey is taken by employees to see how they feel about their company.Answers from a questionnaire could potentially produce scores from 0 to 50 and theactual results are shown below:

Class intervals Frequency (f)

10 - under 15 1115 - under 20 1420 - under 25 2425 - under 30 2830 - under 35 1335 - under 40 10Öf = 100

Test at the 5% level whether this is a Normal distribution.

Solution

There are two parameters to be estimated here, the mean and standard deviation. Usingthe usual formulae (and approximating each class interval by its mid-point) these arecalculated as

×رÙÛÚ 12.5 Ü 11 Ý + Ú 17.5 Ü 14 Ý + Ú 22.5 Ü 26 Ý + Ú 27.5 Ü 26 Ý + Ú 32.5 Ü 13 Ý + Ú 37.5 Ü 10 Ý100Ù 24.9

Similarly, the standard deviation, s, is calculated as 7.194.

Now various probabilities using the Normal curve and statistical tables must becalculated. As an example, consider 30 - under 35. The Normal curve has theappearance:

HERIOT-WATT UNIVERSITY 2003

Page 36: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.7. THE CHI-SQUARED DISTRIBUTION 36

The area to the right of 35 is given by a z value of 35 - 24.97.194 = 1.40

Tables give the area to the right of 1.40 as 0.0808

The area to the right of 30 is given by a z value of 30 - 24.97.194 = 0.71

Tables give the area to the right of 0.71 as 0.2389

This means the probability of obtaining a score between 30 and 35 is 0.2389 - 0.0808 =0.1581.

Multiplying this by 100 gives the expected number in this category, namely 15.81.

The other expected values are calculated in the same way and the results are as follows:

Class intervals Frequency (f)ß 10 1.9210 - under 15 6.4615 - under 20 16.4520 - under 25 25.5725 - under 30 25.7130 - under 35 15.8135 - under 40 6.29à 40 1.79

Notice that these add to 100 (that is why the extra categories at the start and end hadto be added).

Now though, since these "extra" categories give values less than 5 (one of the limitationsof the chi-squared test) it makes sense to combine them with the adjacent categories.The revised table is as follows:

Class intervals Frequency (f)

10 - under 15 8.3815 - under 20 16.4520 - under 25 25.5725 - under 30 25.7130 - under 35 15.8135 - under 40 8.08

Compare this now with the observed values.

All that remains now is to set up the hypotheses and calculate the test statistic (noticethat the most time-consuming part of this problem was the mundane calculations!).

H0: Data follow a normal distribution with mean 24.9 and standard deviation 7.194

H1: Data do not follow a normal distribution with mean 24.9 and standard deviation7.194

á 2âHã4ä=åÃæ±ç4èké·êÉë~ìê åíç 11 - 8.38 ë 28.38

+ ç 14 - 16.45 ë 216.45 î�ï�ï�ï�ïdî ç 10 - 8.08 ë 2

8.08= 2.44

The degrees of freedom, ð , is given by ð = (number of classes) - (number of parametersestimated) - 1

HERIOT-WATT UNIVERSITY 2003

Page 37: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.8. COURSEWORK 1 37

so ò = 6 - 2 (mean and standard deviation) - 1 = 3

The critical value from chi-squared tables is 7.815 as shown on the graph below.

Since the test statistic is not in the shaded region, the null hypothesis cannot berejected. There is evidence at the 5% level that suggests that these data follow a normaldistribution with mean 24.9 and standard deviation 7.194

5.8 Coursework 1

This is the first of two coursework exercises the second is at the end of Topic 10

This work should be submitted to your tutor at a date to be notified. For the exerciseit is expected that you will have access to an appropriate computer package (such asMicrosoft Excel or Minitab) in order to help analyse the data. You are not required toperform calculations manually.

Task 1

An insurance company wishes to investigate if there is a difference between the claimsreceived by their Aberdeen and Dumfries offices. One week of the year is randomlyselected and all the claims to each office during that week are recorded. The results aregiven in Table 5.1 and Table 5.2

HERIOT-WATT UNIVERSITY 2003

Page 38: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.8. COURSEWORK 1 38

Table 5.1:

Aberdeen Claims339 268 292 280 171 220 283 278 253 256297 259 412 134 219 268 363 328 160 349392 222 349 292 323 272 105 1381 371 318345 332 223 119 307 408 241 334 364 198342 353 186 1293 292 285 456 277 245 398335 342 205 374 403 349 59 400 382 196335 160 350 378 281 135 247 173 476 191201 447 267 320 246 221 344 198 351 224284 191 197 270 270

Table 5.2:

Dumfries Claims193 128 174 319 560 451 247 310 190 301164 445 265 189 1303 420 458 171 420 361486 400 275 422 255 51 385 1249 208 344331 445 257 313 370 60 137 383 188 275319 372 355 307 333 201 273 206 290 394208 51 230 224 408 300 343 299 418 363506 374 79 168 234 334 413 365 224 231371 256 325

a) Summarise each data set.

Obtain mean, median, standard deviation and interquartile range.

Produce relevant graphs that will show any patterns in the data.

b) Use a hypothesis test to investigate if there is a significant difference between theclaims received by the two offices.

Set out your hypotheses clearly and show all your working.

c) Produce 95% confidence intervals that will give an estimation of the averageamount of all the claims received by each office.

Task 2

A warehouse ships out 500 cartons of strawberries one day, each of which contains 20strawberries. It is desired to analyse the distribution of rotten strawberries. Tests arecarried out and the results are shown in the following frequency table Table 5.3

HERIOT-WATT UNIVERSITY 2003

Page 39: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

5.9. SUMMARY AND ASSESSMENT 39

Table 5.3:

Number ofrottenstrawberries

0 1 2 3 4 5 6 7 8 9 10 11 12morethan12

ObservedCounts

3 10 34 63 100 100 82 56 34 13 0 2 2 1

Perform an appropriate test, showing all the details, to check whether or not thedistribution of rotten strawberries can be represented by a Binomial distribution withn = 20. (You will need to calculate p).

5.9 Summary and assessment

At this stage you should be able to:

õ identify situations in experimentation where a hypothesis test will produce a usefulresult

õ appreciate the ideas of null and alternative hypotheses

õ use the standardised Normal distribution in hypothesis tests involving largesamples

õ use the student’s t distribution in hypothesis tests involving small samples

õ explain Type 1 and Type 2 Errors

õ use the formulae for standard error and test statistic in the cases of

a) single mean - large samples

b) single proportion - large samples

c) difference between two means - large samples

d) difference between proportions - large samples

e) single mean - small samples

f) difference between two means - large samples

õ decide when to use a One or Two Tailed Test

õ appreciate the concept of degrees of freedom

õ calculate confidence interval for population mean based on a sample mean froma small sample

õ use a paired t test

HERIOT-WATT UNIVERSITY 2003

Page 40: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

ANSWERS: TOPIC 5 40

Answers to questions and activities

5 Hypothesis Tests

Hypothesis testing (page 9)

Q1:

Using a systematic sampling technique of taking every fifth number, a sample isobtained.

2.59 2.16 1.512.59 1.60 1.911.67 1.90 1.442.12 2.04 2.591.86 1.77 2.582.99 2.02 1.881.79 2.19 2.571.78 1.49 2.192.08 1.94 2.042.49 2.29 2.04Now using the statistical functions on a calculator, or using Excel, Minitab or anotherstatistical package, the mean and standard deviation for the sample is found.÷±ø�ù�ú�û;üdûýþøkû ú�ÿ����From the sample results it looks as if the time taken may be more than 2 minutes so setup a Hypothesis test in the form:

H0: ��� 2.000

H1: ��� 2.000

The sample standard deviation can be used as an estimate for the population valuesince this is a large sample. Now the standard error can be calculated as � ú úfø �� ø��� �������� ����� ø�û ú�û;üdû��The test statistic is � ø ��� �! � "#� ø%$ � �&��� � $ � �������� �&����' økû ú)(�(�(Comparing with the standardised Normal distribution curve, and using a significancelevel of 5%, the test statistic is in the shaded region.

c*

HERIOT-WATT UNIVERSITY 2003

Page 41: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

ANSWERS: TOPIC 5 41

Therefore the null hypothesis is accepted. The claim that the questionnaire takes 2minutes to complete is valid using a significance level of 5%.

Note: Your numbers will be different if you took a different sample

Play length (page 23)

Q2:

1.Set up the hypotheses:H0: + 1 - + 2 = 0H1: + 1 - + 2 ,- 0(Subscripts 1 refer to Country and 2 Pop).

2.Calculate sample means and sample standard deviations.0/ -21436587�9 .;: -<5�3=>7�5?�/ -@=A3)1�9>B ?C: -@=A3)D�58DE#/ -GFH= E0: -2I

3.Calculate the Standard Error:J 3KL34-%M?4N /O8PRQ /O�S with M? - N T O P�U /WVYX S PWZ T O S�U /WVYX SSO8P Z O�S U :so M? - N T O P�U /WVYX S P Z T O S�U /WVYX SSO P Z O S[U : - N /]\C^4\�_ `�a&b S Zdc ^4\�_ :�e&: S/]\ Zdc U : -@=A3)1�D�7Therefore,

J 3KL34-fM? N /O P Q /O S -@=A3)1�D�7 N //]\ Q /c -g=A3hFi9�=cj

HERIOT-WATT UNIVERSITY 2003

Page 42: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

ANSWERS: TOPIC 5 42

4.Use the standard error in the test statistic:In the case of difference of two means for small samples this is given by -kmlon p�q�r pCs�turvnxw>q�r w�s�ty>z {#zAnd by the null hypothesis, | 1 - | 2 = 0.kml%} z ~&��� r ~�z ����~��z6�]��� l����4�)�����

5.Compare with t distribution curve with 17 degrees of freedom (recall that n iscalculated as n1+n2 -1). Use a significance level of 0.05 (0.025 each side):

6.Make a conclusion:Since the test statistic is in the shaded region, reject the null hypothesis and acceptthe alternative one. There is evidence at the 5% level that the duration times ofCountry and Pop CD singles are different.

Charter airlines (page 32)

Step1: Add totals to the table and calculate probabilities

Airline Good Average Poor TotalHigh Life 50 40 30 120

Sky Coaxing 40 50 80 170Up and Away 35 55 50 140

Total 125 145 160 430

c�

HERIOT-WATT UNIVERSITY 2003

Page 43: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

ANSWERS: TOPIC 5 43

p(H) = 120/430 = 0.279; p(S) = 0.395; p(U) = 0.326;

p(G) = 125/430 = 0.291; p(A) = 0.337; p(P) = 0.372.

Step 2: Hypothesis Test and Expected Values (EV)

H0: There is no relationship between airline and reliability

H1: There is a relationship between airline and reliability

p(H and G)=0.279 � 0.291 = 0.081

EV(H and G)=0.081 � 430 = 34.9

EV(H and A) = 0.279 � 0.337 � 430 = 40.4

By subtraction, EV( H and P) = 120 - 40.4 - 34.9 = 44.7

EV(S and G) = 0.395 � 0.291 � 430 = 49.3

EV(S and A) =0.395 � 0.337 � 430 = 57.2

EV(S and P) = 170 - 49.3 - 57.2 = 63.5

The last row can be done by subtractions to make up column totals.

Expected Values

Airline Good Average Poor TotalHigh Life 34.9 40.4 44.7 120

Sky Coaxing 49.3 57.2 63.5 170Up and Away 40.8 47.4 51.8 140

Total 125 145 160 430

Step 3: Calculate Test Statistic

Test statistic = �����#���0����The degrees of freedom, � , for contingency table problems is calculated by (number ofrows - 1) � (number of columns - 1)

So in this case, � = (3 - 1) � (3 - 1) = 4

Also in this problem then, the test statistic is given by:

�������� �¢¡ 50 - 34.9 £ 234.9

+ ¡ 40 - 40.4 £ 240.4 ¤<¥¦¥¦¥¦¥¦¥¦¥�¤ ¡ 50 - 51.8 £ 2

51.8= 20.425

Step 4: Compare with chi-squared tables.

With 9 degrees of freedom the "cut-off" point for a 5% significance test is 16.9. Sincethe test statistic is in the shaded region (see graph below) there is evidence at the 0.05level that there is a relationship between airline and reliability.

HERIOT-WATT UNIVERSITY 2003

Page 44: Hypothesis Tests - HWjphillips/DAS/topic05.pdf · So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only

ANSWERS: TOPIC 5 44

HERIOT-WATT UNIVERSITY 2003