1 faculty of social sciences induction block: maths & statistics lecture 6: sample size, spss...

33
1 Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

Upload: alondra-gabriel

Post on 01-Apr-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

1

Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6:

Sample size, SPSS and Hypothesis Testing

Dr Gwilym Pryce

Page 2: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

2

Plan

1. Summary of L5 2. Statistical Significance 3. Type 1 and Type II errors 4. Four steps of Hypothesis Testing 5. Overview of the Course

Page 3: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

3

1. Summary of L5: Social Research is usually based on samples We usually want to use our sample to say

something about the population– I.e. we want to be able to generalise

How precisely we can estimate the population mean or proportion depends on our sample size and the variation within the sample

Using the CLT, statistical inference offers a systematic way of establishing: – the range of values in which the population mean or

proportion is likely to lie (‘a confidence interval’).– Whether a hypothesis about a mean or a proportion is

likely to hold in the population.

Page 4: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

4

2. Statistical Significance “Significance” does not refer to “importance”

– but to “real differences in fact” between our observed sample mean and our assumption about the population mean

P = significance level = chances of our observed sample mean occurring given that our assumption about the population (denoted by “H0”) is true.– So if we find that this probability is small, it might

lead us to question our assumption about the population mean.

Page 5: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

5

I.e. if our sample mean is a long way from our assumed population mean then it is:– either a freak sample – or our assumption about the population mean is wrong.

If we draw the conclusion that it is our assumption that is wrong and reject H0 then we have to bear in mind that there is a chance that H0 was in fact true.

– I.e. every twenty times we reject H0 when P = 0.05, then on one of those occasions we would have rejected H0 when it was in fact true.

Page 6: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

6

Obviously, as the sample mean moves further away from our assumption (H0) about the population mean, we have stronger evidence that H0 is false.

If P is very small, say 0.001, then there is only 1 chance in a thousand of our observed sample mean occurring if H0 is true.

– This also means that if we reject H0 when P = 0.001, then there is only one in a thousand chance that we have made a mistake (I.e. that we have been guilty of a “Type I error”)

Page 7: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

7

There is a tradition (initiated by English scientist R. A. Fisher 1860-1962) of rejecting H0 if the probability of incorrectly rejecting it is 0.05. – If P 0.05 then we say that H0 can be rejected at the 5%

significance level.– If P > 0.05, then, argued Fisher, the chances of incorrectly

rejecting H0 are too high to allow us to do so.

Sig level = P = the probability of a sample mean at least as extreme as our observed value occurring, given our assumption about the population mean.

Page 8: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

8

3. Type I and Type II errors:

P = significance level = chances of incorrectly rejecting H0 when it is in fact true.– Called a “Type I error”

If we accept H0 when in fact the alternative hypothesis is true– Called a “Type II error”.

On this course we shall be concerned only with Type I errors.

Page 9: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

9

4. The four steps of hypothesis testing Last lecture we looked at confidence

intervals:– establish the range of values of the population

mean for a given level of confidence• e.g. we are 90% confident that population mean age of

HoHs in repossessed dwellings in the Great Depression lay between 32.17 and 36.83 years (s = 20).

• Based on a sample of 200 with mean = 34.5yrs.

– But what if we want to use our sample to test a specific hypothesis we may have about the population mean?

• E.g. does = 30 years? – If does = 30 years, then how likely are we to select a

sample with a mean as extreme as 34.5 years?» I.e. 4.5 years more or 4.5 years less than the pop

mean?

Page 10: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

10

Page 11: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

11

One tailed test: P = how likely we are to select a

sample with mean age at least as great as 34.5?

Page 12: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

12

Finding the value of P:

Because all sampling distributions for the mean (assuming large n) are normal, we can convert points on them to the standard normal curve– e.g. for 34.5: z = (34.5 - 30)/(20/200)

=4.5/1.4 = 3.2.

Page 13: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

13

Page 14: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

14

Page 15: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

15

Upper tailed test:

Page 16: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

16

Two tailed test:

Page 17: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

17

4 Steps to Hypothesis tests:

1. Specify null and alternative hypotheses 2. Specify threshold significance level and

appropriate test statistic formula 3. Specify decision rule (reject H0 if P < ) 4. Compute P and state conclusion.

Page 18: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

18

P values for one and two tailed tests: Upper Tail Test:

H1: > 0 then P = Prob(z > zi)

Lower Tail Test:

H1: < 0 then P = Prob(z < zi)

Two Tail Test:

H1: 0 then P = 2xProb(z > |zi|)

Page 19: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

19

C o n f i d e n c e I n t e r v a l H y p o t h e s i s T e s t sF i n d t h e 9 0 % c o n f i d e n c e i n t e r v a l o f t h ep o p u l a t i o n m e a n a g e

T e s t t h e h y p o t h e s i s t h a t t h e p o p u l a t i o n m e a n a g e = 3 0u s i n g a s i g n i f i c a n c e l e v e l o f 0 . 11 . S p e c i f y n u l l a n d a l t e r n a t i v e h y p o t h e s i s :

H 0 : = 3 0H 1 : 3 0

1 . C h o o s e t h e a p p r o p r i a t e t e s ts t a t i s t i c :

ns

xz i

i/

n

szx i

*

2 . S p e c i f y t h e l e v e l o f s i g n i f i c a n c e a n d t h e t e s ts t a t i s t i c S i g n i f i c a n c e l e v e l :

= l i k e l i h o o d o f T y p e I e r r o r t h a t y o u a r ep r e p a r e d t o t o l e r a t e

= P r o b ( R e j e c t H 0 w h e n i t i s t r u e ) = 0 . 1T e s t S t a t i s t i c :

n > 3 0 , t h e r e f o r e w e c a n u s z :

ns

xz i

c/

=

ns

xz i

c/

30

i . e . w e w r i t e t h e z c f o r m u l a a s s u m i n g t h a t H 0 i s c o r r e c t2 . E s t a b l i s h t h e v a l u e o f z * :

P r o b ( - z * < z < z * ) = 0 . 9A r e a o f t a i l s = ( 0 . 1 ) / 2 = 0 . 0 5 z * = 1 . 6 5

3 . S p e c i f y t h e d e c i s i o n r u l e :R e j e c t H 0 i f f P ( t h e c a l c u l a t e d l e v e l o f T y p e I e r r o r )i s n o g r e a t e r t h a n t h e t o l e r a t e d l e v e l :

i . e . R e j e c t H 0 i f f P ( t h e s m a l l e r i s P , t h e l e s s r i s k i n v o l v e d i n r e j e c t i n g H 0 )

3 . C a l c u l a t e t h e c o n f i d e n c e i n t e r v a l :

200

2065.15.34 = 4 5 . 5 2 . 3 3

4 . C o m p u t e P a n d s t a t e y o u r c o n c l u s i o n :z c = 3 . 1 8 ;P P r o b ( z < 3 . 1 8 )

S i n c e P < ( i . e . , i t s s a f e t o r e j e c t H 0

Page 20: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

20

Page 21: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

21

5. Overview of the Course: L1: Density Functions & CLT

L3: Introduction to Confidence Intervals

L4: Confidence Intervals for All Occasions

L5: Introduction to Hypothesis Tests

L2: Calculating z-scores

L6: Hypothesis Tests for All Occasions

L8: Regression

L7: Relationships between Categorical Variables

Quants I

24/09/2005 - v23

Page 22: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

22

Nature of the Course: This is course in applied statistics

– Applied: Not teach theoretical proofs • prove anything with maths (eg Teletubbies are evil)• What counts is understanding the concepts

– Statistics: also teach you SPSS,• But lots of different stats packages out there

– You are likely to use different ones over the course of your research career

– But statistic concepts remain unchanged

Enable you to critique other people’s work Also part of a wider research methods training

programme:– Broader remit is to teach you good practice in

research techniques• Essential to learn syntax…

Page 23: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

23

Why learn syntax?Most texts & courses avoid it!

A succinct and secure record Transparency and reproducibility Efficiency Paste and Learn Avoiding obsolescence

– SPSS point-n-click routines change with each new version of SPSS – changes once a year

– Syntax remained virtually unchanged for 15 years Accessing Extra Resources & Expanding

SPSS

Page 24: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

24

Why the macros? 4 reasons: (a) Get the statistical procedure right, then

choose the program/calculator– SPSS doesn’t know what sort of data you have– SPSS canned routine may not be the right one for

your data– You could compute the procedure by hand, &

indeed it is important to know how to do this. – but this can be long-winded in repeated

applications & easy to make mistakes– Macro commands speed the process & are a

useful way to check your calculations.

Page 25: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

25

(b) Critiquing/Analysing Published Work– SPSS routines can only be used if you have the

original data– Not much use if you want to critique or analyse

someone else’s published research• E.g. Newspaper examples in M&S tutorial• E.g.United Nations crime survey• E.g. MPPI paper by Pryce & Keoghan

– If all you can do is the point-n-click stuff in SPSS you are going to be severely hampered in what you can do.

– The Macro commands written specifically for the course only need summary info (n, xbar, sd, prop.)

• Publicly available via the downloads page of www.geebeejey.co.uk

Page 26: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

26

(c) Working with standard texts– The exercises and examples in standard

statistical texts (such as Moore and McCabe) usually only provide summary information not the original data.

– Can’t use SPSS to do these examples or to check your results

Page 27: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

27

(d) Encourages awareness & development of Macros– SPSS’s greatest strength:

• Customisability/expandability

– Actually don’t need to be good at statistics to use macros

• You can use macros to do anything:– Manipulate data,– Automate repetitive tasks– Formalise and automate complex calculations

– Writing SPSS macros is actually a good way to acquire basic programming skills

– In real-life applied research, most of your time is taken up with non-statistical manipulation of data

• Learning how to write your own macros or use other people’s will greatly increase your productivity & employability!

Page 28: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

28

SPSS macrosConfidence Intervals (CI) Hypothesis tests

Macro command

Definition Macro Command

Definition

CI_L1M Large sample CI for one mean H_L1M Large sample significance test on one mean

CI_S1M Small sample CI for one mean H_S1M Small sample significance test on one mean

CI_S2MP Small independent samples CI for difference between 2 means (pooled variance)

H_S2MP Small independent samples significance test for equality of 2 means (pooled variance)

CI_S2MD Small independent samples CI for difference between 2 means (different variances)

H_S2MD Small independent samples significance test for equality of 2 means (different variances)

CI_L1P Large sample CI for one proportion (presents output for both Traditional and Wilson methods of calculation)

H_L1P Large sample significance test on one proportion

CI_L2P Large sample CI for comparing two proportions (presents output for both Traditional and Wilson methods of calculation)

H_L2P Large samples significance test on two proportions

H_S2VF Simple small sample F-test on equality of two variances (see also Levene’s test in the SPSS help menu for more sophisticated test of homogenous variances).

N_L1M Sample size for desired margin or error for the mean

Page 29: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

29

Guide to Reading: Essential reading (recommended for

purchase):– Pryce, G. Inference and Statistics in SPSS

• Lab exercises drawn from this book.

Usually recommended a book on statistics & a book on SPSS:– E.g. Moore & McCabe (£40) -- stats– E.g. Field (£25+) -- SPSS– M&M and Field = 2 great books but 4 major

problems:

Page 30: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

30

2 great books but 4 major problems:– 1. Cost (to buy both comes to approx £65)

many students have tried to make do without one or the other & struggled.

– 2. Length• 600 pages (M&M) + 832 pages (Field)

– 3. Content: neither geared to business & soc. sci.• Field: too shallow/applied:

– Covers huge spectrum of topics (useful for Quants II)– does not cover some of the basic material we need to do

» tends to cover what can be achieved in SPSS » Does not use macros» Does not teach syntax

• M&M: too deep/theoretical– The Rolls Royce of introductory texts but does not teach SPSS– But would take 2 semesters to cover material in this depth & learn

SPSS

– 4. Integration• Leaves you the student with the task of combining the two

Page 31: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

31

Advantages of Pryce I&S:– 1. Cost

• Pryce = £22 + P&P (special price of £20 this week) – M&M + Field = £65

– 2. Length• Pryce = 200 pages + supplement with further reading

– 600 pages (M&M) + 832 pages (Field)

– 3. Content:• Pryce:

– tries to strike the right balance between theory & application– Based in SPSS– Teaches syntax– Uses the macros– Geared to business and social science – Based on worked examples & exercises

– 4. Integration• Pryce tries to integrate learning inference with learning SPSS• But macros will also allow you do do the Moore & McCabe type of

exercise should you want to get more practice

Page 32: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

32

Disadvantages of Pryce I&S: 1. First edition:

– A few glitches here & there…– But, rare edition because only a small print run

• valuable as a collectors item if you keep it for 20 years.• Glitches add value – ask a stamp collector• Even more valuable if I sign it. • Makes a great Xmas gift for friends & family.

2. Wire comb binding– But actually better for working next to PC

3. I’m biased in my recommendation!– But correct, of course.

Page 33: 1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 6: Sample size, SPSS and Hypothesis Testing Dr Gwilym Pryce

33

Feedback forms…