inferential statistics 2

25
Inferential Statistics 2 Maarten Buis January 11, 2006

Upload: lundy

Post on 12-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Inferential Statistics 2. Maarten Buis January 11, 2006. outline. Student Recap Sampling distribution Hypotheses Type I and II errors and power testing means testing correlations. Sampling distribution. PrdV example from last lecture. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Inferential Statistics 2

Inferential Statistics 2

Maarten Buis

January 11, 2006

Page 2: Inferential Statistics 2

outline

• Student Recap

• Sampling distribution

• Hypotheses

• Type I and II errors and power

• testing means

• testing correlations

Page 3: Inferential Statistics 2

Sampling distribution

• PrdV example from last lecture.

• If H0 is true, than the population consists of 16 million persons of which 41% (=6.56 million persons) supports de PrdV.

• I have drawn 100,000 random samples of 2,598 persons each and compute the average support in each sample.

Page 4: Inferential Statistics 2

Sampling distribution

• 5% or 50,000 samples have a mean of 39% or less.

• So if we reject H0 when we find a support of 39% or less than we will have a 5% chance of making an error.

• Notice: We assume that the only reason we would make an error is random sampling error.

Page 5: Inferential Statistics 2

0

2000

4000

6000

8000

1.0e+04

Fre

quen

cy

.36 .38 .4 .42 .44 .46% support for PrdV

sampling distribution of support for PrdV

Page 6: Inferential Statistics 2

More precise approach

• We want to know the score below which only 5% of the samples lie.

• Drawing lots of random samples is a rather rough approach, an alternative approach is to use the theoretical sampling distribution.

• The proportion is a mean and the sampling distribution of a mean is the normal distribution with a mean equal to the H0 and a standard deviation (called standard error) of N

Page 7: Inferential Statistics 2

More precise approach

• For a standard normal distribution we know the z-score below which 5% of the samples lie (Appendix 2, table A): -1.68

• So if we compute a z-score for the observed value (.31) and it is below -1.68 we can reject the H0, and we will do so wrongly in only 5% of the cases

• se

xz

Page 8: Inferential Statistics 2

More precise approach

• is the mean of the sampling distribution, so .41 (H0)

• se is , of a proportion is

• so the se is

• so the z-score is

• -10.4 is less than -1.68, so we reject the H0

pp 1N

0096.2598

)41.1(41.

4.100096.

41.31.

se

xz

Page 9: Inferential Statistics 2

Null Hypothesis

• A sampling distribution requires you to imagine what the population would look like if H0 is true.

• This is possible if H0 is one value (41%)

• This is impossible if H0 is a range (<41%)

• So H0 should always contain a equal sign (either = or ≤ or ≥)

Page 10: Inferential Statistics 2

Null hypothesis

• In practice the H0 is almost always 0, e.g.:

– difference between two means is 0– correlation between two variables is 0– regression coefficient is 0

• This is so common that SPSS always assumes that this is the H0.

Page 11: Inferential Statistics 2

Undirected Alternative Hypotheses

• Often we have an undirected alternative hypothesis, e.g.:– the difference between two means is not zero

(could be either positive or negative)– the correlation between two variables is not

zero (could be either positive or negative)– the regression coefficient is not zero (could be

either

Page 12: Inferential Statistics 2

Directed alternative hypothesis

• In the PrdV example we had a directed alternative hypothesis: Support for PrdV is less than 41%, since PrdV would have still participated if his support were more than 41%.

Page 13: Inferential Statistics 2

Type I and Type II errors

actual situation

decision H0 is True H0 is False

reject H0Type I error

probability = correct decision

probability = 1-(power)

do not reject H0

correct decision

probability = 1-Type II error

probability =

Page 14: Inferential Statistics 2

Type I error rate

• You choose the type I error rate ()

• It is independent of sample size, type of alternative hypothesis, or model assumptions.

Page 15: Inferential Statistics 2

Type I versus type II error rate

• a low probability of rejecting H0 when H0 is true (type I error), is obtained by:

• rejecting the H0 less often, • Which also means a higher probability of not

rejecting H0 when H0 is false (type II error),• In other words: a lower probability of finding

a significant result when you should (power).

Page 16: Inferential Statistics 2

How to increase your power:

• Lower type I error rate

• Larger sample size

• Use directed instead of undirected alternative hypothesis

• Use more assumptions in your model (non-parametric tests make less assumptions, but are also have less power)

Page 17: Inferential Statistics 2

Testing means

• What kind of hypotheses might we want to test:– Average rent of a room in Amsterdam is 300

euros– Average income of males is equal to the

average income of females

Page 18: Inferential Statistics 2

Z versus t

• In the PrdV example we knew everything about the sampling distribution with only an hypothesis about the mean.

• In the rent example we don’t: we have to estimate the standard deviation.

• This adds uncertainty, which is why we use the t distribution instead of the normal• Uncertainty declines when sample size becomes

larger.• In large samples (N>30) we can use the normal.

Page 19: Inferential Statistics 2

t-distribution

• It has a mean and standard error like the normal distribution.

• It also has a degrees of freedom, which depends on the sample size

• The larger the degrees of freedom the closer the t-distribution is to the normal distribution.

Page 20: Inferential Statistics 2

Data: rents of rooms

rent rent

room 1 175 room 11 240

room 2 180 room 12 250

room 3 185 room 13 250

room 4 190 room 14 280

room 5 200 room 15 300

room 6 210 room 16 300

room 7 210 room 17 310

room 8 210 room 18 325

room 9 230 room 19 620

room 10 240

Page 21: Inferential Statistics 2

Rent example

• H0: =300, HA: ≠ 300

• We choose to be 5%

• N = 19, so df= 18

• We reject H0 if we find a t less than -2.101 or more than 2.101 (appendix B, table 2)

• We do not reject H0 if we find a t between

-2.101 and 2.101 .

Page 22: Inferential Statistics 2

Rent example

• We use s2 as an estimate of 2

• So

• -1.85 is between -2.101 and 2.101, so we do not reject H0

nsese

xt

,

19,99,300,258 Nsx

85.11999

300258

t

Page 23: Inferential Statistics 2

Compare means in SPSS

Independent Samples Test

19,012 ,000 10,365 2250 ,000 633,95876 61,16368 514,01564 753,90189

10,370 2225,825 ,000 633,95876 61,13297 514,07516 753,84236

Equal variancesassumed

Equal variancesnot assumed

incmid householdincome in guilders

F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Group Statistics

1131 2833,2228 1530,70376 45,51556

1121 2199,2640 1366,42170 40,81144

sex sex respondent1 male

2 female

incmid householdincome in guilders

N Mean Std. DeviationStd. Error

Mean

Page 24: Inferential Statistics 2

2121

222

211

21

2121

222

211

21

21

222

211

21

21

222

211

21

222

2112

2

21

21212121

112

11

11

2

11211

211

2

11

0

21

21

21

212121

NNNNsNsN

xxt

NNNN

sNsN

NN

NNsNsN

se

NN

NNsNsN

se

NN

sNsNs

ss

NN

sse

se

xx

se

xx

se

xxt

xx

xx

pool

poolpool

poolxx

xxxxxx

Page 25: Inferential Statistics 2

Do before Monday

• Read Chapter 9 and 10

• Do the “For solving Problems”