lecture 6 hypothesis tests applied to means i. 2 dog colors observedexpected judge 1 judge...

29
Lecture 6 Hypothesis Tests Applied to Means I

Upload: felicity-brooks

Post on 04-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

Lecture 6

Hypothesis Tests Applied to Means I

Page 2: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

2

Dog Colors

Observed Expected

Judge 1 Judge 1

Judge 2 Green Red Blue Total Judge 2 Green Red Blue Total

Green 10 1 3 14 Green 5.090909 1 3 14

Red 2 5 2 9 Red 2 1.909091 2 9

Blue 0 1 9 10 Blue 0 1 4.242424 10

Total 12 7 14 33 Total 12 7 14 33

Sum(Agree) 24

Sum(Expected) 11.24242

k 0.586351

% Agree 0.727273

Page 3: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

3

Hypothesis tests applied to the means

Recall what you learned about sampling distributions in Chapter 4:

Sampling distribution: the distribution of the values of a particular statistic, over a very large number of repeated samplings with equal sample size, each taken from the same population.

Sample statistics: describe characteristics of a sample.

Standard error: The standard deviation of a sampling distribution.

Example:Descriptive Statistics

271 51.82690 .575315 9.470882 89.698

271

READINGSTANDARDIZED SCORE

Valid N (listwise)

Statistic Statistic Std. Error Statistic Statistic

N Mean Std.Deviation

Variance

Page 4: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

4

Test statistics: describe differences or similarities between samples and allow us to make inferences about their respective populations.

*As an observed statistic’s value falls farther and farther from the center of this distribution, you would be less and less likely to believe that that sample could have come from the hypothetical distribution that this sampling distribution represents. This constitutes the conceptual framework for hypothesis testing.

Page 5: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

5

Recall the steps in the hypothesis testing process:

1. Generate a research hypothesis—a theory-based prediction.

2. State a null hypothesis(Ho)—one that, based on our theory, we believe to be incorrect. That is, pretend that the data were chosen from a population with known & uninteresting characteristics. The alternative hypothesis (HA) is the logical converse of the null hypothesis.

3. Obtain the sampling distribution of the statistic assuming that the null hypothesis is true.

4. Gather data.

5. Calculate the probability of obtaining a statistic as or more extreme than the one observed based on the sampling distribution.

6. Decide whether the observed probability is too remote to support our null hypothesis. If it is, then reject the null and support your theory.

7. Substantively interpret your results.

Page 6: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

6

Also recall that the decision can have several potential outcomes:

And that a p-value indicates the probability of obtaining the observed statistic value or more extreme assuming that the null hypothesis is true (as opposed to alpha (α), which dictates the size of the rejection region based on the researcher’s judgment).

DecisionTruth

Ho True Ho False

Reject Ho Type I error () Power (1-)

Retain Ho Correct decision (1-) Type II error ()

Page 7: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

7

Sampling Distribution of the Mean

One of the most interesting sampling distributions is the sampling distribution of the mean—the distribution of sample means created by repeatedly randomly sampling a population and creating equal-sized samples. The characteristics of this distribution are summarized in the central limit theorem:

Given a population with mean and variance , the sampling distribution of the mean (the distribution of sample means) will have a mean equal to (i.e., ), a variance ( ) equal to , and a standard deviation ( ) equal

to . The distribution will approach the normal

distribution as N, the sample size, increases.

XX 2X

2X

n

X

X

n

X 2

X

Page 8: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

8

In English…..

Suppose you have a population, and you know the mean ( ) and variance of that population ( ) (recall that we almost never know these parameters).

Now suppose that you collect a very large number of random samples from that population, each of size N, and compute the means of those samples. Now you have a distribution of sample means—the sampling distribution of the mean. Note that you’d have a slightly different sampling distribution if you selected a different N.

X2

Page 9: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

9

The mean of the sampling distribution of the mean ( ) equals the parameter that you are estimating ( ). In addition, the standard deviation of the sampling distribution of the mean ( , a.k.a. the standard error of the mean) equals the population standard deviation divided by the square root of the sample size ( ).

In addition, the sampling distribution will be approximately normally distributed when the sample size is large.

X

X

NX

X

Page 10: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

10

In R….

To demonstrate the concepts of the central limit theorem, let’s take some random draws from a normal population with different sample sizes

Page 11: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

11

Think about the following questions:

• What is the value of the mean of a sample of N = the entire population?

• What is the shape of the sampling distribution of the mean when N = the entire population?

• What is the standard deviation of the sampling distribution of the mean when N = the entire population?

Page 12: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

12

Revisiting the Z Test

Recall that the z-test is an inferential test that allows us to perform an hypothesis test in situations in which we would like to determine whether the mean of an observed sample could have come from a population with a known population mean ( ) and standard deviation ( ).

Recall that you can standardize scores via:

Also, recall the following about the sampling distribution of the mean:

• It has a mean equal to , the population mean.• It has a standard deviation (standard error of the mean) equal to

• It is normally distributed when the sample size, N, is large

X

XXz

NX

X

X X

X

Page 13: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

13

We can use this information within the hypothesis testing framework in the following way:

1. Determine which test statistic is required for your problem and data. *The z-test is relevant when you want to compare the observed mean of a quantitative variable to a hypothetical population mean (theory-based) and you know the variance of the population.

2. State your research hypothesis: that the observed mean does not come from the population described by your theory.

3. State the alternative hypothesis: that the observed mean is not equal to the hypothetical mean (i.e., or the appropriate one-tailed alternative, like ).

4. State the null hypothesis: that the observed mean equals the hypothetical mean (i.e., or the appropriate one-tailed alternative, like ).

5. Determine the critical value for your test based on your desired a level.

0 X

0 X

0X 0X

Page 14: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

14

6. Compute your observed z-test statistic.

First, identify the location and dispersion of the relevant sampling distribution of the mean. The location is dictated by the hypothetical population mean ( ). The dispersion equals the known population standard deviation divided by the square root of the sample size:

Second, turn the observed sample mean into a z-score from the sampling distribution of the mean:

7. Compare the observed z-test statistic value to your critical value and make a decision to reject or retain your null hypothesis.

8. Make a substantive interpretation of your test results.

NX

X

N

XXzor

Xz

XX

00

X

Page 15: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

15

[Example]

Suppose we want to compare the mean GRE score of graduate students at Loyola University Chicago to the GRE test-taking population.

We know the mean and standard deviation of that population—500 and 100, respectively. Suppose the mean GRE score of our school is 565, based on 300 graduate students last year.

Of course, we’d like to believe that our graduate students are more academically able than the average graduate student—our research hypothesis. That means that we’ll use a one-tailed test so that H0: , and HA: .

If we adopt α = .05, then our one-tailed critical value (the value to exceed) equals 1.65 (from the z-table).

We compute our observed z-statistic by plugging our known values into the equation:

27.1177.5

65

300100

5005650

X

Xz

0 X 0 X

Page 16: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

16

The z-test statistic (11.27) is clearly larger than the critical value (1.65). It is clear that the observed difference between the sample mean and the population mean is much larger than would be expected due to sampling error. In fact, the p-value for the observed statistic is less than .0001.

We would interpret this substantively with a paragraph something like this:

The mean GRE score for graduate students at Loyola University Chicago (565) is considerably larger than the mean for the GRE testing population (500). This difference is statistically significant (z = 11.27, p < .0001).

Page 17: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

17

Graphically, here’s what we did:

5000 X

27.11z

565X

77.5X

65.1CVz 05. 0001.p

52.509CVGRE

Page 18: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

18

One-Sample t Test

The z-test is only useful in somewhat contrived situations--we hardly ever know the value of the population standard deviation, so we can’t compute the standard error of the mean.

We need a different statistic to apply to most real-world situations. The appropriate statistical test is the one-sample t-test.

Recall the formula for a z-test.

It relies on the sampling distribution of the mean. We can create a parallel statistic using the sample variance rather than the population variance.

N

XXz

XX

00

NsX

s

Xt

XX

00

Page 19: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

19

We use a slightly different probability density function for the t-test than we do for the z-test, because we now use the sample variance as an estimate of the population variance. Specifically, we rely on the Student’s t distribution for the t-test.

The feature that differentiates the various t distributions is the degrees of freedom associated with the test statistic.

The degrees of freedom for a t-test relates to the number of data points that are free to vary when calculating the variance. Hence, the degrees of freedom for the t-test equals N – 1, and there is a separate probability distribution for each number of degrees of freedom—recall that there was a single probability distribution for the z-test. The lost degree of freedom is attributed to the fact that the variance is based on the sum of the squared deviations of observations from the mean of the distribution. Because the deviations must sum to zero, one of the data values is not free to vary—one degree of freedom is lost.

Page 20: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

20

Let’s apply the one-sample t-test to the GRE data. We’d still like to believe that our graduate students are more academically able than the average graduate student (i.e., H0: ), as stated on p.28, but in this case, we don’t know the value of the population variance.

The COE’s mean GRE score of the 565, and the standard deviation equals 75, and there are 300 students in the COE.

Our degrees of freedom equals 299 for this test (df=300-1). Looking at the column in the t table on p.682 with the “Level of Significance for One-Tailed Test” equals 0.05 and the df equals to (since 299 is much larger than the minimum value 100 in the df) , the critical value for this test at α=0.05 level is 1.645. Our observed t statistic is:

Since out test statistic (t=15.01) is larger than the critical t value (tc=1.645), our decision and interpretation is the same as it was when we knew the population variance.

01.1533.4

65

30075

5005650

NsX

tX

0 X

Page 21: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

21

SPSS Example:Go to “Analyze””Compare means””One-Samples T Test”.

H0:

One-Sample Statistics

271 51.82690 9.470882 .575315READINGSTANDARDIZED SCORE

N Mean Std. DeviationStd. Error

Mean

One-Sample Test

3.175 270 .002 1.826904 .69423 2.95958READINGSTANDARDIZED SCORE

t df Sig. (2-tailed)Mean

Difference Lower Upper

95% ConfidenceInterval of the

Difference

Test Value = 50

50reading

Page 22: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

22

Two Matched-Samples t Test

A more common comparison in research is one in which two samples are compared to determine whether there is a larger-than-sampling-error difference between the means of the groups.

Two common strategies for constructing groups in experimental research: One involves assigning individuals to groups via randomization (although not necessary). The other involves matching individuals and assigning 1 member of each matched pair to each group. Because the two matched-sample t-test (a.k.a., two dependent-samples t-test) is a less complex extension of the one sample t test, we’ll discuss it first.

Page 23: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

23

But first, recall the reasons why we might do matching and the two most common methods of matching.

We typically match cases because there are extraneous variables that are strongly related to the outcome variable and we want to make sure that observed differences between the groups on the dependent variable cannot be attributed to group differences with respect to these extraneous variables.

For example, we may want to ensure that groups are equivalent on SES. We may match samples by pairing individuals on levels of the extraneous variable (matched samples), or we may expose the same individual to multiple conditions (repeated measures).

Page 24: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

24

The groups are compared by examining the difference between the two members of each pair of individuals. The relevant statistic is the average difference score (Note that N is the number of pairs.).

Although our null, theory-based, value for the magnitude of this difference can be any value, we typically are interested in determining whether the difference is non-zero. Hence, we state our null hypothesis to be that the mean difference in the population equals zero (i.e., Ho: = 0).

Formulating the null hypothesis in this way allows us to use a variation of the one-sample t-test to make the comparison.

Ns

D

s

Dt

D

D

D

D 00

i

N

iii

N

XX

D

1

21

0021 DD

D

Page 25: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

25

As an example, consider data from a study of the reading interests of 18 pairs of college-educated husbands and wives. Each individual in the sample was interviewed and asked how many books he/she had completed in the past year. The research question is do males and females who come from similar environments engage in similar levels of reading? This implies a two-tailed null hypothesis ( ), and the corresponding alternative hypothesis ( ).

Our degrees of freedom equal 17, so our two-tailed critical value using a = .05 is 2.11. The mean and standard deviation of the differences in the sample were 1.16 and 2.88, respectively. So, our t-statistic is:

Because the observed t-statistic is not more extreme than the critical value, we retain the null hypothesis. That is, we do not have evidence that men and women read different amounts. Incidentally, the p-value for the observed t statistic equals .11.

71.16788.0

16.1

1888.2

016.10

Ns

Dt

D

D

0D0D

Page 26: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

26

SPSS Example:Go to “Analyze””Compare means””Paired-Samples T Test”.Difference: (Reading Score - Math Score)H0: 0D

Paired Samples Statistics

51.87816 270 9.450733 .575153

51.71431 270 10.083413 .613657

READINGSTANDARDIZED SCORE

MATHEMATICSSTANDARDIZED SCORE

Pair1

Mean N Std. DeviationStd. Error

Mean

Paired Samples Correlations

270 .714 .000

READINGSTANDARDIZED SCORE& MATHEMATICSSTANDARDIZED SCORE

Pair1

N Correlation Sig.

Page 27: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

27

SPSS Example (cont’):H0:

Another way to perform the same analysis:We can calculate the differences between pairs and form a new variable, I called it “diff”, using “transformcompute” to calculate this new variable.

0D

Paired Samples Test

.163848 7.406301 .450733 -.723565 1.051261 .364 269 .717

READINGSTANDARDIZED SCORE- MATHEMATICSSTANDARDIZED SCORE

Pair1

Mean Std. DeviationStd. Error

Mean Lower Upper

95% ConfidenceInterval of the

Difference

Paired Differences

t df Sig. (2-tailed)

Page 28: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

28

The outputs for this analysis.

Compare them to the previous one.

One-Sample Statistics

270 .1638 7.40630 .45073diffN Mean Std. Deviation

Std. ErrorMean

One-Sample Test

.364 269 .717 .16385 -.7236 1.0513difft df Sig. (2-tailed)

MeanDifference Lower Upper

95% ConfidenceInterval of the

Difference

Test Value = 0

Paired Samples Test

.163848 7.406301 .450733 -.723565 1.051261 .364 269 .717

READINGSTANDARDIZED SCORE- MATHEMATICSSTANDARDIZED SCORE

Pair1

Mean Std. DeviationStd. Error

Mean Lower Upper

95% ConfidenceInterval of the

Difference

Paired Differences

t df Sig. (2-tailed)

Page 29: Lecture 6 Hypothesis Tests Applied to Means I. 2 Dog Colors ObservedExpected Judge 1 Judge 2GreenRedBlueTotalJudge 2GreenRedBlueTotal Green101314Green5.0909091314

29

Try the following questions in our text.

P.2067.67.77.107.13

P.2077.167.177.18