chapter 11: applications of chi-square. count or frequency data many problems for which the data is...

30
Chapter 11: Applications of Chi-Square 0 5 10 15 20 25 2 1 df 4 df 10 df 20 df

Upload: bernadette-walker

Post on 12-Jan-2016

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Chapter 11: Applications of Chi-Square

0 5 10 15 20 252

1df

4df

10df 20df

Page 2: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Count or Frequency Data

• Many problems for which the data is categorized and the results shown by way of counts.

• Results are often displayed on a chart showing the number of observations for each possible category.

Page 3: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Is Your Die Fair?• Suppose you want to test whether or not a die is

“fair,” i.e., are the probabilities for each outcome the same?

• You toss the die 60 times and record the results. You expect to get 10 of each number, but due to random variation you probably won’t.

• The question is, are the frequencies far enough off to convince you the die is not fair?

• Suppose you get 10, 9, 11, 12, 8, 10.• Suppose you get 5, 10, 15, 14, 9, 7.• How can we evaluate whether this is likely to be

due to random chance or an unbalanced die?

Page 4: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Background:

1. Suppose there are n observations.

2. Each observation falls into a cell (or class).

3. Observed frequencies in each cell: O1, O2, O3, … , Ok.

Sum of the observed frequencies is n.

4. Expected, or theoretical, frequencies: E1, E2, E3, . . . , Ek.

Summary of notation:

nOOOO k 321

nEEEE k 321

k Categories

1st 2nd 3rd k th Total

Observed Frequency O 1 O 2 O 3 O k n

Expected Frequency E 1 E 2 E 3 E k n

Page 5: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Goal: 1. Compare the observed frequencies with the expected

frequencies.2. Decide whether the observed frequencies seem to agree or

seem to disagree with the expected frequencies.

Methodology:Use a chi-square statistic:

This statistic is a measure of variation. Note its similarity to the formula for sums of squares (variance).

Small values of 2: Observed frequencies close to expected frequencies, because the variation is small.

Large values of 2: Observed frequencies do not agree with expected frequencies, the variation is large.

cells all

22 )(*

EEO

Page 6: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Sampling Distribution of 2*:

When n is large and all expected frequencies are greater than or equal to 5, then 2* has approximately a 2 (chi-square) distribution.

Recall:

Properties of the Chi-Square Distribution:

1. 2 is nonnegative in value; it is zero or positively valued.

2. 2 is not symmetrical; it is skewed to the right.

3. 2 is distributed so as to form a family of distributions, a separate distribution for each different number of degrees of freedom.

Page 7: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Various Chi-Square Distributions:

0 5 10 15 20 252

1df

4df

10df 20df

Page 8: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Critical values for chi-square:

1. Table 8, Appendix B.

2. Identified by degrees of freedom (df) and the area under the curve to the right of the critical value.

3. 2(df, ): critical value of a chi-square distribution with df degrees of freedom and area to the right.

4. Chi-square distribution is not symmetrical: critical values associated with right and left tails are given separately.

0 ),df(2 2

Page 9: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Example: Find 2(16, 0.05).

0 )05.0,16(2 2

05.0

Portion of Table 8

Area to the right

df 0.05

16 26.3

2(16, 0.05) = 26.3

Page 10: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Example: Find 2(10, 0.99).

0 )99.0,10(2 2

Area to the right

df 0.99

10 2.56

Portion of Table 8

2(10, 0.99) = 2.56

Page 11: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Multinomial Experiment:

An experiment with the following characteristics:

1. It consists of n identical independent trials.

2. The outcome of each trial fits into exactly one of k possible cells.

3. There is a probability associated with each particular cell, and these individual probabilities remain constant during the experiment.

4. The experiment will result in a set of observed frequencies, O1, O2, . . . , Ok, where each Oi is the number of times a trial outcome falls into that particular cell.(It must be the case that O1 + O2 + + Ok = n.)

)1 that truebemust (It 21 kppp

Page 12: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Testing Procedure:

1. H0: The probabilities p1, p2, . . . , pk are correct.

Ha: Not all of the given probabilities are correct.

2. Test statistic:

3. Use a one-tailed critical region; the right-hand tail.4. Degrees of freedom: df = k 1.5. Expected frequencies:6. To ensure a good approximation to the chi-square

distribution: Each expected frequency should be at least 5

cells all

22 )(*

EEO

ii pnE

).5( iE

Page 13: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Example: A market research firm conducted a consumer-preference experiment to determine which of 5 new breakfast cereals was the most appealing to adults. A sample of 100 consumers tried each cereal and indicated the cereal he or she preferred. The results are given in the following table:

Is there any evidence to suggest the consumers had a preference for one cereal, or did they indicate each cereal was equally likely to be selected? Use = 0.05.

Cereal A B C D E Total

Frequency 25 17 15 22 21 100

Page 14: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Solution:If no preference was shown, we expect the 100 consumers to be equally distributed among the 5 cereals. Thus, if no preference is given, we expect (100)(0.2) = 20 consumers in each class.1. The null and alternative hypotheses:

H0: There was no preference shown (equally distributed).

Ha: There was a preference shown (not equally distributed).2. The type of test (distribution):

A multinomial experiment with specified probabilities.Use 2* with df = k 1 = 5 1 = 4

3. Rejection Region:

Reject if 2* > 2(4,.05)=9.49

Page 15: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

4. Calculations:

2* = 3.2

5. Conclusion: Fail to reject H0.

At the 0.05 level of significance, there is not sufficient evidence to suggest the consumers showed a preference for any one cereal.

O E O E (O E )2/E

25 20 5 1.25

17 20 -3 0.45

15 20 -5 1.25

22 20 2 0.20

21 20 1 0.05

100 100 0 3.20

Page 16: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Example: A sample of 200 individuals were tested for their blood type, and the results are used to test the hypothesized distribution of blood types:

At the 0.05 level of significance, is there any evidence to suggest the stated distribution is incorrect?

Blood Type A B O AB

Hypothesized Proportion 0.41 0.09 0.46 0.04

Observed Frequency 74 25 86 15

Page 17: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Solution:1. The null and alternative hypotheses:

H0: Blood type proportions are 0.41, 0.09, 0.46, 0.04

Ha: Blood type proportions are not 0.41, 0.09, 0.46, 0.04

2. The type of test (distribution):A multinomial experiment with specified probabilities.Use 2* with df = k 1 = 4 1 = 3

3. Rejection Region:

Reject if 2* > 2(3,.05)= 7.82

Page 18: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

4. Calculate the value of the test statistic:

2* = 10.02

5. Conclusion : Reject H0.The evidence suggests that the hypothesized proportions for blood types are incorrect.

O E O E (O E )2/E

74 82 -8 0.78

25 18 7 2.72

86 92 -6 0.39

15 8 7 6.13

200 200 0 10.02

Page 19: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Contingency Tables

• Contingency table: an arrangement of data into a two-way classification.

• Data is sorted into cells, and the observed frequency in each cell is reported.

• Contingency table involves two factors, or variables

• Usual question: are the two variables independent or dependent?

Page 20: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

r c Contingency Table:

1. r: number of rows; c: number of columns.

2. Used to test the independence of the row factor and the column factor.

3. Degrees of freedom:

4. n = grand total.

5. Expected frequency in the ith row and the jth column:

Each Ei,j should be at least 5.

6. R1, R2, . . . , Rr and C1, C2, . . . Cc: marginal totals.

)1()1(df cr

n

CRE ji

ji

totalGrandalColumn tot totalRow

,

Page 21: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Expected Frequencies for an r c Contingency Table:

Columns

Rows 1 2 jth Column c Total

1

2

i th Row

r

Total

nCR 11

nCR 21

nCR 12

n

CRi 1

n

CR j1

n

CR ji

1R

2R

iR

1C

2C jC n

Page 22: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Example: A random sample of registered voters was selected and each was asked his or her opinion on Proposal 129, a property tax reform bill. The distribution of responses is given in the table below.

Test the hypothesis “political party is independent of opinion on Proposal 129.” Use = 0.01.

Political Party

Tax Reform Democrat Republican Independent

Yes 34 11 12

No 17 12 18

Unsure 10 16 15

Page 23: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Solution:1. The null and the alternative hypotheses:

H0: Opinion on property tax reform is independent of political party.

Ha: Opinion on property tax reform is not independent of political party.

2. The type of test (distribution):A Chi-Square test of independencedf = (r 1) (c 1) = (3 1) (3 1) = 4

3. Rejection region:Reject H0 if 2* > 2

(4, 0.01) = 13.3

Page 24: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

4. Calculations using Contingency table:

Political Party

Tax Reform Democrat Republican Independent Total

Yes 34 11 12 57(23.98) (15.33) (17.69)

No 17 12 18 47(19.77) (12.64) (14.59)

Unsure 10 16 15 41(17.25) (11.03) (12.72)

Total 61 39 45 145

16.14)(

*cells all

22

E

EO

Page 25: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

5.Conclusion:

Reject H0.

There is evidence to suggest that opinion on tax reform and political party are not independent.

Page 26: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Test for Homogeneity:

1. Another type of contingency table problem.

2. Used when one of the two variables is controlled by the experimenter so that the row (or column) totals are predetermined.

3. Hypothesis test: the distribution of proportions within rows (or columns) is the same for all rows (or columns).

4. May be thought of as a comparison of several multinomial experiments.

5. Test procedure for independence and homogeneity with contingency tables is the same.

Page 27: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Example: A pharmaceutical company conducted an experiment to determine the effectiveness of three new cough suppressants. Each cough syrup was given to 100 random subjects.

Is there any evidence to suggest the syrups act differently to suppress coughs? Use = 0.05.

Cough Suppressant

A B C Total

No relief 23 29 20 72

Some relief 60 56 50 166

Total relief 17 15 30 62

Total 100 100 100 300

Page 28: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

Solution:1. The null and alternative hypotheses:

H0: The proportion of individuals who receive various forms of relief is the same for all three cough syrups.

Ha: The proportion of individuals who receive various forms of relief is not the same for all three cough syrups. (In at least one group the proportions are different from the others.)

2. Type of test (distribution):A Chi-square test of homogeneity with df = (r 1) (c 1) = (3 1) (3 1) = 4

3. Rejection Region:Reject if 2* > 2

(4, 0.05) = 9.49

Page 29: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

4. Calculations (done by Minitab):

A B C Total 1 23 29 20 72

24.00 24.00 24.00

2 60 56 50 166

55.33 55.33 55.33

3 17 15 30 62

20.67 20.67 20.67

Total 100 100 100 300

Chi-Sq = 0.042 + 1.042 + 0.667 +

0.394 + 0.008 + 0.514 +

0.651 + 1.554 + 4.215 = 9.085

DF = 4, P-Value = 0.059

Page 30: Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts

5. Conclusion: Fail to reject H0.There is no evidence to suggest the three remedies act differently to suppress coughs.