© 2008 mcgraw-hill higher education the statistical imagination chapter 13: nominal variables: the...

27
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions

Upload: joy-park

Post on 29-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

© 2008 McGraw-Hill Higher Education

The Statistical Imagination

• Chapter 13:

Nominal Variables: The Chi-Square and Binomial Distributions

© 2008 McGraw-Hill Higher Education

The Chi-Square Test

• Chi-Square is a test for a relationship between two nominal variables

• Calculations are made using a cross-tabulation (or “crosstab”) table, which reports frequencies of joint occurrences of attributes

© 2008 McGraw-Hill Higher Education

Crosstab Tables

• Cross-tabulation or “crosstab” tables are designed to compare the frequencies of two nominal/ordinal variables at once

© 2008 McGraw-Hill Higher Education

Sample Crosstab Table

• Spent night on streets in last 2 weeks by gender among homeless persons

On streets Male Female Total

Yes 28 10 38

No 79 44 123

Total 107 54 161

© 2008 McGraw-Hill Higher Education

Reading a Crosstab Table

• The number in a cell is the frequency of joint occurrences, where a joint occurrence is the combination of categories of the two variables for a single individual

• From the cell, look up then look to the left• E.g., in the table above, the joint

occurrence of “male and on-street” is 28, the number in the sample who are both male and spent a night on the streets

© 2008 McGraw-Hill Higher Education

Reading a Crosstab Table (cont.)

• The numbers in the margins on the right side and the bottom present marginal totals, the total number of subjects in a category

• The grand total (n, the sample size) is presented in the bottom right-hand corner

© 2008 McGraw-Hill Higher Education

Crosstab Tables and the Chi-Square Test

• For the chi-square test, the categories of the independent variable (X) go in the columns of the table, and those of the dependent variable (Y), in the rows

• E.g.: Is gender a good predictor of who among homeless persons is likely to spend a night on the streets?

© 2008 McGraw-Hill Higher Education

Calculating Expected Frequencies

• In addition to the observed joint frequencies, the chi-square test involves calculating the expected frequency of each table cell

• The expected frequency of a cell is equal to the column marginal total for the cell (look down) times the row marginal total for cell (look to the right) divided by the grand total

© 2008 McGraw-Hill Higher Education

Using Expected Frequencies to Test the Hypothesis

• The expected frequencies are those that would occur if there is no relationship between the two nominal/ordinal variables

• The chi-square statistic measures the gap between expected and observed frequencies

• If there is no relationship, then the expected and observed frequencies are the same and chi-square computes to zero

© 2008 McGraw-Hill Higher Education

The Chi-Square Statistic

• The sampling distribution is generated using the chi-square equation:

χ2 = Σ[(O-E)2/ E]

where O is the observed frequency of a cell,

and E is the expected frequency• Chi-square tells us whether the summed squared

differences between the observed and expected cell frequencies are so great that they are not simply the result of sampling error

© 2008 McGraw-Hill Higher Education

When to Use the Chi-Square Statistic

1) There is one population with a representative sample from it

2) There are two variables, both of a nominal/ordinal level of measurement

3) The expected frequency of each cell in the crosstab table is at least five

© 2008 McGraw-Hill Higher Education

Features of the Chi-Square Hypothesis Test

• Step 1. The H0 states that there is no relationship between the two variables. When this is the case, chi-square calculates to a value of zero, give or take some sampling error

• This null hypothesis asserts no difference in observed and expected frequencies

© 2008 McGraw-Hill Higher Education

Features of the Chi-Square Hypothesis Test (cont.)

• Step 2. The sampling distribution is the chi-square distribution. It describes all possible outcomes of the chi-square statistic with repeated sampling when there is no relationship between X and Y

• Degrees of freedom are determined by the number of columns and rows in the crosstab table: df = (r -1) (c -1)

© 2008 McGraw-Hill Higher Education

Features of the Chi-Square Hypothesis Test (cont.)

• Step 4. The test effects are the differences between expected and observed frequencies

• The test statistic is the chi-square statistic• The p-value is obtained by comparing the

calculated chi-square value to the critical values of the chi-square distribution in Statistical Table G of Appendix B

© 2008 McGraw-Hill Higher Education

The Existence of a Relationship for the Chi-Square Test

• Existence: Test the H0 that χ2 = 0;

that is, there is no relationship between X and Y

• If the H0 is rejected, a relationship exists

© 2008 McGraw-Hill Higher Education

Direction and Strength of a Relationship for Chi-Square

• Direction: Not applicable (because the variables are nominal level)

• Strength: These measures exist but are seldom reported because they are prone to misinterpretation

© 2008 McGraw-Hill Higher Education

Nature of a Relationship for the Chi-Square Test

• Nature: Report the differences between the observed and expected cell frequencies for a couple of outstanding cells

• Calculate column percentages for selected cells

© 2008 McGraw-Hill Higher Education

Column and Row Percentages

• A column percentage is a cell’s frequency as a percentage of the column marginal total

• A row percentage is a cell’s frequency as a percentage of the row marginal total

© 2008 McGraw-Hill Higher Education

Chi-Square as a Difference of Proportions Test

• The chi-square test is frequently used to compare proportions of categories of a nominal/ordinal variable for two or more groups of a second nominal/ordinal variable

• Thus, it may be viewed as a difference of proportions test as illustrated in Figure 13-2 in the text

© 2008 McGraw-Hill Higher Education

The Binomial Distribution

• The binomial distribution test is a small single-sample proportions test. Contrast it to the large single-sample proportions test of Chapter 10

• The test hinges on mathematically expanding the binomial distribution equation, (P + Q)n

© 2008 McGraw-Hill Higher Education

When to Use the Binomial Distribution

1) There is only one nominal variable and it is dichotomous, with P = p [of success] and Q = p [of failure]

2) There is a single, representative sample from one population

3) Sample size is such that [(psmaller)(n)] < 5, where psmaller = the smaller of Pu and Qu

4) There is a target value of the variable to which we may compare the sample proportion

© 2008 McGraw-Hill Higher Education

Expansion of the Binomial Distribution Equation

• Expansion of the binomial distribution equation, (P + Q)n, provides the sampling distribution for dichotomous events. That is, the equation describes all possible sampling outcomes and the probability of each, where there are only two possible categories of a nominal variable

© 2008 McGraw-Hill Higher Education

An Example of an Expanded Binomial Equation

• The equation reveals, for example, the possible outcomes of the tossing of 4 coins

• P = p [heads] = .5; Q = p [tails] = .5; n = 4 coins

• (P + Q)4 = P4 + 4P3Q1 + 6P2Q2 + 4P1Q3 + Q4 • Add the coefficients to get the total number

of possible outcomes = 16• The probability of 3 heads and 1 tails, is the

coefficient of P3Q1 over the sum of coefficients = 4 over 16 = .25

© 2008 McGraw-Hill Higher Education

Pascal’s Triangle

• Pascal’s Triangle provides a shortcut method for expanding the binomial equation

• It provides the coefficients for small samples and allows a quick computation of the probabilities of all possible outcomes when P and Q are equal to .5

• See Table 13-7 in the text

© 2008 McGraw-Hill Higher Education

Features of the Binomial Distribution Test

• Step 1. H0: Pu = a target value

• Step 2. The sampling distribution is an expanded binomial equation for the given sample size

© 2008 McGraw-Hill Higher Education

Features of the Binomial Distribution Test (cont.)

• Step 4. The effect is the observed combination of successes and failures, which corresponds to a term in the equation (e.g., 3 heads and 1 tails, is represented by the term 4P3Q1)

• The test statistic is the expanded binomial equation

• The p-value is taken directly from the equation (not from a statistical table)

© 2008 McGraw-Hill Higher Education

Statistical Follies: Statistical Power and Sample Size

• For a given level of significance, statistical power is a test statistic’s probability of not incurring a Type II error (i.e., unknowingly making the incorrect decision of failing to reject a false null hypothesis)

• Low statistical power can result from having too small a sample size