cross tabulation

45
Cross Tabulation Statistical Analysis of Categorical Variables

Upload: griffith-wyatt

Post on 30-Dec-2015

100 views

Category:

Documents


0 download

DESCRIPTION

Cross Tabulation. Statistical Analysis of Categorical Variables. To date…. We have examined statistical tests for differences of means, proportions, regression coefficients and correlation coefficients. These statistics are all measured at the interval level. New Test…. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cross Tabulation

Cross Tabulation

Statistical Analysis of Categorical Variables

Page 2: Cross Tabulation

To date….

• We have examined statistical tests for differences of means, proportions, regression coefficients and correlation coefficients.

• These statistics are all measured at the interval level.

Page 3: Cross Tabulation

New Test…

• Now we wish to examine statistical tests for questions involving nominal and ordinal variables. To do so we introduce the Chi Square Test.

Page 4: Cross Tabulation

Cross Tabulation

• We are interested in the counting the number of cases for the categories of one variable in terms of the categories of a second variable, and….

• Implicitly, we are asking if there are differences in the patterns of the counts….

Page 5: Cross Tabulation

Cross Tabulation and Chi Square Test

A cross tabulation cross classifies one variable by another variable. Below is a cross classification of occupational groups and wards for the Simon data for 1905.

Frequencies OCC$ (rows) by WARD (columns) 14 18 20 22 Total +-----------------------------------------+ profcler | 9 55 30 57 | 151 prop | 27 33 54 45 | 159 skilled | 90 16 149 114 | 369 skillpart | 13 12 40 26 | 91 unskilled | 175 12 71 46 | 304 +-----------------------------------------+ Total 314 128 344 288 1074

Page 6: Cross Tabulation

Cross Tabulation and Chi Square Test

We count the number of cases in each occupational category for each ward. At the edges of the table we total the rows and columns.

Frequencies OCC$ (rows) by WARD (columns) 14 18 20 22 Total +-----------------------------------------+ profcler | 9 55 30 57 | 151 prop | 27 33 54 45 | 159 skilled | 90 16 149 114 | 369 skillpart | 13 12 40 26 | 91 unskilled | 175 12 71 46 | 304 +-----------------------------------------+ Total 314 128 344 288 1074

Page 7: Cross Tabulation

Graphic Illustration of the Counts of Occupational Groups by Ward

Page 8: Cross Tabulation

The Simplest Example….a 2 by 2 Table

• Do the opinions of men and women differ on the War in Iraq?

• Do the opinions of men and women differ on the importance of capturing Osama Bin Laden?

• Data: September 2006 ABC News Poll on the War on Terror. A Sample of about 1000 respondents.

Page 9: Cross Tabulation

Q12 War worth fighting NET

430 42.9 43.8 43.8

552 55.0 56.2 100.0

982 97.9 100.0

1 .1

20 2.0

21 2.1

1003 100.0

Worth fighting NET

Not worth fighting NET

Total

Valid

System Missing

DK/No opinion

Total

Missing

Total

Frequency Percent Valid PercentCumulative

Percent

Basic Frequencies

Page 10: Cross Tabulation

Basic Frequencies Broken Down by Gender

Page 11: Cross Tabulation

Bars show counts

Another Graphical Illustration: 4 Bins of Counts

Page 12: Cross Tabulation

Tabular Data

Q12 War worth fighting NET * Q921 GENDER Crosstabulation

Count

213 217 430

245 307 552

458 524 982

Worth fighting NET

Not worth fighting NET

Q12 War worthfighting NET

Total

Male Female

Q921 GENDER

Total

Page 13: Cross Tabulation

Some Terms and Assumptions• Cell frequency: number in the body of the table• Marginal total: total of the row or the column• Row percent: the proportion of cases in the cell for

the particular row.• Column percent: the proportion of cases in the cell

for the particular column• Expected frequency: the number of cases expected

based upon the marginal proportions• Deviation: the difference between the expected

frequency and the actual frequency

Page 14: Cross Tabulation

Tabular Data

Q12 War worth fighting NET * Q921 GENDER Crosstabulation

Count

213 217 430

245 307 552

458 524 982

Worth fighting NET

Not worth fighting NET

Q12 War worthfighting NET

Total

Male Female

Q921 GENDER

Total

Cell Frequency

Marginals

Page 15: Cross Tabulation

Q12 War worth fighting NET * Q921 GENDER Crosstabulation

Count

213 217 430

245 307 552

458 524 982

Worth fighting NET

Not worth fighting NET

Q12 War worthfighting NET

Total

Male Female

Q921 GENDER

Total

213 245 217 307

Table Counts

and Graph

Page 16: Cross Tabulation

Q12 War worth fighting NET * Q921 GENDER Crosstabulation

% within Q12 War worth fighting NET

49.5% 50.5% 100.0%

44.4% 55.6% 100.0%

46.6% 53.4% 100.0%

Worth fighting NET

Not worth fighting NET

Q12 War worthfighting NET

Total

Male Female

Q921 GENDER

Total

Row Percents

Page 17: Cross Tabulation

Q12 War worth fighting NET * Q921 GENDER Crosstabulation

% within Q921 GENDER

46.5% 41.4% 43.8%

53.5% 58.6% 56.2%

100.0% 100.0% 100.0%

Worth fighting NET

Not worth fighting NET

Q12 War worthfighting NET

Total

Male Female

Q921 GENDER

Total

Column Percents

Page 18: Cross Tabulation

Q12 War worth fighting NET * Q921 GENDER Crosstabulation

Count

213 217 430

245 307 552

458 524 982

Worth fighting NET

Not worth fighting NET

Q12 War worthfighting NET

Total

Male Female

Q921 GENDER

Total

Q12 War worth fighting NET * Q921 GENDER Crosstabulation

% within Q12 War worth fighting NET

49.5% 50.5% 100.0%

44.4% 55.6% 100.0%

46.6% 53.4% 100.0%

Worth fighting NET

Not worth fighting NET

Q12 War worthfighting NET

Total

Male Female

Q921 GENDER

Total

Q12 War worth fighting NET * Q921 GENDER Crosstabulation

% within Q921 GENDER

46.5% 41.4% 43.8%

53.5% 58.6% 56.2%

100.0% 100.0% 100.0%

Worth fighting NET

Not worth fighting NET

Q12 War worthfighting NET

Total

Male Female

Q921 GENDER

Total

Frequencies, Row and Column Percents

Page 19: Cross Tabulation

Q12 War worth fighting NET * Q921 GENDER Crosstabulation

213 217 430

200.5 229.5 430.0

245 307 552

257.5 294.5 552.0

458 524 982

458.0 524.0 982.0

Count

Expected Count

Count

Expected Count

Count

Expected Count

Worth fighting NET

Not worth fighting NET

Q12 War worthfighting NET

Total

Male Female

Q921 GENDER

Total

New Concept: Expected Frequencies

• What would the counts in the cells be if there was no impact of gender on attitudes towards the Iraq War?

• The marginal proportions would define the cell counts.

Page 20: Cross Tabulation

Expected Frequencies

• Row Total * Column Total/ Grand Total

• Or…

• Row Proportion * Column Total

• Or…

• Column Proportion * Row Total

Page 21: Cross Tabulation

Another Example: The Importance of Capturing Osama

Bin Laden

Page 22: Cross Tabulation
Page 23: Cross Tabulation

Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the waron terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)? * Q921 GENDER Crosstabulation

Count

173 246 419

277 255 532

450 501 951

U.S. must capture/killOsama bin Laden

War can be a successwithout capturing/killingbin Laden

Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?

Total

Male Female

Q921 GENDER

Total

Frequencies by Gender

Page 24: Cross Tabulation

Frequencies By Gender

Page 25: Cross Tabulation

Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the waron terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)? * Q921 GENDER Crosstabulation

% within Q.28 Do you think (the United States has to capture or kill Osama bin Laden for thewar on terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)?

41.3% 58.7% 100.0%

52.1% 47.9% 100.0%

47.3% 52.7% 100.0%

U.S. must capture/killOsama bin Laden

War can be a successwithout capturing/killingbin Laden

Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?

Total

Male Female

Q921 GENDER

Total

Row Percents

Page 26: Cross Tabulation

Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the waron terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)? * Q921 GENDER Crosstabulation

% within Q921 GENDER

38.4% 49.1% 44.1%

61.6% 50.9% 55.9%

100.0% 100.0% 100.0%

U.S. must capture/killOsama bin Laden

War can be a successwithout capturing/killingbin Laden

Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?

Total

Male Female

Q921 GENDER

Total

Column Percents

Page 27: Cross Tabulation

Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the waron terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)? * Q921 GENDER Crosstabulation

% within Q.28 Do you think (the United States has to capture or kill Osama bin Laden for thewar on terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)?

41.3% 58.7% 100.0%

52.1% 47.9% 100.0%

47.3% 52.7% 100.0%

U.S. must capture/killOsama bin Laden

War can be a successwithout capturing/killingbin Laden

Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?

Total

Male Female

Q921 GENDER

Total

Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the waron terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)? * Q921 GENDER Crosstabulation

% within Q921 GENDER

38.4% 49.1% 44.1%

61.6% 50.9% 55.9%

100.0% 100.0% 100.0%

U.S. must capture/killOsama bin Laden

War can be a successwithout capturing/killingbin Laden

Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?

Total

Male Female

Q921 GENDER

Total

Page 28: Cross Tabulation

Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the war on terrorism to be asuccess), or do you think (the war on terrorism can be a success without Osama bin Laden being killed or

captured)? * Q921 GENDER Crosstabulation

173 246 419

198.3 220.7 419.0

277 255 532

251.7 280.3 532.0

450 501 951

450.0 501.0 951.0

Count

Expected Count

Count

Expected Count

Count

Expected Count

U.S. must capture/killOsama bin Laden

War can be a successwithout capturing/killingbin Laden

Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?

Total

Male Female

Q921 GENDER

Total

Expected Frequencies

Page 29: Cross Tabulation

Actual Frequencies, Expected Frequencies, and Deviations

(Residual)Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the war on terrorism to be asuccess), or do you think (the war on terrorism can be a success without Osama bin Laden being killed or

captured)? * Q921 GENDER Crosstabulation

173 246 419

198.3 220.7 419.0

-25.3 25.3

277 255 532

251.7 280.3 532.0

25.3 -25.3

450 501 951

450.0 501.0 951.0

Count

Expected Count

Residual

Count

Expected Count

Residual

Count

Expected Count

U.S. must capture/killOsama bin Laden

War can be a successwithout capturing/killingbin Laden

Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?

Total

Male Female

Q921 GENDER

Total

Page 31: Cross Tabulation

Examples of Chi Square Distribution

Page 32: Cross Tabulation

Degrees of Freedom for Chi Square

• Degrees of Freedom = (r-1)* (c-1)

• So, 2 by 2 table has 1 degree of freedom

• 3 by 2 table has (3-1)(2-1)= 2 degrees of freedom

Page 33: Cross Tabulation

Calculations: Catching Osama bin Laden by Gender

• 640.09/198.3 = 3.23

• 640.09/220.7 = 2.90

• 640.09/251.7 = 2.54

• 640.09/280.3 = 2.29

• Chi Square (SUM) = 10.96

Page 34: Cross Tabulation

Attitudes toward Iraq War by Gender

Q12 War worth fighting NET * Q921 GENDER Crosstabulation

213 217 430

200.5 229.5 430.0

12.5 -12.5

245 307 552

257.5 294.5 552.0

-12.5 12.5

458 524 982

458.0 524.0 982.0

Count

Expected Count

Residual

Count

Expected Count

Residual

Count

Expected Count

Worth fighting NET

Not worth fighting NET

Q12 War worthfighting NET

Total

Male Female

Q921 GENDER

Total

Page 35: Cross Tabulation

Calculations: Attitudes toward Iraq War by Gender

• 156.25/200.5 = .78

• 156.25/229.5 = .68

• 156.25/257.5 = .61

• 156.25/294.5 = .53

• Chi Square (SUM) = 2.60

• (not statistically signfication at .05 level)

Page 36: Cross Tabulation

Chi Square Test• For a larger table, calculation is the

same, but the number of terms increases. The number of terms is equal to the number of cells.

Frequencies OCC$ (rows) by WARD (columns) 14 18 20 22 Total +-----------------------------------------+ profcler | 9 55 30 57 | 151 prop | 27 33 54 45 | 159 skilled | 90 16 149 114 | 369 skillpart | 13 12 40 26 | 91 unskilled | 175 12 71 46 | 304 +-----------------------------------------+ Total 314 128 344 288 1074

Page 37: Cross Tabulation

Concentration of Occupational Groups by Ward

Page 38: Cross Tabulation

Cross Tabulation

• Are the occupational patterns different in the four wards?

• Or….are the patterns a result of chance? (null hypothesis)

• How would we decide?

Page 39: Cross Tabulation

Illustration: Frequencies and Marginals

Frequencies OCC$ (rows) by WARD (columns) 14 18 20 22 Total +-----------------------------------------+ profcler | 9 55 30 57 | 151 prop | 27 33 54 45 | 159 skilled | 90 16 149 114 | 369 (Marginals) skillpart | 13 12 40 26 | 91 unskilled | 175 12 71 46 | 304 +-----------------------------------------+ Total 314 128 344 288 1074

(Marginals)

Page 40: Cross Tabulation

Row and Column Percents Row percents OCC$ (rows) by WARD (columns) 14 18 20 22 Total N +-----------------------------------------+ profcler | 5.960 36.424 19.868 37.748 | 100.000 151 prop | 16.981 20.755 33.962 28.302 | 100.000 159 skilled | 24.390 4.336 40.379 30.894 | 100.000 369 skillpart | 14.286 13.187 43.956 28.571 | 100.000 91 unskilled | 57.566 3.947 23.355 15.132 | 100.000 304 +-----------------------------------------+ Total 29.236 11.918 32.030 26.816 100.000 N 314 128 344 288 1074 Column percents OCC$ (rows) by WARD (columns) 14 18 20 22 Total N +-----------------------------------------+ profcler | 2.866 42.969 8.721 19.792 | 14.060 151 prop | 8.599 25.781 15.698 15.625 | 14.804 159 skilled | 28.662 12.500 43.314 39.583 | 34.358 369 skillpart | 4.140 9.375 11.628 9.028 | 8.473 91 unskilled | 55.732 9.375 20.640 15.972 | 28.305 304 +-----------------------------------------+ Total 100.000 100.000 100.000 100.000 100.000 N 314 128 344 288 1074

Page 41: Cross Tabulation

Expected and Actual FrequenciesFrequencies OCC$ (rows) by WARD (columns) 14 18 20 22 Total +-----------------------------------------+ profcler | 9 55 30 57 | 151 prop | 27 33 54 45 | 159 skilled | 90 16 149 114 | 369 skillpart | 13 12 40 26 | 91 unskilled | 175 12 71 46 | 304 +-----------------------------------------+ Total 314 128 344 288 1074

Expected values OCC$ (rows) by WARD (columns) 14 18 20 22 +-----------------------------------------+ profcler | 44.15 18.00 48.36 40.49 | prop | 46.49 18.95 50.93 42.64 | skilled | 107.88 43.98 118.19 98.95 | skillpart | 26.61 10.85 29.15 24.40 | unskilled | 88.88 36.23 97.37 81.52 | +-----------------------------------------+

Page 42: Cross Tabulation

Deviates: (Observed-Expected)

OCC$ (rows) by WARD (columns)

14 18 20 22

+-----------------------------------------+

profcler | -35.147 37.004 -18.365 16.508 |

prop | -19.486 14.050 3.073 2.363 |

skilled | -17.883 -27.978 30.810 15.050 |

skillpart | -13.605 1.155 10.853 1.598 |

unskilled | 86.121 -24.231 -26.371 -35.520 |

+-----------------------------------------+

Deviates

Page 43: Cross Tabulation

Case number OCC$ WARD FREQUENCY EXPECTED RESIDUAL CHITERM

1 profcler 14.000 9.000 44.147 -35.147 27.982

2 profcler 18.000 55.000 17.996 37.004 76.087

3 profcler 20.000 30.000 48.365 -18.365 6.973

4 profcler 22.000 57.000 40.492 16.508 6.730

5 prop 14.000 27.000 46.486 -19.486 8.168

6 prop 18.000 33.000 18.950 14.050 10.418

7 prop 20.000 54.000 50.927 3.073 0.185

8 prop 22.000 45.000 42.637 2.363 0.131

9 skilled 14.000 90.000 107.883 -17.883 2.964

10 skilled 18.000 16.000 43.978 -27.978 17.799

11 skilled 20.000 149.000 118.190 30.810 8.032

12 skilled 22.000 114.000 98.950 15.050 2.289

13 skillpart 14.000 13.000 26.605 -13.605 6.957

14 skillpart 18.000 12.000 10.845 1.155 0.123

15 skillpart 20.000 40.000 29.147 10.853 4.041

16 skillpart 22.000 26.000 24.402 1.598 0.105

17 unskilled 14.000 175.000 88.879 86.121 83.449

18 unskilled 18.000 12.000 36.231 -24.231 16.205

19 unskilled 20.000 71.000 97.371 -26.371 7.142

20 unskilled 22.000 46.000 81.520 -35.520 15.477

Calculations

Page 44: Cross Tabulation

Review: Terms and Assumptions• Cell frequency: number in the body of the table• Marginal total: total of the row or the column• Row percent: the proportion of cases in the cell for

the particular row.• Column percent: the proportion of cases in the cell

for the particular column• Expected frequency: the number of cases expected

based upon the marginal proportions• Deviation: the difference between the expected

frequency and the actual frequency

Page 45: Cross Tabulation

Strength of Relationships

• Phi: Square root of (Chi Square/N)

• Cramer’s V: Square root of (Chi Square/n*min(r-1, c-1))

• Contingency Coefficient: Square root of (Chi Square/(Chi Square+n))