crosstabs. when to use crosstabs as a bivariate data analysis technique for examining the...

23
Crosstabs

Upload: gyles-simon

Post on 23-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Crosstabs

Page 2: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

When to Use Crosstabs as a Bivariate Data Analysis Technique

For examining the relationship of two CATEGORIC variables

For example, do men and women (variable is GENDER) have different distributions for the variable SUPER BOWL WATCHING — often, sometimes, never?

Ask class for more examples!

Both variables must be categoric — nominal, ordinal, or dichotomous.

Page 3: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

What is a Crosstab?

A cross-tab is a table that displays a joint distribution — a distribution of cases for two variables.

It is also called a “contingency table.”

Each cell displays the observed count of the cases that fall into a specific category of the IV AND a specific category of the DV.

Page 4: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

How to Set Up a Crosstab

Decide which variable is the IV and which is the DV. (If you can’t decide, make two tables, one for each option.)

Place the IV in the columns. Each COLUMN represents a CATEGORY OF THE IV.

The DV goes in the rows. Each ROW represents a CATEGORY OF THE DV.

Page 5: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Cells and Marginals [1]

After the observed counts are placed in the cells, a new final column is created at the far right side of the table to display the row totals (or “row marginals”), adding the cell counts ACROSS.

And a new row is created at the bottom of the table to display the column totals (or “column marginals”), adding the counts down the columns.

Page 6: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Example: Super Bowl Watching by Gender

SUPER BOWL WATCHING BY GENDER

Watching Men Women Row Total

Often 20 30 50

Sometimes 15 20 35

Never 10 30 40

Column Total 45 80 125

Page 7: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Crosstabs Features

Here the IV (gender) is in the columns — two columns are for the two categories of the variable (men and women) and the last column shows the total in each row.

Three rows are for the three “Super Bowl watching” categories: often, sometimes, and never. The bottom row shows the totals in each column.

The number at the far lower right of the table shows the total number of cases.

Page 8: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Cells and Marginals [2]

The marginals — the row and column totals —are “fixed” — they must correspond to the frequency distributions of the variables that exist regardless of the relationship.

The row totals are the frequency distribution of the DV.

The column totals are the frequency distribution of the IV.

More than one distribution is possible for the cell frequencies. Do they show a relationship?

Page 9: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Next?

The next step is to compute percentages for the table, computing within categories of the IV (i.e., in the direction of the IV — GENDER, in this case).

What % of men watch often? What % of men watch sometimes? What % of men watch never? And then what are the corresponding percentages for women?

Each column’s percentages must add up to 100%.

Are these two % distributions (one for men and one for women) similar to each other or not?

Page 10: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

ONLY COMPUTE PERCENTAGES DOWN THE COLUMNS!

We could percentage the table across rows, down columns, or with each cell count as a proportion of the total — or all of these displayed in the same table.

RESIST THE TEMPTATION TO PERCENTAGE EVERYTHING! COMPUTE PERCENTAGES ONLY DOWN THE COLUMNS, IN THE DIRECTION OF THE IV.

Page 11: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

WHY?

If percentages are computed down the columns, we can see at a glance if the IV categories have similar DV distributions.

If we do it any other way, the percentages will be confusing and hard to read.

Page 12: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

What Do We Do Now?

Look carefully at the DV percentage distributions — do they look similar for the categories of the IV?

Compute chi-square to see if the differences are significant.

Compute measures of association to see if the relationship is strong.

Page 13: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Significance

The chi-square test of significance helps to confirm what we should be able to see by looking at the percentages: Do the IV categories have different proportional (%) distributions into the DV categories?

The null hypothesis is that their DV distributions are all the same.

Page 14: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Significance of the Relationship

The test of significance for a crosstab is called chi-square. (That’s a “K” sound in chi!)

The chi-square test statistic measures the discrepancy between the observed distribution and a hypothetical “null hypothesis” distribution.

The null hypothesis is that all categories of the IV have an identical distribution into the DV categories. (Men and women have identical distributions into the super bowl watching categories.)

Page 15: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

How to Calculate Chi-SquareStep 1: The Expected Counts [1]

Compute expected counts for each cell, under the hypothesis of identical proportional DV distributions for all IV categories.

In each cell, the expected count is calculated as the row total for the row the cell is in multiplied by the column total for the column the cell is in, and divided by the total sample size.

E = (row total)(column total) / N

Page 16: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Expected Counts [2]

E = CR/N

The logic is that the row total divided by the grand total (sample size) represents the proportion of cases in the column that we would expect to find in this particular cell if there were no differences among the IV categories in their DV distributions (if men and women watched the Super Bowl in the same proportions).

To get the expected count of cases in the cell, we apply that proportion — row total / N — to the column total by multiplying:

(column total)(row total) / N = expected count in the cell

Page 17: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Now Compute Chi-Square

After we have computed all of the expected counts, we compute the overall discrepancy between these expected counts and the observed counts we found in our data. That is the value of chi-square:

χ2 = ∑ [(O – E)2 / E]

In each cell, we subtract the expected count from the observed count and square the difference. We divide by the expected count for the cell. Then we repeat this calculation for all the cells of the table and sum the results.

Page 18: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Chi-Square and Degrees of Freedom

If there are big discrepancies between the expected (null hypothesis) counts and the observed counts, chi-square will be relatively large.

We have to calculate the degrees of freedom for our chi-square: df = (r – 1)(c – 1) where r is the number of row (DV) categories and c is the number of column (IV) categories.

Page 19: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

OK, Look It Up!

Once the df is calculated, we can look up our computed chi-square in the table of its distribution and see if it exceeds the critical value for the degrees of freedom.

If it does, we can say that we have found a statistically significant difference among the IV categories for their DV distributions (“men and women report significantly different levels of Super Bowl watching”).

Page 20: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Monkey Business with Chi-Square

The chi-square test statistic is very sensitive to sample size.

A large sample is quite likely to produce a significant chi-square result even when the differences in the distributions are pretty small.

Page 21: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Measures of Association

A “measure of association” is a measure of the strength of the relationship between the variables in the crosstab.

Some of them are more appropriate for ordinal data, some for nominal data.

They have different ranges and require care in interpretation.

Check the “How To” section of the text for more details.

Page 22: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Elaboration: The Third Variable

Researchers often introduce a third variable into their crosstab analysis.

Like the variables in the original bivariate analysis, the third variable in a crosstab needs to be categoric. It should not have too many categories (two or three)!

It is introduced as a “layer variable,” and software will produce a separate crosstab and test of significance of the initial bivariate analysis FOR EACH CATEGORY OF THE LAYER VARIABLE.

Page 23: Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men

Third Variable Example

For example, in our GENDER/SUPER BOWL example, we could introduce a variable called AGE as the third variable in an elaboration. It could be coded as “young” or “old” for its two categories.

We might find that the initial GENDER/SUPER BOWL relationship is different for older people and younger people. For example, among younger people, there might be very little difference between men and women in their Super Bowl watching, while there is a gender difference in viewing habits among the older respondents.