data analysis: cross-tabulation gap toolkit 5 training in basic drug abuse data management and...
TRANSCRIPT
Data analysis: cross-tabulation
GAP Toolkit 5 Training in basic drug abuse data management and analysis
Training session 11
Objectives
• To introduce cross-tabulation as a method of investigating the relationship between two categorical variables
• To describe the SPSS facilities for cross-tabulation• To discuss a range of simple statistics to describe the
relationship between two categorical variables• To reinforce the range of SPSS skills learnt to date
Bivariate analysis
• The relationship between two variables• A two-way table:
– Rows: categories of one variable– Columns: categories of the second variable
Frequency Percent Valid Percent Cumulative Percent
Valid Male 1251 79.6 79.9 79.9
Female 314 22.0 20.1 100.0
Total 1565 99.6 100.0
Missing System 6 .4
Total 1571 100.0
Gender
Frequency Percent Valid Percent Cumulative Percent
Valid Swallow 794 50.5 51.0 51.0
Smoke 634 40.4 40.7 91.7
Snort 62 3.9 4.0 95.6
Inject 30 1.9 1.9 97.6
12.00 2 .1 .1 97.7
15.00 1 .1 .1 97.8
23.00 10 .6 .6 98.4
24.00 11 .7 .7 99.1
25.00 5 .3 .3 99.4
34.00 4 .3 .3 99.7
234.00 5 .3 .3 100.0
Total 1558 99.2 100.0
Missing System 13 .8
Total 1571 100.0
Mode of ingestion Drug 1
Out-of-range values (note that none of the digits are
> 5)
Cleaning Mode1
• Save a copy of the original• Recode the out-of-range values into a new value (for
example,12, 15, 23, 24 ,25, 34, 234 into the value 8)• Set the new value as a user-defined missing value (for
example, 8 is declared a missing value and given the label “Out-of-range”).
Frequency Percent Valid Percent Cumulative Percent
Valid Swallow 794 50.5 52.2 52.2
Smoke 634 40.4 41.7 93.9
Snort 62 3.9 4.1 98.0
Inject 30 1.9 2.8 100.0
Total 1520 96.8 100.0
Missing Out-of-range 38 2.4
System 13 .8
Total 51 3.2
Total 1571 100.0
Mode of ingestion Drug 1
Gender
Male Female Total
Swallow 600 194 794
Smoke 553 77 630
Snort 44 17 61
Inject 20 10 30
Total 1271 298 1515
Mode of ingestion Drug1
Row totals
Joint frequencies
Grand total
Count
Mode of ingestion Drug1 * Gender cross-tabulation
Column totals
Percentages
• The difference in sample size for men and women makes comparison of raw numbers difficult
• Percentages facilitate comparison by standardizing the scale
• There are three options for the denominator of the percentage:– Grand total– Row total– Column total
Gender
Male Female Total
Swallow Count 600 194 794
% of Total 39.6% 12.8% 52.4%
Smoke Count 553 77 630
% of Total 36.5% 5.1% 41.6%
Snort Count 44 17 61
% of Total 2.9% 1.1% 4.0%
Inject Count 20 10 30
% of Total 1.3% .7% 2.0%
Total Count 1271 298 1515
% of Total 80.3% 19.7% 100.0%
Mode of ingestion Drug1
Marginal distribution
Mode1
Joint distribution Mode1 & Gender
Mode of ingestion Drug1 * Gender cross-tabulation
Marginal distributionGender
Mode of ingestion Drug1 * Gender cross-tabulation
Gender
Male Female Total
Swallow Count 600 194 794
% within Mode of ingestion Drug1
75.6% 24.4% 100.0%
Smoke Count 553 77 630
% within Mode of ingestion Drug1
87.8% 12.2% 100.0%
Snort Count 44 17 61
% within Mode of ingestion Drug1
72.1% 27.9% 100.0%
Inject Count 20 10 30
% within Mode of ingestion Drug1
66.7% 33.3% 100.0%
Total Count 1271 298 1515
% within Mode of ingestion Drug1
80.3% 19.7% 100.0%
The distribution of Gender conditional on Mode1
Mode of ingestion Drug1
Mode of ingestion Drug1 * Gender cross-tabulation
Gender
Male Female Total
Swallow Count 600 194 794
% within Gender 49.3% 65.1% 52.4%
Smoke Count 553 77 630
% within Gender 45.4% 25.8% 41.6%
Snort Count 44 17 61
% within Gender 3.6% 5.7% 4.0%
Inject Count 20 10 30
% within Gender 1.6% 3.4% 2.0%
Total Count 1271 298 1515
% within Gender 100.0% 100.0% 100.0%
Mode of ingestion Drug1
The distribution of Mode1 conditional on Gender
Choosing percentages
• “Construct the proportions so that they sum to one within the categories of the explanatory variable.”
Source: (C. Marsh, Exploring Data: An Introduction to Data Analysis for Social Scientists (Cambridge, Polity Press, 1988), p. 143.)
n=600
n=553
n=44
n=20
n=194
n=77
n=17
n=10
Dimensions
Definitions of vertical and horizontal variables
Two-by-two tables
• Tables with two rows and two columns• A range of simple descriptive statistics can be applied to
two-by-two tables• It is possible to collapse larger tables to these
dimensions
Gender * White pipe cross-tabulation
White pipe
Yes No Total
Male Count 290 961 1251
% within Gender 23.2% 76.8% 100.0%
Female Count 22 292 314
% within Gender 7.0% 93.0% 100.0%
Total Count 312 1253 1565
% within Gender 19.9% 80.1% 100.0%
Gender
White pipe
Yes No
Gender Male 0.2318 0.7682
Female 0.0701 0.9299
Relative risk
• Divide the probabilities for “success”:– For example:
P(Whitpipe=Yes|Gender=Male)=0.2318 P(Whitpipe=Yes|Gender=Female)=0.0701Relative risk is 0.2318/0.0701=3.309
• The proportion of males using white pipe was over three times greater than females
Odds
• The odds of “success” are the ratio of the probability of “success” to the probability of “failure”
• For example:- For males the odds of “success” are 0.2318/0.7682=0.302 - For females the odds of “success” are 0.0701/0.9299=0.075
Odds ratio
• Divide the odds of success for males by the odds of success for females
• For example: 0.302/0.075=4.005• The odds of taking white pipe as a male are four times
those for a female
95% Confidence interval
Value Lower Upper
Odds ratio for Gender (Male / Female)
4.005 2.547 6.299
For cohort white pipe = Yes 3.309 2.184 5.012
For cohort white pipe = No .826 .791 .862
N of valid cases 1565
Risk estimate
Relative risk of
“success”
Relative risk of
“failure”
Odds ratio M/F
Exercise 1: cross-tabulations
• Create and comment on the following cross-tabulations:– Age vs Gender– Race vs Gender– Education vs Gender– Primary drugs vs Mode of ingestion
• Suggest other cross-tabulations that would be useful
Exercise 2: cross-tabulation
• Construct a dichotomous variable for age: Up to 24 years and Above 24 years
• Construct a dichotomous variable for the primary drug of use: Alcohol and Not Alcohol
• Create a cross-tabulation of the two new variables and interpret
• Generate Relative Risks and Odds Ratios and interpret
Summary
• Cross-tabulations• Joint frequencies• Marginal frequencies• Row/Column/Total percentages• Relative risk• Odds• Odds ratios