data analysis: cross-tabulation gap toolkit 5 training in basic drug abuse data management and...

30
Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Upload: rosamond-lawrence

Post on 18-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Data analysis: cross-tabulation

GAP Toolkit 5 Training in basic drug abuse data management and analysis

Training session 11

Page 2: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Objectives

• To introduce cross-tabulation as a method of investigating the relationship between two categorical variables

• To describe the SPSS facilities for cross-tabulation• To discuss a range of simple statistics to describe the

relationship between two categorical variables• To reinforce the range of SPSS skills learnt to date

Page 3: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Bivariate analysis

• The relationship between two variables• A two-way table:

– Rows: categories of one variable– Columns: categories of the second variable

Page 4: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Frequency Percent Valid Percent Cumulative Percent

Valid Male 1251 79.6 79.9 79.9

Female 314 22.0 20.1 100.0

Total 1565 99.6 100.0

Missing System 6 .4

Total 1571 100.0

Gender

Page 5: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Frequency Percent Valid Percent Cumulative Percent

Valid Swallow 794 50.5 51.0 51.0

Smoke 634 40.4 40.7 91.7

Snort 62 3.9 4.0 95.6

Inject 30 1.9 1.9 97.6

12.00 2 .1 .1 97.7

15.00 1 .1 .1 97.8

23.00 10 .6 .6 98.4

24.00 11 .7 .7 99.1

25.00 5 .3 .3 99.4

34.00 4 .3 .3 99.7

234.00 5 .3 .3 100.0

Total 1558 99.2 100.0

Missing System 13 .8

Total 1571 100.0

Mode of ingestion Drug 1

Out-of-range values (note that none of the digits are

> 5)

Page 6: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Cleaning Mode1

• Save a copy of the original• Recode the out-of-range values into a new value (for

example,12, 15, 23, 24 ,25, 34, 234 into the value 8)• Set the new value as a user-defined missing value (for

example, 8 is declared a missing value and given the label “Out-of-range”).

Page 7: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Frequency Percent Valid Percent Cumulative Percent

Valid Swallow 794 50.5 52.2 52.2

Smoke 634 40.4 41.7 93.9

Snort 62 3.9 4.1 98.0

Inject 30 1.9 2.8 100.0

Total 1520 96.8 100.0

Missing Out-of-range 38 2.4

System 13 .8

Total 51 3.2

Total 1571 100.0

Mode of ingestion Drug 1

Page 8: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11
Page 9: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Gender

Male Female Total

Swallow 600 194 794

Smoke 553 77 630

Snort 44 17 61

Inject 20 10 30

Total 1271 298 1515

Mode of ingestion Drug1

Row totals

Joint frequencies

Grand total

Count

Mode of ingestion Drug1 * Gender cross-tabulation

Column totals

Page 10: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Percentages

• The difference in sample size for men and women makes comparison of raw numbers difficult

• Percentages facilitate comparison by standardizing the scale

• There are three options for the denominator of the percentage:– Grand total– Row total– Column total

Page 11: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Gender

Male Female Total

Swallow Count 600 194 794

% of Total 39.6% 12.8% 52.4%

Smoke Count 553 77 630

% of Total 36.5% 5.1% 41.6%

Snort Count 44 17 61

% of Total 2.9% 1.1% 4.0%

Inject Count 20 10 30

% of Total 1.3% .7% 2.0%

Total Count 1271 298 1515

% of Total 80.3% 19.7% 100.0%

Mode of ingestion Drug1

Marginal distribution

Mode1

Joint distribution Mode1 & Gender

Mode of ingestion Drug1 * Gender cross-tabulation

Marginal distributionGender

Page 12: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Mode of ingestion Drug1 * Gender cross-tabulation

Gender

Male Female Total

Swallow Count 600 194 794

% within Mode of ingestion Drug1

75.6% 24.4% 100.0%

Smoke Count 553 77 630

% within Mode of ingestion Drug1

87.8% 12.2% 100.0%

Snort Count 44 17 61

% within Mode of ingestion Drug1

72.1% 27.9% 100.0%

Inject Count 20 10 30

% within Mode of ingestion Drug1

66.7% 33.3% 100.0%

Total Count 1271 298 1515

% within Mode of ingestion Drug1

80.3% 19.7% 100.0%

The distribution of Gender conditional on Mode1

Mode of ingestion Drug1

Page 13: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Mode of ingestion Drug1 * Gender cross-tabulation

Gender

Male Female Total

Swallow Count 600 194 794

% within Gender 49.3% 65.1% 52.4%

Smoke Count 553 77 630

% within Gender 45.4% 25.8% 41.6%

Snort Count 44 17 61

% within Gender 3.6% 5.7% 4.0%

Inject Count 20 10 30

% within Gender 1.6% 3.4% 2.0%

Total Count 1271 298 1515

% within Gender 100.0% 100.0% 100.0%

Mode of ingestion Drug1

The distribution of Mode1 conditional on Gender

Page 14: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Choosing percentages

• “Construct the proportions so that they sum to one within the categories of the explanatory variable.”

Source: (C. Marsh, Exploring Data: An Introduction to Data Analysis for Social Scientists (Cambridge, Polity Press, 1988), p. 143.)

Page 15: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11
Page 16: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11
Page 17: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11
Page 18: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

n=600

n=553

n=44

n=20

n=194

n=77

n=17

n=10

Page 19: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Dimensions

Definitions of vertical and horizontal variables

Page 20: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Two-by-two tables

• Tables with two rows and two columns• A range of simple descriptive statistics can be applied to

two-by-two tables• It is possible to collapse larger tables to these

dimensions

Page 21: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Gender * White pipe cross-tabulation

White pipe

Yes No Total

Male Count 290 961 1251

% within Gender 23.2% 76.8% 100.0%

Female Count 22 292 314

% within Gender 7.0% 93.0% 100.0%

Total Count 312 1253 1565

% within Gender 19.9% 80.1% 100.0%

Gender

Page 22: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

White pipe

Yes No

Gender Male 0.2318 0.7682

Female 0.0701 0.9299

Page 23: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Relative risk

• Divide the probabilities for “success”:– For example:

P(Whitpipe=Yes|Gender=Male)=0.2318 P(Whitpipe=Yes|Gender=Female)=0.0701Relative risk is 0.2318/0.0701=3.309

• The proportion of males using white pipe was over three times greater than females

Page 24: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Odds

• The odds of “success” are the ratio of the probability of “success” to the probability of “failure”

• For example:- For males the odds of “success” are 0.2318/0.7682=0.302 - For females the odds of “success” are 0.0701/0.9299=0.075

Page 25: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Odds ratio

• Divide the odds of success for males by the odds of success for females

• For example: 0.302/0.075=4.005• The odds of taking white pipe as a male are four times

those for a female

Page 26: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11
Page 27: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

95% Confidence interval

Value Lower Upper

Odds ratio for Gender (Male / Female)

4.005 2.547 6.299

For cohort white pipe = Yes 3.309 2.184 5.012

For cohort white pipe = No .826 .791 .862

N of valid cases 1565

Risk estimate

Relative risk of

“success”

Relative risk of

“failure”

Odds ratio M/F

Page 28: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Exercise 1: cross-tabulations

• Create and comment on the following cross-tabulations:– Age vs Gender– Race vs Gender– Education vs Gender– Primary drugs vs Mode of ingestion

• Suggest other cross-tabulations that would be useful

Page 29: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Exercise 2: cross-tabulation

• Construct a dichotomous variable for age: Up to 24 years and Above 24 years

• Construct a dichotomous variable for the primary drug of use: Alcohol and Not Alcohol

• Create a cross-tabulation of the two new variables and interpret

• Generate Relative Risks and Odds Ratios and interpret

Page 30: Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11

Summary

• Cross-tabulations• Joint frequencies• Marginal frequencies• Row/Column/Total percentages• Relative risk• Odds• Odds ratios