introduction to data analysis

21
Introduction to Data Analysis Why do we analyze data? Make sense of data we have collected Basic steps in preliminary data analysis Editing Coding Tabulating

Upload: kayo

Post on 24-Feb-2016

65 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Data Analysis. Why do we analyze data? Make sense of data we have collected Basic steps in preliminary data analysis Editing Coding Tabulating. Introduction to Data Analysis. Editing of data Impose minimal quality standards on the raw data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Data Analysis

Introduction to Data Analysis

Why do we analyze data? Make sense of data we have collected

Basic steps in preliminary data analysis Editing Coding Tabulating

Page 2: Introduction to Data Analysis

Introduction to Data Analysis

Editing of data Impose minimal quality standards on the raw data

Field Edit -- preliminary edit, used to detect glaring omissions and inaccuracies (often involves respondent follow up) Completeness Legibility Comprehensibility Consistency Uniformity

Page 3: Introduction to Data Analysis

Introduction to Data Analysis

Central office edit More complete and exacting edit

Best performed by a number of editors, each looking at one part of the data

Decisions on how to handle item non-response and other omissions need to be made List-wise deletion (drop for all analyses) vs. case-wise

deletion (drop only for present analysis)

Page 4: Introduction to Data Analysis

Introduction to Data Analysis

Coding -- transforming raw data into symbols (usually numbers) for tabulating, counting, and analyzing Must determine categories

Completely exhaustive Mutually exclusive

Assign numbers to categories Make sure to code an ID number for each

completed instrument

Page 5: Introduction to Data Analysis

Introduction to Data Analysis

Tabulation -- counting the number of cases that fall into each category Initial tabulations should be preformed for each

item One-way tabulations

Determines degree of item non-response Locates errors Locates outliers Determines the data distribution

Page 6: Introduction to Data Analysis

Preliminary Data Analysis

Tabulation Simple Counts For example

74 families in the study own 1 car

2 families own 3

Missing data (9) 1 Family did not report Not useful for further

analysis

Number of Cars

Number of Families

1 75

2 23

3 29 1

Total 101

Page 7: Introduction to Data Analysis

Preliminary Data Analysis

Tabulation Compute Percentages Eliminate non-responses

Note – Report without missing data

Number of Cars

Number of Families

1 75%

2 23%

3 2%Total 100

Page 8: Introduction to Data Analysis

Preliminary Data Analysis Cross Tabulation

Simultaneous count of two or more items Note marginal totals are

equal to frequency totals Allows researcher to

determine if a relationship exists between two variables Used a final analysis step in

majority of real-world applications

Investigates the relationship between two ordinal-scaled variables

Number of Cars

LowerIncome

HigherIncome

Total

1 48 27 752 or More

6 19 25

Total 54 46 100

Page 9: Introduction to Data Analysis

Preliminary Data Analysis

Cross Tabulation To analyze the data

Calculate percentages in the direction of the “causal variable”

Does number of cars “cause” income level?

Number of Cars

LowerIncome

HigherIncome

Total

1 64% 36% 100%

2 or More

24% 76% 100%

Total 54% 46% 100%

Page 10: Introduction to Data Analysis

Preliminary Data Analysis

Cross Tabulation To analyze the data

Does income level “cause” number of cars?

NOTE – “direction” means “causal” variable totals should be 100%

Number of Cars

LowerIncome

HigherIncome

Total

1 89% 59% 75%

2 or More

11% 41% 25%

Total 100% 100% 100%

Page 11: Introduction to Data Analysis

Preliminary Data Analysis

Cross Tabulation allows the development of hypotheses Develop by comparing percentages across

Lower income more likely to have one car (89%) than the higher income group (59%)

Higher income more likely to have multiple cars (41%) than the lower income group (11%)

Are results statistically significant? To test must employ chi-square analysis

Page 12: Introduction to Data Analysis

Preliminary Data Analysis

Chi-square analysis Allows the statistical testing of the independence of

two or more nominally-scaled variables Null hypothesis (HO) is that the variables are independent

(i.e., no relationship exists) Alternative hypothesis (HA) is that a statistical relationship

exists among the variables Present example

HO: Income level will have no affect on the number of cars that a family owns

HA: Income level will affect the number of cars that a family owns

Page 13: Introduction to Data Analysis

Preliminary Data Analysis

Chi-square analysis General Approach

Based on “marginal totals” compute the expected values per cell

Compare expected values to actual values to compute chi-square value (2)

Compare computed 2 to critical 2

Table 4 on p. 442 in text

Number of Cars

LowerIncome

HigherIncome

Total

1 75

2 or More

25

Total 54 46 100

Page 14: Introduction to Data Analysis

Preliminary Data Analysis

Chi-square analysis Compute Expected

Values E1 = (75 * 54)/100 E1 = 40.5

E2 = (75 * 46)/100 E2 = 34.5

Note E1 + E2 = 75

E3 = ? E4 = ?

Number of Cars

LowerIncome

HigherIncome

Total

1 E1 E2 75

2 or More

E3 E4 25

Total 54 46 100

Page 15: Introduction to Data Analysis

Preliminary Data Analysis Compute 2 value

2 = (Oi – Ei)2/Ei 2 = 12.08

df = (rows - 1) + (cols. - 1) = 1 + 1 =2

= .05 Critical 2 = 5.99

12.08 > 5.99: Reject the Null Hypothesis

Cell Oi EiOi - Ei

(Oi – Ei)2 (Oi – Ei)2/Ei

E1 48 40.5 7.5 56.25 1.39

E2 27 34.5 -7.5 56.25 1.63

E3 6 13.5 -7.5 56.25 4.17

E4 19 11.5 7.5 56.25 4.89

2 12.08

Page 16: Introduction to Data Analysis

Preliminary Data Analysis

Conclusion Income has an influence on number of cars in a

family

BUT: Does family size matter??

Do a 3-way Cross-Tabulation Is Income more important than Family Size?

Page 17: Introduction to Data Analysis

Preliminary Data Analysis

Total Data

IncomeLevel

1 Car or Less

2 or More Cars

Total

Low 48 6 54High 27 19 46Total 75 25 100

Page 18: Introduction to Data Analysis

Preliminary Data Analysis

Families with 4 Members or Less (Actual Data)

Income Level

1 Car or Less

2 or More Cars

Total

Low 44 2 46High 26 6 32Total 70 8 78

Page 19: Introduction to Data Analysis

Income Level

1 Car or Less

2 or More Cars

Total

Low 4 4 8

High 1 13 14

Total 5 17 22

Preliminary Data Analysis

Families with 5 Members or More (Actual Data)

Page 20: Introduction to Data Analysis

Income Level 1 Car or Less 2 or More Cars Total

Low 96% 4% 100%

High 81% 19% 100%

Income Level 1 Car or Less 2 or More Cars Total

Low 50% 50% 100%

High 7% 93% 100%

Preliminary Data Analysis

Families with 5 Members or More

Families with 4 Members or Less

•Still Calculate Percentages in the Direction of the Causal Variable

Page 21: Introduction to Data Analysis

Preliminary Data Analysis

Income Level/Size

4 or Less 5 or More

Total

Low 4% 50% 11%High 19% 93% 41%

Create New Table – Look at those families with 2 or more cars by family sizeFamilies with 2 or More Cars by Income and Size

Certainly Both family size and income level contribute to the number of cars that a family owns – Family size plays a major role