sadc course in statistics assessing data critically module b1 session 17

14
SADC Course in Statistics Assessing data critically Module B1 Session 17

Upload: aiden-stack

Post on 28-Mar-2015

227 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SADC Course in Statistics Assessing data critically Module B1 Session 17

SADC Course in Statistics

Assessing data critically

Module B1 Session 17

Page 2: SADC Course in Statistics Assessing data critically Module B1 Session 17

2To put your footer here go to View > Header and Footer

Objectives

At the end of this session the students will be able to:

•Apply basic techniques for error detection

•Ask relevant questions that allow for the explanation or correction of discrepancies

Page 3: SADC Course in Statistics Assessing data critically Module B1 Session 17

3To put your footer here go to View > Header and Footer

Detecting errors in primary data

Checks to detect errors in primary data should be made at various stages:

Immediately after data collection (and during data entry)

After data computerisation

During exploratory data analysis

Page 4: SADC Course in Statistics Assessing data critically Module B1 Session 17

4To put your footer here go to View > Header and Footer

Checking for errors after data collection Have all questions been answered? If not,

are the reasons for non-response clear?

Are recorded values within their expected range?

Do all questions or items have meaningful entries? Are they internally consistent?

Are any zero entries genuinely zeros?

Are IDs unique?

Page 5: SADC Course in Statistics Assessing data critically Module B1 Session 17

5To put your footer here go to View > Header and Footer

Checking for errors after data entry

Compute new (temporary) variables to check if:

Rates recorded per 1000 of population are less than 1000

Percentages expected to be less than 100% are indeed so

There is internal consistency amongst variables, and between tables – for example,

• date of interviewing should be earlier than the date when the supervisor checked the questionnaire

• totals are consistent across different tables, and sub-totals add to overall totals.

Codes for missing values have been identified correctly according to their reason for missing and have been set as missing in the database to be used for analysis.

Page 6: SADC Course in Statistics Assessing data critically Module B1 Session 17

6To put your footer here go to View > Header and Footer

Tips for error detection• Look for counts or categories that do not make

sense

• If you have a series of data in chronological order, look for jumps in the data. They may be errors

• Always check your totals– Make sure they add to the expected total (e.g. 100%).– When looking at multiple tables in a single study, the

sample size should be consistent in all tables

• What is expected to tally should tally!

• Don’t just look at the numbers, look at the definitions that the numbers represent

Page 7: SADC Course in Statistics Assessing data critically Module B1 Session 17

7To put your footer here go to View > Header and Footer

Checks during Exploratory Data Analysis

Simple one-way or two-way tables can help identify errors.

(a) Results are from a socio-economic survey in Uganda. Are these results reasonable?

Average number of meals taken by HH in past week Frequency

0 6

1 699

2 5547

3 3285

4 113

5 1

7 1

Total 9652

Page 8: SADC Course in Statistics Assessing data critically Module B1 Session 17

8To put your footer here go to View > Header and Footer

Checks during Exploratory Data Analysis

(b) A second example from the British Crime Survey, 2000

Number of times something was stolen from respondent’s hands, pockets, bag or case since 1 Jan 99 Frequency

0 413

1 39

2 4

3 2

5 1

10 1

15 1

36 1

97 1

Total 463

Can the last figure be correct?

Page 9: SADC Course in Statistics Assessing data critically Module B1 Session 17

9To put your footer here go to View > Header and Footer

Checks during Exploratory Data Analysis

(c) Detection rate of property crimes in one police force. (Data are fictitious)

Property Crime Jan Feb Mar

Vandalism 10 13 14

Burglary 14 19 16

Vehicle thefts 15 81 17

Bicycle thefts 4 3 3

Thefts from person 3 2 5

Other thefts 7 9 11

Page 10: SADC Course in Statistics Assessing data critically Module B1 Session 17

10To put your footer here go to View > Header and Footer

Checks during Exploratory Data Analysis

Consistency checks across related variables

The following examples show:

(i) Current number of cars at household versus whether respondent was worried about having car stolen.

(ii) Current number of cars at household versus whether respondent was worried about having things stolen from car.

(iii) Distance to reach any type of formal court versus distance from nearest Magistrate’s Court.

Page 11: SADC Course in Statistics Assessing data critically Module B1 Session 17

11To put your footer here go to View > Header and Footer

Use of cross-tabulations • Table 1. Cross-tabulation of current number

of cars at household versus extent to which respondent is worried about having car stolen (Source: BCS, 2000)

Page 12: SADC Course in Statistics Assessing data critically Module B1 Session 17

12To put your footer here go to View > Header and Footer

Use of cross-tabulations• Table 2. Cross-tabulation of current number

of cars at household versus extent to which respondent is worried about having things stolen from the car (Source: BCS, 2000)

Page 13: SADC Course in Statistics Assessing data critically Module B1 Session 17

13To put your footer here go to View > Header and Footer

Detecting errors in secondary data

Procedures similar to the above can be undertaken,but in addition:

• Ask questions regarding the source from where data arose, e.g. to assess competence, adequacy of funding, motivation for study, etc.

• Ask about the data collection procedure and associated documentation. In particular seek answers to what, who, why, when, where, and how.

• Important to follow the whole data chain.

Page 14: SADC Course in Statistics Assessing data critically Module B1 Session 17

14To put your footer here go to View > Header and Footer