a short overview of dataentry with epidata 2 · 2019. 5. 15. · data manager • design database...

Post on 30-Sep-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Management

Pawin Numthavaj M.D.

Section for Clinical Epidemiology and Biostatistics

Ramathibodi Hospital, Mahidol University

E-mail: pawin.num@mahidol.ac.th

1

Objectives of Data Management

•To minimize errors at all stages of data collection

•To prepare data of the highest possible quality in a suitable form for statistical analysis

2

Data Management Process

1

• Design and create case report form (CRF)

2

• Collect data by CRF

3

• Design and create database

4

• Specify data quality control

5

• Enter data into database

6

• Clean and check data

3

Design & Create CRF

4

Definition of CRF

•Case report/record form (CRF) is the document used to record the data on which the eventual analysis and reporting of the clinical trial data will be based• Paper-based• Electronic

•Design of the CRF must reflect• Data collection• Data extraction

5

Who will use CRF?

Role Good CRF should be

Investigator • Clear, unambiguous, easy to follow, complete• Comprehensive instruction and guidance• Enable investigator to ascertain subject eligibility

to continue in the trial at any point

Monitor • Review completed CRF against protocol• Minimize uncertainties and facilitate entry

verification

Data manager • Design database• Source for data in database• Clear and unambiguous response, minimizing

amount of free text

6

Ideal CRF should

•Request the precise information and only the information required by the protocol

•Simple, quick, unambiguous, straightforward

•Order questions in sequence

•Have been accepted by all members of study team

7

Principles of CRF design

1. Understand basic questions for current research

• What are the questions/objectives of research?

• What is the type of study design?

• What variables will be involved?

• How variables will be collected?

• How often variables will be collected?

8

•Example:

•For a retrospective cohort study of kidney transplantation in Thailand, researchers would like to study the association between type of donor and risk of graft rejection

9

What are the objectives of this research?

•To study the association between type of donor and risk of graft rejection

10

What is the type of study design?

•Retrospective cohort study

What variables will be involved?

•Type of donor

•Graft status

11

How variables will be collected?

•Type of donor was classified as• Cadaveric donor (CDKT)• Living-related donor (LRKT)

•Graft status was classified as• Graft rejection• Graft non-rejection

12

How often variables will be collected?

•Type of donor was collected during enrollment

•Graft status was collected every 6 months during the follow up period

13

Principles of CRF design

2. Consider timing of data collection

•Decide how many different CRFs should be created to collect the data

•Decide which data should be collected on which form

14

Example

15

Data of requirementTiming of data

collection

Characteristics of recipients Enrollment

Characteristics of donors Enrollment

Details of kidney transplantation Enrollment

Graft status after kidney transplantation FU every 6 months

16

1. Enrollment formID numberPart I Recipient- - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - -

Part II Donor- - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - -

Part III Transplantation- - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - -

2. Follow up formID numberDate of visit- - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - -

Principle for CRF design

3. Consider sources of data collection

•Decide how many different CRFs should be created to collect the data.

•Decide which data should be collected on which form.

17

18

Example

Data of requirement Sources of data collection

Characteristics of recipients Recipients

Characteristics of donors Donor

Details of kidney transplantation Operating room

Graft status after kidney transplantation Outpatient clinic

19

1. Recipient formID number- - - - - - - - - - - -- - - - - - - - - - - -- - - - - - - - - - - -

4. Follow up formID numberDate of visit- - - - - - - - - - - -- - - - - - - - - - - -- - - - - - - - - - - -- - - - - - - - - - - -- - - - - - - - - - - -- - - - - - - - - - - -- - - - - - - - - - - -- - - - - - - - - - - -- - - - - - - - - - - -- - - - - - - - - - - -- - - - - - - - - - - -

2. Donor formID number- - - - - - - - - - - -- - - - - - - - - - - -- - - - - - - - - - - -

3. Transplantation formID number- - - - - - - - - - - -- - - - - - - - - - - -- - - - - - - - - - - -

Recommendations

• It is not always best to minimize the number of forms by trying to fit as much as possible onto one page.

• It may be better to have more forms, each with a small amount of data.

20

Principle for CRF design

4. Specify identifying (ID) number

• Identifying numbers are a unique value for each case which are assumed to be present on every CRF

•HN – Beware of patient’s identity

• ID will link all data on different forms together

21

Identifying and ensuring the integrity

•Each page of CRF should have• Patient identification (Subject No, CRF No, Subject

initials)

• Identification of trial (Ex. Code name or number)• Number or code identifying the center in which

subject has been recruiting• Visit number (if applicable)• Name of sponsor• Page number (page n of nn)

22

23

Principle for CRF design

5. Structure sequence of questions

•Related questions should be together

24

1. ID _ _ _

2. Sex

1) Male 2) Female 9) Missing

3. Height _ _ _._ _ cm

4. Types of treatment

1) RT 2) Chemo 9) Missing

5. Date of treatment _ _/_ _/_ _ _ _

Principle for CRF design

5. Structure sequence of questions

•Related questions should be together

25

1. ID _ _ _

2. Sex

1) Male 2) Female 9) Missing

3. Types of treatment

1) RT 2) Chemo 9) Missing

4. Height _ _ _._ _ cm

5. Date of treatment _ _/_ _/_ _ _ _

X

Question formats

•Questions should be written in a simple way•Avoid double negative question

• Is the patient unable to swallow tablets?• Does the patient have difficulty swallowing tablets?

•Use coded tick box instead of writing if possible• 0 = No, 1 = Yes• Usage the same for the rest of CRF

•Yes/No questions should appear in one column to prevent the wrong box tick•State clearly if more than one box can be checked

26

Layout

•Easy to read and understand

•Orderly and logical fashion

•Look “good” and “attractive” to encourage careful and accurate completion

27

Multiple assessments

•Should be in the same format and sequence for each visit

•Assist investigator to develop a ‘visit routine’

•Assist database building and data entry

28

Investigator comments

•Discourage note-writing on CRF

•Use of separate “comment page” can be provided

29

Fonts and layout

•Serif fonts (Times New Roman)

•Text size around 10-12 point• 10 point for minor instruction e.g. (dd/mm/yy)

•Rotate text if needed

30

Text entries

•Block capitals are easier than script

•Appropriate space to write

31

•Particular styles (ex. Bold) for all same answer (ex. Yes) can be useful

• Inclusion/exclusion question• All “yes” for inclusion criteria• All “no” for exclusion criteria

32

33

Sections that are completed by subject

•Text should be at least 10 point size

•No medical jargon

•Examples of entry should be given (ex. How to write time format)

•Attractive, easy-to-use

•Minimized text entry

34

Principle for CRF design

6. Collecting continuous data

•The correct number of boxes for the answer should be provided.

•Any required decimal points, commas, or other punctuation should be preprinted.

35

36

1. ID _ _ _

2. Weight _ _ _ . _ _

3. Height _ _ _ . _ _

4. SBP _ _ _

5. DBP _ _ _

Example. Format for collecting continuous data

•The units to be used in recording the data should be specified.

37

1. ID _ _ _

2. Weight _ _ _ . _ _ kg

3. Height _ _ _ . _ _ cm

4. SBP _ _ _ mmHg

5. DBP _ _ _ mmHg

Include units of measurement on the form

• Investigator should be in no doubt about units of measurement (ex: cm. or m. or ft. or in.)

38

Principle for CRF design

•Avoid grouping of continuous data at data collection time.

39

3a. Age at enrollment

1) 15-24

2) 25-35

3) 36-45

4) > 45

3b. Age at enrollment _ _ years

X

Principle for CRF design

•Do not make any calculations before data entry. Why?

•Since it may cause many errors and more time is consumed.

•We can calculated later in a statistical programs.

40

Weight (kg) _ _ . _ _

Height (cm) _ _ _

BMI (kg/m2) _ _ . _ _

Principle for CRF design

7. Collecting categorical data

•All possible categories of categorical variables should be displayed on the form.

41

Please circle the right answer

What is your sex?

Male

Female

Principle for CRF design

7. Collecting categorical data

•Numerical codes should be assigned for all possible categories.

42

What is your sex?

Male………………………1

Female…………………..2

Principle for CRF design

•Coding conventions should be consistent for all data items.

•For example, 1=yes, 2=no for all yes-no possible answers.

43

44

Underlying disease

• DM 1. yes 2. no

• HT 1. yes 2. no

• Stroke 1. yes 2. no

• CVD 1. yes 2. no

Example

Principle for CRF design

8. Code for missing data

• It is bad practice to leave data collection field blank on the CRF because it can lead to confusion at data entry time.

•Special codes should be assigned for missing values at the data collection time.

45

Principle for CRF design

8. Code for missing data

•The missing data codes should not be possible valid values.

• It is common practice to use 9, 99, 999 and so on to denote missing data.

46

Age _ _ _ year (missing=999)

Height _ _ _ . _ _ cm (missing=999.99)

Sex

1. male 2. female 9. missing

Stage of cancer

1. I 2. II 3. III 9. missing

47

Example

Principle for CRF design

9. Collecting date

• It is important to clearly identify the date format to be used, for example,

48

• Day, Month, Year (dd/mm/yyyy).

• Month, Day, Year (mm/dd/yyyy).

Principle for CRF design

9. Collecting date

• It is important to clearly identify the year format to be used, for example,

49

• Western (dd/mm/20yy)

• Buddist (dd/mm/25yy)

50

Example of weak CRF design

1. Have you ever been diagnosed with DM?

1. Yes 2. No 9. Missing

For female: if yes, answer the following questions

2. Did you have DM before pregnancy?

1. Yes 2. No 9. Missing

3. Did you have DM during pregnancy?

1. Yes 2. No 9. Missing

4. Have you ever taken drug for DM?

1. Yes 2. No 9. Missing

51

Example of strong CRF design

1. Have you ever been diagnosed with DM?

1. Yes 2. No 9. Missing

if yes, answer the question number 2.

2. Have you ever taken drug for DM?

1. Yes 2. No 9. Missing

If you are female, and have been pregnant, answer the questions number 3 and 4, otherwise go to question number 5.

3. Did you have DM before pregnancy?

1. Yes 2. No 9. Missing

4. Did you have DM during pregnancy?

1. Yes 2. No 9. Missing

52

Example of weak CRF design

Have you ever taken medications for osteoporosis?

Calcium □ Start date _ _/_ _/_ _ _ _

Vitamin D □ Start date _ _/_ _/_ _ _ _

Calcitonin □ Start date _ _/_ _/_ _ _ _

Hormone □ Start date _ _/_ _/_ _ _ _

53

Example of strong CRF design

Have you ever taken medications for osteoporosis?

Calcium 1. Yes 2. No 9. Missing

If yes, specify start date _ _/_ _/25 _ _

Vitamin D 1. Yes 2. No 9. Missing

If yes, specify start date _ _/_ _/25 _ _

Calcitonin 1. Yes 2. No 9. Missing

If yes, specify start date _ _/_ _/25 _ _

Hormone 1. Yes 2. No 9. Missing

If yes, specify start date _ _/_ _/25 _ _

Recommendations

•The quality of the data recorded decreases when the amount of data required increases.

• It is important to take time over the design and development of the forms because the design of CRF has a direct impact on the quality of data.

54

Recommendation

•Collecting data without the CRFs is likely to result in incomplete and invalid data.

55

Database Design & Testing

Pawin Numthavaj M.D.

Section for Clinical Epidemiology and Biostatistics

Faculty of Medicine Ramathibodi Hospital

56

Definition

•A database consists of an organized collection of data for one or more purposes, typically in digital form.

57

Database File

58

Id: 5

Id: 4

Id: 3

Id: 2

Id: 1

Date of birth: …

Age: …

Sex: …

Weight: …

Height: …

Variables Case File

Data set for database file

59

Id Date of birth Age Sex Weight Height

1 12/12/1973 37 M 56 167

2 10/11/1988 22 M 78 178

3 03/08/1963 47 F 45 158

4 14/09/1986 24 M 67 169

5 23/10/1981 29 F 41 155

Database Management System (DBMS)

•The DBMS is a set of computer programs which perform a wide range of operations:• creating new files

• entering new records• sorting, searching, and editing• and so on.

60

DBMS software package

•There are many different DBMS software packages:• Microsoft Access

• dBase• Paradox• EpiData• And so on

61

Reasons for using EpiData

•Specially written for use in research studies.

•Easy to use

•Free

•Small program

•Can export data in Stata / SPSS format

62

Where to get EpiData

•http://www.Epidata.dk/download.php#ee

63

Overview of EpiData

•The EpiData screen has a standard windows layout with one menu line and two toolbars.

64

Work process toolbar

Menu line

Editor toolbar

Work process toolbar

1. Define Data

2. Make Data File

3. Checks

4. Enter Data

5. Document

6. Export Data

65

Process of creating database file with EpiData

Define data QuEStionnaire file (.qes)

Make data file RECord file (.rec)

Add/revise checks CHecK file (.chk)

66

67

Define data

.QES file

Make data file

.REC file

Figure 1. Flowchart for creating a database file in EpiData

Add checks

.CHK file

1. Define data: QES files

68

Variable Name Variable Label Variable types

Variable names

•Must not exceed 8 characters.

•Must not contain space/punctuation

•Has to begin with a letter, not a number.

•Can contain any sequence of letters and digits.

•Can be upper or lower case.

69

Examples of illegal variable names

70

Variable name

1date

Last name

countryoforigin

Begins with a number

Contains a space

Longer than 8 letters

Variable labels

• “Notes” for variable name

•Make data more easy to understand for others

•For example,

• Variable Name: dateb

• Variable Label: date of birth

Variable types

•The variable type indicates characteristic of the variable such as

- Text

- Numeric

- Date

etc.

72

Variable types: Text

•Text variables are used for holding data consisting of letters and/or numbers

•You can enter numbers into text variables but you cannot perform any calculation with them

73

Variable types: Numeric

•Numerical information

•Can be used for continuous/categorical data

•Can be used for integer/real number

74

Variable types: Date

•Date variables are used for holding dates.

•You can perform simple arithmetic such as addition or subtraction one date variable from another date variable.

75

Variable types: Date

•The advantage of using date type variables is that the EpiData will only allow you to enter valid dates.

•EpiData also has a special type of date variable which is updated each time a record is changed.

76

Examples of variable types

77

Variable Type

ID

Date of birth

Age at enrollment

Sex

Do you have any underlying diseases?

Specify medications

Numeric

Date

Numeric

Text

Numeric

Numeric

Variable length

•The length of a variable defines how much data it can hold.

•A text variable with length 10 will be able to hold up to ten letters or numbers.

78

Variable length

•A numerical variable with length 3 will be able to hold numbers between -99 and 999.

•The length of a variable must correspond to the maximum anticipated number of letters and/or numbers.

79

Specify variable type and length

80

Type EpiData definition

Text _ _ _ _ _ _ _ _

Numeric ### or ###.##

Date <dd/mm/yyyy>, <mm/dd/yyyy>

Today’s date <today-dmy>, <today-mdy>

81

Variable Name Variable Label Variable types

82

Define data

.QES file

Make data file

.REC file

Figure 1. Flowchart for creating a database file in EpiData

Add checks

.CHK file

2. Make data file

•The second step is to create database file based upon the database structure.

•The make data file function is used to crate a record (.REC) file from questionnaire (.QES) file.

83

84

Summary

•At the end of this step, you can enter the data set into the database file.

85

Interactive checking

• Interactive checking is checking for error during data entry

• Interactive checking is useful in picking up typing errors

•This step can be done by using EpiData check functions

86

Data validation

• This involves the data being entered twice into different files by different persons.

• The resulting files are then compared to each other to see if they are the same.

87

88

Define data

.QES file

Make data file

.REC file

Figure 1. Flowchart for creating a database file in EpiData

Add checks

.CHK file

Interactive checking functions

•EpiData provides functions that allow you to do data interactive checking as:

- Must enter variables

- Range and legal values

- Attach value labels to variables

- Repeated variables

- Conditional jumps

- Programmed checks

Basic checks

Advanced checks

Something to consider if you do not want to use database software

•You could use spreadsheet software such as Excel to enter data

•But please consider following restriction for data preparation

90

1. Prepare data in a table format with each row corresponds to each individual

91

2. The name of the variable should be in English and do not contain special characters such as % & + ! (space). You can use underscore (_)

92

3. Do not enter text that is not data, such as comments, directly into the table. Use comment function in Excel. (Or put it somewhere else)

93

4. Do not use cell color to code data. Computer programs do not see the different between color-coded rows.

5. Try to make data as categories and use number to label categories (ex. 1/2 instead Male/Female)

6. In case there is no data collected, do not type anything. Leave the cell blank.

94

Thank you

95

top related