1 data management (1) data management (1) “application of information and communication technology...

Post on 11-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Data Management (1)Data Management (1)“Application of Information and Communication Technology

to Production and Dissemination of Official statistics”10 May – 11 July 2006

M Q HasanLecturer/ StatisticianUN Statistical Institute for Asia and the PacificChiba, JapanEmail : hasan@unsiap.or.jp

2

OverviewOverview

Data managementData management planningData management proceduresData management softwareHands on experienceReferences

3

Data management and the NSOData management and the NSO

Data management during production

– Individual caseData management after production

– Individual caseData management

– All case – long term

4

Data managementData management

Management of data filesManagement files during analysisManagement files afterwards

5

Data managementData management

Management of data files

– Labeling data files

– Documentation

6

Data managementData management

Management files during analysis

– Version managements

– Subset data

– Arrange files in different folder

– Index files

7

Data managementData management

Management files afterwards

– Pass them to system administrator for future reference

8

DATA MANAGEMENTDATA MANAGEMENT

P O S S IB L E D IR E C TO R Y S TR U C TU R E

D ata TA B L E S

H E A L TH E D U C A TIO N

M Y _ F IL E S

M Q Hasan, UN-SIAP9

These will lead to …These will lead to …Production of creditable data

Design of robust/ efficient / flexible storage and accessible system

Efficient procedure for sharing data with others

10

Data managementData managementbefore and duringbefore and duringdata processingdata processing

11

Define the relevant aspects of a dataset.Formulate a data preservation strategy.Design an access procedure.

During DP Planning :During DP Planning :

12

File format and file structureNaming filesCreation and naming of variablesVariable labels

Defining the relevant aspects of a dataset

13

Chose file structure according to available computing resources and the experience of

the data processors.

Defining the relevant aspects of a dataset

14

Documentation– Provide responsibility to log all processing

activities

– Problems encounter

– How problems are to be solved

– Major decision taken

Defining the relevant aspects of a dataset

15

Can be time consuming.Should contain all information about data, such

as, survey method, sample information, time of collection, information about variables, missing values etc.

Should start well before actual data processing. Follow standards.Preferably one file with reference to other files.

DP : Documentation DP : Documentation

16

Title: Child labour in Portugal: Social characterization of school-age children and their families, 1998.

Subtitle : Child labour in Portugal, 1998.Alternative title : SIMPOC Portugal survey,

1998.Parallel title :Trabalho Infantil em Portugal:

Caracterização social dos menores emidade escolar e suas famílias, 1998 files.

DP : DocumentationDP : Documentation

17

Keywords. National survey, child, economic activity, child labour, household, household chores etc.Abstract. Purpose, nature, and scope of the child labour data collection. Special characteristics of the contents etc.Time period covered. If the data was collected in 1999, and one question was “did you work last year?”, The time period should be 1998-99.

DP : DocumentationDP : Documentation

18

Date of collection. Date(s) when the data were collected.Country. Name of the country where the survey was conducted.Geographic coverage. Total geographic scope of the data. Geographic unit. Lowest level of geographic aggregation covered by the data—for example province, state, or district.Unit of analysis. For most child labour surveys, the basic unit of analysis or observation is the individual person.

DP : DocumentationDP : Documentation

19

Time method. Panel, cross-sectional, trend, and time-series etc.Data collector. Responsible for administering the questionnaire or interview or for compiling the data. E.G NSO.Frequency of data collection. For example, in first-time.Sampling procedure. Reference to sampling documents.

DP : DocumentationDP : Documentation

20

Mode of data collection. CAPI, CATI etc.Type of research instrument. Structured, semi-structured, open-ended questions etc. Actions to minimize losses. E.G follow-up visits, supervisory checks, historical matching etc.Control operations. Methods used to facilitate data control.

DP : DocumentationDP : Documentation

21

Weighting. Reference to appropriate document.Cleaning operation. E.g consistency checking, wild code checking, etc.Response rate. Percentage of sample members who provided information. Estimates of sampling error. Indication of how precisely one can estimate a population value from a given sample.

DP : DocumentationDP : Documentation

22

Location. Say where the data is currently stored (e.g. A national statistics office).Availability status. Provide a statement of data availability.Extent of data. Number of physical files that exist in a dataset.Completeness of dataset. Describe if items of collected information were not included in the data file.

DP : DocumentationDP : Documentation

23

Access authority. Contact person or organization that controls access to the data collection.Date use statement. Reference to the terms of use for the data collection, if any.Citation requirement. Specify any text that should be cited in publications based on analysis of the data.

DP : DocumentationDP : Documentation

24

File contents. Short description of the file(s).File structure. E.G. Hierarchical, rectangular, or relational etc.Record or record group. Describe the record groupings for hierarchical or relational.Label (of record). Detailed information for each record group.Dimensions (of record). Physical characteristics of the record, such items as number of variables per record, number of cases, etc.

DP : DocumentationDP : Documentation

25

Overall case count. Number of cases or observations.Overall variable count. Number of variables.Data format. Delimited format, free format, software dependent, etc.Missing data. Provide information such standardized across the collection, that missing data are the result of merging, etc.Software. Identify the software used to create the file, including the software version number.Version statement. Version statement for the data file.

DP : DocumentationDP : Documentation

26

list of variables with followings :   

– if variable is a weight; and if not reference weight variable for this variable;

– question ID for the variable;

– which format has been used (e.g. SAS, SPSS);

– the number of decimal points in the variable; – whether the options are discrete or continuous

which record type this variable belongs to;

DP : DocumentationDP : Documentation

27

Usually generated in a package-specific format

Convert data into other formats, if possible,Convert data into ASCII and generate

codebookReload ASCII data using same codebookRecheck data

Conversion of data files to other formats as required

DPDP

28

Possible list/type of files

– Data in a package-specific format – Data in ASCII with necessary data dictionary– Public use data– Public use data in ASCII with necessary data

dictionary– Final documentation– Questionnaire

Storage of all files.

DATA MANAGEMENTDATA MANAGEMENT

29

Possible list/type of files contd.

– Logical rules for consistency check.– Computer program files.– Interviewer and/or supervisor’s instruction

manual.– Coding file/s.– Sampling and weight files.

Storage of all files.

DATA MANAGEMENTDATA MANAGEMENT

30

Group them considering version, type etc.Create index file associated with each sub-

directory.Add short description to each file according to

the file contents in the index file.

Storage of all files

DATA MANAGEMENTDATA MANAGEMENT

31

HardwareAutomation softwareDirectory structure

Formulating a data preservation strategy

DATA MANAGEMENTDATA MANAGEMENT

32

DATA MANAGEMENTDATA MANAGEMENT

P O S S IB L E D IR E C TO R Y S TR U C TU R E

V er_ 1 V er_ 2

IN TE R N A L E X TE R N A L

C L S

33

DATA MANAGEMENTDATA MANAGEMENT

P O S S IB L E D IR E C TO R Y S TR U C TU R E (con td .)

In d ex file

D a ta file1

D ata file 2

D ata file 3

C od eb ook

D ata

In d ex file

M etad a ta file

P rog ram file

Q u es tion n a ire

E tc .

D ocu m en t

In d ex file

C ou n try P ro file

C ou n try rep ort

O th er R ep orts

R ep ort In d ex file

V er_ 1

34

DATA MANAGEMENTDATA MANAGEMENT

P O S S IB L E D IR E C TO R Y S TR U C TU R E (con td .)

In d ex file

D a ta file 1

D ata file 2

C od eb ook

D ata

In d ex file

M etad a ta file

M an u a l file

Q u es tion n a ire

D ocu m en t

In d ex file

C ou n try P ro file

C ou n try rep ort

O th er R ep orts

R ep ort In d ex file

E X TE R N A L

35

Access policy Safe keeping person : system administrator Contact person : supervisor Content modifying authority : supervisor Finalize access condition to each file

Designing an access procedure

DATA MANAGEMENTDATA MANAGEMENT

36

Micro data Aggregate tables Executive summary Reports

Data type

DATA DISSEMINATIONDATA DISSEMINATION

37

Online : direct access through internet in real time

Off line : available on request

Methods

DATA DISSEMINATIONDATA DISSEMINATION

38

Backup policy During during data processing

Data processors responsibility

After finalization of data and documentation System administrator’s responsibility

Designing an access procedure

DATA MANAGEMENTDATA MANAGEMENT

39

END

top related