introduction to data management, terminologies and use of data management platforms

32
www.iita.org A member of CGIAR consortium Introduction to data management, terminologies and use of data management platforms Workshop on Management and Analyses of ISFM Data Monday, May 25, 2015 1

Upload: international-institute-of-tropical-agriculture

Post on 16-Apr-2017

83 views

Category:

Government & Nonprofit


0 download

TRANSCRIPT

Page 1: Introduction to data management, terminologies and use of data management platforms

1www.iita.orgA member of CGIAR consortium

Introduction to data management, terminologies and use of data management

platforms

Workshop on Management and Analyses of ISFM Data

Monday, May 25, 2015

Page 2: Introduction to data management, terminologies and use of data management platforms

2www.iita.orgA member of CGIAR consortium

Data management

"Data management is the development, execution and supervision of plans,

policies, programs and practices that control, protect, deliver and enhance the value of data and information assets.“

(DAMA Data Management Association International )

Page 3: Introduction to data management, terminologies and use of data management platforms

3www.iita.orgA member of CGIAR consortium

Data managementObjective: • to maximize the potential of data while

integrating them into business processes

Topics:• Data quality• Data security• Data organization

Page 4: Introduction to data management, terminologies and use of data management platforms

4www.iita.orgA member of CGIAR consortium

Data management principles• Data are correct• Data are consistent(uniform in content, content structure, notation, units, methods used, meaning, language)

• Data are complete• Data are up to date• Data are relevant• Data are precise enough• Datasets are free of redundancies• Data are reliable and comprehensible• Data are understandable by all involved

users and processible by machines• Data are unambiguous/explicit

Data quality

Page 5: Introduction to data management, terminologies and use of data management platforms

5www.iita.orgA member of CGIAR consortium

Data management principles

• Every data needs a frequent backup

• no data without access permission control

• Treatment of data of different ownership (private) is clarified

Data security

Page 6: Introduction to data management, terminologies and use of data management platforms

6www.iita.orgA member of CGIAR consortium

Data management principles

• There is no data without a person responsible for it (clear roles & responsibilities)

• There is no data without one, clearly defined, easy to find and communicated location for it

Data organization

Page 7: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Main roles in data management• Data Editor: The person that validates, creates and

edits the data• Data Steward: The person that holds the data,

usually they will take care of the data, ensuring the data consumers obtain exactly the data approved by the data owner

• Data Owner: The person that approves data before it is published for the eventual audience

• Data Consumer: A person that uses the data without editing, correcting or modifying it

7

Page 8: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Operational levels

• Individual(Execution of data activities, self-organizing)

• Project/working group(Plans&deliveries, rules&responsibilities, workflow&steering, communication, access/permission control, data organizing (content mgt./file order, file naming strategies, templates, Project data…) )

• Organization(Policies, Infrastructure&repositories, Ressources, …)

• Global(Metadata standards, data exchange protocols, vocabularies/ ontologies, legal issues, Open Access, …)

8

Global

Organization

Project

Individual

Page 9: Introduction to data management, terminologies and use of data management platforms

collect

assure

describe

preserve

discoverintegrate

analyze

present

plan

www.iita.orgA member of CGIAR consortium

Data lifecycle

9

interpret dataderive data (apply statistical and analytical methods)produce research outputsauthor publications

create metadata and documentation

Identify (tracking)Categorize

migrate data to suitable medium

back-up and store data

archive data

collect data (experiment, observe, measure, simulate)

design researchplan data management (formats, storage etc)plan consent for sharing

locate existing data

enter data, digitize, transcribe, translate

check, validate, clean dataanonymize data where necessarydescribe data

migrate data to best format

Locate, explore and understand datascrutinize findings

distribute datashare datacontrol accessestablish copyrightpromote data

establish copyrightpromote data

follow-up researchundertake research reviews

teach and learn

Exposing metadata through a searchable interface

Source: Boston University Libraries

Page 10: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data intervention areas

Data capturing and preprocessingData transferData flow/content mgt.Data storageData analyticsData delivery

10

Page 11: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

From capture to delivery

11

Page 12: Introduction to data management, terminologies and use of data management platforms

12

Find answers to • ensure all data mgt.

principles are respected

• in and across all intervention areas

• at all operational levels

Start planning from the desired outcomes!

www.iita.orgA member of CGIAR consortium

Plan data managementDa

ta m

anag

emen

t prin

cipl

es

Operational levels

Data lifecycle / intervention areas

Page 13: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data presentation/publication• Who are the end users of which data?• Mode of presentation per information product• Ease of extraction of the right data in the right format for

the right (authorized) people• Automized? real-time data? Personalized data?• Consumers conditions (file formats? Com. tools?)• ability to search&browse (metadata, tags)• Presentation mode and conditions (inclusive

visualization)• licensing

13

Page 14: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data transfer• Transfer format and requirements (Data

Transformation needed?)• Transfer initiative (receiver or sender?)• Transfer mode and instructions• Transfer compression needs (zip, tar…), limited

internet availability? • Transfer channels (email, phone, skype, RSS

etc.)• Transfer check (i.e. email)

14

Page 15: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data transfer• Transfer security• Platform Openness• Authorization Controls (user credentials)• Encryption Standards (SSL, S/MIME etc.)• Transfer scheduling• Use of API’s?

15

Page 16: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data storage• Suitable end repository (server folder, Sharepoint, MySQL database, cloud based solution, PC, external repository)

• Suitable data infrastructure hardware(servers, network(s), bandwidth, databases, security facilities, PCs, external hard drive, USB stick, Smartphones/tablets, scanners, field or laboratory sensors with digital data capturing, etc.)

• Data categorization, file order, filing order criteria• Data deleting policy and archiving for

evidence/documentation purposes• Data disposal/sharing/access control +

administration

16

Page 17: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data analytics and data search• Goal and mode of analysis• Frequency of a data analysis• Participating units and data integration

(Business intelligence)• Storage and backup of analysis results• Speed of search• eventual transition or termination of the

data?

17

Page 18: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data backup• Risk assessment:

loss/theft/damage/overload/hacker attack…• Backup mode and regulations• Backup frequency/scheduling and discipline• Suitable backup repository (server folder,

Sharepoint, MySQL database, cloud, PC, external repository, external hard drive, USB stick etc.)

• Backup tool/software/opportunities to automize

18

Page 19: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data capturing and preprocessing• Capturing location and its conditions• Capturing mode (manual typing, crowd sourcing, data mining, etc.)• Capturing tools/hardware (PC, Smartphones, tablets, GPS, mobile

phone, scanners etc.)• Capturing software and requirements (field data capturing tools,

scanning & OCR read software, etc.)• Capturing instructions (metadata, data protocols, add. data

descriptions, methodological correctness)• Data validation rules + data checks: Ensuring Data quality• Referencing captured data in time & space• Data structure at capturing• Capturing data intermediate storage

19

Page 20: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Platforms• MS SharePoint• CKAN• aWhere• Collaboration tools• File sharing services (google drive,

dropbox, FTP server, etc.)

20

Page 21: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data mgt. platforms (1)

MS SharePoint• Fits to existing Microsoft environment (MS Office (especially Outlook, Excel, Access, Visio, Project), MS Server databases, Exchange server, skype)

• With proper permission settings, allows to create as much pages, apps or subsites as necessary

• Useful features for data mgt.(Metadata tagging, version control, templates (MS office only), validation rules, linking data lists, workflows (approvals etc.), many predefined apps come with customizable metadata sets)

• Weak: issues linking open repositories

21

Page 22: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data mgt. platforms (2)

CKAN – “Meta-repository”• functional emphasis: defacto standard software for

publishing open data(started as a catalogue for harvesting published data spread of knowledge)

• Python based (DKAN in PHP) • Strength: customizable, data organization, harvesting

multiple repositories • Weak: no workflow or bulk operations: processing

need to be done before cataloguing; no collaboration tools; no upload of multiple ressources at a time and batch edit the metadata

• Example: http://data.ilri.org/portal/ 22

Page 23: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data mgt. platforms (3a)

ILRI dataset portal based on CKAN

23

Page 24: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data mgt. platforms (3b)

ILRI dataset portal based on CKAN

24

Page 25: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data mgt. platforms (4)

aWhere• Functional emphasis: (geo)data exploration • Strength: easy to use platform to explore data

from xls or ODK as tables, diagram or maps and in connection with data from other users, the library and the weather module

• Weak: xls only; collaboration functionality• More by Hannah and Courtney

25

Page 26: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data mgt. platforms (5)

Collaboration tools - basecamp• Functional emphasis: collaboration with many

different partners in projects• Strength: easy to use platform with typical

collab. tools (file sharing+tagging, calendar, wiki, task tracking)

• Weak: not customizable, no data linkage to databases

26

Page 27: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Data mgt. platforms (6)

File sharing services – Google drive• Functional emphasis: synchronized working on

office apps in the cloud• Strength: data sharing and synchronizing, widely

known, easy to use• Weak: not customizable, no data linkage to

databases, google account necessary; adverts

27

Page 28: Introduction to data management, terminologies and use of data management platforms

28www.iita.orgA member of CGIAR consortium

Thank you!

Page 29: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

File naming strategies

29

Order by date:2013-04-12_interview-recording_THD.mp3

2013-04-12_interview-transcript_THD.docx

2012-12-15_interview-recording_MBD.mp3

2012-12-15_interview-transcript_MBD.docx

Order by subject:MBD_interview-recording_2012-12-15.mp3

MBD_interview-transcript_2012-12-15.docx

THD_interview-recording_2013-04-12.mp3

THD_interview-transcript_2013-04-12.docx

Order by type:Interview-recording_MBD_2012-12-15.mp3

Interview-recording_THD_2013-04-12.mp3

Interview-transcript_MBD_2012-12-15.docx

Interview-transcript_THD_2013-04-12.docx

Forced order with numbering:01_THD_interview-recording_2013-04-12.mp3

02_THD_interview-transcript_2013-04-12.docx

03_MBD_interview-recording_2012-12-15.mp3

04_MBD_interview-transcript_2012-12-15.docx

Page 30: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Supporting documentation(1)

30

Supporting documentation is information in separate files that accompanies data in order to provide • context, • explanation, or • instructions on • confidentiality and • data use or • reuse

Source: Dublin UCD Library

Page 31: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Supporting documentation(1)

31

Examples of supporting documentation include:

Source: Dublin UCD Library

Information about the project and data creators;Working papers or laboratory notebooksQuestionnaires or interview guides CodebooksDetails on how the data were created, analysed, anonymised etc;Final project reports and publications

Page 32: Introduction to data management, terminologies and use of data management platforms

www.iita.orgA member of CGIAR consortium

Metadata

32

There are three broad categories of metadata:

Source: Dublin UCD Library

Descriptive - common fields such as title, author, abstract, keywords which help users to discover online sources through searching and browsing.

Administrative - preservation, rights management, and technical metadata about formats.

Structural - how different components of a set of associated data relate to one another, such as a schema describing relations between tables in a database.