data sharing principles and the development of african

28
Data Sharing Principles and the Development of African Data Strategies Dr Simon Hodson Executive Director, CODATA www.codata.org GEOSS Data Sharing Africa ISRSE Conference, Earth Observation for Development and Adaptation to a Changing World Tshwane, South Africa 10 March 2017

Upload: others

Post on 27-Nov-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Data Sharing Principles and the Development of African Data Strategies

Dr Simon Hodson

Executive Director, CODATA

www.codata.org

GEOSS Data Sharing AfricaISRSE Conference, Earth Observation for

Development and Adaptation to a Changing WorldTshwane, South Africa

10 March 2017

CODATA Prospectus:https://doi.org/10.5281/zenodo.165830

Principles, Policies and Practice

Capacity Building

Frontiers of Data Science

Data Science Journal

CODATA 2017, Saint Petersburg 8-13 Oct 2017

http://codata2017.gcras.ru/

Why Open Science / FAIR Data?

• Good scientific practice depends on communicating the evidence.

• Open research data are essential for reproducibility, self-correction.

• Academic publishing has not kept up with age of digital data.

• Danger of an replication / evidence / credibility gap.

• Boulton: to fail to communicate the data that supports scientific assertions is malpractice

• Open data practices have transformed certain areas of research.

• Genomics and related biomedical sciences; crystallography; astronomy; areas of earth systems science; various disciplines using remote sensing data…

• Research data often have considerable potential for reuse, reinterpretation, use in different studies.

• Open data foster innovation and accelerate scientific discovery through reuse of data within and outside the academic system.

• Research data produced by publicly funded research are a public asset.

80% of ecology data irretrievable after 20 years (516 studies)

Vines TH et al. (2013) Current Biology DOI: 10.1016/j.cub.2013.11.014

The Value of Open Data Sharing

Report by CODATA for GEO, the Group on Earth Observation.

Provides a concise, accessible, high level synthesis of key arguments and evidence of the benefits and value of open data sharing.

Particular, but not exclusive, reference to Earth Observation data.

Benefits in the areas of:

Economic Benefits

Social Welfare Benefits

Research and Innovation Opportunities

Education

Governance

Available at http://dx.doi.org/10.5281/zenodo.33830

ANDS: http://www.ands.org.au/__data/assets/pdf_file/0005/741740/Data-impact-eBook.pdf

ODE: http://www.alliancepermanentaccess.org/wp-content/uploads/sites/7/downloads/2011/10/7836_ODE_brochure_final.pdf

Emerging Policy Consensus

• FAIR Data: Findable, Accessible, Interoperable, Reusable.

• FAIR Data now at the heart of H2020 policy, European Open Science Cloud etc.

• Under the revised version of the 2017 work programme, the Open Research Data pilot has been extended to cover all the thematic areas of Horizon 2020.

• See Guidance at http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf and http://ec.europa.eu/research/press/2016/pdf/opendata-infographic_072016.pdf

• Horizon 2020 Expert Group on FAIR research data: the Open Science agenda contains the ambition to make FAIR data sharing the default for scientific research by 2020; Expert Group to facilitate this goal.

• As Open as Possible as Closed as Necessary

• Need to state reasons for closing data.

• There are legitimate reasons for keeping data closed.

• Data protection requires good data stewardship.

Emerging Policy Consensus: FAIR Data

• FAIR Data

• Findable: have sufficiently rich metadata and a unique and persistent identifier.

• Accessible: retrievable by humans and machines through a standard protocol; open and free; authentication and authorization where necessary.

• Interoperable: metadata use a ‘formal, accessible, shared, and broadly applicable language for knowledge representation’.

• Reusable: metadata provide rich and accurate information; clear usage license; detailed provenance.

• FAIR Guiding Principles for scientific data management and stewardship, http://dx.doi.org/10.1038/sdata.2016.18

• Guiding Principles for FAIR Data: https://www.force11.org/node/6062

• European Commission Expert Group, chaired by Simon Hodson, CODATA producing implementation guidelines for FAIR Data for EC Funded Programmes

FAIR Guiding Principles (1)

• To be Findable:

• F1. (meta)data are assigned a globally unique and persistent identifier

• F2. data are described with rich metadata (defined by R1 below)

• F3. metadata clearly and explicitly include the identifier of the data it describes

• F4. (meta)data are registered or indexed in a searchable resource

• To be Accessible:

• A1. (meta)data are retrievable by their identifier using a standardized communications protocol

• A1.1 the protocol is open, free, and universally implementable

• A1.2 the protocol allows for an authentication and authorization procedure, where necessary

• A2. metadata are accessible, even when the data are no longer available

(Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, http://dx.doi.org/10.1038/sdata.2016.18)

FAIR Guiding Principles (2)

• To be Interoperable:

• I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

• I2. (meta)data use vocabularies that follow FAIR principles

• I3. (meta)data include qualified references to other (meta)data

• To be Reusable:

• R1. meta(data) are richly described with a plurality of accurate and relevant attributes

• R1.1. (meta)data are released with a clear and accessible data usage license

• R1.2. (meta)data are associated with detailed provenance

• R1.3. (meta)data meet domain-relevant community standards

(Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, http://dx.doi.org/10.1038/sdata.2016.18)

Open Data in a Big Data World

• Science International Accord on Open Data in a Big Data World: http://www.science-international.org/

• Supported by four major international science organisations.

• Lays out a framework of principles, responsibilities and enabling practices for how the vision of Open Data in a Big Data World can be achieved.

The Open Data Iceberg

The Technical Challenge

The Ecosystem Challenge

The Funding Challenge

The Support Challenge

The Skills Challenge

The Incentives Challenge

The Mindset Challenge

Processes &

Organisation

People

Geoffrey Boulton (CODATA) - developed from an idea by Deetjen, U., E. T. Meyer and R. Schroeder (2015). OECD Digital Economy Papers, No. 246, OECD Publishing.

A National Infrastructure

Technology

Opportunities for Research Institutions

Open and FAIR Research Data Presents Major Opportunities for Universities

Research intensive universities will be data intensive universities.

Supporting researchers’ use of data is a key strategic mission and enabler: world class research environment includes support for data stewardship.

A university’s reputation is increasingly built on all research outputs and wider societal and economic impact: data is core to this.

Development of significant data collections of research intensive universities. Leading departments / research groups will be characterised by excellence in data, by Open FAIR data collections.

The way in which the contribution to research of both the individual researcher and the institution will increasingly be measured on the basis of data outputs as well as research articles.

Policies less and less ambiguous – data stewardship, RDM is necessary for grant funding success.

Avoid reputational damage through data loss.

Challenges for Research Institutions

Policy development: unpicking Open and FAIR data

Supporting data through the lifecycle.

Culture and incentives: what’s in it for us?

Skills gaps: training and support.

Technical systems and infrastructure.

Developing culture of conscious data stewardship: what to keep and what to discard.

Supporting the long term stewardship of research data: finding niche in data ecosystem, clarifying division of responsibility between institutional national and international repositories.

Sustainability and finance.

Sustainability and finance.

Sustainability and finance.

African Open Science Platform

• Approach outlined in the Accord on Open Data in a Big Data World: Open Science is fundamental to 21st Century discovery and needs to be supported.

• Development of coordinating ‘Platform’ approach to support development of Open Science in Africa.

• Comprises a ‘policy platform’ to establish national data fora and collaboration/coordination mechanisms institutional, national and regional level.

• Pilot and roadmap for technical platform to assist data stewardship and sharing.

• Pilot project a direct result of the Accord. Funding from DST / NRF for African-wide initiative.

• Three year pilot starting in Dec 2016 with scoping workshop.

African Open Science Platform

• ASSAf playing the key implementation role; directed by CODATA

• High level advisory council and technical advisory board.

• Engagement workshop with RDA and the AAU.

• Meetings of National Data Fora in Botswana (June) and Madagascar (September).

Establish African Open Data Forum / Platform

Funded Research Data Infrastructure Initiatives

Funded, co-designed transdisciplinary researchprojects

Co-design African Open Data Policies

Develop Incentives Frameworks

Develop Research Data Science Training

African Research Data Infrastructure Roadmap

Activities requirelow funding for coordination, secondment,

contributions in kind and evaluation.

Activities requirehigher investmentfor coordination,

co-designimplemenatationand evaluation.

African Open Science PlatformPilot Project Workpackages

African Open Science PlatformBotswana

Have already very strong engagement with Botswana. Provides one example of how the pilot may develop.

Meetings arranged by Joint Minds Consult with key Botswana research institutes.

Workshops and discussions on research priorities, the data revolution and the benefits of open science.

Expertise in data, development of open data platforms can benefit strategically important areas of research: vision of centres of excellence with strong emphasis on open data and open science.

With input from colleagues, Joint Minds have developed a White Paper to establish a National Data Forum and a process for Data Strategy and Data Policy.

Paper has been presented to the Ministry of Tertiary Education, Research, Science and Technology.

National Data Forum in June.

National and Institutional Strategies

National Open Research Data Strategy.

Open data policies and guidance at national and institutional level.

Clarify the boundaries of open (particularly privacy, IPR).

Collaborative infrastructures for certain research disciplines.

Development of institutional infrastructure for research collaboration and data stewardship/RDM.

Mechanisms (infrastructure and policy) to ensure concurrent publication of data as research output.

Promotion of data skills (researchers and data stewards).

Data ‘publication’ included in assessment of research contribution.

Challenges and developments JKUAT / Kenya

1. Lack of national legal/policy framework for open data: e.g. FOI Act … still Bill

• JKUAT enacted JORD Policy…as part of implementation of CODATA strategy

• Undertaking research in various domains

• Utilizing Collaborations eg CODATA, CAS etc

2. Big Data / Open Data still a new concept … data reuse and sharing minimal

• Developed courses for PhD IT- Business Analytics and reviewed undergraduate courses

• Supply (motivation) and demand balancing

3. Still building and Integrating infrastructure

• Building iCEOD Open Data Platform to support data reuse, preservation, innovation

4. IP issues … not enough legal and institutional instruments to encourage more open approaches

• Advocate CODATA-RDA Principles and Guidelines on Legal Interoperability

5. Cultural practices … data private by default

• Explore rewards for making data available, including data citation.

6. Capacity building ….

• Organizing short courses & new curriculum for Data Science

Resources: Current Best Practice for Research Data Management Policies

Expert report commissioned by De-IC and DEFF

Provides comprehensive summary of best practice in funder data policies.

Identifies key elements to be addressed:

1. Summary of policy drivers

2. Intelligent openness

3. Limits of openness

4. Definition of research data

5. Define data in scope

6. Criteria for selection

7. Summary of responsibilities

8. Infrastructure and costs

9. DMP requirements

10. Enabling discovery and reuse

11. Recognition and reward

12. Reporting requirements, compliance monitoring

Zenodo: http://dx.doi.org/10.5281/zenodo.27872

Plan

Create

Use

AppraisePublish

Discover

Reuse

Store

Annotate

Select

DiscardDescribe

Identify Hand Over?

Access

Supporting the Research Data Lifecycle

Data Management Planning

Managing Active

Data

Processes for selection and

retentionDeposit / Handover

Data Repositories/

Catalogues

Components of RDM support services

RDM Policy and RoadmapBusiness Plan and

Sustainability

Guidance, Training and Support

Research Data Registry /

Infrastructure

22

Institutional Research Data Management Policies:http://www.dcc.ac.uk/resources/policy-and-legal/institutional-data-policies/uk-institutional-data-policies

CODATA-RDA School of Research Data Science

• Contemporary research – particularly when addressing the most significant, transdisciplinary research challenges –increasingly depends on a range of skills relating to data. These skills include the principles and practice of Open Science and research data management and curation, the development of a range of data platforms and infrastructures, the techniques of large scale analysis, statistics, visualisation and modellingtechniques, software development and data annotation. The ensemble of these skills, relating to data in research, can usefully be called ‘Research Data Science’.

Foundational Curriculum

Seven components: open science, data management and curation; software carpentry; data carpentry; data infrastructures; statistics and machine learning; visualisation.

Builds on much existing courses to create something more than the sum of its parts:

Open Science – reflection on ethos and requirements of sharing/openness

Open Research Data – Basics of data management, DMPs, RDM life-cycle, data publishing, metadata and annotation

Author Carpentry – Improving research efficiency with command line and OS tools.

Software Carpentry – Introduction the Unix shell and Git (sharing software and data)

Data Carpentry – Introduction to programming in R, and to SQL databases

Visualisation – Tools, Critical Analysis of Visualisation

Analysis – Statistics and Machine Learning (clustering, supervised and unsupervised learning)

Computational Infrastructures – Introduction to cloud computing, launching a Virtual Machine on an IaaS cloud

Building international network of short courses http://bit.ly/first_data_school_trieste

Programme and materials: http://bit.ly/School_of_Research_Data_Science-Programme ; http://bit.ly/first_data_school_materials

CODATA 2017http://codata2017.gcras.ru/

http://conference.codata.org/2017/

CODATA 2017: Global Challenges and Data Driven Science

Major conference themes:1. Achievements in Data Driven Science, in all

research disciplines

2. Earth Observations Data and the Earth’s System

3. Data and Disaster Risk Research

4. Data Driven and Sustainable Cities

5. Big Data in Scientific and Commercial Sectors

6. Data Analysis, Event Recognition and Applications

7. National and International Data Services

8. Research Data Services in Universities

9. Coordination of Data Standards and Interoperability

10. FAIR Data and the Limits of Open Data

11. Metrology, Reference Data and Monitoring Data

Call for Sessions and Papers• Deadline is 30 June 2017

• Participants may submit session proposals, with proposed papers, or papers against conference themes.

• Encouraged to submit papers for a special collection in the Data Science Journal

Simon HodsonExecutive Director CODATA

www.codata.orghttp://lists.codata.org/mailman/listinfo/codata-international_lists.codata.org

Email: [email protected]: @simonhodson99

Tel (Office): +33 1 45 25 04 96 | Tel (Cell): +33 6 86 30 42 59

CODATA (ICSU Committee on Data for Science and Technology), 5 rue AugusteVacquerie, 75016 Paris, FRANCE

Thank you for your attention!

Boundaries of Open

For data created with public funds or where there is a strong demonstrable public interest, Open should be the default.

As Open as Possible as Closed as Necessary.

Proportionate exceptions for:

Legitimate commercial interests (sectoralvariation)

Privacy (‘safe data’ vs Open data – the anonymisation problem)

Public interest (e.g. endangered species, archaeological sites)

Safety, security and dual use (impacts contentious)

All these boundaries are fuzzy and need to be understood better!

There is a need to evolve policies, practices and ethics around closed, shared, and open data.