open data infrastructure · linking people to products can help evalutate scientists both people...

20
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° 654237 Open Data Infrastructure Bruce Becker, Coordinator, Africa-Arabia Regional Operaons Centre CSIR Meraka Instute, South Africa Sci-GaIA Open Science Workshop, Maputo, 2015

Upload: others

Post on 09-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

This project has received funding from the European Union’s Horizon 2020

research and innovation programme under grant agreement n° 654237

Open Data Infrastructure

Bruce Becker, Coordinator, Africa-Arabia Regional Operations CentreCSIR Meraka Institute, South AfricaSci-GaIA Open Science Workshop, Maputo, 2015

Page 2: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

Outline

● The fate of data

● Open Data

● Open scientific data

● Services needed improve the fate of data

● How does Sci-GaIA fit in ?

● The role of the NREN

Page 3: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

The Fate of Data

● Pre-digital scientists often published their data together with the article

● With the growth in digital science, this could no longer be sustained – large datasets could not be directly included in scholarly articles

● Progress is made via publishing the results of the analysis and interpretation of data, rather than the data itself.

● Data itself is often seen as baggage and discarded as soon as the article is published

● The article may be preserved for a long time, but the data which helped write it – on which it is based – is lost.

Page 4: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

Open Data – what ?

● When we refer to data as « open » what do we mean ?

● « Open Data » refers to various aspects of the data, not just whether it's available online

● The Open Knowledge Foundation defines Open Works as those which :

● Have an open license

● Are accessible

● Are produced in an open format

http://opendefinition.org/

Page 5: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

Open Data in Africa – what ?

● Open Data in Africa

● OpenAFRICA - http://africaopendata.org/

● Supported by the World Bank, Google, Code4Africa, Open Knowledge Foundation, International Centre for Journalists.● Powered by CKAN and OKFN

● Open Data for Africa - http://opendataforafrica.org/

● Supported by African Development Bank● Powered by Socrata

● Open Data needs Open Source

● Code for Africa http://www.codeforafrica.org/

● Funded by World Bank and the Open Knowledge Foundation

Page 6: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

Open Data Initiatives in Africa

● But these are civic data initiatives - What about scientific data ?

● Civic data and scientific data are open in different ways

● Scientific data often follows the FAIR principles : https://www.force11.org/group/fairgroup/fairprinciples

● F indable

● A ccessible

● I nteroperable

● R e-Usable

Page 7: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

Open Scientific Data in Africa ? Why ?

● Discoverability

● Citability

● Reproducibility

● Re-usability

● Persistence

Value

Data is expensive – make it count

Page 8: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

What services are needed ?

● Findability, Discoverability – need a federation and harvesting service

● Citability – need a unique, persistent identifier framework

● Reproducibility – need an execution environment (grid/cloud)

● Data Persistence – need a way to ensure that the data will be available for the forseeable future/forever (data repositories)

Infrastructure = Services = Standards

Page 9: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

Where is all the data ?!

http://www.sci-gaia.eu/knowledge-base/data-repositories-map/

Page 10: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

DATA

Page 11: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

How does Sci-GaIA fit into this ?

...Sci-GaIA will make African ... research and researchers more “visible”worldwide and will contribute to the ... discoverability, reproducibility and extensibility of science products

Moreover, worldwide standards ... and widely accepted guidelines ... on Open Access and Data Preservation will be promoted in order to achieve a better interoperation and interoperability of e-Infrastructures, including especially Open Data Infrastructures.

● Goals :

● 1 Member of a Registration Agency to issue persistent unique identifiers (either DOIs or PIDs) to research products (papers, data, software, etc.);

● At least 30 new Open Access Document/Data Repositories in Africa;

● At least 100 Open Access Document/Data Repositories compliant with the OpenAIRE Guidelines and include them in the OpenDOAR and OpenAIRE lists of official providers.

Page 12: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

Sci-GaIA supporting open data infrastructure in Africa

● Sci-GaIA will conduct various activities to promote the Open Data repositories (or data FAIRports) in the region.

Task 3.1 - Support the creation of federated and interoperable Open Access Document and Data Repositories in Africa, compliant with EU and other international guidelines

Page 13: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

Sci-GaIA Activities in support of data infrastructure

● Identification of existing Open Access Repositories in the region and inclusion in web-based directories

● Promotion of the Open Access Initiative standards and of the OpenAIRE guidelines to make contents stored on the African repositories more discoverable, searchable and visible

● Federation, through the use of Linked Data standards and Semantic Web technologies, of African Open Access Document and Data Repositories, to make them accessible and searchable

● Feasibility study for the creation of a pilot service to issue Persistent Identifiers (PIDs)

● Provision of a ready-to-install-and-configure appliance to quickly build and populate Open Access Repositories compliant with OAI, OpenDOAR and OpeAIRE standards/guidelines.

Page 14: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

Data Infrastructure and Data Repositories

● The high-lying infrastructure services are important, but the actual data

● Data, once generated has to be curated if it is to be open.

● Two kinds of « Open Access »

● Gold Open Access – third-party repository

● Green Open Access – self-archiving repository

● Almost no Gold OA archives in Africa, very few Green OA

● So… where does data go ?

Page 15: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

Cultivating Standards-Compliant Data Repositories - Improving the fate of data

Data dumps do more harm than good

By organising and exploiting data, it gains value

Page 16: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

Cultivating Standards-Compliant Data Repositories

● Sci-GaIA has deployed a standards-compliant OADR - http://oar.sci-gaia.eu/ which can be used for dissemination and reference purposes, as well as integration work

● This repository is re-usable

● Virtual Appliance – off the shelf repository http://dx.doi.org/11623/sci-gaia:1439991515.53

● Continuous Delivery – Deploy your repo with Ansible and Docker https://github.com/brucellino/invenio-docker-role

● Identify, monitor, encourage and support migration of existing repositories to become standards-compliant

● Starting early 2016

Page 17: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

Persistence, Uniqueness and Value – linking infrastructures and data to people.

● ORCID : Data is important, but people are also important.

● Linking people to products can help evalutate scientists

● Both people and products need to have persistent, unique identifiers

● Http://orcid.org

● DataCite : Data is important, but not by itself

● Metadata and federation services

● Citation metrics

● Http://datacite.org

● THOR : Interoperability between various infrastructures

● Http://project-thor.eu

Page 18: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

The role of the NREN

● Federate.

● Promote Open Data and FAIR data standards

● A statement to that effect on your website would go a long way !

● Act as a neutral third party to address issues of and standards adoption and licensing

● Support the core services necessary for data infrastructure

● Operate certain key services (metadata harvesters, persistent identifier mints)

Page 19: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

The rôle of the NREN

● Identify the key players – research groups and institutional repositories

● Ensure that they have reasons to collaborate.

● Support each-other

● Not every NREN needs to operate every service

● Take advantage of the opportunities that the scale of the Alliances offer

Page 20: Open Data Infrastructure · Linking people to products can help evalutate scientists Both people and products need to have persistent, unique identifiers Http://orcid.org DataCite

Thank you!Bruce Becker

[email protected]@brusisceddu

@thesagrid

Africa-Arabia Regional Operations Centrehttp://www.africa-grid.org

github.com/AAROC

www.sci-gaia.eu