collaboratively creating a network of ideas, data and software

17

Click here to load reader

Upload: anita-de-waard

Post on 11-Apr-2017

709 views

Category:

Science


4 download

TRANSCRIPT

Page 1: Collaboratively creating a network of ideas, data and software

Anita de WaardVP Research Data Collaborations

Elsevier, Jericho, VT

Some Thoughts on Collectively Creating Networks of Ideas, Data and Software

Page 2: Collaboratively creating a network of ideas, data and software

How do we unify the needs of the collective and the individual?

“Let us endeavor to build systems that allow a kid in Mali who wants to learn about proteomics to not be overwhelmed by the irrelevant and the untrue.”

- John Perry Barlow, iAnnotate 2014

Collectively create nimble and robust systems of knowledge management that interconnect ideas, data and software.

Page 3: Collaboratively creating a network of ideas, data and software

Automated caption/body text splitting & linking

Precision Recall F-score56.3 76.0 64.7

Statement type

Connecting Ideas: Big Mechanism

Page 4: Collaboratively creating a network of ideas, data and software

Connecting Ideas: Towards an Elsevier Knowledge Graph

14M articles from Science Direct

3.3M triples

475M triples

49M triples p x r matrix p x k, k x r latent factor matrices

~102 triples

920K concepts from EMMeT

• Ongoing proof-of-concept work by Paul Groth, Sujit Pal and Ron Daniel of Elsevier Labs

• Unsupervised, scalable and built with off-the-shelf technologies• Based on recent work at University College London and University of

Massachusetts Amherst

Riedel, Sebastian, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. "Relation extraction with matrix factorization and universal schemas." (2013).

Page 5: Collaboratively creating a network of ideas, data and software

Connecting Research Data:

https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data

Page 6: Collaboratively creating a network of ideas, data and software

Linking Papers to Data, Phase 1

• Supplementary data at PANGAEA• Bidirectional links between PANGAEA &

ScienceDirect• Data visualized next to the article

http://www.elsevier.com/databaselinking

Page 7: Collaboratively creating a network of ideas, data and software

Linking Papers to Data, Phase 2

• ICSU/WDS/RDA Publishing Data Service Working group

• Currently creating linked-data model for exposing DOI to DOI links outside publisher’s firewall

• Merged with National Data Service pilot with the same goal

• Collaboration between CrossRef, DataCite, Europe PubMed Central, ANDS, Thompson Reuters, Elsevier

• About to deliver: http://dliservice.research-infrastructures.eu/#/api

Objective: move from

a plethora of (mostly) bilateral arrangements between the different players…

.. a one-for-all cross-referencing service for articles and data

.. to ..

Page 8: Collaboratively creating a network of ideas, data and software

Researchers

Funding AgencyInstitution

Data Repository

Dataset

JournalPaper

Current Systems for Linking Data

1. Researcher creates datasets2. Researcher writes paper & publishes in journal3. (Sometimes,) dataset gets posted to repository4. Researcher reports (post-hoc) to Institution and Funder

22

1

3

4

4

Page 9: Collaboratively creating a network of ideas, data and software

Researchers

Funding AgencyInstitution

Data Repository

Dataset

JournalPaper

Issues with the Current Situation:

22

1

3

4

4iii. No link between data

and paper

iv. Funders/Institutions informed as an afterthought

i. Too much work for researchers

ii. Data posting not mandatory

Page 10: Collaboratively creating a network of ideas, data and software

Researchers

Funding Agency

Institution

Data Repository

Dataset

Journal

Paper

A Proposal To Address These Issues:

1. Researcher creates datasets and posts to repository(under embargo)

2. Funder is automatically notified of dataset publication3. Researcher writes paper & publishes in journal; embargo is lifted and data linked- NB this also allows release of non-used data for negative result and reproducibility4. Funder and institution get report on publication and embargo lifting

2

11

3

3

3

4

4

i. Less Work!

iv. Better Tracking!

iii. Better Linking!

ii. More Data

Stored!

Page 11: Collaboratively creating a network of ideas, data and software

One piece of the puzzle: Mendeley Data:https://data.mendeley.com/datasets/xz6gv65m6d/6

Linked to published papers – or not

Linked to Github – or not

Versioning and provenance

Page 13: Collaboratively creating a network of ideas, data and software

Federated Poor APIRich API

FTP & Index

Federated Poor APIRich API

FTP & Index

Federated Poor APIRich API

FTP & Index

Data

Enrichment Manual

Automated(User) Intent

Ranking Filtering (how

to mix federated &

indexed rich & poor)

SearchRendering

Search all dataFaceted query/Results

refinementStore & Use results

General UI

Domain UI

Filtering

Feeding user signals

back into Search rankingEvaluation

How Do We Evaluate Discoverability?

Birds of a Feather on Data Search: https://rd-alliance.org/bof-data-search.html

Page 14: Collaboratively creating a network of ideas, data and software

How do we pay for all this?RDA Cost Recovery WG

• Cochair with Ingrid Dillo (DANS), Simon Hodson (CODATA)• Goal: write a report regarding new potential funding models for

data repositories, allow them to start sharing this knowledge• Interviewed 24 repositories on their funding (current and future)• Now summarising stories and trends – will present at RDAP7

Terms of funding for main income stream (in %)

Page 15: Collaboratively creating a network of ideas, data and software

Software As A First-Class Knowledge Object:

Page 16: Collaboratively creating a network of ideas, data and software

Working with Networks of PartnersForce11:

– Multi-stakeholder, member-driven organisation– Unites scholars, tool developers, librarians, publishers, funding agencies etc. etc.– E.g. Software citation group, akin to Data Citation Group– Will present at Force16 in Portland, OR April 17-19, 2016

National Data Service:– Multi-stakeholder group, based around supercomputing centres– Aims to be a ‘connective tissue’ between data creation, curation, storage etc projects. – Inviting Pilots: two or more partners who have not worked together, interested in collaborating

on a data-centric project to solve a real-world needs: can include software sharing– E.g. Datasearch, Data Linking systems

RDA: – CoLead Data publishing, linking group– Colead Cost Recovery group– Active in Chemistry, Earth Science groups– Starting BoF Data Search

The NationalDATA SERVICE

Page 17: Collaboratively creating a network of ideas, data and software

Anita de WaardVP Research Data Collaborations, [email protected]@anitadewaard

In summary:Let’s collectively enable ‘an account of the present undertakings, studies and labours of the ingenious in many considerable parts of the world’,

by connecting ideas, data, and software through interconnected partnerships!