citing data in research articles: principles, implementation, challenges - and the benefits of...

43
Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways Jo McEntyre Europe PMC, EMBL-EBI www.ebi.ac.uk

Upload: fairdom

Post on 12-Jan-2017

758 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Jo McEntyreEurope PMC, EMBL-EBIwww.ebi.ac.uk

Page 2: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Life Science Data

Page 3: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Familiar Complexity!A

rticl

e ‘P

acka

ge’

Ext

erna

l Res

ourc

es

“Recognized” data repos: file|structured record,

Accession|DOI|API+ Accession

Institutional repos:file|structured record, URL|

DOI|API+Accession

Author database|‘website’: file|struct record, URL|DOI|API+Accession

Supp info tables/data: file, URL|DOI

Cross-reference

Dataset list

Ref to external resRef to external

res

Reference list

Fig Source data: file, URL|DOI

Fig (caption + graphic)

Cross-reference

Ref to external resource

Adapted from Thomas Lemberger, EMBO

Page 4: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Europe PMC literature database

Europe PMC• Abstracts: 30 million• Full-text articles: 3 million

• Article citation counts• Grants• ORCIDs • Semantic annotation• Data citations• Data integration

Europe PMC is a member of the PMC International Collaboration.

Funded by 28 European funders of life science research

Page 5: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

About EMBL-EBI

• Part of the European Molecular Biology Laboratory

• International, non-profit research institute

• Europe’s hub for biological data services and research

Page 6: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Making data discoverable Labs around

the world deposit data

and we…

Archive it

Classify itShare it with other data providers

Analyse, add value and

integrate it

…provide tools to

help researchers

use itA collaborative

enterprise

Page 7: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Journal Data Publishing

Page 8: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Data Citation in Europe PMC full text

Literature*

Added-Value

Submitted

*OMIM, Clinical trials, GO

Submission statements vs reuse?

260K

Page 9: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Data Citation Principals Engender Two Big Ideas

"sound, reproducible scholarship rests upon a foundation of robust, accessible

data"

"data should be considered legitimate, citable products of research"

These slides are adapted from: http://www.slideshare.net/joanstarr/data-citation-a-joint-declaration-of-principles

Page 10: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

1 Importance2Credit and Attribution3Evidence4Unique Identification5Access6Persistence7Specificity and Verifiability8 Interoperability and flexibilityFull Principles: https://www.force11.org/datacitation

Joint Declaration on Data Citation Principles

Page 11: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Joint Declaration

Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications.

1. Importance

Page 12: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data.

2. Credit and Attribution

Joint Declaration

Page 13: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited.

3. Evidence

Joint Declaration

Page 14: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community.

4. Unique identification

etc.. !!!

Joint Declaration

Page 15: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data.

5. AccessJoint Declaration

Page 16: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Unique identifiers, and metadata describing the data, and its disposition, should persist --  even beyond the lifespan of the data they describe.

6. Persistence

Joint Declaration

Page 17: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Data citations should facilitate identification of, access to, and verification of the specific data that support a claim.  Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited.

7. Specificity and VerifiabilityJoint Declaration

Page 18: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities.

8. Interoperability and flexibility

Joint Declaration

Page 19: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Many organizational endorsements

Page 20: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

An implementation example

Principle 2:Credit andAttribution

Principle 4, 5, 6:Unique IDAccess Persistence

Principle 7:Specificity andVerifiability

Principle 8: Interoperability and flexibility

Creators, Year, Dataset Title, DOI, Data Repository, version

(Resolves to landing page with access to metadata, docs, and data)

Slide fromMercè Crosas, Ph.D.

Harvard University

Page 21: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

http://europepmc.org/articles/PMC3089613

Large dataset:

Page 22: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

http://europepmc.org/articles/PMC3535838

Page 23: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

http://europepmc.org/articles/PMC3766260

Page 24: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

http://europepmc.org/articles/PMC3704603

Page 25: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

http://europepmc.org/articles/PMC3710810

Fig. 2

Page 26: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

!! 2469 references !!

http://europepmc.org/articles/PMC2672098

Page 27: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Examples of Implementations of Data Citations in Reference Lists

Page 28: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

http://europepmc.org/articles/PMC3661987

<mixed-citation publication-type="other">

Occurrence in reference list:

Occurrence in text:

Tagged in reference list as:

Page 29: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

http://europepmc.org/articles/PMC3646594

<mixed-citation publication-type="thesis">

Occurrence in text:

Occurrence in reference list:

Tagged in reference list as:

Page 30: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

http://europepmc.org/articles/PMC3722494

<mixed-citation publication-type="webpage">

Also in this reference list: a non-DOI data citation

Occurrence in text:

Occurrence in reference list:

Tagged in reference list as:

Page 31: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

http://europepmc.org/articles/PMC3626513

<mixed-citation publication-type="journal">

Occurrence in text:

Occurrence in reference list:

Tagged in reference list as: Cite data generated in the course of the work

described?

Page 32: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

JATS support for data citation<mixed-citation publication-type='data'> <name><surname>Heinz</surname><given-names>D.W.</given-names></name>, <name><surname>Baase</surname><given-names>W.A.</given-names></name>, <etal>et. al.</etal> <data-title>How amino-acid insertions are allowed in an alpha-helix of T4 lysozyme</data-title>. <source>PDB Europe</source>, accession <pub-id pub-id-type='accession' assigning-authority='pdb' xlink:href='http://www.ebi.ac.uk/pdbe/entry/search/index?text:102L'>102l</pub-id>. <pub-id pub-id-type='doi' xlink:href='http://dx.doi.org/10.2210/pdb102l/pdb'>10.2210/pdb102l/pdb</pub-id></mixed-citation>

Page 33: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Minimal, maximal & extensible citation

Resource name

ID

Resource name

Resolution ‘template’ ID

Author list

Resource name

Resolution ‘template’ ID

Time

? Author list

Resource name

Resolution ‘template’ ID

Time

?

For example: new data vs pre-existing

data

For example:version

Thomas Lemberger, EMBO

Page 34: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Integrated Research

Reused from: seier+seier, Flickr

Reused from: Images Money, Flickr

Articles

Data

People

Institutions

Funders

Page 35: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community.

4. Unique identification

etc..

Joint Declaration

Page 36: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

1. Discoverability through accessibility• Deposit in a public/open database• Where possible, structured archive (e.g. PDB,

ENA) >> unstructured archive (e.g. Zenodo, Figshare)

• Uniquely identify it: PID, Accession number, DOI, ROI

• Give it context: metadata (and more)

• All of the above = citable = Discoverable

Page 37: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

2. Discoverability through structured datastructured data is one of the true

enablers of life science

- Discovery of homology between genes across species

- Predicting function based on protein folds

• Structured data can be cross-analysed, compared by algorithm, and encourages development of new products and tools

Page 38: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Structured data is good value for money

Annual cost of generating new protein structure data in labs around the world

Annual cost of maintaining itin a central database

Page 39: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Degrees of DataUnstructured

/semi-structured

Structured

Added Value

Metadata

A picture of a graphA spreadsheet of my results

A record in a DNA sequence database

A graphical display of a genome

A narrative with citations, pictures and attachments

Article

Page 40: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Metadata – critical to discoverabilityGeneric: title, submitters, date, file format, version.

citationbasic search

Wagner F.F., 23-APR-2002, TPA: Homo sapiens SMP1 gene, RHD gene and RHCE gene, INSDC, 14-NOV-2006 (Rel. 89, Last updated, Version 7). BN000065

Specific: organism, tissue, assay, page number …

deep search analysis computation

Page 41: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

BioStudyEBI

BioStudy database for unstructured data

Study

Publications

Ontologies

Data files

Other DBs

Metadata

Other DBs

Page 42: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

Elixir: An international distributed infrastructure for

• Data• Standards• Tools• Compute• Training• Industry

Page 43: Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways

THE END