open access week - oxford, 20-24 oct 2014

57
Consultant, Honorary Academic Editor Associate Director, Principal Investigator Open access and open data at Nature Publishing Group: better data = better science Susanna-Assunta Sansone, PhD @biosharing @isatools @scientificdata Open Access Week at Oxford, 20-24 October, 2014 http://www.slideshare.net/SusannaSansone

Upload: susanna-assunta-sansone

Post on 24-Jun-2015

478 views

Category:

Data & Analytics


1 download

DESCRIPTION

Open access and open data at Nature Publishing Group: better data = better science

TRANSCRIPT

Page 1: Open Access Week - Oxford, 20-24 Oct 2014

Consultant, Honorary Academic Editor

Associate Director, Principal Investigator

!

Open access and open data at !

Nature Publishing Group: !better data = better science!

!

Susanna-Assunta Sansone, PhD!!!

@biosharing!@isatools!

@scientificdata!!

Open Access Week at Oxford, 20-24 October, 2014

http://www.slideshare.net/SusannaSansone

Page 2: Open Access Week - Oxford, 20-24 Oct 2014

https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/

Credit to:

Page 3: Open Access Week - Oxford, 20-24 Oct 2014

A community mobilization

image by Greg Emmerich

http://discovery.urlibraries.org/

http://www.theguardian.com/higher-education-network/blog/2014/jun/26

https://okfn.org

Page 4: Open Access Week - Oxford, 20-24 Oct 2014

Open access is not enough on its own

http://www.theguardian.com/higher-education-network/blog/2014/jun/26

If your research has been funded by the taxpayer, there's a good chance you'll be encouraged to publish your results on an open access basis….. This final article makes publicly available the hypotheses, interpretations and conclusions of your research. But what about the data that led you to those results and conclusions?

Page 5: Open Access Week - Oxford, 20-24 Oct 2014

Also open data is not always enough

http://www.theguardian.com/higher-education-network/blog/2014/jun/26

So data that is in theory open and free to access!•  may still be hard to get hold of!•  it may not have been stored or cited

in the appropriate manner!•  it may not be interoperable with

related data because it is not formatted appropriately; or!

•  it may not be reusable because it may not contain enough information for others to understand it!

Page 6: Open Access Week - Oxford, 20-24 Oct 2014

Credit to: Iain Hrynaszkiewicz

Benefits and barriers to data sharing

Benefits! Barriers!•  Reduction of error and fraud!•  Increased return on investment in

research!•  Compliance with funder and

journal mandates!•  Reduce duplication and bias!•  Reproduction/validation of

research!•  Testing additional hypotheses!•  Use for teaching!•  Integration with other data sets!•  Increased citations !

•  Concerns over inappropriate reuse!•  Limited time/resources!•  Costs associated with data sharing!•  Human privacy concerns!•  Unclear ownership of data/

authority to release data!•  Lack of academic incentives/

recognition!•  Lack of repositories or lack of

awareness of repositories!•  Protecting commercially sensitive

information !

Page 7: Open Access Week - Oxford, 20-24 Oct 2014

Movement for FAIR data in life and medical sciences

http://bd2k.nih.gov/workshops.html#ADDS

Page 8: Open Access Week - Oxford, 20-24 Oct 2014

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

8

•  make annotation explicit and discoverable

•  structure the descriptions for consistency

•  ensure/regulate access

•  deposit and publish

•  etc….

§  To make any dataset ‘FAIR’, one must have standards, tools and best practices to: •  report sufficient details •  capture all salient features of

the experimental workflow

Page 9: Open Access Week - Oxford, 20-24 Oct 2014

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

9

…breadth and depth !of the experimental context!

…is pivotal!

Page 10: Open Access Week - Oxford, 20-24 Oct 2014

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

10

sample characteristic(s)!experimental design!

experimental variable(s)!technology(s)!

measurement(s)!protocols(s)!data file(s)!

......!

Page 11: Open Access Week - Oxford, 20-24 Oct 2014

Notes in Lab Books(information for humans)

Spreadsheets and Tables( the compromise)

Facts as RDF statements(information for machines)

Notes and narrative! Spreadsheets and tables! Linked data and nanopublications!

Notes in Lab Books(information for humans)

Spreadsheets and Tables( the compromise)

Facts as RDF statements(information for machines)

Notes in Lab Books(information for humans)

Spreadsheets and Tables( the compromise)

Facts as RDF statements(information for machines)

Increase the level of annotation at the source, tracking provenance and using community standards

Doing my fair share of work Working with and for:

Page 12: Open Access Week - Oxford, 20-24 Oct 2014

12

Because, in all fairness, no much data is FAIR!

Page 13: Open Access Week - Oxford, 20-24 Oct 2014

Because, in all fairness, no much data is FAIR!

Page 14: Open Access Week - Oxford, 20-24 Oct 2014

Role of publishers as “agents of change”

•  Data has to become an integral part of the scholarly communications!

•  Responsibilities lie across several stakeholder groups: researchers, data centers, librarians, funding agencies and publishers!

•  Publishers occupy a leverage point in this process!

Page 15: Open Access Week - Oxford, 20-24 Oct 2014

•  Policies on access (to data, code, reagents etc.)!o  Supporting funder & community needs!

•  Format and amount of content!o  Methodological details, supplementary info, data integration and

links to repositories!

•  Licensing for reuse!•  Incentives to share!o  Data citations!o  Data journals and articles!

•  Quality assurance through peer review!

Publishers and data/reproducibility

Credit to: Iain Hrynaszkiewicz

Page 16: Open Access Week - Oxford, 20-24 Oct 2014

Some important events!!

•  1996: Bermuda Principles!o  prepublication of DNA sequence data!

•  1998: Structural data!o  accession codes required by Nature & Science!

•  2002: MIAME community standards!•  microarray data deposition public repositories required!

•  2007: Methods sections!o  Limitations for the online version removed!

•  2009: Ioannidis et al. Nat Gen 41, 2, 149 !

Data/reproducibility at NPG

Credit to: Veronique Kiermer

Page 17: Open Access Week - Oxford, 20-24 Oct 2014

Credit to: Iain Hrynaszkiewicz

2013

Page 18: Open Access Week - Oxford, 20-24 Oct 2014

Wang et al, Nature, 2013 doi:10.1038/nature12730

Data/reproducibility at NPG Some important recent events 2013-2014

•  Figure source data o  putting data behind figures/graphs o  rolled out at Nature and progressively across all other Nature branded

titles

Page 19: Open Access Week - Oxford, 20-24 Oct 2014

Data/reproducibility at NPG Some important recent events 2013-2014

•  Figure source data o  putting data behind figures/graphs o  rolled out at Nature and progressively across all other Nature branded

titles

•  Extended data o  expandable text and extra figures; rolled out at Nature

Page 20: Open Access Week - Oxford, 20-24 Oct 2014

Data/reproducibility at NPG Some important recent events 2013-2014

•  Figure source data o  putting data behind figures/graphs o  rolled out at Nature and progressively across all other Nature branded

titles

•  Extended data o  expandable text and extra figures; rolled out at Nature

•  Data citation o  tackling both styling and format; monitoring community developments,

such the Data Citation Synthesis Group o  to be rolled out across all Nature branded titles and Scientific Data

•  Code reproducibility o  peer review, availability and reuse

•  NPG’s Linked Data release – CC0 •  A new data publication platform:

Page 21: Open Access Week - Oxford, 20-24 Oct 2014

Human Genome 2001 62 Pages, 150 Authors,

49 Figure, 27 tables

Encode Project 2012 30 papers, 3 Journals

Nature Publishing Group: the changing landscape

Page 22: Open Access Week - Oxford, 20-24 Oct 2014

•  Credit!•  Unpublished data!

•  Peer review focus!•  Value of data vs. analysis!

•  Discoverability!

•  Reusability!•  Narrative/context!

•  “Intelligently open data”!

The role of data journals/articles

Credit to: Iain Hrynaszkiewicz

Page 23: Open Access Week - Oxford, 20-24 Oct 2014

Data journals everywhere?

Credit to: Iain Hrynaszkiewicz

Page 24: Open Access Week - Oxford, 20-24 Oct 2014

market research (2011)

•  Scope of survey!o  How much data researchers produce, in what format and

what they do with it!o  Perceived availability of public repositories!o  Perceptions of the Scientific Data concept!o  Level/nature of data journal peer review!

•  Respondent characteristics!o  387 respondents (329 active researchers)!o  Physics (24%), Earth and environmental science (21%),

Biology (20%) Chemistry (19%) Others (16%)!

Credit to: Iain Hrynaszkiewicz

Page 25: Open Access Week - Oxford, 20-24 Oct 2014

market research (2011)

•  Key survey data o  60% share their data with their colleagues o  50% look at other researchers’ datasets at least once a month o  very few respondents produce more than 1TB of data per

year; the majority produce less than 1G o  45% unaware of a repository for some of their data o  90% reacted positively to the concept of Scientific Data o  80% believed Scientific Data would increase data deposition o  what do researchers want from a data publication?

96% - increased visibility and discovery 95% - increased usability of their research data 93% - credit mechanism for deposit of data 80% - peer review of content/datasets Credit to:

Iain Hrynaszkiewicz

Page 26: Open Access Week - Oxford, 20-24 Oct 2014

•  Get Credit for Sharing Your Data •  Publications will be listed in the major indexes and will be citeable •  Focused on Data Reuse •  All the information others need to reuse the data; no interpretative

analysis or hypothesis testing

•  Open-access •  Authors select from three Creative Commons licences for the main •  Data Descriptor. Each publication supported by curated CC0

metadata

•  Peer-reviewed •  Rigorous peer-review managed by our Editorial Board of academic

researchers ensures data quality and standards

•  Promoting Community Data Repositories •  Data stored in community data repositories

Page 27: Open Access Week - Oxford, 20-24 Oct 2014

Supported by:!

Advisory Panel including senior researchers, funders, librarians and curators Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment Research Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office of Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta, UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute, USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ● Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ● Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ● Wellcome Trust, UK ● Wolfram Horstmann ● University of Oxford, UK ● Piero Carninci ● RIKEN Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of Bioinformatics, Switzerland ● Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. Scheuermann ● J. Craig Venter Institute, USA ● Caroline Shamu ● Harvard Medical School, USA

Susanna-Assunta Sansone Honorary Academic Editor (University of Oxford, UK)

Andrew L Hufton Managing Editor

Varsha Khodiyar Editorial Curator

Iain Hrynaszkiewicz Publisher

An open access, peer-reviewed publication for descriptions of scientifically valuable datasets!

Launched May 2014

Page 28: Open Access Week - Oxford, 20-24 Oct 2014

Data Descriptor

Synthesis

Analysis

Conclusions

Interpretation

What is the sample?

What did I do to generate the data?

Where is the data?

How was the data processed?

Who did what when?

Summary of Data Descriptor

Facts

Data Descriptor

Journal article

NARRATIVE

•  The data descriptor is only concerned with the facts behind the methodology of data generation/collection and processing!

•  A data descriptor complements a journal article!

Introducing a new content type: the Data Descriptor

Page 29: Open Access Week - Oxford, 20-24 Oct 2014

Article or !narrative component!

(PDF and HTML) !

Data Descriptor: narrative and structure!

!!!Experimental metadata or !

structured component!(in-house curated, machine-

readable formats)!

Page 30: Open Access Week - Oxford, 20-24 Oct 2014

Article or !narrative component!

(PDF and HTML) !

Data Descriptor: narrative and structure!

!!!Experimental metadata or !

structured component!(in-house curated, machine-

readable formats)!

Page 31: Open Access Week - Oxford, 20-24 Oct 2014

In traditional publications this information is not provided in a sufficiently detailed manner

However this information is essential for understanding, reusing, and reproducing datasets

Focus on data reuse!Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.!Does not contain tests of new scientific hypotheses!

Data Descriptor: narrative!

Sections:!•  Title!•  Abstract!•  Background & Summary!•  Methods!•  Technical Validation!•  Data Records!•  Usage Notes !•  Figures & Tables !•  References!•  Data Citations!!

Page 32: Open Access Week - Oxford, 20-24 Oct 2014

Data Descriptor: narrative!

Sections:!•  Title!•  Abstract!•  Background & Summary!•  Methods!•  Technical Validation!•  Data Records!•  Usage Notes !•  Figures & Tables !•  References!•  Data Citations!!

Focus on data reuse!Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.!Does not contain tests of new scientific hypotheses!

Page 33: Open Access Week - Oxford, 20-24 Oct 2014

Data Descriptor: narrative!

Sections:!•  Title!•  Abstract!•  Background & Summary!•  Methods!•  Technical Validation!•  Data Records!•  Usage Notes !•  Figures & Tables !•  References!•  Data Citations!!

Focus on data reuse!Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.!Does not contain tests of new scientific hypotheses!

Joint Declaration of Data Citation Principles by the Data Citation Synthesis Group

Page 34: Open Access Week - Oxford, 20-24 Oct 2014

Includes fields describing: •  each study, linking to relevant sections of the

Data Descriptor article •  authors’ details, including ORCID •  publications •  funding sources and funders’ name, via FundRef •  experimental factors •  study design •  assays •  protocols

analysis !method! script!

Data file or !record in a database!

Data Descriptor: structure - content !

Page 35: Open Access Week - Oxford, 20-24 Oct 2014

In-house editorial curator:!•  assists users to submit the structured

content via simple templates and an internal authoring tool!

•  performs value-added semantic annotation of the experimental metadata!

For advanced users/service providers willing to export ISA-Tab for direct submission, we will release a technical specification:!

analysis !method! script!

Data file or !record in a database!

Data Descriptor: structure - content !

Page 36: Open Access Week - Oxford, 20-24 Oct 2014

!!!!!!!!Scientific hypotheses:!Synthesis!Analysis!Conclusions!

Methods and technical analyses supporting the quality of the measurements:!What did I do to generate the data?!How was the data processed?!Where is the data?!Who did what when!

Relation with traditional articles - content!

Page 37: Open Access Week - Oxford, 20-24 Oct 2014

BEFORE: get your data to the community as soon as possible (see NPG pre-publication policy)

AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s)

AFTER: expand on your research articles, adding further information for reuse of the data

Relation with traditional articles - time!

Page 38: Open Access Week - Oxford, 20-24 Oct 2014

Citations of and links to data files - databases!

Page 39: Open Access Week - Oxford, 20-24 Oct 2014

Value added component integrated in a growing ecosystem!

We currently recognize over 60 public data repositories!!

Res

earc

h pa

pers

D

ata

reco

rds

Dat

a D

escr

ipto

rs

Page 40: Open Access Week - Oxford, 20-24 Oct 2014

A web-based, curated and searchable portal works to ensure the

standards and databases are registered, informative and discoverable and accessible, monitoring the development and evolution of standards,

their use in databases and the adoption of both in data policies.

Over 500 Over 600

Page 41: Open Access Week - Oxford, 20-24 Oct 2014

Over 500 Over 600

Including minimum information reporting requirements, or checklists to report the same core, essential information

Including controlled vocabularies, taxonomies, thesauri, ontologies etc. to use the same word and refer to the same ‘thing’

Including conceptual model, conceptual schema from which an exchange format is derived to allow data to flow from one system to another

Page 42: Open Access Week - Oxford, 20-24 Oct 2014

Mapping the landscape of community –developed standards, databases and data policies in the life sciences, broadly covering

biological, natural an biomedical sciences

Over 500 Over 600

Page 43: Open Access Week - Oxford, 20-24 Oct 2014

Researchers, developers and curators lack support and guidance on how to best navigate and select content standards, understand their maturity, or find databases that implement them;

Funders, journals and librarians do not have enough information to make informed decisions on which content standards or database to recommended in policies, or funded or implemented

Page 44: Open Access Week - Oxford, 20-24 Oct 2014

Operational Team

Advisory Board and Working Group - core members and adopters

Page 45: Open Access Week - Oxford, 20-24 Oct 2014

24

3

10 4

1

4

3

4

DNA and protein sequenceFunctional genomicsGenetic association and genome variationMetagenomicsMolecular interactionsOrganism- or disease-specificProteomicsTaxonomy and species diversityTraces and sequencing reads

“Omics” is emphasized among basic life-sciences repositories

•  We currently recognize over 60 public data repositories, and provide advice on the best place for authors to archive their data!

•  We have integrated systems with both:!!!

Helping authors find the right place for the data!

Page 46: Open Access Week - Oxford, 20-24 Oct 2014

Big  data  |  CSE  2014  46  

Repositories criteria!1.  Broad support and recognition within their scientific community !2.  Ensure long-term persistence and preservation of datasets!3.  Provide expert curation !

4.  Implement relevant, community-endorsed reporting requirements !Progressively monitor this via !

5.  Provide for confidential review of submitted datasets !

6.  Provide stable identifiers for submitted datasets !7.  Allow public access to data without unnecessary restrictions !

Page 47: Open Access Week - Oxford, 20-24 Oct 2014

Data: the primary datasets resides in public repositories. Partnering with FigShare and Dryad, which are both CC0!

Data Descriptor - structured component (ISA-Tab): as NPG has already done with its existing Linked Data Portal, the metadata about data descriptors in Scientific Data is CC0!Data Descriptor - narrative component: describing the methodology of data generation/collection and processing is licensed under either of the following, by author choice:

Open Access – APC supported!

OA Article processing charges: $1,000 USD / £650 GBP / €750 for each accepted article

Page 48: Open Access Week - Oxford, 20-24 Oct 2014

Evaluation is not be based on the perceived impact !or novelty of the findings or size of the data!

!

•  Experimental rigour and technical data quality!o  Methodologically sound!o  Technical validation experiments and statistical analyses!o  Depth, coverage, size, and/or completeness of data sufficient for the types

of applications!•  Completeness of the description!

o  Sufficient details to allow others to reproduce the results, reuse or integrate it with other data!

o  Compliance with relevant minimum information or reporting standards!•  Integrity of the data files and repository record!

o  Data files match the descriptions in the Data Descriptor!o  Deposited in the most appropriate available data repository!

Peer review process focused on quality and reuse!

Page 49: Open Access Week - Oxford, 20-24 Oct 2014

•  Neuroscience, ecology, epidemiology, environmental science, functional genomics, metabolomics, toxicology etc.!

•  New previously published individual datasets, curated aggregation and citizen science:!o  a fuller, more in-depth look at the data processing steps, supported by

additional data files and code from each step!o  additional tutorial-like information for scientists interested in reusing or

integrating the data with their own!•  Datasets in figshare, Dryad and domain specific databases!•  Code deposited in figshare and GitHub!•  First collection:!

49

Current content is diverse - bimonthly releases !

Page 50: Open Access Week - Oxford, 20-24 Oct 2014

Hanke: Neuroscience !

!!!!!!!!!

Code in GitHub

New Dataset Data in OpenfMRI Source code in GitHub

Big Data

Page 51: Open Access Week - Oxford, 20-24 Oct 2014

Stefano: Stem Cells!Associated Nature Article Data - figshare - NCBI GEO Integrated figshare data viewer

Page 52: Open Access Week - Oxford, 20-24 Oct 2014

Hao: Environmental!

New Dataset Data in figshare Code in figshare Integrated figshare data viewer Cited in Science

Page 53: Open Access Week - Oxford, 20-24 Oct 2014
Page 54: Open Access Week - Oxford, 20-24 Oct 2014

http://www.flickr.com/photos/12308429@N03/4957994485/

u  Make sure your research outputs make an impact! u  Open your research outputs, via the right channels to get cited and credited

u  Contribute to the reproducible research movement and to FAIR data

Page 55: Open Access Week - Oxford, 20-24 Oct 2014

u  Uniquely identify yourself via ORCID u  Share identified generic research outputs, e.g. FigShare

u  Share and deposit code, e.g. GitHub, Bitbucket

http://www.flickr.com/photos/idiolector/289490834/

Page 56: Open Access Week - Oxford, 20-24 Oct 2014

u  Learn about open standards in your area, via e.g. BioSharing u  Select tools that implement relevant standards, e.g. ISA

u  Publish not just in traditional journals, but think Scientific Data

http://www.flickr.com/photos/webhamster/2582189977/

Page 57: Open Access Week - Oxford, 20-24 Oct 2014

Acknowledgements!

Advisory Boards and Collaborators

Philippe Rocca-Serra, PhD

Alejandra Gonzalez-Beltran, PhD

Milo Thurston, PhD

Visit nature.com/scientificdata

Email [email protected]

Tweet @ScientificData

Honorary Academic Editor Susanna-Assunta Sansone, PhD Managing Editor Andrew L Hufton, PhD Editorial Curator Varsha Khodiyar Publisher Iain Hrynaszkiewicz

Eamonn Maguire, DPhil candidate

And we are hiring a software developer!