publishing source data - european research council bernd pulverer.pdf · july 28, 2014 re:...

Post on 15-Mar-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Publishing Source Data

Finding and Accessing the Data Behind Figures

Bernd Pulverer Head | Scientific Publications Chief Editor | The EMBO Journal

Three principles of research

• Sharing

research progresses as a collaborative

enterprise - often in small steps

• Self-governance

peer-review, committees, boards

• Ethics

discover

validate

revise

share

discover

validate

revise

share

Data published in papers represents a small fraction of all useful data generated in labs peer review; stable; citable; usually unstructured

Data deposited in databases represents a bigger fraction of the data usually curation; (stable); citable; structured

Data deposited in repositories may capture a large fraction of the data some curation; often unstructured >> validation, citability, stability

>20k journals 1.5 million papers/year

>5% annual growth

papers

databases

structures

sequences

functional genomics

proteomics

genotype phenotype

Data deposition in databases/repositories

computational models

“orphan” data Papers

Authors’ website

Institutional repositories

Open Science! …but

not all data is useful

flawed data

unreproducible data raw data unstructured data validation - curation

Data published in papers is validated by peer review ….but

negative, confirmatory and refuting data is largely ignored not all published data is high quality not all published data is reproducible

Amgen

Bayer Healthcare

Unreproducible data (?)

‘biologists fail to design experiments properly, and so submit underpowered studies that have an insufficient sample size and trumpet chance observations as biological effects…. Researchers …must agree on standards that will protect against avoidable errors. '

NATURE Error prone Nature 487, 406 (2012) doi:10.1038/487406a

‘Scientists and journals must work together to ensure that eye-catching artefacts are not trumpeted as genomic insights’ ‘hunting for biological surprises without due caution can easily yield a rich crop of biases and experimental artefacts, and lead to high-impact papers built on nothing more than systematic experimental 'noise'.’

NATURE Methods: Face up to false positives Daniel MacArthur Nature 487, 427–428 (2012) doi:10.1038/487427a

Nature Reviews Neuroscience Power failure: why small sample size Undermines the reliability of neuroscience Button, Ioannidis, Mokrysz, Nosek, Flint, Robinson & Munafò Nature Reviews Neuroscience 14, 365-376 (2013) doi:10.1038/nrn3475

‘the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful.’

Many Journals have explicit materials and data sharing guidelines …which are often not policed effectively

It is not always easy to share all data: human subjects pharma, biotech

Data (Code) sharing may expose flaws aids competitors

Many Journals have explicit materials and data sharing guidelines …which are often not policed effectively

The scientific paper will remain a key mode of exchanging filtered scientific information • browsing • ‘academic currency’

Research Papers rarely contain data that can be mined, extracted, reanalyzed and reused [not just a question of licensing] The data in papers are not discoverable [not just a question of OA]

Scientific Journals are good at validation and in mandating standards & sharing [but need to get better]

Journals are not good at access and publishing of structured data • human vs. machine readable • interlinking papers, databases and data repositories

Quality assurance Reproducibility Accessibility Discoverability

EXPANDED VIEW

EXPANDED VIEW

• Files that not rendered in HTML are linked as downloads • Links to datasets • Data citations

EXPANDED VIEW

Data Transparency

Published data should be accessible, reproducible and re-usable for research by others

‘The two vital components of

the scientific endeavor – the idea and the evidence – are too frequently separated’

Science as an open enterprise, Royal Society, 2012

A paper

Graphs Gels Schemata Micrographs

What is a figure?

A scientific result converted into a

collection of pixels

Add Source Data to Papers: Gels, Blots, micrographs & Graphs a lab book

Figure Source Data Raw Data

Raw vs. Source

• Archive

• Transparency

• Replicates

• Reanalysis

• Reuse

• Discoverability

• Discourage manipulation

o voluntary o ~40% papers

Source Data

Data or Schema?

Post Review Manuscript EMBO Molecular Medicine 2012

Data or Schema?

‘I’m a great believer in seeing all the data – this is a very important lever that we have for transparency’

Michael Farthing, founder COPE

Smart figures

• Access to source data

• Descriptive metadata

• Coherent experimental units: figure panels

• Enabling data-oriented searches

• Present data in the context of related data

>> Data oriented search

Paper 1 Paper 2

Navigating Research Findings via Figures

Data viewer

Figure = Data

Text = Narrative

(A) Primary early-passage MEFs were infected with MSCV-Myc-ERTAM-IRES-GFP (Myc-ER) or MSCV-IRES-GFP (GFP) virus. GFP+ cells were then left untreated (−) or were treated (+) with 2 μm 4-HT±Chx pretreatment (30 min) for 24 h and assessed for their expression of the indicated mRNAs ( cks1, skp2, rcl and cdc) by SYBR-green real-time PCR analysis. Levels of mRNA were standardized to Ub.

Entity tagging: machine readable metadata

Curation tool

(Level 0: metadata associated to individual panels.)

Level 1: list of chemical and biological components: small molecules, genes, proteins, sub-cellular structures, cell type, tissue, species.

Level 2: representation of the causality of the experimental design: “Measurement of Y as a function of A, B, C, using assay P in biological system S.”

Level 3: normalization to machine-readable standard identifiers.

measured

component

perturbed

component

experimental system

assayed

property

Structured metadata: ‘perturbation-observation-assay’

Resulting hypothesis: test drug Z in disease D.

tissue T disease

D gene x

Pa

pe

r 3

protein X P kinase Y

Pa

pe

r 2

kinase Y activity drug Z

Pa

pe

r 1

Data integration

Survey (n=487)

PubMed is the first choice for 72% of users (Google is the second choice).

Major issues: “Too many irrelevant results (lack of specificity)” and “Difficulty to formulate complex queries”

Microattribution & Data/Protocol links • Credit • Data Citation • Accountability • Reproducibility

Fig 1C • Source Data • Methods • Protocol • Data Citation • Authorship

Tim Elston, Univ. of North Carolina

Prepublication Ethics

Journal Author Checklists

EMM submission 26/8/2013

Fraud or Beautification?

Ban the eraser tool! EMM submission 26/8/2013

Fraud or Beautification?

July 28, 2014

RE: EMM-2014-03890-V3

Dear Dr. Carret,

We retract our paper from further consideration at EMBO Molecular Medicine. …..

We thank the Journal’s Editorial Board for their rigorous evaluation of our study. Publishing a paper with an erroneous blot would call into question all of the good work that went into this study. This situation would have been a disaster for us ….. Thus, we are sincerely grateful that you enabled us to detect this mistake prior to publication.

Cordially,

Prepublication Image Analysis

‘A handful of television trucks with satellite dishes lined the street in front of the building in downtown Tokyo. Some 200 journalists packed the meeting room and were flanked by two dozen video cameras and crew at the front.’ Nature 2014

Fig 1i ED Fig 5

Retracted Nature 505, 641–647 (2014) Nature 505, 676–680 (2014)

Fig 2g Fig 1b

Fig 2g

Prepublication Image Analysis

Long term toxicity of a Roundup herbicide and a Roundup-tolerant genetically modified maize Séralini G.E., et al. Food and Chemical Toxicology, 2012, retracted

Extraordinary claims require extraordinary proof and validation

“no definitive conclusions can be reached” A. Wallace Hayes, editor Food and Chemical Toxicology

Open Data: how to ensure reliability?

Gate of Hell - William Blake Jacob’s Dream - William Blake

Opening the

Gates…

top related