Page 1: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Publishing Source Data

Finding and Accessing the Data Behind Figures

Bernd Pulverer Head | Scientific Publications Chief Editor | The EMBO Journal

Page 2: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Three principles of research

• Sharing

research progresses as a collaborative

enterprise - often in small steps

• Self-governance

peer-review, committees, boards

• Ethics









Page 3: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Data published in papers represents a small fraction of all useful data generated in labs peer review; stable; citable; usually unstructured

Data deposited in databases represents a bigger fraction of the data usually curation; (stable); citable; structured

Data deposited in repositories may capture a large fraction of the data some curation; often unstructured >> validation, citability, stability

>20k journals 1.5 million papers/year

>5% annual growth



Page 4: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO



functional genomics


genotype phenotype

Data deposition in databases/repositories

computational models

“orphan” data Papers

Authors’ website

Institutional repositories

Page 5: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Open Science! …but

not all data is useful

flawed data

unreproducible data raw data unstructured data validation - curation

Page 6: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Data published in papers is validated by peer review ….but

negative, confirmatory and refuting data is largely ignored not all published data is high quality not all published data is reproducible

Page 7: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO


Bayer Healthcare

Unreproducible data (?)

Page 8: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

‘biologists fail to design experiments properly, and so submit underpowered studies that have an insufficient sample size and trumpet chance observations as biological effects…. Researchers …must agree on standards that will protect against avoidable errors. '

NATURE Error prone Nature 487, 406 (2012) doi:10.1038/487406a

‘Scientists and journals must work together to ensure that eye-catching artefacts are not trumpeted as genomic insights’ ‘hunting for biological surprises without due caution can easily yield a rich crop of biases and experimental artefacts, and lead to high-impact papers built on nothing more than systematic experimental 'noise'.’

NATURE Methods: Face up to false positives Daniel MacArthur Nature 487, 427–428 (2012) doi:10.1038/487427a

Nature Reviews Neuroscience Power failure: why small sample size Undermines the reliability of neuroscience Button, Ioannidis, Mokrysz, Nosek, Flint, Robinson & Munafò Nature Reviews Neuroscience 14, 365-376 (2013) doi:10.1038/nrn3475

‘the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful.’

Page 9: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Many Journals have explicit materials and data sharing guidelines …which are often not policed effectively

It is not always easy to share all data: human subjects pharma, biotech

Page 10: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Data (Code) sharing may expose flaws aids competitors

Many Journals have explicit materials and data sharing guidelines …which are often not policed effectively

Page 11: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

The scientific paper will remain a key mode of exchanging filtered scientific information • browsing • ‘academic currency’

Page 12: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Research Papers rarely contain data that can be mined, extracted, reanalyzed and reused [not just a question of licensing] The data in papers are not discoverable [not just a question of OA]

Page 13: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Scientific Journals are good at validation and in mandating standards & sharing [but need to get better]

Journals are not good at access and publishing of structured data • human vs. machine readable • interlinking papers, databases and data repositories

Page 14: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Quality assurance Reproducibility Accessibility Discoverability

Page 15: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO


Page 16: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO


Page 17: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

• Files that not rendered in HTML are linked as downloads • Links to datasets • Data citations


Page 18: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Data Transparency

Published data should be accessible, reproducible and re-usable for research by others

‘The two vital components of

the scientific endeavor – the idea and the evidence – are too frequently separated’

Science as an open enterprise, Royal Society, 2012

Page 19: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

A paper

Graphs Gels Schemata Micrographs

Page 20: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

What is a figure?

A scientific result converted into a

collection of pixels

Page 21: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Add Source Data to Papers: Gels, Blots, micrographs & Graphs a lab book

Page 22: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Figure Source Data Raw Data

Raw vs. Source

Page 23: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

• Archive

• Transparency

• Replicates

• Reanalysis

• Reuse

• Discoverability

• Discourage manipulation

o voluntary o ~40% papers

Source Data

Page 24: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Data or Schema?

Post Review Manuscript EMBO Molecular Medicine 2012

Page 25: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Data or Schema?

Page 26: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

‘I’m a great believer in seeing all the data – this is a very important lever that we have for transparency’

Michael Farthing, founder COPE

Page 27: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Smart figures

• Access to source data

• Descriptive metadata

• Coherent experimental units: figure panels

• Enabling data-oriented searches

• Present data in the context of related data

>> Data oriented search

Page 28: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Paper 1 Paper 2

Navigating Research Findings via Figures

Data viewer

Page 29: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO
Page 30: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Figure = Data

Text = Narrative

Page 31: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO
Page 32: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

(A) Primary early-passage MEFs were infected with MSCV-Myc-ERTAM-IRES-GFP (Myc-ER) or MSCV-IRES-GFP (GFP) virus. GFP+ cells were then left untreated (−) or were treated (+) with 2 μm 4-HT±Chx pretreatment (30 min) for 24 h and assessed for their expression of the indicated mRNAs ( cks1, skp2, rcl and cdc) by SYBR-green real-time PCR analysis. Levels of mRNA were standardized to Ub.

Entity tagging: machine readable metadata

Page 33: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Curation tool

Page 34: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

(Level 0: metadata associated to individual panels.)

Level 1: list of chemical and biological components: small molecules, genes, proteins, sub-cellular structures, cell type, tissue, species.

Level 2: representation of the causality of the experimental design: “Measurement of Y as a function of A, B, C, using assay P in biological system S.”

Level 3: normalization to machine-readable standard identifiers.





experimental system



Structured metadata: ‘perturbation-observation-assay’

Page 35: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Resulting hypothesis: test drug Z in disease D.

tissue T disease

D gene x



r 3

protein X P kinase Y



r 2

kinase Y activity drug Z



r 1

Data integration

Page 36: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Survey (n=487)

PubMed is the first choice for 72% of users (Google is the second choice).

Major issues: “Too many irrelevant results (lack of specificity)” and “Difficulty to formulate complex queries”

Page 37: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO
Page 38: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Microattribution & Data/Protocol links • Credit • Data Citation • Accountability • Reproducibility

Fig 1C • Source Data • Methods • Protocol • Data Citation • Authorship

Page 39: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Tim Elston, Univ. of North Carolina

Prepublication Ethics

Page 40: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Journal Author Checklists

Page 41: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

EMM submission 26/8/2013

Fraud or Beautification?

Page 42: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Ban the eraser tool! EMM submission 26/8/2013

Fraud or Beautification?

Page 43: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

July 28, 2014

RE: EMM-2014-03890-V3

Dear Dr. Carret,

We retract our paper from further consideration at EMBO Molecular Medicine. …..

We thank the Journal’s Editorial Board for their rigorous evaluation of our study. Publishing a paper with an erroneous blot would call into question all of the good work that went into this study. This situation would have been a disaster for us ….. Thus, we are sincerely grateful that you enabled us to detect this mistake prior to publication.


Prepublication Image Analysis

Page 44: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO
Page 45: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

‘A handful of television trucks with satellite dishes lined the street in front of the building in downtown Tokyo. Some 200 journalists packed the meeting room and were flanked by two dozen video cameras and crew at the front.’ Nature 2014

Page 46: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO
Page 47: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO
Page 48: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO
Page 49: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Fig 1i ED Fig 5

Retracted Nature 505, 641–647 (2014) Nature 505, 676–680 (2014)

Fig 2g Fig 1b

Fig 2g

Prepublication Image Analysis

Page 50: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Long term toxicity of a Roundup herbicide and a Roundup-tolerant genetically modified maize Séralini G.E., et al. Food and Chemical Toxicology, 2012, retracted

Extraordinary claims require extraordinary proof and validation

“no definitive conclusions can be reached” A. Wallace Hayes, editor Food and Chemical Toxicology

Page 51: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO
Page 52: Publishing Source Data - European Research Council Bernd Pulverer.pdf · July 28, 2014 RE: EMM-2014-03890-V3 Dear Dr. Carret, We retract our paper from further consideration at EMBO

Open Data: how to ensure reliability?

Gate of Hell - William Blake Jacob’s Dream - William Blake

Opening the


Top Related