make the world a better place through reproducible research roger d. peng department of...

Post on 28-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Make the World a Better Place through Reproducible Research

Roger D. PengDepartment of Biostatistics

Johns Hopkins Bloomberg School of Public Health

Wall of Wonder

2006-05-12

Trends in Scientific Research

• Signal-to-noise in many investigations is getting smaller

• Smaller relative risks– e.g. relative risk of mortality is 1.005 per 10

ppb of ozone

• High-throughput measurement technologies

• Powerful computers

Trends in Computing: Then...

...And Now

The Result?

• Large databases for investigating subtle associations

• Interactive computing with advanced statistical algorithms

• Sophisticated searches across models and variables to identify important risks

• Bigger and better studies

Replication: The Standard

• Scientific evidence is strengthened when important findings are replicated by independent investigators, data, methods, laboratories, instruments, etc.

• Replication is often not possible because of time, funding constraints

• Policy decisions must often be made with evidence at hand

Reproducible Research: A Minimum Standard

Published research where the following are made available:

• Analytic data

• Computer code implementing methods

• Documentation about code/data

All are distributed using standard means

Benefits of Reproducible Research

• Published findings can be verified

• Alternative analyses conducted

• Challenge uninformed criticisms (“put up or shut up”)

• Expedite exchange of ideas among investigators

Challenges to RR

• “If I give away my data, others will publish results and scoop me”

• “I own my data and ideas, other people don’t necessarily have any rights to them”

Why should I just give away my intellectual property?

Ya see, it’s what I call the “ownership society”

Property

[Automatt] [JRodrigues]

[james.thompson]

[nervsappy]

“Intellectual Property”

• “the intangible value created by human creativity and invention” – from JHSPH Office of Technology Transfer

(emphasis added)

• How can something that is intangible be property?

There’s No Such Thing as “Intellectual Property”

• If I copy your book, you still have your book• If I use your idea, you still have your idea• If I copy your data, you still have your data• If I use your statistical model, you still have

your statistical model• If I implement your algorithm, you still have

your algorithm• etc.

Research done by youregardless of

sharing

What are the Potential Gainsand Losses from Sharing Data?

Research done by youregardless of

sharing

Data

Research done by others

Research youwould have done

if you hadn’tshared dataDon’t share

Share = Y(1)

= Y(0)

(a) D = 0(b) D < 0(c) D > 0

D = Y(1) - Y(0)

What is a Dataset?

• Represents already published findings and ideas

• Contains potential findings and ideas yet to be discovered and exploited

• Datasets do not fit well into the framework of copyrights and patents

What Do We Need?

• Infrastructure– Tools for researchers, developers– Repositories for datasets– Rights framework for datasets

• Privacy preservation• Handle computer language Babel• Structured research modularity

“WWKD”

“WWKD”What Would Karl Do?

Models for Reproducibility

Models for Reproducibility

Models for Reproducibility

Models for Reproducibility

Models for Reproducibility

An Example

http://www.biostat.jhsph.edu/MCAPS/

Partial Rights for Data?A First Cut

• Full access: the data can be used for any purpose

• Attribution: the data can be use for any purpose with a specific citation

• Share-alike: the data can be used for any purpose but any “improvements” must be made available under the same license

• Reproduction-only: the data can only be used for reproducing published results and commenting via a letter to the editor

Thank you!

top related