izant openscience

31
Why we do it Jonathan Izant VP, Sage Bionetworks Open Science Summit 31 July 2010 www.sagebase.org

Upload: open-science-summit

Post on 31-Oct-2014

880 views

Category:

Documents


2 download

Tags:

DESCRIPTION

 

TRANSCRIPT

Page 1: Izant openscience

Why we do it

Jonathan IzantVP, Sage Bionetworks

Open Science Summit 31 July 2010www.sagebase.org

Page 2: Izant openscience

denial

Page 3: Izant openscience

Genomics does not yet teach us much

Pharma drug development is broken

Standards of care are inadequate

Academics limit open access

Page 4: Izant openscience

Genetics Timeline

1800 1900 2000

Page 5: Izant openscience

Gene Regulation circa 1990

Page 6: Izant openscience

Gene Regulation circa 1996

Page 7: Izant openscience

Gene Regulation circa 2002

Page 8: Izant openscience

DNAVariation

DNAVariation

Complex TraitVariation

Molecular TraitVariation

trait

“Standard” GWAS Approaches Profiling Approaches

“Integrated” Genetics Approaches

Genome scale profiling provide correlates of disease Many examples BUT what is cause and effect?

Identifies Causative DNA Variation but provides NO mechanism

Provide unbiased view of molecular physiology as it relates to disease phenotypes

Insights on mechanism Provide causal relationships

and allows predictions

RNA amplificationMicroarray hybirdization

Gene Index

Tum

ors

Tum

ors

8

How is genomic data used to understand biology?

Page 9: Izant openscience

Merck Inc. Co.5 Year ProgramBased at RosettaTotal Resources >$150M

The “Rosetta Integrative Genomics Experiment”: Generation, assembly, and integration of data to build models that

predict clinical outcome

• Generate data needed to build bionetworks• Assemble other available data useful for building networks• Integrate and build models• Test predictions• Develop treatments• Design Predictive Markers

Page 10: Izant openscience

Constructing Bayesian Networks

Page 11: Izant openscience

"Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)

"Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)

"Genetics of gene expression and its effect on disease." Nature. (2008)

"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009) ….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc

"Identification of pathways for atherosclerosis." Circ Res. (2007)

"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)

…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome

"Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)

“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)

"An integrative genomics approach to infer causal associations ...” Nat Genet. (2005)

"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)

"Integrating large-scale functional genomic data ..." Nat Genet. (2008) …… Plus 3 additional papers in PLoS Genet., BMC Genet.

d

Metabolic Disease

CVD

Bone

Methods

Extensive Publications now Substantiating Scientific ApproachProbabilistic Causal Bionetwork Models

• >60 Publications from Rosetta Genetics Group (~30 scientists) over 5 years including high profile papers in PLoS Nature and Nature Genetics

Page 12: Izant openscience

Opportunity

The stunning technologies coming will generate heaps of genomic data

Bionetworks using integrative genomic approaches can highlight the non-redundant components- can find drivers of the disease and of therapies

Need to develop ways to host massive amounts of data, evolving representations of disease as represented by these probabilistic causal disease models

Page 13: Izant openscience

Drivers

Recognition that the benefits of bionetwork based molecular models of diseases are powerful but that they require significant resources

Appreciation that it will require decades of evolving representations as real complexity emerges and needs to be integrated with therapeutic interventions

Realizing the donation by Merck might seed a “commons” allowing a potential long term gain to the whole community provided by evolving models of disease built via a contributor network

Page 14: Izant openscience

14

Mission

Sage Bionetworks is a non-profit organization with a vision to create a “Commons” where integrative bionetworks are evolved by contributor scientists with a shared vision to accelerate the elimination of human disease

Page 15: Izant openscience

Sage Bionetworks:a busy first year

2009 2010

14 Staff move into Sage Offices

at FHCRC

First Board of Directors Meeting

First NIH grant payment

Catalyst Funding from Listwin, CHDI

and QuintilesNIH New

Institution Review

Partnership with Pfizer

Partnership with Merck

$5m LSDF Grant

1st Sage Commons

Congress in SF

501(c)(3)determination

$8m NCI grant for new CCSB

Page 16: Izant openscience

Sage Bionetworks Partners

Rese

arch

Platform

Training

Page 17: Izant openscience
Page 18: Izant openscience

Global Coherent Data SetsA data set containing genome-wide DNA variation and intermediate trait, as well as physiological phenotype data across a population of individuals large enough to power association or linkage studies, typically 50 or more individuals. To be coherent, the data needs to be matched with consistent identifiers. Intermediate traits are typically gene expression, but may also include proteomic, metabolomic, and other molecular data.

GCDs are current state of knowledge and subject to change as more information becomes available to Sage

Page 19: Izant openscience

http://www.sagebase.org/research/tools.html

Page 20: Izant openscience

Sage Commons Challenges

Standards (data, annotation)

Tools (combining, analyzing)

Citation (recognition)

Internationalization

Public Engagement

Page 21: Izant openscience

consistent data format and metadata

building the critical mass of contributors

Data standartization, Data Quality

enormous curation effort needed to correct for incompatible study designs, incomplete data gathering

IRB and protection of human subjects

Data interoperability

legal/licensing framework

Tools and standards: allow the reosuce to gown and evolve, capture metadata in a standardized way and quality measures and quality control

Visualization tools

platform independence

Designing a simple-to-use model for uploading and processing data

Ability to capture structured content

The Commons will need to resolve issues surrounding protection of human subjects data if the information is to be widely shared.

Barriers:

Page 22: Izant openscience

The person/institution that was funded to generate the data

The Journals where it was published

The funding agency, regulated by agency rules

Government agency (e.g. NCBI, EBI)

Institutions who want to generate intellectual property

The patients who were studied

A non-profit public access organization

Hospitals and healthcare organizations

A commercial IT, biotechnology or pharmaceutical company

Other (please specify)

0% 20% 40% 60% 80% 100%

One year after it is generated, where is most clinical / genomic

data stored? (87 respondents, multiple choices

permitted)

Problem: ‘Accessible’ data often isn’t

Page 23: Izant openscience

more than 90%

between 50% and 90%

about 50% between 50% and 10%

less than 10%0%

20%

40%

60%

80%

Question: What percentage of the clinical/genomic data that has been published is currently readily accessible for researchers to use?

Question: What percentage of published clinical/genomic data is currently available in a format that is easily down-loaded in a way that facilitates new analysis?

Page 24: Izant openscience
Page 25: Izant openscience

Collaborators

Page 26: Izant openscience

Biomedical research developed as a Cottage Industry

Page 27: Izant openscience
Page 28: Izant openscience

Need for multi-layer mega datasets and the vanishing ‘price’ for genes

provides incentive for pre-competitive space for genomics

1980 1990 2000 2010 2020 $100,000

$1,000,000

$10,000,000

$100,000,000

$1,000,000,000

Gene Licensing Deals ($US)

Page 29: Izant openscience

Incentives:

Researcher "Turf" /lack of experience sharing

Business case for contributing and sharing resources and information is unclear to many, while business case for hoarding them is well articulated and obvious.

Buy-in from tool developers, data producers and data users

politic: competitive funding versus communal goal

Sociology and policy. Getting people to share and building trust.

Willingness by the community to share data and key ancillary information (e.g. pathology/clinical data for profiled samples)Changing culture of individual recognition,

publication, rewards, incentives

This is a social (political) experiment/ entreprise as much as a scientific challenge. How to motivate individuals not community inclined might be key.

The theory is great, the practice needs commitment from a wide variety of players

IMHO, the central challenge will be community adoption.

We need a team that will take the time to make sure we create a set of tools that can interoperate , rather than a set of tools that perform discrete independent tasks.

Page 30: Izant openscience

Andrea CalifanoColumbia U.Eric Schadt

PacBio - UCSF

Atul ButteStanford Med

Trey IdekerUCSD

Stephen FriendSage Bionetworks

The Federation Experiment

Page 31: Izant openscience

Sage Bionetworks

Focused on improving treatment of disease

Working through extensive partnerships to enable research and drug development

Cultural challenges may eclipse technical and operational hurtles

www.sagebase.org