izant openscience
DESCRIPTION
TRANSCRIPT
Why we do it
Jonathan IzantVP, Sage Bionetworks
Open Science Summit 31 July 2010www.sagebase.org
denial
Genomics does not yet teach us much
Pharma drug development is broken
Standards of care are inadequate
Academics limit open access
Genetics Timeline
1800 1900 2000
Gene Regulation circa 1990
Gene Regulation circa 1996
Gene Regulation circa 2002
DNAVariation
DNAVariation
Complex TraitVariation
Molecular TraitVariation
trait
“Standard” GWAS Approaches Profiling Approaches
“Integrated” Genetics Approaches
Genome scale profiling provide correlates of disease Many examples BUT what is cause and effect?
Identifies Causative DNA Variation but provides NO mechanism
Provide unbiased view of molecular physiology as it relates to disease phenotypes
Insights on mechanism Provide causal relationships
and allows predictions
RNA amplificationMicroarray hybirdization
Gene Index
Tum
ors
Tum
ors
8
How is genomic data used to understand biology?
Merck Inc. Co.5 Year ProgramBased at RosettaTotal Resources >$150M
The “Rosetta Integrative Genomics Experiment”: Generation, assembly, and integration of data to build models that
predict clinical outcome
• Generate data needed to build bionetworks• Assemble other available data useful for building networks• Integrate and build models• Test predictions• Develop treatments• Design Predictive Markers
Constructing Bayesian Networks
"Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)
"Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)
"Genetics of gene expression and its effect on disease." Nature. (2008)
"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009) ….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc
"Identification of pathways for atherosclerosis." Circ Res. (2007)
"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)
…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome
"Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)
“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)
"An integrative genomics approach to infer causal associations ...” Nat Genet. (2005)
"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)
"Integrating large-scale functional genomic data ..." Nat Genet. (2008) …… Plus 3 additional papers in PLoS Genet., BMC Genet.
d
Metabolic Disease
CVD
Bone
Methods
Extensive Publications now Substantiating Scientific ApproachProbabilistic Causal Bionetwork Models
• >60 Publications from Rosetta Genetics Group (~30 scientists) over 5 years including high profile papers in PLoS Nature and Nature Genetics
Opportunity
The stunning technologies coming will generate heaps of genomic data
Bionetworks using integrative genomic approaches can highlight the non-redundant components- can find drivers of the disease and of therapies
Need to develop ways to host massive amounts of data, evolving representations of disease as represented by these probabilistic causal disease models
Drivers
Recognition that the benefits of bionetwork based molecular models of diseases are powerful but that they require significant resources
Appreciation that it will require decades of evolving representations as real complexity emerges and needs to be integrated with therapeutic interventions
Realizing the donation by Merck might seed a “commons” allowing a potential long term gain to the whole community provided by evolving models of disease built via a contributor network
14
Mission
Sage Bionetworks is a non-profit organization with a vision to create a “Commons” where integrative bionetworks are evolved by contributor scientists with a shared vision to accelerate the elimination of human disease
Sage Bionetworks:a busy first year
2009 2010
14 Staff move into Sage Offices
at FHCRC
First Board of Directors Meeting
First NIH grant payment
Catalyst Funding from Listwin, CHDI
and QuintilesNIH New
Institution Review
Partnership with Pfizer
Partnership with Merck
$5m LSDF Grant
1st Sage Commons
Congress in SF
501(c)(3)determination
$8m NCI grant for new CCSB
Sage Bionetworks Partners
Rese
arch
Platform
Training
Global Coherent Data SetsA data set containing genome-wide DNA variation and intermediate trait, as well as physiological phenotype data across a population of individuals large enough to power association or linkage studies, typically 50 or more individuals. To be coherent, the data needs to be matched with consistent identifiers. Intermediate traits are typically gene expression, but may also include proteomic, metabolomic, and other molecular data.
GCDs are current state of knowledge and subject to change as more information becomes available to Sage
http://www.sagebase.org/research/tools.html
Sage Commons Challenges
Standards (data, annotation)
Tools (combining, analyzing)
Citation (recognition)
Internationalization
Public Engagement
consistent data format and metadata
building the critical mass of contributors
Data standartization, Data Quality
enormous curation effort needed to correct for incompatible study designs, incomplete data gathering
IRB and protection of human subjects
Data interoperability
legal/licensing framework
Tools and standards: allow the reosuce to gown and evolve, capture metadata in a standardized way and quality measures and quality control
Visualization tools
platform independence
Designing a simple-to-use model for uploading and processing data
Ability to capture structured content
The Commons will need to resolve issues surrounding protection of human subjects data if the information is to be widely shared.
Barriers:
The person/institution that was funded to generate the data
The Journals where it was published
The funding agency, regulated by agency rules
Government agency (e.g. NCBI, EBI)
Institutions who want to generate intellectual property
The patients who were studied
A non-profit public access organization
Hospitals and healthcare organizations
A commercial IT, biotechnology or pharmaceutical company
Other (please specify)
0% 20% 40% 60% 80% 100%
One year after it is generated, where is most clinical / genomic
data stored? (87 respondents, multiple choices
permitted)
Problem: ‘Accessible’ data often isn’t
more than 90%
between 50% and 90%
about 50% between 50% and 10%
less than 10%0%
20%
40%
60%
80%
Question: What percentage of the clinical/genomic data that has been published is currently readily accessible for researchers to use?
Question: What percentage of published clinical/genomic data is currently available in a format that is easily down-loaded in a way that facilitates new analysis?
Collaborators
Biomedical research developed as a Cottage Industry
Need for multi-layer mega datasets and the vanishing ‘price’ for genes
provides incentive for pre-competitive space for genomics
1980 1990 2000 2010 2020 $100,000
$1,000,000
$10,000,000
$100,000,000
$1,000,000,000
Gene Licensing Deals ($US)
Incentives:
Researcher "Turf" /lack of experience sharing
Business case for contributing and sharing resources and information is unclear to many, while business case for hoarding them is well articulated and obvious.
Buy-in from tool developers, data producers and data users
politic: competitive funding versus communal goal
Sociology and policy. Getting people to share and building trust.
Willingness by the community to share data and key ancillary information (e.g. pathology/clinical data for profiled samples)Changing culture of individual recognition,
publication, rewards, incentives
This is a social (political) experiment/ entreprise as much as a scientific challenge. How to motivate individuals not community inclined might be key.
The theory is great, the practice needs commitment from a wide variety of players
IMHO, the central challenge will be community adoption.
We need a team that will take the time to make sure we create a set of tools that can interoperate , rather than a set of tools that perform discrete independent tasks.
Andrea CalifanoColumbia U.Eric Schadt
PacBio - UCSF
Atul ButteStanford Med
Trey IdekerUCSD
Stephen FriendSage Bionetworks
The Federation Experiment
Sage Bionetworks
Focused on improving treatment of disease
Working through extensive partnerships to enable research and drug development
Cultural challenges may eclipse technical and operational hurtles
www.sagebase.org