2016 09 cxo forum
TRANSCRIPT
Preparing For Genomic Medicine
September 2016: CXO Forum
Chris DwanDirector, IT Architecture and Strategy
[email protected] @fdmts
Conclusions
• We are still in the early days of genomic medicine.
• Organizations who are effective at collaboration and integrative data analysis will lead the next decade of biomedical delivery*
• Privacy, security, and ensuring appropriate access to data will be necessary challenges
• Technology and technologists will be a key differentiator
* Increasingly, you must be competent at these things to even participate.
Coming soon, to a patient near you
Coming soon, to a patient near you
This will be the new normal and public expectation within 5 years
• Non-profit biomedical research institute founded in 2004
• Fifty core faculty members, from MIT and Harvard, plus hundreds of associate members.
• ~1000 employees• >> 2,400+ researchers
Programs and Initiativesfocused on specific disease or biology areas
Cancer Genome BiologyCell Circuits Psychiatric DiseaseMetabolism Medical and Population Genetics Infectious DiseaseEpigenomics
Platformsfocused technological innovation and application
Genomics Data SciencesTherapeutics ImagingMetabolite Profiling ProteomicsGenetic Perturbation
The Broad Institute
• Non-profit biomedical research institute founded in 2004
• Fifty core faculty members, from MIT and Harvard, plus hundreds of associate members.
• ~1000 directly affiliated personnel
• ~2,400+ associated researchers
Programs and Initiativesfocused on specific disease or biology areas
Cancer Genome BiologyCell Circuits Psychiatric DiseaseMetabolism Medical and Population Genetics Infectious DiseaseEpigenomics
Platformsfocused technological innovation and application
Genomics Data SciencesTherapeutics ImagingMetabolite Profiling ProteomicsGenetic Perturbation
The Broad Institute
“This generation has a historic opportunity and responsibility to transform medicine by using systematic approaches in the biological sciences to dramatically accelerate the understanding and cure of disease”
Genomic Data Production @ Broad
Genomic Data Production @ Broad
~140 Whole Genome Sequences / day
Genomic Data Production @ Broad
~140 Whole Genome Sequences / day
~1PB data / month raw (> 40PB total holdings)~15k cores (hybrid cloud) dedicated to primary analysis100Gb/sec link to Internet2
Genomic Data Production in Context
Genomic Data Production in Context
We have learned a vast amount in the
last decade
Genomic Data Production in Context
We have learned a vast amount in the
last decade
The question is no longer “can we do this?” but “what shall we do?”
People @ Broad
The future is already here – it’s just not very well distributed
William Gibson
The future is already here – it’s just not very well distributed
William Gibson
The right side of history
• Applications are containerized (Docker)
• Data is accessed RESTfully (S3 standard)
• Identity management is federated (Oauth2)
• Analytics are ubiquitous (HDFS / Spark)
• Public clouds (AWS, GCS, Azure) provide flexible commodity infrastructure and surge capacity
• Data flow operations adopt serverless architectures (Lambda)
• Technologists are embedded in project teams (DevOps)
This is a multi year journey. Start today.
Transition to Public Clouds
Genomes on the Cloud (April 2016)
Testing the genome analysis
pipeline“Go-live”
3rd Party Companies Fill Cloud Feature GapsCloudhealth dashboard atop the billing API
Storage $$
Network $$
Governance remains critical
$$ !!
The new normal
The new normal
The new normal
You move towards and become like that which you think about.
The Big Data Healthcare Feeding Frenzy
• “If we sequence X new patients with condition Y every year, the sequencing data alone will take up ALL THE EXABYTES”*
• The data storage and analysis needs of precision / personalized / genomic medicine are not unreasonable by comparison with major, data driven industries.
• We can compensate by being thoughtful about what data we store, how we store it, and how we share it.
* If you multiply a number by a sufficiently large number the product is a large number.
We’re starting to get a handle on the basics• Reduced footprint for genomic data
– 30X WGS: 200GB ==> 40GB
• Increasingly standardized and well integrated variant calling and annotation tools
• Powerful public reference sets and tools
… people who had nothing to do with the design and execution of the study …
... use another group’s data for their own ends …
… even use the data to try to disprove what the original investigators had posited…
… some researchers have characterized as “research parasites”
Fear, Uncertainty, and Doubt
The regulatory framework
• Under current law and practice, there is very limited organizational upside to sharing PHI and EMR.
• Data use policies: Financial risk• Research participation: Risk to privacy• De-identification (AKA data mutilation) is not a viable, long
term strategy in the age of analytics
• Also, the compliance process, even in lightweight versions is killing our ability to innovate.
“To be without method is deplorable, but to depend entirely on method is worse.”
The Mustard Seed Garden Manual of Painting, 1679
Appropriate Usage: A framework
Any person
Should have appropriate access to any and all data
Necessary to correctly answer appropriate questions
Appropriate Usage: A new framework
Any person
Should have appropriate access to any and all data
Necessary to correctly answer appropriate questions
This looks almost nothing like our current regulatory framework.
What we need
• Incentive structures that reward making data accessible and useful– All indicators except the benefit of the patient lead to suboptimal behavior– This will require courage.
• National / global data scale data repositories, standards, and toolkits– Death to walled gardens, monolithic systems, and GUIs.– Life to APIs built for a global community (c.f. Amazon, 2002)
• Open, fearless conversation about data protection vs. appropriate use– Genomic data is inherently personally identifiable and should be treated as such– “Appropriate usage” goes well beyond legal conformity
Standards are needed for genomic data
“The mission of the Global Alliance for Genomics and Health is to accelerate progress in human health by helping to establish a common framework of harmonized approaches to enable effective and responsible sharing of genomic and clinical data, and by catalyzing data sharing projects that drive and demonstrate the value of data sharing.”
Regulatory IssuesEthical IssuesTechnical Issues
This stuff is important
We have an opportunity to change lives and health outcomes, and to realize the gains of genomic medicine, not in an indefinite future, but this year.
We also have an opportunity to waste vast amounts of money (very rapidly) and still not really help anybody.
I would like to work together with you to build a better future.
The right side of history
• Applications are containerized (Docker)
• Data is accessed RESTfully (S3 standard)
• Identity management is federated (Oauth2)
• Analytics are ubiquitous (HDFS / Spark)
• Public clouds (AWS, GCS, Azure) provide flexible commodity infrastructure and surge capacity
• Data flows and transforms adopt serverless architectures (Lambda)
• Technologists are embedded in project teams (DevOps)
This is a multi year journey. Start today.
Thank You