systems genetics of cancer - big data and all that
TRANSCRIPT
Florian Markowetz
CRUK Cambridge Institute
www.markowetzlab.org
Systems genetics
of cancerBig data and all that
“With enough data
and the ability to
crunch it, virtually any
challenge facing
humanity today can
be solved.”
Eric Schmidt et al, How Google Works, 2014
Prof Atul Butte (Genomic Medicine, Stanford) at TEDMED 2012
“Who needs the scientific method?
Vast stores of available data (…)
are simply waiting for the right
questions.”
Chris Anderson, WIRED.com, 2008http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory
The End of Theory: The Data
Deluge Makes the Scientific
Method Obsolete‘Petabytes allow us to say: "Correlation
is enough.” (…)
We can throw the numbers into the
biggest computing clusters the world
has ever seen and let statistical
algorithms find patterns where science
cannot.’
[The ENCODE] data enabled us to
assign bio-chemical functions for
80% of the genome.
Function = showed up in the data
Graur et al, Genome Biol Evol (2013)
“This claim flies in the face of
current estimates according to
which the fraction of the genome
that is evolutionarily conserved
(…)
is under 10%.”
Function = evolutionary conserved
Always the same science.
Always the same questions.
Big Data is a technical
challenge, not a conceptual
one
Systems Genetics of Cancer
Genetic variation
• In people
• In tumours
• In clones
Phenotypic variation
• Tumour subtypes
• Aggressiveness
• Survival
Cancer genome
Evolution
Cancer tissue
Context
Cancer genome
Function
Ines
Wei Edith
Geoff
Ke Anne Joe
Leon
Andy
Amanda
Intra-patient heterogeneity in HGSOC
Schwarz et al, PLoS Comp Bio 2014
Schwarz et al, PLoS Medicine, 2015
Mixture model
1. How many clones are there in the
sample?
2. How are they related in a tree?
Data
Nr of
clones Size of
clone
Variability
inside clone
Parameters
Graphical Model behind BitPhylogeny
Phylogeny prior
Prior on local parameters
Likelihood
The BitPhylogeny model
FUTURE
• ICGC pan-cancer analysis:
2500 genomes => 2500 trees
• Characterize the 2500 trees
• Correlate trees with clinical data
• Infer onco-genetic progression models across
the 2500 trees
DNA RNA Protein ChIP
Van’t Veer et al (2002) http://ms.lbl.gov Ross-Innes et al (2012)
Tumors
are
complex
tissues
FUTURE
• ICGC pan-cancer analysis: 2500 genomes
• Collect tissues for as many samples as possible.
• Correlate tissue architecture with clinical data
• Correlate tissue architecture with evolution.