mechanistic models of cancer progression in the space of pathways
DESCRIPTION
Mechanistic Models of Cancer Progression in the Space of Pathways. Elena Edelman [email protected] Computational Biology and Bioinformatics Program Institute of Genome Policy and Science Duke University. Outline. I.Biological Background Problems with single gene analysis - PowerPoint PPT PresentationTRANSCRIPT
Mechanistic Models of Cancer Progression in the Space of Pathways
Elena [email protected]
Computational Biology and Bioinformatics ProgramInstitute of Genome Policy and Science
Duke University
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Outline
I. Biological Background– Problems with single gene analysis– Advantages of pathway analysis
II. Gene Sets– How they are derived– Importance of understanding context
III. Modeling Cancer Progression– Overview of multitask model– Prostate cancer example– Melanoma example Mechanistic Models of Cancer Progression, Elena
Edelman presenting
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Disadvantages of single gene based methods
• Hundreds of differentially expressed genes• Subtle signals• Lack of consensus
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Solutions
• Hundreds of differentially expressed genes – group together in a small number of pathways
• Subtle signals – brought to attention when seen as a group
• Lack of consensus – consensus in processes/pathways, not single genes
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Disadvantage of single gene methods
13,023 genes↓
1,149 mutated genes
↓
189 candidate cancer genes
↓
Each sample of a given tumor type had no more than six mutated CAN genes in common
Sjoblom 2006
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Importance of pathway analysis
• Deregulation of specific processes are necessary for tumor formation. Each process has many potential member genes.
• Alteration of a number of different genes will provide the same phenotypic result.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Rb pathway
• Several cancer genes control transitions from resting state (G0 or G1) to replicating phase (S) of cell cycle.
• Diverse protein products:– cdk4 (kinase), oncogene– cyclin D1 (activates cdk4),
oncogene– Rb (transcription factor),
TSG – p16 (inhibits cdk4), TSG
Mechanistic Models of Cancer Progression, Elena Edelman presenting
P53 TSG
• P53 is a transcription factor that inhibits cell growth and stimulates cell death
• Point mutation inactivates its capacity to bind specifically to its recognition sequence.
• Other ways to achieve the same effect– Amplification of MDM2– Infection with DNA
tumor viruses whose products bind to p53 and functionally inactivate it.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Pathway Analysis
• Identify gene sets whose expression patterns characterize specific genetic or molecular perturbations.
• Early pathway analysis: Apply methods such as t-tests to determine differentially expressed genes between two classes. Use database such as Gene Ontology to relate individual genes in terms of general cellular function.
→
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Pathway Analysis
• Next step in pathway analysis: Gene Set Enrichment Analysis (GSEA) & Analysis of Sample Set Enrichment Score (ASSESS)– Start with biological information: Gene sets– Score enrichment of gene sets in an expression profile with
samples from two classes– GSEA outputs enrichment scores for each gene set in each
phenotype– ASSESS outputs enrichment scores for each gene set in each
individual
Mechanistic Models of Cancer Progression, Elena Edelman presenting
05000
10000
15000
20000
-0.2 0.0 0.2 0.4 0.6 0.8
RE
S fo
r G
S 1
6 : c
hr1
1p
13
ES
=
0
.2
09
N
ES
=
0
.9
46
Gene List Index
RES
Ranked G
ene L
ist
Gene Sets
G1 G2G3
Phenotype classes
05000
10000
15000
20000
-0.2 0.0 0.2 0.4 0.6 0.8
RE
S fo
r G
S 1
71
: c
hr3
q2
1
ES
=
0
.0
83
8 N
ES
=
0
.7
05
Gene List Index
RES05000
10000
15000
20000
-0.2 0.0 0.2 0.4 0.6 0.8 RE
S fo
r G
S 1
: x
in
ac
t.u
13
3a
.g
rp
ES
= 0
.6
3 N
ES
=
5.1
Gene List Index
RES
05000
10000
15000
20000
-0.2 0.0 0.2 0.4 0.6 0.8
RE
S fo
r G
S 1
6 : c
hr1
1p
13
ES
=
0
.2
09
N
ES
=
0
.9
46
Gene List Index
RES
Ranked G
ene L
ist
Gene Sets
G1 G2G3
Phenotype classes
05000
10000
15000
20000
-0.2 0.0 0.2 0.4 0.6 0.8
RE
S fo
r G
S 1
71
: c
hr3
q2
1
ES
=
0
.0
83
8 N
ES
=
0
.7
05
Gene List Index
RES05000
10000
15000
20000
-0.2 0.0 0.2 0.4 0.6 0.8 RE
S fo
r G
S 1
: x
in
ac
t.u
13
3a
.g
rp
ES
= 0
.6
3 N
ES
=
5.1
Gene List Index
RES
G1 G2 G3
Given a ranked gene list and a gene set of interest, find genes in the set that are “enriched” at the top or bottom of the list.
How could we conclude that G1 is enriched but G2 and G3 are not?
Enrichment Analysis
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Outline
I. Biological Background– Problems with single gene analysis– Advantages of pathway analysis
II. Gene Sets– How they are derived– Importance of understanding context
III. Modeling Cancer Progression– Overview of multitask model– Prostate cancer example– Melanoma example
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Gene Sets
• Defined functionally or structurally
• Defined by experimental methods or through literature.– Experimental: Knockouts, infections– Literature: Biochemical experiments, reported in databases
such as BioCarta and GenMapp
Mechanistic Models of Cancer Progression, Elena Edelman presenting
GSEA of male vs. female in lymphoblastoid cells
GENE SETGENE SET SOURCESOURCE ESES NESNES NOM p-vNOM p-v FDR q-vFDR q-v
Enriched in Males
s1:chrY Genome 0.778 2.465 < 0.001 < 0.001
s1:chrYp11 Genome 0.759 2.181 < 0.001 < 0.001
s1:chrYq11 Genome 0.886 2.175 < 0.001 < 0.001
s1:Testis expressed genes Experimental GNF 0.656 2.018 < 0.001 0.009
Enriched in Females
s2:Genes that escape XinactivationDisteche et al, Willard et al -0.800 -2.295 < 0.001 < 0.001
s2:Female reproductive tissue expressed genes Experimental GNF -0.485 -1.892 0.013 0.045
Mechanistic Models of Cancer Progression, Elena Edelman presenting
ASSESS of male vs. female in lymphoblastoid cells
SAMPLES
GE
NE
SE
TS
Mechanistic Models of Cancer Progression, Elena Edelman presenting
• Analyses will depend on accuracy of gene sets. We ask:– What is the accuracy of gene sets annotated according to
known perturbations?– How do gene sets defined by experimental studies vs.
expert knowledge compare?
Gene Set Accuracy
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Hypoxia Gene Set
• Hypoxia: The cellular response to low oxygen conditions. Includes new blood vessel formation
• Seven hypoxia gene sets describing the cellular response to hypoxia
Gene Set Source
Hypoxia Down Manalo et al
Hypoxia Up Manalo et al
Hypoxia Fibro Up Kim et al
Hypoxia Reg Up Leonard et al
Hypoxia Review Harris
VEGF Pathway BioCarta
HIF Pathway BioCarta
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Hypoxia gene set accuracy
• Expression data set with 6 hypoxic and 6 normal cells (Mense 2006)
• GSEA applied with database of 508 gene sets.
Rank Gene Set NES P-val
Enriched in Hypoxic Cells
3 Hypoxia Up -1.96 0.008
4 Hypoxia Review -1.95 0
6 Hypoxia Fibro Up
-1.84 0.004
9 Hypoxia Reg Up -1.73 0.02
10 HIF Pathway -1.73 0.02
53 VEGF Pathway -1.39 0.055
Enriched in Normal Cells
17 Hypoxia Down 1.48 0.167
Mechanistic Models of Cancer Progression, Elena Edelman presenting
RAS
• 3 Ras gene sets: K-Ras, H-Ras, and the Ras pathway from Biocarta.
• K-RAS and H-RAS are experimentally defined and context specific.
• Biocarta's Ras gene set in the most general, consisting of genes thought to biochemically interact with RAS and proteins associated with RAS.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
RAS gene set accuracy
• Gene expression profile of 31 cells with tumors caused by K-RAS mutation and 19 normal cells.
• H-RAS does not capture K-RAS specificity.
• BioCarta's RAS gene set is appropriate to use regardless of the specific RAS mutation.
Gene Set NES Pval
Enriched in Tumor
RAS Up BioCarta 1.51 0
SRC Down 1.41 0.09
MYC Up 1.25 0.15
SRC Up 1.25 0.15
HRAS Up 1.12 0.26
E2F3 Up 1.12 0.25
BCAT Up 0.81 0.74
Enriched in Normal
RAS Down BioCarta
-1.51 0.12
E2F3 Down -1.29 0.10
HRAS Down -1.18 0.19
BCAT Down -1.14 0.29
MYC Down -0.99 0.55
Mechanistic Models of Cancer Progression, Elena Edelman presenting
RAS gene set accuracy
• Gene expression profile of 45 adenocarcinomas and 48 squamous lung cancer samples.
• Data set indirectly involves RAS perturbations.
• Enrichment scores from ASSESS were used to predict phenotype. Class prediction accuracy for the three sets:– 69.9% for the H-RAS pathway gene set– 75.3% for the K-RAS pathway gene set– 79.6% for the BioCarta RAS pathway gene set
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Outline
I. Biological Background– Problems with single gene analysis– Advantages of pathway analysis
II. Gene Sets– How they are derived– Importance of understanding context
III. Modeling Cancer Progression– Overview of multitask model– Prostate cancer example– Melanoma example
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Dynamics of Cancer Progression
• Long lists of genes implicated in various stages of cancer exist for many different cancer types. Want to learn about the interaction of these genes via signaling pathways and functional relationships.
• Next step is for a mechanistic understanding of cancer progression on the pathway level.
• There are only a few types of cancers where we know which pathways acquire mutations that initiate tumorigenesis. – Eye: RB1
• Are other types of cancer initiated by one or several pathways becoming altered?
• The alteration of one gene hardly ever suffices to give rise to full blown cancer.– Oncogenes, tumor suppressor genes (TSGs), and stability genes drive
tumor progression.– Mammalian cells have multiple safeguards . Several genes must be
defective for invasive cancer to develop.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Objectives
• Identify pathways most relevant throughout progression and pathways most relevant to individual transitions.
• Build pathway networks: Estimate the interdependence of pathways relevant to each step of tumor progression.
• Refine relevant pathways and infer a gene network for those relevant genes sets.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Hierarchical Modeling
• Tumor progression– FIXED EFFECTS: Stage in cancer progression. Individuals
will show similar pathway deregulation as cancer progresses depending on whether they have benign, primary or metastatic lesions.
– RANDOM EFFECTS: Within a stage, individuals will have differences based on how they specifically developed the disease.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Regularized Multitask Learning (RML)
• Current analyses of genomic data evaluate each stage in progression independently, missing relationships between the data.
• Integration of the data over all stages will provide a more complete picture of the processes underlying tumorigenesis.
• RML learns a problem together with other related problems at the same time. Learning the problems in parallel can help each problem be better learned by using a shared representation.
• Problems: Which pathways are relevant to transition 1? Transition 2? Which pathways are relevant throughout progression?
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Stratifying Data
• States: normal (n), early (e), metastatic (m).
• Data: Gene expression for g genes in s samples. Stratify data into T datasets, one for each step in progression.
T=2: D1 D2
n e
n e m e m
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Modeling tumor progression
• Model Summary: Find relevant pathways in the overall progression
{n→e→m}
And the relevant pathways at different stages
{n→e} and {e→m}
The task t corresponds to progression from less serious to more serious states
t=1: {n→e}, t=2: {e→m}
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Transformation
• Transformation: Gene expression data is transformed using ASSESS
D: genes x samples S: gene sets x samples
D1 D2 S1 S2 n e
e m
genes
20,000
1
Gene s
ets
n e
e m
→
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Multitask SVM
• Support vector machines (SVMs) - regularization method– Input regression data – Estimate a regression function f - a summary statistic of
Y|X.
• Multitask SVM – builds classification models jointly over all data sets, Y|
S1, S2.– Provides a baseline model for gene sets relevant to
predicting phenotype in both data sets, Y|S1,S2 – Provides gene sets relevant to only one data set, Y|S1
and Y|S2– These regressions provide data set dependent
corrections to the baseline model.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
The Model
• Input: x= S1, S2 • class labels, y={-1,1} where -1=less serious, 1=more serious.
• Build two regression models ft1(x) and ft2(x), for transition 1 data and transition 2 data.
– b(x)=baseline term over all tasks and rt(x)=task specific corrections
• Discriminate functions:
– w0 is a vector of baseline weights for the gene sets
– vt1 is the vector of correction terms for transition 1
– vt2 is the vector of correction terms for transition 2
– b is a scalar offset
f t1 (x)b(x) rt1 (x)
f t2 (x)b(x) rt2 (x)
f t1(x)w0 x wt1 x b,
f t2 (x)w0 x v t2 x b,
v t1
Mechanistic Models of Cancer Progression, Elena Edelman presenting
The Model
• Parameters are estimated by minimization problem:
where v(f(xit), yit) is a loss function. If tasks are thought to be highly related, set λ2/λ1 ratio to be large.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Model Interpretation
• Interpretation: wjo – weight of jth gene set in a baseline model. Gene sets for which |wj0| are largest are relevant in
{n→e→m}
vjt – weight of the jth gene set in state progression t.
Gene sets for which |vj1| is large are relevant in
{n→e}
and gene sets for which |vj2| are large are relevant in
{e→m}.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Prostate Cancer
• Gene expression profile of 22 benign epithelium samples (b), 32 primary prostate cancer samples (p), and 17 metastatic prostate cancer samples (m). Tomlins, 2007
• Progression {b→p→m}
w0
v1
v2
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Results
• Categorized results by “Hallmarks of Cancer” – Hanahan, 2000– Self sufficiency of growth signals– Insensitivity to anti-growth signals– Evasion of apoptosis– Defense against limitless replicative potential– Angiogenesis– Invasion and metastasis
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Results
• Self sufficiency in growth signals– Cell cycle gene sets– ErbB4, EGF, Sprouty, ERK
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Results
• Evidence for insensitivity to anti-growth signals: – PTEN down-regulation– PTDINS up-regulation
• Evasion of apoptosis:– IGF1R up-regulation– ROS down-regulation
• Energy production– Glycolysis gene set up-regulation– ATP synthesis gene set up-regulation– Oxidative phosphorylation up-regulation
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Novel Findings
• Took previous analysis a step further by discovering the specific pathways implicated in tumorigenesis.– Previous work identified single genes which were relevant in
progression and grouped them together to form important concepts.
• Currently little known about ErbB4 deregulation in PCA – EGF receptors have been implicated in several tumor type –
stomach, brain, breast.– ErbB2/HER2 has been shown to be overexpressed in prostate
cancer
Tomlins 2007
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Objective 2: Pathway dependency structure
• Infer a pathway interaction network for each stage of progression using learning gradients and inverse regression .
• Provide knowledge on how certain pathways relate, interact, and influence one another with respect to phenotype.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Objective 2
• Standard regression methods show which gene sets are correlated with class labels but do not provide information on the co-variation of gene sets correlated with class labels.
• Estimate covariance of inverse regression C=cov(X|Y)– Input matrix of enrichment scores (X) and class labels (Y)– Output covariance matrix C=cov(X|Y)
• Diagonal elements measure relevance of i-th gene set with respect to change in label.
• ij-th off diagonal element measures the dependence between gene sets i and j.
• Relationships will be visualized in graphical models.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Objective 2
• Analysis can identify pathways that are closely associated throughout progression:– IGF1R and ERK are linked through their association with RAS.
ERK ranks 9th out of 522 gene sets based on the covariance with the IGF1R pathway.
– PTDINS ranks 15th based on the covariance with the PTEN gene set
– IGF1R ranks 32nd based on the covariance with PTDINS
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Objective 2
• A: Dependency structure of the 10 gene sets most relevant in the benign to prostate cancer transition
• B: Extended dependency structure
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Objective 3: Refinement
• Gene sets available are not always in the right context for a specific data set.
• The refinement procedure adapts the gene set to the context of the data set. Shows which genes are dependent on each other and if there is substructure in the gene set.
• Cluster genes in gene set based on their covariance: C=cov(X|Y);– X= gene expression value of genes in the gene set– Y= class labels
• A gene network modeling the interdependence of the genes in the refined gene set is inferred.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Gene Set Refinement
• The genes of BioCarta's ERK pathway• Refine the pathway to those genes most relevant for this data set. • A and B differ in threshold values
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Melanoma Progression
• Gene expression profile of 4 normal skin samples (n), 4 primary melanoma samples (p), and 4 metastatic melanoma samples (m). Smith, 2005.
• Progression {n→p→m}
w0
v1
v2
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Melanoma Results
• Self-sufficiency of growth– AKT up-regulation throughout progression– PTDINS up-regulation throughout progression
• Escape from apoptosis– IGF1R up-regulation in the late transition– p53 down-regulation throughout progression
• Defense against limitless replicative potential– HTERT up-regulation in the early transition
• Angiogenesis– HIF up-regulation throughout progression– Angiogenesis gene set up-regulation in the early transition
• Invasion and Metastasis– CDC42RAC up-regulation throughout progression– MTA3 down-regulation in the early transition
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Validation
• Gene expression profile of 9 samples of benign nevis, 6 samples of primary melanoma, and 19 samples of metastatic melanoma (Haqq 2005)
• Both analysis found:– p53 gene set down-regulation– D4-GDI pathway over-expression– HTERT gene set over-expression– CDC42RAC pathway over-expression
w0
v2v1
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Pathway Dependencies
• A: Dependency structure of top 10 gene sets most relevant in the normal skin to primary melanoma transition
• B: Extended dependency structure
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Sterol Biosynthesis
• Sterol biosynthesis gene set is highly connected
• Tumor cells often have sterol synthesis deficiencies
• One component of the sterol biosynthesis pathway is mevalonate pathway.
• Many tumor cells can not synthesize mevalonate so they obtain is from the host
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Pathways Dependencies
• Interdependence with sterol biosynthesis gene sets out of 523 gene sets:– Fatty acid synthesis ranks 14th
– Cyanoamino acid metabolism ranks 19th
– Gamma hexachlorocyclohexane ranks 3rd
• All are closely tied to the inability of a tumor to synthesize certain metabolites and its increasing need for these metabolites as it grows and develops.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Colon Cancer Example
• Multitask learning can be applied to data sets with more than 3 classes (2 tasks).
• Colon cancer gene expression profile: 32 normal, 32 adenoma, 35 stage 1 carcinoma, 82 stage 2 carcinoma, 70 stage 3 carcinoma, and 43 stage 4 carcinoma.
Vogelstein, 1990
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Future
• Expand analyses to datasets with more than 3 classes– Prostate cancer: benign, PIN, PCA low, PCA high,
metastatic– Colon cancer: normal, adenoma, carcinomas stage1-4
• Gene set expansion– After refining the gene sets, find genes outside of the set
with strong dependencies to the core genes in the gene set
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Acknowledgements
• Sayan Mukherjee• Phillip Febbo• Joe Nevins• Ashley Chi• Justin Guinney