network biology approach to complex diseases...an integrated genomic analysis identifies clinically...
TRANSCRIPT
![Page 1: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/1.jpg)
Teresa Przytycka
NIH / NLM / NCBI
Network Biology Approach to
Complex Diseases
LECTURE 4.
Disease Heterogeneity
![Page 2: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/2.jpg)
Challenges in Modeling of disease
heterogeneity
• Capturing important similarities without loosing important
differences
• Examples – classification based on on treatment response helps
optimizing treatment options (applicable also in the absence of
understanding of molecular mechanism )
• Understanding molecular underpinnings of the differences
• Subtype specific drug design
• Need to zoom on putative causes of differences/similarities
2
![Page 3: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/3.jpg)
Modeling Disease Heterogeneity
3
• Supervised classification discrete labels are provided by assigning a
phenotype (e.g. metastatic/non-metastatic) and gene expression or other
molecular measurements are taken as classifying features
• Machine learning approaches such as random forest, SVM, etc. can be
applied and will not be discussed
• Network based classification (Chuang MSB 2007 and other)
• Non supervised classification
• clustering using particular feature (e.g. gene expression)
• Integrative/multi-feature clustering (example iCluster Shen et. al. PloS
One 2012)
• Network based mixture models
• Cho and Przytycka RECOMB 2012 / NAR 2013
![Page 4: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/4.jpg)
4
Revisiting Chuang et al. Network based classification
of breast cancer metastasis
• For each gene compute activity
score :
– Normalize gene expression
– Compute activity scores a kj by
averaging over genes in the
subnetwork; discretize this value
• Score candidate subnetwork Mk
using mutual information between
value of a and phenptypeS(Mk ) = S x value of a S phen yp(x,y)log[ p(x,y)/p(x)p(y)]
• Search for most discriminative
subnetworks (greedy search)
Hunag et al. MSB 2007
![Page 5: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/5.jpg)
Comments on extensions/ modifications
5
• For separating into two subclasses any method
that identifies dys-regulated pathways can be
used using by taking of the subtypes in place of
“normal”:
• jActive, DEGAS, module cover (already
discussed)• Chowdhury S.A., Koyutürk M. Identification of
coordinately dysregulated subnetworks in complex
phenotypes. Pacific Symposium on Biocomputing
2010:133-144.
• Finding discriminative subnetworks optimally Dao et al,
Bioinformatics (ISMB 2011)
Use color coding paradigm to find optimal subnetwokrs
efficiently
![Page 6: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/6.jpg)
Non-Supervised classification
6
• hierarchical clustering
• positive matrix factorization
• other clustering techniques
• Integrative Clustering (iCluster)Integrative Subtype Discovery in Glioblastoma Using iCluster
Shen et.at. PloSONE 2012
![Page 7: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/7.jpg)
Module Cover
TCGA Ovarian
Cancer
![Page 8: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/8.jpg)
Example – expression based
clustering
8
An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by
abnormalities in PDGFRA, IDH1, EGFR and NF1
Roel G.W. Verhaak, et al. Cancer Cell 2011
![Page 9: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/9.jpg)
Step 2: Identify important signatures
9
![Page 10: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/10.jpg)
Classification into subtypes is a reasonable but not
perfect approach for several reasons:
• Expectation of clearly defined subgroups might not be
realistic
• Difficulty in capturing underlying genotype-phenotype
relation
![Page 11: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/11.jpg)
Key features of our approach
1. Our model is a meta-model that summarizes the
results of 1,000 different models.
2. In each model we assume
1. k subtypes
2. each patient is a mixture of these subtypes
3. each subtype is defined by distribution of features
4. patients with similar phenotypes have similar
explanatory features
Generative Topic Model
![Page 12: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/12.jpg)
Phenotypic features: Explanatory features
Nodes – patients
Edges – phenotypic similarities
neighbors in patient network should have similar
explanatory features
– mutations, CNV, micro RNA level;
– Epigenetic factors,
– Sex, environment ….
Phenotypic and explanatory features
Survival time
Response to drugs,…..
Gene expression profile
Patient graph
Key idea
Topic model: Chang J, Blei DM: Hierarchical Relational Models for Document
Networks. Ann Appl Stat 2010, 4(1):124-150.
![Page 13: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/13.jpg)
Case study of GBM
(Glioblastoma Multiforme)
13
patient network for GMB
Mesenchymal
Classical
Proneural
Neural
IDH1 mutation
PDGFRA ampl.
NF1 mutation
Varhaak et al.
Classification
![Page 14: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/14.jpg)
Key features of our approach
1. Our model is a meta-model that summarizes the
results of 1,000 different models.
2. In each model we assume
1. k subtypes and
2. each patient is a mixture of subtypes
3. Each subtype is defined by distribution of features
4. Patients with similar phenotypes have similar
explanatory features
Generative Topic Model
![Page 15: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/15.jpg)
Step 2. 1 Assuming k subtypes, generate feature
distribution for k subtypes
15
![Page 16: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/16.jpg)
• All features discretized
• one random variable per each gene and per
each type of a genetic variation observed in this
gene (amplification and deletion having two
different variables, all mutations treated with one
variable).
• For microRNA under expression as two different
types of alterations where variable indicates if
the expression is more than 1 or 2 standard
deviations from the mean microRNA expression
• ith aberration in pth patient corresponds to a
discrete random variable gp,i
16
![Page 17: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/17.jpg)
Step 2. 1 Assuming k subtypes, generate feature
distribution for k subtypes
17Each disease subtype β k is defined as a distribution over the genomic
aberrations.
![Page 18: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/18.jpg)
18
Features:EGFR_A
NF1_M
CDKN2B_D
.
.
;
Step 2.2 .Based on patient’s features represent each
patient as mixture of the subtypes
First, for each patient p, draw subtype proportions θ p from the K-dimensional Dirichlet
distribution
![Page 19: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/19.jpg)
Step 2.3 Generate edges based on similarity of subtype
mixtures
Optimize parameters to maximize likelihood of
the patient -patient network
Chang J, Blei DM: Hierarchical Relational Models for Document Networks. Ann Appl
Stat 2010, 4(1):124-150.
Patient network is described by P2 binary random variables lp,p’ where lp,p’ is
set to 1 if there is a link between patients p and p’.
![Page 20: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/20.jpg)
20
The observed patient network is assumed to be
generated by the following hierarchical sampling
process.
• First, for each patient p, draw subtype proportions θ p
from the K-dimensional Dirichlet distribution.
• For each genomic factor gp,i, draw the latent subtype
assignment zp,i from the multinomial distribution
defined by θ p and randomly choose a genomic factor
from the corresponding multinomial distribution.
• for each pair of patient (p, p’) draw the binary link
variable lp,p’ from the distribution defined by the link
probability function ψ. This function is dependent on
the inner product of two vectors of subtype
assignments zp and zp’ that generated their genomic
aberrations.
![Page 21: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/21.jpg)
21
Copy number variationMutation
miRNA expression changes
Correlated expression
Link probability functions
EGFR_A
CDKN2A_D
NF1_M
miR-195_H
PDGFRA_A
Subtype I
EGFR_A0.45
NF1_M0.37
PTEN_M0.21
TP53_M0.11
…
Subtype II
PDGFRA_A0.51
IDH1_M0.29
TP53_M0.17
miR-9_H0.11
…
Subtype III
miR218_H0.35
CDK2_D0.22
SHC1_M0.14
…
Subtype IV
EGFR_A0.47
CDKN2B_D0.36
EGFR_M0.19
miR195_H0.05
…
ABC
D
EGFR_M
miR‐9_HEGFR_M
![Page 22: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/22.jpg)
22
Visualization
of a sample
individual
model
![Page 23: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/23.jpg)
Summarizing the results of 1,000
models with respect to three
aspects:
• Relation between patients
• Relation between features
• Relation between features and patients
23
![Page 24: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/24.jpg)
Patient-patient relationship
Observation: No separate Neural group
(setting larger k did not change it)
![Page 25: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/25.jpg)
25
Feature-feature and
patient-feature relationships
![Page 26: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/26.jpg)
Selected cancer related features
Probabilistic subtype assignment
![Page 27: Network Biology Approach to Complex Diseases...An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec78de30c642f4c5c32021e/html5/thumbnails/27.jpg)
Challenges in Modeling of disease
heterogeneity • Subtyping methods allow for capturing important
similarities without loosing important differences
• Mixture models – capturing overlapping subtypes
• A preferred approach should link causes to effects
i.e. capture genotype-phenotype relation - topic
model
27