identifying causal genes and dysregulated pathways in complex diseases nov. 6 th, 2010 yoo-ah kim...
TRANSCRIPT
IDENTIFYING CAUSAL GENES AND DYSREGULATED PATHWAYS IN COMPLEX DISEASES
Nov. 6th, 2010
YOO-AH KIMNIH / NLM / NCBI
Complex Diseases
Associated with the effects of multiple genesAs opposed to single gene diseases
The combination of genomic alteration may vary strongly among different patients
Dysregulating the same components, thus often leading to the same disease phenotype
Difficult to study and TreatCancer, Heart diseases, Diabetes, etc.
Copy Number Variations
Two copies of each gene are generally assumed to be present in a genome
Genomic regions may be deleted or duplicated causing CNV
Some CNVs are associated with susceptibility or resistance to diseases such as cancer
Copy Number Variations in 158 Glioblastoma patients
Identifying Genomic Causes in Complex Diseases
Identify genotypic causes in individual patients as well as dysregulated pathways
Systems biology approachGenome-wide searchGraph theoretic algorithms
Circuit flowSet cover
158 Glioblastoma multiforme patients
Glioblastoma multiforme (GBM)
the most common and most aggressive type of primary brain tumor in humans
Expression as Quantitative Trait
Genotype:Copy number variations
Phenotype:Gene expression
eQTL (expression Quantitative Trait Loci) Analysis
While we assume that the genetic variation is the cause and expression change is the effect, we don’t know molecular pathways behind the relation
Putative target gene Putative causal gene/loci
Method Outline
A. Target gene selection Gene expression
B. eQTL Find association between
expression and copy number
C. Circuit flow algorithm Molecular interactions Candidate causal genes
D. Causal gene selection Weighted multiset cover
cases
target g
enes
gm
g3
g2
g1
tag lo
ci
sn
s3
s2
s1
s4
cases
causalgenes
cases
targetGene gm
tagSNP sn
causalgenes
+ -
A
CTF-DNA
phosphoryl.event
protein-protein
D
B
Target Gene Selection
Select a representative set of disease genes Filter differentially expressed genes
for each case Multi-set cover
Gene 1 Gene 2 Gene 3
.
.
.
.
.
Controls Disease Cases
Gene Expression
Associations between the expression of target genes and copy number variations of genomic loci Linear regression For every pair of tag loci and
target genes
eQTL
casestarget genes
tag Loci
cases
Finding Candidate Causal Genes
Genotypic Variations Target Genes
Finding Candidate Causal Genes
?
Genotypic Variations Target Genes
C1
C2
C3
C4
C5
Candidate Genes
Finding Candidate Causal Genes
Genotypic Variations Target Genes
C1
C2
C3
C4
C5
Candidate Genes
D
Interaction Network
protein-protein interactions phosphorylation eventstranscription factor interactions.
Finding Candidate Causal Genes
Genotypic Variations Target Genes
C1
C2
C3
C4
C5
Candidate Genes
u
v
D
Current flow
+-
Resistance (u, v) is set to be reversely proportional to (|corr (expr(u), expr(D))| + |corr(expr(v), expr(D))|)/2
Interaction Network
Finding Candidate Causal Genes
Genotypic Variations Target Genes
C1
C2
C3
C4
C5
Candidate Genes
D
Current flow
+-
Compute the amount of current entering each causal gene by solving a system of linear equations
Interaction Network
Method Outline
A. Target gene selection Gene expression
B. eQTL Find association between
expression and copy number
C. Circuit flow algorithm Molecular interactions Candidate causal genes
D. Causal gene selection Weighted multiset cover
cases
target g
enes
gm
g3
g2
g1
tag lo
ci
sn
s3
s2
s1
s4
cases
causalgenes
cases
targetGene gm
tagSNP sn
causalgenes
+ -
A
CTF-DNA
phosphoryl.event
protein-protein
D
B
Final Causal Gene Selection
cases
causal genesA putative causal gene explains a disease case if • its corresponding tag locus has a copy
number alteration• its affected target genes (i.e., genes
sending a significant amount of current to the causal gene) are differentially expressed in the disease case
Final Causal Gene Selection
cases
causal genesA putative causal gene explains a disease case if • its corresponding tag locus has a copy
number alteration• its affected target genes (i.e., genes
sending a significant amount of current to the causal gene) are differentially expressed in the disease case
Final Causal Gene Selection
cases
causal genesA putative causal gene explains a disease case if • its corresponding tag locus has a copy
number alteration• its affected target genes (i.e., genes
sending a significant amount of current to the causal gene) are differentially expressed in the disease case
WEIGHT
Final Causal Gene Selection
Find a smallest set of genes covering (almost) all cases at least k’ times minimum weighted multi-set cover
Dysregulated Pathways
Causal paths between a target and a causal gene a maximum current path
C1
C2
C3
C4
C5
D
Selected Causal Genes
Number of Genes Overlap with GBM genes
Step B: eQTL 16056 0.56 (75)
Step C: Circuit flow 701 0.045 (10)
Step D: Set cover 128 4.7 10-4 (6)
Results
128 causal genes from set cover (STEP D)
701 candidate causal gene from circuit flow algorithm (STEP C)
Causal Genes
BSOSC Review, November 2008
P-value Genes
Glioma 0.008 PRKCA,EGFR,AKT1,CDKN2A,CAMK2G,TP53,RB1,PTEN
Cell cycle 0.028 MCM7,CDKN2A,CDC2,TP53,ORC5L,RB1,ATR,BUB3,CUL1p53 signaling pathway 0.030 CDKN2A,CDC2,TP53,ATR,FAS,THBS1,PTEN
Proteasome 0.026 PSMA1,PSMC6,PSMB1,PSMC3,PSMA5,PSMA4
Functional analysis using DAVID
The selected causal gene set includes many known cancer implicated genes
PTEN as causal gene
fold change- 0 +
TF-DNAprotein-protein
kinase
TF
causalgenes
EGFR as causal and target gene
fold change- 0 +
kinase
TF
causalgenes
TF-DNAprotein-protein
phosphorylation
Causal EGFR
Target EGFR
Conclusion
A novel computational method to simultaneously identify causal genes and dys-regulated pathways Circuit flow algorithm Multi-set cover
Augmentation of eQTL evidence with interaction information resulted in a very powerful approach uncover potential causal genes as well as intermediate
nodes on molecular pathways Our method can be applied to any disease system
where genetic variations play a fundamental causal role
Acknowledgements
Teresa M. Przytycka Stefan Wuchty
Other group members Dong Yeon Cho Yang Huang Damian Wojtowicz Jie Zheng
Method Outline
A. Target gene selection Gene expression
B. eQTL Find association between
expression and copy number
C. Circuit flow algorithm Molecular interactions Candidate causal genes
D. Causal gene selection Weighted multiset cover
cases
target g
enes
gm
g3
g2
g1
tag lo
ci
sn
s3
s2
s1
s4
cases
causalgenes
cases
targetGene gm
tagSNP sn
causalgenes
+ -
A
CTF-DNA
phosphoryl.event
protein-protein
D
B
EGFR as causal and target geneCAU
SAL PATHS
fold change- 0 +
kinase
TF
causalgenes
TF-DNAprotein-protein
phosphorylation
causal EGFR
target EGFR
PTEN as causal geneCAU
SAL PATHS
fold change- 0 +
TF-DNAprotein-protein
kinase
TF
causalgenes
Our Method
Integrate several types of data Gene expression Copy number variations Molecular interactions
Methods and Results
Method model the expression change of disease
genes as a function of genomic alterations translated the propagation of information
from a potential causal to a disease gene as the flow of electric current through a network of molecular interactions.
multi-set cover: select most prominent genes
Validated our approach by testing the enrichment of selected causal genes with known GBM/Glioma related genes
diseasegene gm
tagSNP
sn
causalgenes
+ -