bayesian networks as framework for data integration · pdf filebayesian networks as framework...
TRANSCRIPT
Bayesian Networks as framework for data integration
Jun Zhu, Ph. D.
Department of Genomics and Genetic Sciences
Icahn Institute of Genomics and Multiscale
Biology
Icahn Medical School at Mount Sinai
New York, NY
@IcahnInstitute UCLA workshop, July, 2013---Jun Zhu, Ph. D.
What are Bayesian networks?
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Association vs Causality
From Stephen Friend
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
A simple biological question: are there
causal/reactive relationships?
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
A Bayesian network approach:
A
B C
Best models Markov Equivalent models
A
A
A
B
B
B
C
C
C
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
A Bayesian network ≠ a causal structure
Markov Equivalent models
A
A
A
B
B
B
C
C
C
|B C A
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
X
F1
F2
F0
Diabetes
resistant
Diabetes
susceptible
Animal model: mouse F2 intercrosses
Bayesian network: how to break
Markov equivalent?
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Liver Brain Muscle
White
adipose
Genotyping
Constructing
genetics map
Scanning QTLs
clinical traits Molecular profiling
Network
reconstruction
General data flow genetic crosses
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Variation in mRNA leads to
variation in protein, which in
turn can lead to disease
Causal inference: genetics
Perturbations with a causal anchor
--Natural variation in a segregating population provides the same type of
causal anchor
DNA Supporting
Gene X
Variation in DNA leads to
variation in mRNA
AA
CA
GT
T
AA
CG
GT
T
High expression, alt
splicing, codon
change, etc.
Low expression, no alt.
splicing, no codon
change, etc.
Central Dogma of Biology
Schadt et al. Nature Genetics (2005)
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
A Bayesian network approach:
Best models Markov Equivalent models
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Structure priors based on causality
▶ Estimate confidence of causality
– Bootstrap samples for 200
times
– Factions of causal, reactive,
independent calls
▶ The pair is independent
▶ The pair is causa/reactive
Zhu et al., PLoS CompBio, 2007
Bayesian network: integrating genetic data
• Give a sense of causality to Bayesian network
• how much improvement is achieved by integrating genetic data?
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Bayesian Network: a simulation study
Zhu et al., PLoS CompBio, 2007
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Bayesian network: Genetics information is critical
when sample size is small
Largest improvement in recall occurs
with smaller sample sizes
Zhu et al., PLoS CompBio, 2007
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Bayesian network: integrating genetic data
L1 L2 Ln-1 Ln
G1 G2 Gn-1 Gn Gj
Lj
Cis-regulation
Genetic loci
trans-regulation Transcriptional regulation
Gene
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
recall
pre
cis
ion
Weak signals Strong signals
300 samples 900 samples 300 samples 900 samples
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Bayesian network: integrating genetics
Experimental Hsd11b1 signature : mice treated with Hsd1
inhibitor
Prediction Hsd1 signatures based on BxD data
Correlation to Hsd1 10% of predicted signature overlap with experimental one
BN without genetics 20% of predicted signature overlap with experimental one
BN with genetics 52% of predicted signature overlap with experimental one
Zhu J et al, Cytogenet Genome Res. (2004)
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
A framework for data integration
probabilistic
graphic models
Microarray data
Proteomic data
Genomics
Genetics
Medline Biocarta/Biopathway Biologists
Database
GUI Hypothesis, test
High throughput
data
knowledge
Metabolomic data
Bayesian network: PPI
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Zhu J et al, Nature Genetics, 2008
Bayesian network: PPI
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Zhu J et al, Nature Genetics, 2008
3-clique
4-clique 4-clique
3-clique
Clique community
(partial clique)
Bayesian network: PPI
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Zhu J et al, Nature Genetics, 2008
Bayesian network: Transcription Factors
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
C B
TF
D E
Is the TF is functional?
Are genes B, C, D, and E are correlated?
Bayesian network: Transcription Factors
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Introducing scale-free priors for TF or protein
complex
)()( TwgTp
)),(log()(
Rg
cutoffi
i
rgTrTw
Zhu J et al, Nature Genetics, 2008
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Zhu J et al, Nature Genetics, 2008
Yeast segregants
Synthetic complete
medium
Logorithm growth
Gene expression genotypes
Public
databases
Protein-
protein
interations
Transcription
factor binding
sites
Bayesian network
Protein
Metabolite
interations
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Zhu J et al, Nature Genetics, 2008
Integration improves network qualities
BN KO data GO terms TF data
w/o any priors 125 55 26
w/ genetics
priors 139 59 34
w/ genetics, TF
and PPI
priors 152 66 52
Zhu J et al, Nature Genetics, 2008
LEU2 GCN4
ILV6
GCN4
LEU2 KO gives rise to small expression signature
• LEU2 KO sig enriched (p~10E-18)
• GCN4 downregulated in LEU2 KO small signature
ILV6 gives rise to large expression signature
• ILV6 KO sig enriched (p~10E-52)
• GCN4 upregulated in ILV6 KO large signature
Prospective validation is the gold
standard
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
How does LEU2 affect LEU3 activity?
LEU3 binding sites
LEU2
mRNA expression
LEU2 LEU3
Surrogate marker for Leu3p activity
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
A framework for building causal networks
probabilistic
graphic models
Microarray data
Proteomic data
Genomics
Genetics
Medline Biocarta/Biopathway Biologists
Database
GUI Hypothesis, test
High throughput
data
knowledge
Metabolomic data
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Zhu et al, PLoS Biology, 2012
Yeast segregants
Synthetic complete
medium
Logorithm growth
Gene expression metabolites Y
east
seg
regan
ts
genotypes
Public
databases
Protein-
protein
interations
Transcription
factor binding
sites
Bayesian network
Protein
Metabolite
interations
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Zhu et al, PLoS Biology, 2012
Metabolite abundance is under genetic control
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
KEGG biochemical pathways
emdeemp ,)(
Zhu et al, PLoS Biology, 2012
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
LEU2 mRNA is causal to 2-isopropylmalate
KEGG pathway
Zhu et al, PLoS Biology, 2012
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
LEU3 regulation
• The activity of Leu3p is positively regulated by alpha-isopropylmalate (IPM), the product of the first step in leucine biosynthesis
Sze JY, et al. (1992) In vitro transcriptional activation by a metabolic intermediate: activation by Leu3 depends on alpha-isopropylmalate. Science 258(5085):1143-5
• The degree of activation by Leu3p is Leu3p concentration dependent, and it has been shown that LEU3 gene expression is regulated by general amino acid control, which is mediated by the GCN4 transcription factor
Zhou K, et al. (1987) Structure of yeast regulatory gene LEU3 and evidence that LEU3 itself is under general amino acid control. Nucleic Acids Res 15(13):5261-73
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
2-isopropylmalate: mechanism of causal
regulator LEU2
LEU2 genotype LEU2 activity 2-isopropylmalate
LEU3 activity Transcriptional response for
genes with LEU3 binding sites
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Zhu et al, PLoS Biology, 2012
Consistent with KEGG pathway
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
What else can you learn from integrating
metabolomic data? Metabolite QTLs Causal candidates
Protein degradation
Metabolite
Signature
size
KO
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Zhu et al, PLoS Biology, 2012
Is the transcriptional effect real?
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Zhu et al, PLoS Biology, 2012
PHM7-ko affects many metabolites
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Integration of CNV blocks into Bayesian networks
Network-based model selection
Random
gene
Tran et al. BMC Sys. Biol. 2011
Aknowledgements
UCLA workshop, July, 2013---Jun Zhu, Ph. D.
Sage Bionetworks
Stephen Friend et al.
Mount Sinai
Eric Schadt
Bin Zhang
Zhidong Tu
Decode
Valur Emilsson
U Washington
Roger Baumgarner
UCLA
Jake Lusis
Xia Yang, et al
Berkerley
Rachel Brem
Princeton
Lenoid Kruglyak
Harvard
Jun Liu
Merck
Qiuwei Xu
Ethan Xu
Theretha Zhang
Fred Hutchingson
Paddison lab
MD Anderson
Hanash lab
U Wisconsin
Alan Attie
Mark Keller, et al
Mount Sinai
Powell lab
Oh Lab
Casaccia