reverse engineering of genetic networks (final presentation) ji won yoon (s0344084) supervised by...
TRANSCRIPT
![Page 1: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/1.jpg)
Reverse Engineering of Genetic Networks (Final presentation)
Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier.
MSc in Informatics at Edinburgh University,[email protected]
![Page 2: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/2.jpg)
Reverse Engineering
What is reverse engineering of gene network?
- Relevance Network- My own method - MCMC for Bayesian network
Missing gene
+ “up” and “down” data from micro array
![Page 3: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/3.jpg)
Past works Comparison of existing approaches to the
reverse engineering of genetic networks, Mutual information relevance networks My own method Bayesian networks using Markov Chain Monte Carlo
method
Applying all methods to synthetic data generated from a gene network simulator.
Applying to Biological data Diffuse large B cell Lymphoma gene expression data Arabidopsis gene expression data
![Page 4: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/4.jpg)
Relevance Network (Butte, 2000)
Using mutual informationMI(A, B) = H(A) – H(A|B)
= H(B) – H(B|A) = MI(B, A) -> SymmetricMI(A, B) = H(A) + H(B) – H(A, B)
Mutual information is zero if two genes are independent.
Pair wise relation
![Page 5: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/5.jpg)
Relevance Network (cont.)
Relevance Network (Butte, 2000) Useful only to local relation due to pair wise relation. Important to select proper threshold to get good relations.
Bootstrapping (Comparison of results in real data and in randomly permuted data)
Difficulty to identify the relation with two parents due to the locality. MI(A, [B, C, D])> MI(A, B)< MI(A, C)< MI(A, D)< Cannot detect XOR operation
MI(A, C) = MI(B, C) = 0
No direction of edges due to symmetric property Fast and light computation.
Useful for a number of genes
C D
A
B D
A
B C
A B C
0 0 0
0 1 1
1 0 1
1 1 0
![Page 6: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/6.jpg)
My method (using mutual information)
Based on Scale free network Crucial genes will have more connections than other genes.
A
B C D
A
B C D E
On insert new gene F, A will have more chance to have it than other genes.
G = (N, E)
'G= (N, E, L)(L is level information}
![Page 7: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/7.jpg)
My method (Insertion step)
Finding better parents and merging Clusters
a
MI(1, a) = 0.34
MI(5, a) = 0.35MI(6, a) = 0.31MI(7, a) = 0.4
MI(4, a) = 0.28
Threshold = 0.3
1
4
5
6
7
( )
1
4
5
3 8
2
6
a
10
12
7
9
S1
S2
S3
S4
11
![Page 8: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/8.jpg)
My method (Deletion step)
AssumptionThe network generated from insertion step of my method is in stationary state in marginal log likelihood except one edge, which is investigated to check the connection
Three case in an edge e X->Y, X<-Y, and X Y
P (D | M) = U * P (X | pa (X)) * P (Y | pa (Y))
YXYpaYPXpaXPU
YXYpaYPXpaXPU
YXYpaYPXpaXPU
))(|(*))(|(*
))(|(*))(|(*
))(|(*))(|(*
333
222
111
))(|(*))(|( YpaYPXpaXP ii
![Page 9: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/9.jpg)
My method (Deletion step)
X
Y
Graph G
e
A
C
D
H
I
B
E
F
YXHEYpaBAXpa
YXHEYpaYBAXpa
YXHEXYpaBAXpa
},{)(},,{)(
},{)(},,,{)(
},,{)(},,{)(
33
22
11
![Page 10: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/10.jpg)
My method
Mainly two stepsInsertion and deletion steps
e f g h
c
d
a
be f g hc
d
ab
= 0.5
efg
hc
da
be
f
g hd
= 0.4c
ab
ef
ghdc
a
be
fg
hdc
a
b
= 0.3
Continue up to t = 0.…
5T
6T
7T
10 T
… …
Insertion step Deletion step
![Page 11: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/11.jpg)
My method
AdvantageBased on Biological facts (Scale Free Network)
No need of thresholdsOnline approachScalabilityEasy to explore sub-networksFast computation
![Page 12: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/12.jpg)
My method Disadvantage
Input order dependency
Risky in exploring parents in data with big noise values. (It can be over-fitted to training data)
61 % edges are less order dependent(in part B)
![Page 13: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/13.jpg)
Bayesian network with MCMC Bayesian network
Problem 1 Problem 2
A
B C
D E
i
ii XpaXPEDCBAP ))(|(),,,,(
)|()|()|()|()( BEPBDPACPABPAP
)(
)()|()|(
DP
MPMDPDMP
M
MPMDP
MPMDP
)()|(
)()|(
Left: in large data set. Right: in small data set.
![Page 14: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/14.jpg)
Bayesian network with MCMCMCMC (Markov Chain Monte Carlo)
Inference rule for Bayesian Network
Sample from the posterior distribution
Proposal Move : Given M_old, propose a new network M_new with probability
Acceptance and Rejection :)|( oldnew MMQ
)|(
)|(
)()|(
)()|(,1min
oldnew
newold
oldold
newnewaccept MMQ
MMQ
MPMDP
MPMDPP
![Page 15: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/15.jpg)
Bayesian network with MCMC
![Page 16: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/16.jpg)
MCMC in Bayes Net toolbox
Hasting factorThe proposal probability is calculated from the number of neighbours of the model.
![Page 17: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/17.jpg)
Improvement of MCMCs
Fan-in The sparse data leads the prior probability to have a non-negligible influence on the posterior P(M|D). Limit the maximum number of edges converging on a node, fan-in. If FI(M) > a, P(M)=0. Otherwise, P(M)=1. The time complexity reduced largely
Acceptable configuration of child and parents in fan-in 3
AA A A
B B B CC D
A
B C D E
![Page 18: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/18.jpg)
Improvement of MCMCsDAG to CPDAG (DAG : Directed Acyclic Graph, CPDAG : Completed Partially Directed Acyclic Graph)
X Y X Y
P(X, Y) = P(X)P(X|Y) = P(Y|P(Y|X)
Set of all equivalent DAGs
DAG to CPDAG
DE is reversible others are compelled.
![Page 19: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/19.jpg)
Improvement of MCMCsThis CPDAG concept bring several advantages:
The space of equivalent classes is more reduced. It is easy to trap in local optimum in moving DAG spaces.
Incorporating CPDAG to MCMC
![Page 20: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/20.jpg)
MCMCMCTrapping
A B A : global optimaB : local optima
Easy to be trapped in local optima B. Multi chains with different temperatures will be useful to escape from it.
![Page 21: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/21.jpg)
MCMCMCTrapping
![Page 22: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/22.jpg)
MCMCMCA super chain, S
Acceptance ratios of a super chain
j
Tij
iji jMPMDPDSP
1)()( )()|()|()|()|( 11 iiii SSQSSQ
lk
kl
TllTkk
TllTkkiaccept
MPMDPMPMDP
MPMDPMPMDPP 11
11
)()|()()|(
)()|()()|(,1min
![Page 23: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/23.jpg)
Importance Sampling.
Partition function
Proposal distribution
Acceptance probability
We only case the prior distribution for acceptance.Importance Sampling is also combined with MCMCMC.
MCMCMC with Importance Sampling
: Likelihood for configuration of a node n and its parents
![Page 24: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/24.jpg)
Order MCMC It sample over total orders not over structures.
A
B C
A B CA C B
B C AB A CC A BC B A
![Page 25: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/25.jpg)
Order MCMC It sample over total orders not over structures.
Proposal move flipping two nodes of the previous order
Computational limitations Using candidate sets
Sets of parents with the highest scores in likelihood for each node Reduces the computation time.
![Page 26: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/26.jpg)
Order MCMC
![Page 27: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/27.jpg)
Order MCMC Selection features
We can extract the edges by approximating and averaging under the stationary distribution,
where
![Page 28: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/28.jpg)
Synthetical data
41th to 50th genes are not connected.
![Page 29: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/29.jpg)
Synthetic data
- MCMCMC with Importance Sampling has the best performance.- Order MCMC is the second.- Order MCMC is much faster than MCMCMC with Importance Sampling.
![Page 30: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/30.jpg)
Synthetic data
I changed one parameters for MCMC simulation.
1) Standard application(using standard parameters)
2) Change a noise value(Decrease noise value to 0.1)
3) Change a training data size(Decrease the size to 50)
4) Change the number of iterations(Increase the number to 50000)
Standard parameters ( MCMC in Bayes Net Toolbox )training data size:200, noise value:0.3, the number of iterations: 5000 (5000 samples and 5000 burn-ins)
![Page 31: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/31.jpg)
Synthetic data
![Page 32: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/32.jpg)
Synthetic data
Convergence
1) MCMC in BNT
2) MCMCMC Importance Sampling (IM)
3) MCMCMC ImportanceSampling (ID)
4) Order MCMC
1 2
3 4
training set size : 200noise : 0.3
5000 iterations.
![Page 33: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/33.jpg)
Synthetic data MCMCMC
(Burn-in# + Sample #)
Left: 5000 + 5000
Right: 100000 + 100000
Acceptance ratiosLeft: MCMC in BNT, Right : Order MCMC
Middle: MCMCMC with Importance Sampling
![Page 34: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/34.jpg)
Diffuse large B cell lymphoma Data
Data discretisation I used K-means algorithms to discretise gene expression levels for each genes since the stationary level for each gene can be different from others. (up, down and normal)
Problem of this discretisationIf there are too many noises,
the noises can make fluctuations
Finally, this method can not work well for gene3.
![Page 35: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/35.jpg)
Diffuse large B cell lymphoma Data
Comparison of convergence
MCMC in BNT MCMCMC withImportance Sampling(ID)
Order MCMC
# of genes : 27Training data size : 105Iterations : 20000
![Page 36: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/36.jpg)
Diffuse large B cell lymphoma Data
Comparison of Acceptance Ratios
The number of genes : 27, Training data size : 105, Iterations : 20000
![Page 37: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/37.jpg)
Gene expression inoculated by viruses
in susceptible Arabidopsis thaliana plants Viruses
• Cucumber mosaic cucumovirus• Oil seed rape tobamovirus• Turnip vein clearing tobamovirus• Potato virus X potexvirus• Turnip mosaic potyvirus
1)
2)
3)
4)
5)
1DAI 2DAI 3DAI 4DAI 5DAI 7DAI
Symptomoccurs.
DAI = Day after inoculationInoculation
Gene a
Training data : 127 genes with 20 data size ( 4 DAIs * 5 viruses )
![Page 38: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/38.jpg)
Gene expression inoculated by viruses
in susceptible Arabidopsis thaliana plants only for 20 genes (1DAI and 2DAI)
1000 samples from my method 10000 samples from MCMCMC with Importance Sampling(ID)
![Page 39: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/39.jpg)
Gene expression inoculated by viruses
in susceptible Arabidopsis thaliana plants127 genes
Genes with Higher
connectivity
Average global connectivity
= 1.5847
![Page 40: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/40.jpg)
Gene expression inoculated by viruses
in susceptible Arabidopsis thaliana plants127 genes
p-value check for transcription function
- f is the number of genes with j th function in 127 genes.
- m is the number of genes with j th function in 14 genes.
![Page 41: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/41.jpg)
Gene expression inoculated by viruses
in susceptible Arabidopsis thaliana plantsfor 127 genes from my method (100
samples)
![Page 42: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/42.jpg)
Conclusion
We need to select methodologies depending on the characteristics of training data.
To obtain the closest result to real networks, MCMCMC with Importance and Order MCMC are suitable.
MCMCMC with Importance Sampling has the best performance but it is slower than other MCMCs. Order MCMC has the second performance but it is four times faster than MCMCMC with Importance Sampling.
If we want to process large scale data and we do not have enough time to run MCMCs,
Relevance Network and My method are proper.
Also, several methods generate different networks so that combining them will give better results.
![Page 43: Reverse Engineering of Genetic Networks (Final presentation) Ji Won Yoon (s0344084) supervised by Dr. Dirk Husmeier. MSc in Informatics at Edinburgh University,](https://reader037.vdocument.in/reader037/viewer/2022110210/56649e605503460f94b5a1c0/html5/thumbnails/43.jpg)
Conclusion
Biological meaningTranscription genes have higher connectivities more than other genes (from my method). That is, genes with transcription function may act as hubs in a network for response against viruses in Arabidopsis thaliana plant.