simulation and application on learning gene causal relationships xin zhang
Post on 21-Dec-2015
217 views
TRANSCRIPT
Simulation and Application on learning gene causal relationships
Xin Zhang
Introduction• High-throughput genetic technologies empowers
to study how genes interact with each other; • Simulation to evaluate how well IC algorithm
learns gene causal relationships;• We present an algorithm (mIC algorithm) for
learning causal relationship with knowledge of topological ordering information, and apply it on Melanoma dataset;
• Apply mIC algorithm on Melanoma dataset;
Steps for Simulation Study
• Construct a causal network N;• Generate datasets based on the causal network;• Learning the simulated data using causal
algorithms (e.g. IC algorithm) to obtain network N´;
• Compare the original network N with obtained network N´ w.r.t precision and recall;
Modeling and simulation of a causal Boolean network (BN)
• Boolean network:A
C
B
f
C=f(A,B)
• Constructing a causal structure;• Assign parameters (proper functions) for each
node with casual parents;• Assign probability distribution;
Constructing Boolean Network
1. Generate M BNs with up to 3 causal parents for each node;
2. For each BN, generate a random proper function for each node;
3. Assign random probabilities for the root gene(s);
4. Given one configuration, get probability distribution;
5. Collect 200 data points for each network;
6. Repeat above steps 3-5 for all M networks.
Constructing Causal Structure
A
C
B
E
D
Steps for constructing causal structure
Proper function (1)
Proper function: The function that reflects the influence of the operators.
Example:
By simplifying f, c is a function of a with c = a
b is a pseudo predictor of c, and has no effect on c.
f is not a proper function.
Proper function (2)
• Definition:
With n predictors, the number of proper function is given by:
Probability Distribution
Generating dataset
Steps of learning gene causal relationships
• Step1: obtain the probability distribution and data sampling;
• Step2: apply algorithms to find causal relations;• Step3: compare the original and obtained networks
based on the two notions of precision and recall;• Step4: repeat step 1-3 for every random network;
Comparing two networks
A
DC
B A
DC
B
Original Network Obtained Network
Precision and Recall
• Original graph is a DAG, while obtained graph has both directed and undirected edges;
Orig Graph Obt. Graph
FN
TP
TN
FP
PFN, PTP
PTN, PFP
Recall = ATP/(AFN+ATP), Precision = ATP/(ATP + AFP)
Observational equivalence and Transitive Closure
• Two DAGs are said to be observational equivalent (OE) if they have the same skeleton and the same set of v-structure;
A
DC
B A
DC
BOE
Transitive closure (TC): A ->B -> C with A -> C
cc(x,y): is true if there is a directed or an undirected edge from x to y;
pcc(x,y): is true if there is a path from x to y consisting of properly directed and undirected edges
pcc(x,y):= cc(x,y) | pcc(x,z) pcc(z,y)
Result for IC algorithm
How to improve IC algorithm
• The original IC algorithm did not have good results on learning gene causal relationships;
• A possible way to improve the performance is to incorporate extra information;
• If we know the topological ordering of the regulatory network, it would be helpful to improve the learning result;
Gene topological ordering
• If a specific gene is the causal parent of another gene;
• In a pathway, if one gene appears before another gene;
• If one gene is at the beginning or at the end of the pathway;
IC algorithm + topological ordering information
mIC algorithm
• mIC algorithm based on IC, but incorporates both topological ordering information with steady state data to infer causality;
• 3 Steps of mIC algorithm:– Find conditional independence:
For each pair of gene gi and gj in a dataset, test pairwise conditional independence. If they are dependent, search for a set
Sij = {gk | gi and gj are independent given gk, with i<k<j, or j<k<i}.
Construct an undirected graph G such that gi and gj are connected with an edge if an only if they are pairwise dependent and no Sij can be found;
– Find v-structure:
For each pair of nonadjacent genes gi and gj with common neighbor gk, if gk Sij, and k>i, k>j, add arrowheads pointing at gk, such as gi ->gk <- gj;
– Orientate more directed edges according to rules:
Orientate the undirected edges without creating new cycles and v-structures;
Results from mIC algorithm
Melanoma dataset
• The 10 genes involved in this study chosen from 587 genes from the melonoma data;
• Previous studies show that WNT5A has been identified as a gene of interest involved in melanoma;
• Controlling the influence of WNT5A in the regulation can reduce the chance of melanoma metastasizing;
Applying mIC algorithm on Melanoma Dataset
WNT5A
Partial biological prior knowledge:MMP3 is expected to be the end of the
pathway
Pirin causatively influences WNT5A – In order to maintain the level of
WNT5A we need to directly control WNT5A or through pirin.
WNT5A directly causes MART-1
Conclusion• Evaluated IC algorithm using simulation data;• We presented mIC algorithm that can infer gene causal
relationship from steady state data with gene topological ordering information;
• Performed simulation based on Boolean network to evaluate the performance of the causal algorithms;
• We applied mIC algorithm to real biological microarray data Melanoma dataset;
• The result showed that some of the important causal relationships associated with WNT5A gene have been identified using mIC algorithm.