module networks sushmita roy bmi/cs 576 [email protected] nov 18 th & 20th, 2014
TRANSCRIPT
![Page 2: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/2.jpg)
RECAP
• Probabilistic graphical models (PGMs) provide a natural representation of molecular networks
• Learning PGMs from data requires us to solve two problems – Parameter estimation given a graph structure– Structure learning which requires us to learn both structure and parameters
• Bayesian networks are a type of PGMs popular for representing molecular networks
• Learning Bayesian networks from gene expression data requires additional considerations– Sparse candidate algorithm: heuristics to reduce the number of candidate
parents– Bootstrap to assess model confidence– Module networks
![Page 3: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/3.jpg)
What you should know
• Motivation of module networks• What is a module network?• Compare and contrast module network
learning algorithm to other network inference algorithms– E.g. Sparse candidate
![Page 4: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/4.jpg)
Module Networks
• Motivation: – Most complex systems have too many variables– Not enough data to robustly learn networks– Large networks are hard to interpret
• Key idea: Group similarly behaving variables into “modules” and learn the same parents and parameters for each module
• Relevance to gene regulatory networks– Genes that are co-expressed are likely regulated in
similar ways
Segal et al 2005
![Page 5: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/5.jpg)
Definition of a module
• Statistical definition (specific to module networks by Segal 2005)– A set of random variables that share a statistical
model• Biological definition of a module– Set of genes that are co-expressed and co-
regulated
![Page 6: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/6.jpg)
An expression module
Set of genes that behave similarly across conditions
Modules
Gasch & Eisen, 2002
Genes
Genes
Genes
![Page 7: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/7.jpg)
Key questions of Module Networks
• How to represent the Conditional Probability Distributions (CPD) for children?– Regression Tree
• How to learn module networks?
![Page 8: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/8.jpg)
Defining a Module Network
• A probabilistic graphical model over N random variables
• Set of module variables M1.. MK
• Module assignments A that specifies the module (1-to-K) for each Xi
• CPD per module P(Mj|PaMj), PaMj are parents of module Mj
– Each variable Xi in Mj has the same conditional distribution
![Page 9: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/9.jpg)
Bayesian network vs Module network
Each variable takes three values: UP, DOWN, SAME
![Page 10: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/10.jpg)
Bayesian network vs Module network
• Bayesian network– Different CPD per random variable– Learning only requires to search for parents
• Module network– CPD per module• Same CPD for all random variables in the same module
– Learning requires parent search and module membership assignment
![Page 11: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/11.jpg)
Learning a Module Network
• Given training dataset , fixed number of modules (K)
• Learn– Module assignments A of each variable to a
module – The parents of each module
![Page 12: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/12.jpg)
Score of a Module networkModule network
Data
K: number of modules, Xj : jth module PaMj Parents of module Mj
Likelihood of module j
![Page 13: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/13.jpg)
Module network learning algorithm
![Page 14: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/14.jpg)
Module assignment search
• Happens in two places• Module initialization– Interpret as clustering of the random variables
• Module re-assignment
![Page 15: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/15.jpg)
Module initialization as clustering of variables
for module network
![Page 16: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/16.jpg)
Module re-assignment
• Must preserve the acyclic graph structure• Must improve score• Module re-assignment happens using a
sequential update procedure:– Update only one variable at a time– The change in score of moving a variable from one
module to another while keeping the other variables fixed
![Page 17: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/17.jpg)
Module re-assignment via sequential update
![Page 18: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/18.jpg)
Representing the Conditional probability distribution
• Xi are continuous variables
• How to represent the distribution of Xi given the state of its parents?
• How to capture context-specific dependencies?
• Module networks use a regression tree
![Page 19: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/19.jpg)
Modeling the relationship between regulators and targets
• suppose we have a set of (8) genes that all have in their upstream regions the same activator/repressor binding sites
![Page 20: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/20.jpg)
A regression tree
• A rooted binary tree T• Each node in the tree is either an interior node
or a leaf node• Interior nodes are labeled with a binary test
Xi<u, u is a real number observed in the data• Leaf nodes are associated with univariate
distributions of the child
![Page 21: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/21.jpg)
A regression tree to capture a CPD
X1 > e1
X2 > e2
YES
NO
NO YESLeaf
Interior node
X3
X1 X2
Expression of gene represented by X3 modeled using Gaussians at each leaf node
e1, e2 are values seen in the data
![Page 22: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/22.jpg)
An example regression tree for a Module network
![Page 23: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/23.jpg)
A very simple regression tree
X2
X3
e1 e2
X2 > e1
X2 > e2
YESNO
NO YESX3
![Page 24: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/24.jpg)
Algorithm for growing a regression tree
• Input: dataset D, child variable Xi, candidate parents Ci of Xi
• Output: Tree T• Initialize T to a leaf node, estimated from all
samples of Xi
• While not converged– For every leaf node l in T
• Find with the best split at l• If split improves score
– add two leaf nodes, i and j below l– Update samples and parameters associated with , i and j
![Page 25: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/25.jpg)
Learning a regression tree• Assume we are searching for the parents of a variable X3 and it
already has two parents X1 and X2
• X4 will be considered using “split” operations of existing leaf nodes
X1 > e1
X2 > e2
YES
NO
NO YES
X4 > e3
NO YES
X1 > e1
X2 > e2
YES
NO
NO YES
N1 N2N3
Nl: Gaussian associated with leaf l
N2N3
N4N5
![Page 26: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/26.jpg)
Convergence in regression tree
• Depth of tree• Improvement in score• Maximum number of parents• Minimum number of samples per leaf node
![Page 27: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/27.jpg)
Assessing the value of using Module Networks
• Using simulated data– Generate data from a known module network – Known module network was in turn learned from real data
• 10 modules, 500 variables
– Evaluate using• Test data likelihood• Recovery of true parent-child relationships are recovered in learned
module network
• Using gene expression data– External validation of modules (Gene ontology, motif
enrichment)– Cross-check with literature
![Page 28: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/28.jpg)
Test data likelihood
Each line type represents size of training data
10 Modules is the best for almost all training data set sizes
![Page 29: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/29.jpg)
Recovery of graph structure
![Page 30: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/30.jpg)
Application of Module networks to yeast expression data
Segal et al, Regev, Pe’er, Gasch 2005
![Page 31: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/31.jpg)
Module networks has better performance than simple Bayesian network
Gain in test data likelihood over Bayesian network using expression data
![Page 32: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/32.jpg)
The Respiration and Carbon Module
Regulation tree
![Page 33: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/33.jpg)
Global View of Modules
• modules for common processes often share common– regulators– binding site motifs
![Page 34: Module networks Sushmita Roy BMI/CS 576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1f5503460f94c37264/html5/thumbnails/34.jpg)
Summary
• Module networks– A type of Bayesian network– Identifies modules (sets of similarly behaving random
variables) and learns parents for each module– Conditional probability distributions capture “rules” of
regulatory relationships– Learning requires inferring parent->module
relationships and module assignments• In practice give more realistic networks compared
to Bayesian networks