probabilistic modelling in computational biology
Post on 01-Jan-2016
23 Views
Preview:
DESCRIPTION
TRANSCRIPT
Probabilistic modelling in computational biology
Dirk Husmeier
Biomathematics & Statistics Scotland
James Watson & Francis Crick, 1953
Frederick Sanger, 1980
Network reconstruction from postgenomic data
Model Parameters q
Friedman et al. (2000), J. Comp. Biol. 7, 601-620
Marriage between
graph theory
and
probability theory
Bayes net
ODE model
Model Parameters q
Probability theory Likelihood
Model Parameters q
Bayesian networks: integral analytically tractable!
UAI 1994
Identify the best network structure
Ideal scenario: Large data sets, low noise
Uncertainty about the best network structure
Limited number of experimental replications, high noise
Sample of high-scoring networks
Sample of high-scoring networks
Feature extraction, e.g. marginal posterior probabilities of the edges
High-confident edge
High-confident non-edge
Uncertainty about edges
Number of structures
Number of nodes
Sampling with MCMC
Madigan & York (1995), Guidici & Castello (2003)
Overview
• Introduction
• Limitations
• Methodology
• Application to morphogenesis
• Application to synthetic biology
Homogeneity assumption
Interactions don’t change with time
Limitations of the homogeneity assumption
Example: 4 genes, 10 time points
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
Supervised learning. Here: 2 components
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
Changepoint model
Parameters can change with time
Changepoint model
Parameters can change with time
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
Unsupervised learning. Here: 3 components
Extension of the model
q
Extension of the model
q
Extension of the model
q
k
h
Number of components (here: 3)
Allocation vector
Analytically integrate out the parameters
q
k
h
Number of components (here: 3)
Allocation vector
P(network structure | changepoints, data)
P(changepoints | network structure, data)
Birth, death, and relocation moves
RJMCMC within Gibbs
Dynamic programming, complexity N2
Collaboration with the Institute of
Molecular Plant Sciences at Edinburgh University (Andrew Millar’s group)
- Focus on: 9 circadian genes: LHY, CCA1, TOC1, ELF4,
ELF3, GI, PRR9, PRR5, and PRR3
- Transcriptional profiles at 4*13 time points in 2h intervals under constant light for
- 4 experimental conditions
Circadian rhythms in Arabidopsis thaliana
Comparison with the literature
PrecisionProportion of identified interactions that
are correct
Recall = Sensitivity Proportion of true interactions that we
successfully recovered
SpecificityProportion of non-interactions that are
successfully avoided
CCA1
LHY
PRR9
GI
ELF3
TOC1
ELF4
PRR5
PRR3
False negative
Which interactions from the literature are found?
True positive
Blue: activations
Red:Inhibitions
True positives (TP) = 8
False negatives (FN) = 5
Recall= 8/13= 62%
Which proportion of predicted interactions are confirmed by the literature?
False positives
Blue: activationsRed: Inhibitions
True positive
True positives (TP) = 8
False positives (FP) = 13
Precision = 8/21= 38%
Precision= 38%
CCA1
LHY
PRR9
GI
ELF3
TOC1
ELF4
PRR5
PRR3
Recall= 62%
True positives (TP) = 8
False positives (FP) = 13
False negatives (FN) = 5
True negatives (TN) = 9²-8-13-5= 55
Sensitivity = TP/[TP+FN] = 62%
Specificity = TN/[TN+FP] = 81%
Recall
Proportion of avoided non-interactions
Model extension So far: non-stationarity in the
regulatory process
Non-stationarity in the network structure
Flexible network structure .
Model Parameters q
Model Parameters q
Use prior knowledge!
Flexible network structure .
Flexible network structure with regularization
Hyperparameter
Normalization factor
Flexible network structure with regularization
Exponential priorversus
Binomial prior with conjugate beta
hyperprior
NIPS 2010
Overview
• Introduction
• Limitations
• Methodology
• Application to morphogenesis
• Application to synthetic biology
Morphogenesis in Drosophila melanogaster
• Gene expression measurements at 66 time points during the life cycle of Drosophila (Arbeitman et al., Science, 2002).
• Selection of 11 genes involved in muscle development.
Zhao et al. (2006),
Bioinformatics 22
Can we learn the morphogenetic transitions: embryo larva
larva pupa pupa
adult ?
Average posterior probabilities of transitions
Morphogenetic transitions: Embryo larva larva pupa pupa adult
Can we learn changes in the regulatory network structure ?
Overview
• Introduction
• Limitations
• Methodology
• Application to morphogenesis
• Application to synthetic biology
Can we learn the switch Galactose Glucose?
Can we learn the network structure?
Task 1:Changepoint detection
Switch of the carbon source:Galactose Glucose
Galactose Glucose
Task 2:Network reconstruction
PrecisionProportion of identified interactions
that are correct
Recall Proportion of true interactions that
we successfully recovered
BANJO: Conventional homogeneous DBN TSNI: Method based on differential equations
Inference: optimization, “best” network
Sample of high-scoring networks
Sample of high-scoring networks
Marginal posterior probabilities of the edges
P=1
P=0
P=0.5
P=1
True network
Thresh 0.9
Prec 1
Recall 1/2
PrecisionRecall
P=1 P=0.5
True network
Thresh 0.9 0.4
Prec 1 2/3
Recall 1/2 1
PrecisionRecall
P=1
P=0
P=0.5
True network
Thresh 0.9 0.4 -0.01
Prec 1 2/3 1/2
Recall 1/2 1 1
PrecisionRecall
Future work
How are we getting from here …
… to there ?!
Input:Learn:MCMC
Prior knowledge
top related