probabilistic modelling in computational biology

77
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland

Upload: lucy-burt

Post on 01-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Dirk Husmeier. Probabilistic modelling in computational biology. Biomathematics & Statistics Scotland. James Watson & Francis Crick, 1953. Frederick Sanger, 1980. Network reconstruction from postgenomic data. Model Parameters q. Marriage between - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Probabilistic modelling in computational biology

Probabilistic modelling in computational biology

Dirk Husmeier

Biomathematics & Statistics Scotland

Page 2: Probabilistic modelling in computational biology

James Watson & Francis Crick, 1953

Page 3: Probabilistic modelling in computational biology

Frederick Sanger, 1980

Page 4: Probabilistic modelling in computational biology
Page 5: Probabilistic modelling in computational biology
Page 6: Probabilistic modelling in computational biology
Page 7: Probabilistic modelling in computational biology

Network reconstruction from postgenomic data

Page 8: Probabilistic modelling in computational biology

Model Parameters q

Page 9: Probabilistic modelling in computational biology

Friedman et al. (2000), J. Comp. Biol. 7, 601-620

Marriage between

graph theory

and

probability theory

Page 10: Probabilistic modelling in computational biology

Bayes net

ODE model

Page 11: Probabilistic modelling in computational biology

Model Parameters q

Probability theory Likelihood

Page 12: Probabilistic modelling in computational biology

Model Parameters q

Bayesian networks: integral analytically tractable!

Page 13: Probabilistic modelling in computational biology

UAI 1994

Page 14: Probabilistic modelling in computational biology

Identify the best network structure

Ideal scenario: Large data sets, low noise

Page 15: Probabilistic modelling in computational biology

Uncertainty about the best network structure

Limited number of experimental replications, high noise

Page 16: Probabilistic modelling in computational biology

Sample of high-scoring networks

Page 17: Probabilistic modelling in computational biology

Sample of high-scoring networks

Feature extraction, e.g. marginal posterior probabilities of the edges

High-confident edge

High-confident non-edge

Uncertainty about edges

Page 18: Probabilistic modelling in computational biology

Number of structures

Number of nodes

Sampling with MCMC

Page 19: Probabilistic modelling in computational biology

Madigan & York (1995), Guidici & Castello (2003)

Page 20: Probabilistic modelling in computational biology
Page 21: Probabilistic modelling in computational biology

Overview

• Introduction

• Limitations

• Methodology

• Application to morphogenesis

• Application to synthetic biology

Page 22: Probabilistic modelling in computational biology

Homogeneity assumption

Interactions don’t change with time

Page 23: Probabilistic modelling in computational biology

Limitations of the homogeneity assumption

Page 24: Probabilistic modelling in computational biology

Example: 4 genes, 10 time points

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10

X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10

X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10

X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10

Page 25: Probabilistic modelling in computational biology

Supervised learning. Here: 2 components

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10

X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10

X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10

X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10

Page 26: Probabilistic modelling in computational biology

Changepoint model

Parameters can change with time

Page 27: Probabilistic modelling in computational biology

Changepoint model

Parameters can change with time

Page 28: Probabilistic modelling in computational biology

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10

X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10

X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10

X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10

Unsupervised learning. Here: 3 components

Page 29: Probabilistic modelling in computational biology

Extension of the model

q

Page 30: Probabilistic modelling in computational biology

Extension of the model

q

Page 31: Probabilistic modelling in computational biology

Extension of the model

q

k

h

Number of components (here: 3)

Allocation vector

Page 32: Probabilistic modelling in computational biology

Analytically integrate out the parameters

q

k

h

Number of components (here: 3)

Allocation vector

Page 33: Probabilistic modelling in computational biology
Page 34: Probabilistic modelling in computational biology

P(network structure | changepoints, data)

P(changepoints | network structure, data)

Birth, death, and relocation moves

RJMCMC within Gibbs

Page 35: Probabilistic modelling in computational biology

Dynamic programming, complexity N2

Page 36: Probabilistic modelling in computational biology
Page 37: Probabilistic modelling in computational biology

Collaboration with the Institute of

Molecular Plant Sciences at Edinburgh University (Andrew Millar’s group)

- Focus on: 9 circadian genes: LHY, CCA1, TOC1, ELF4,

ELF3, GI, PRR9, PRR5, and PRR3

- Transcriptional profiles at 4*13 time points in 2h intervals under constant light for

- 4 experimental conditions

Circadian rhythms in Arabidopsis thaliana

Page 38: Probabilistic modelling in computational biology

Comparison with the literature

PrecisionProportion of identified interactions that

are correct

Recall = Sensitivity Proportion of true interactions that we

successfully recovered

SpecificityProportion of non-interactions that are

successfully avoided

Page 39: Probabilistic modelling in computational biology

CCA1

LHY

PRR9

GI

ELF3

TOC1

ELF4

PRR5

PRR3

False negative

Which interactions from the literature are found?

True positive

Blue: activations

Red:Inhibitions

True positives (TP) = 8

False negatives (FN) = 5

Recall= 8/13= 62%

Page 40: Probabilistic modelling in computational biology

Which proportion of predicted interactions are confirmed by the literature?

False positives

Blue: activationsRed: Inhibitions

True positive

True positives (TP) = 8

False positives (FP) = 13

Precision = 8/21= 38%

Page 41: Probabilistic modelling in computational biology

Precision= 38%

CCA1

LHY

PRR9

GI

ELF3

TOC1

ELF4

PRR5

PRR3

Recall= 62%

Page 42: Probabilistic modelling in computational biology

True positives (TP) = 8

False positives (FP) = 13

False negatives (FN) = 5

True negatives (TN) = 9²-8-13-5= 55

Sensitivity = TP/[TP+FN] = 62%

Specificity = TN/[TN+FP] = 81%

Recall

Proportion of avoided non-interactions

Page 43: Probabilistic modelling in computational biology

Model extension So far: non-stationarity in the

regulatory process

Page 44: Probabilistic modelling in computational biology

Non-stationarity in the network structure

Page 45: Probabilistic modelling in computational biology

Flexible network structure .

Page 46: Probabilistic modelling in computational biology

Model Parameters q

Page 47: Probabilistic modelling in computational biology

Model Parameters q

Use prior knowledge!

Page 48: Probabilistic modelling in computational biology

Flexible network structure .

Page 49: Probabilistic modelling in computational biology

Flexible network structure with regularization

Hyperparameter

Normalization factor

Page 50: Probabilistic modelling in computational biology

Flexible network structure with regularization

Exponential priorversus

Binomial prior with conjugate beta

hyperprior

Page 51: Probabilistic modelling in computational biology

NIPS 2010

Page 52: Probabilistic modelling in computational biology

Overview

• Introduction

• Limitations

• Methodology

• Application to morphogenesis

• Application to synthetic biology

Page 53: Probabilistic modelling in computational biology

Morphogenesis in Drosophila melanogaster

• Gene expression measurements at 66 time points during the life cycle of Drosophila (Arbeitman et al., Science, 2002).

• Selection of 11 genes involved in muscle development.

Zhao et al. (2006),

Bioinformatics 22

Page 54: Probabilistic modelling in computational biology

Can we learn the morphogenetic transitions: embryo larva

larva pupa pupa

adult ?

Page 55: Probabilistic modelling in computational biology

Average posterior probabilities of transitions

Morphogenetic transitions: Embryo larva larva pupa pupa adult

Page 56: Probabilistic modelling in computational biology
Page 57: Probabilistic modelling in computational biology

Can we learn changes in the regulatory network structure ?

Page 58: Probabilistic modelling in computational biology
Page 59: Probabilistic modelling in computational biology

Overview

• Introduction

• Limitations

• Methodology

• Application to morphogenesis

• Application to synthetic biology

Page 60: Probabilistic modelling in computational biology
Page 61: Probabilistic modelling in computational biology
Page 62: Probabilistic modelling in computational biology

Can we learn the switch Galactose Glucose?

Can we learn the network structure?

Page 63: Probabilistic modelling in computational biology

Task 1:Changepoint detection

Switch of the carbon source:Galactose Glucose

Page 64: Probabilistic modelling in computational biology

Galactose Glucose

Page 65: Probabilistic modelling in computational biology

Task 2:Network reconstruction

PrecisionProportion of identified interactions

that are correct

Recall Proportion of true interactions that

we successfully recovered

Page 66: Probabilistic modelling in computational biology

BANJO: Conventional homogeneous DBN TSNI: Method based on differential equations

Inference: optimization, “best” network

Page 67: Probabilistic modelling in computational biology
Page 68: Probabilistic modelling in computational biology

Sample of high-scoring networks

Page 69: Probabilistic modelling in computational biology

Sample of high-scoring networks

Marginal posterior probabilities of the edges

P=1

P=0

P=0.5

Page 70: Probabilistic modelling in computational biology

P=1

True network

Thresh 0.9

Prec 1

Recall 1/2

PrecisionRecall

Page 71: Probabilistic modelling in computational biology

P=1 P=0.5

True network

Thresh 0.9 0.4

Prec 1 2/3

Recall 1/2 1

PrecisionRecall

Page 72: Probabilistic modelling in computational biology

P=1

P=0

P=0.5

True network

Thresh 0.9 0.4 -0.01

Prec 1 2/3 1/2

Recall 1/2 1 1

PrecisionRecall

Page 73: Probabilistic modelling in computational biology
Page 74: Probabilistic modelling in computational biology

Future work

Page 75: Probabilistic modelling in computational biology

How are we getting from here …

Page 76: Probabilistic modelling in computational biology

… to there ?!

Page 77: Probabilistic modelling in computational biology

Input:Learn:MCMC

Prior knowledge