problem limited number of experimental replications. postgenomic data intrinsically noisy. poor...

67

Upload: lynne-phillips

Post on 05-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction
Page 2: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction
Page 3: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Problem

• Limited number of experimental replications.

• Postgenomic data intrinsically noisy.

• Poor network reconstruction.

Page 4: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Problem

• Limited number of experimental replications.

• Postgenomic data intrinsically noisy.

• Can we improve the network reconstruction by systematically integrating different sources of biological prior knowledge?

Page 5: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction
Page 6: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

+

Page 7: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

+

+

Page 8: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

+

+

+

+…

Page 9: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

• Which sources of prior knowledge are reliable?

• How do we trade off the different sources of prior knowledge against each other and against the data?

Page 10: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Overview of the talk

• Revision: Bayesian networks

• Integration of prior knowledge

• Empirical evaluation

Page 11: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Overview of the talk

• Revision: Bayesian networks

• Integration of prior knowledge

• Empirical evaluation

Page 12: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Bayesian networks

A

CB

D

E F

NODES

EDGES

•Marriage between graph theory and probability theory.

•Directed acyclic graph (DAG) representing conditional independence relations.

•It is possible to score a network in light of the data: P(D|M), D:data, M: network structure.

•We can infer how well a particular network explains the observed data.

),|()|(),|()|()|()(

),,,,,(

DCFPDEPCBDPACPABPAP

FEDCBAP

Page 13: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction
Page 14: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Bayesian networks versus causal networks

Bayesian networks represent conditional (in)dependence relations - not necessarily causal interactions.

Page 15: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Bayesian networks versus causal networks

A

CB

A

CB

True causal graph

Node A unknown

Page 16: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Bayesian networks versus causal networks

A

CB

• Equivalence classes: networks with the same scores: P(D|M).

• Equivalent networks cannot be distinguished in light of the data.

A

CB

A

CB

A

CB

Page 17: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Symmetry breaking

A

CB

Prior knowledge

A

CB

A

CB

A

CB

P(M|D) = P(D|M) P(M) / Z

D: data. M: network structure

Page 18: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

P(D|M)

Page 19: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Prior knowledge:

B is a transcription factor with binding sites in the upstream regions of A and C

P(M)

Page 20: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

P(M|D) ~ P(D|M) P(M)

Page 21: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Learning Bayesian networks

P(M|D) = P(D|M) P(M) / Z

M: Network structure. D: Data

Page 22: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction
Page 23: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction
Page 24: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Overview of the talk

• Revision: Bayesian networks

• Integration of prior knowledge

• Empirical evaluation

Page 25: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction
Page 26: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Use TF binding motifs in promoter sequences

Page 27: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Biological prior knowledge matrix

Biological Prior Knowledge

Indicates some knowledge aboutthe relationship between genes i and j

Page 28: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Biological prior knowledge matrix

Biological Prior Knowledge

Define the energy of a Graph G

Indicates some knowledge aboutthe relationship between genes i and j

Page 29: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Notation

• Prior knowledge matrix:

P B (for “belief”)

• Network structure:

G (for “graph”) or M (for “model”)

• P: Probabilities

Page 30: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Prior distribution over networks

Energy of a network

Page 31: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Sample networks and hyperparameters from the posterior distribution • Capture intrinsic inference uncertainty• Learn the trade-off parameters automatically

P(M|D) = P(D|M) P(M) / Z

Page 32: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Prior distribution over networks

Energy of a network

Page 33: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Rewriting the energy

Energy of a network

Page 34: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Approximation of the partition function

Partition function of a perfect gas

Page 35: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Multiple sources of prior knowledge

Page 36: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

MCMC sampling scheme

Page 37: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Sample networks and hyperparameters from the posterior distribution

Metropolis-Hastings scheme

Proposal probabilities

Page 38: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Bayesian networkswith biological prior knowledge

•Biological prior knowledge: Information about the interactions between the nodes.

•We use two distinct sources of biological prior knowledge.

•Each source of biological prior knowledge is associated with its own trade-off parameter: 1 and 2.

•The trade off parameter indicates how much biological prior information is used.

•The trade-off parameters are inferred. They are not set by the user!

Page 39: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Bayesian networkswith two sources of prior

Data

BNs + MCMC

Recovered Networks and trade off parameters

Source 1 Source 2

1 2

Page 40: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Bayesian networkswith two sources of prior

Data

BNs + MCMC

Source 1 Source 2

1 2

Recovered Networks and trade off parameters

Page 41: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Bayesian networkswith two sources of prior

Data

BNs + MCMC

Source 1 Source 2

1 2

Recovered Networks and trade off parameters

Page 42: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Overview of the talk

• Revision: Bayesian networks

• Integration of prior knowledge

• Empirical evaluation

Page 43: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Evaluation

• Can the method automatically evaluate how useful the different sources of prior knowledge are?

• Do we get an improvement in the regulatory network reconstruction?

• Is this improvement optimal?

Page 44: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Raf regulatory network

From Sachs et al Science 2005

Page 45: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Raf regulatory network

Page 46: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Evaluation: Raf signalling pathway

• Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell

• Deregulation carcinogenesis

• Extensively studied in the literature gold standard network

Page 47: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

DataPrior knowledge

Page 48: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Flow cytometry data

• Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins

• 5400 cells have been measured under 9 different cellular conditions (cues)

• Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments

Page 49: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Microarray example Spellman et al (1998)Cell cycle73 samples

Tu et al (2005)Metabolic cycle36 samples

Ge

nes

Ge

nes

time time

Page 50: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

DataPrior knowledge

Page 51: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

KEGG PATHWAYS are a collection of manually drawn pathway maps representing our knowledge of molecular interactions and reaction networks.

http://www.genome.jp/kegg/

Flow cytometry data and KEGG

Page 52: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Prior knowledge from KEGG

Page 53: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Prior distribution

Page 54: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

The data and the priors

+ KEGG

+ Random

Page 55: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Evaluation

• Can the method automatically evaluate how useful the different sources of prior knowledge are?

• Do we get an improvement in the regulatory network reconstruction?

• Is this improvement optimal?

Page 56: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Bayesian networkswith two sources of prior

Data

BNs + MCMC

Recovered Networks and trade off parameters

Source 1 Source 2

1 2

Page 57: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Bayesian networkswith two sources of prior

Data

BNs + MCMC

Source 1 Source 2

1 2

Recovered Networks and trade off parameters

Page 58: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Sampled values of the

hyperparameters

Page 59: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Evaluation

• Can the method automatically evaluate how useful the different sources of prior knowledge are?

• Do we get an improvement in the regulatory network reconstruction?

• Is this improvement optimal?

Page 60: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

How can we evaluate the reconstruction accuracy?

Page 61: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction
Page 62: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Flow cytometry data and KEGG

Page 63: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Evaluation

• Can the method automatically evaluate how useful the different sources of prior knowledge are?

• Do we get an improvement in the regulatory network reconstruction?

• Is this improvement optimal?

Page 64: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Learning the trade-off hyperparameter

• Repeat MCMC simulations for large set of fixed hyperparameters β

• Obtain AUC scores for each value of β

• Compare with the proposed scheme in which β is automatically inferred.

Mean and standard deviation of the sampled trade off parameter

Page 65: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction
Page 66: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Conclusion• Bayesian scheme for the systematic

integration of different sources of biological prior knowledge.

• The method can automatically evaluate how useful the different sources of prior knowledge are.

• We get an improvement in the regulatory network reconstruction.

• This improvement is close to optimal.

Page 67: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction

Thank you