elucidating regulatory mechanisms downstream of a signaling pathway using informative experiments...

Download Elucidating regulatory mechanisms downstream of a signaling pathway using informative experiments Discussion leader: Navneet Scribe: James Computational

If you can't read please download the document

Upload: geraldine-porter

Post on 17-Jan-2018

222 views

Category:

Documents


0 download

DESCRIPTION

Environmental stimuli trigger signaling cascades, which regulate transcription. Suppose we qualitatively understand a signaling pathway. How can we reconstruct the underlying regulatory mechanisms? Must run targeted experiments. Experimental Design problem: Find the best set of experiments to run.

TRANSCRIPT

Elucidating regulatory mechanisms downstream of a signaling pathway using informative experiments Discussion leader: Navneet Scribe: James Computational Network Biology BMI 826/Computer Sciences 838 https://compnetbiocourse.discovery.wisc.edu By Ewa Szczurek, Irit Gat-Viks, Jerzy Tiuryn and Martin Vingron Molecular Systems Biology, 2009 Problem Overview Environmental stimuli trigger signaling cascades, which regulate transcription. Suppose we qualitatively understand a signaling pathway. How can we reconstruct the underlying regulatory mechanisms? Must run targeted experiments. Experimental Design problem: Find the best set of experiments to run. Detour: Lets Play Hangman Hangman Some word is chosen from a dictionary least, setal, slate, stale, steal, stela, taels, tales, teals, tesla (anagrams of aestl, from Scrabble dictionary) Guess a letter to identify its position in the chosen word. Strategy? Strategy Guessing a tells you its location in the word. We can partition the words based on this location. taels tales least slate stale teals setal steal stela tesla Guessing t gives us a different partition. taels tales teals tesla stale steal stela setalslateleast Which is better? a or t ? Strategy Guessing t gives us a partition with more classes, with fewer words per class, than guessing a. i.e., t distinguishes the words better than a. This is captured in the notion of Entropy. taels tales least slate stale teals setal steal stela tesla taels tales teals tesla stale steal stela setalslateleast vs. MEED Model Expansion Experiment Design Inputs 1.A logical model of the signaling pathway 2.A set of candidate experiments 3.A set of regulation functions [Note: No high throughput TF-DNA binding data required.] Signaling Pathway Model Stimulator variables Environmental signals States are different external stimulations Regulator variables Signaling molecules controlling transcription States: o Activated (+1) o Neutral (0) o Deactivated (-1) Signaling Pathway Model Regulation Function Determines state of a variable as a function upstream effectors state Structure Topology of this signaling network May be cyclic Stimulators have 0 in-degree Experiment Definition Stimulation States of all the stimulator variables i.e., all environmental signals applied in the experiment Perturbed variables Model variables (regulators) subject to perturbation [Note: At most one perturbed variable is allowed in MEED] Perturbation states Fixed state of perturbed variables in the experiment e.g., knockout (-1) or over-activation (+1) Logical Model Expt. Definition Predicted Model State Assignment of states to all the model variables. Unique predicted model state for acyclic models But cyclic models may have none or multiple model states Predicted Model State Regulation Functions Maps states of regulators to state of the regulatory target [Note: MEED only considers single- regulator functions.] There are 27 (=3 3 ) different possible regulation functions. Not all of them are biologically relevant. Regulatory Program Set of regulators and corresponding regulation functions Specify who regulates, and how respectively. Regulation Functions Regulatory Program Predicted Response Predicted Model State Upregulated Neutral Downregulated Predicted Profile Candidate Expts Predicted Response Predicted Profiles Distinguishing Regulatory Programs Recall how t distinguishes words in the Hangman example. taels tales teals tesla stale steal stela setalslateleast Similarly, an experiment or a set of experiments may distinguish regulatory programs. A set of experiments distinguishes two regulatory programs if their predicted profiles are different. Distinguishing Regulatory Programs e.g., e 2 distinguishes the regulatory programs f 2 (A) and f 3 (A) e.g., e 2 does not distinguish f 1 (A) and f 2 (A), but e 6 does. e.g., The set of experiments e 2, e 3 and e 6 does not distinguish f 1 (B) and f 3 (B) MEED tries to choose a minimal set of experiments from the candidate set that maximally distinguishes as the regulatory programs. Entropy Score MEED tries to choose a minimal set of experiments from the candidate set that maximally distinguishes as the regulatory programs. But general problem is NP-Hard. MEED uses a greedy heuristic based on Entropy score. Suppose a list E of experiments partitions r regulatory programs into C disjoint blocks with n c programs 1 c C Entropy is 0 when there is only one block with all programs. Entropy is log(r) if each block contains exactly one program. i.e., if all the regulatory programs are distinguished by E Intuitively, higher entropy means programs are more spread out into more blocks. Entropy gain from adding an experiment e to a list E is [Note: Not the same as H({e}).] Entropy Score Greedy heuristic Start with E = empty list of experiments. Find the experiment e with maximum entropy gain. Append e to the ordered list E. If no experiment has any entropy gain, stop. Provably approximates the optimal solution of the NP-hard problem within a factor logarithmic in r Output list may not distinguish all regulatory programs. Output list distinguishes as many regulatory programs as the candidate experiment set. MEED Algorithm Experimental Design Note: Everything so far is deterministic Note: We have not actually run any experiments or obtained any data yet. Everything is based on logical model predictions so far. MEED Greedy Algorithm Predicted Profiles Ordered List of Expts Expansion Regulatory Modules Assignment of target genes to best matching regulatory program Input Logical model List of experiments Experiment measurements (gene expression profiles) Probabilistic Matching Compare observed expression profiles with the predicted profiles and compute probability that they match. Considered a match if probability exceeds cutoff threshold p # of experiments with p = 0.7 Expansion Predicted Profiles Regulatory Modules Observed Profiles from Expts Evaluation Evaluation of Experiment Design Evaluation Metric: FUP Score Fraction of Undistinguished Pairs FUP is 0 when all regulatory programs are distinguished FUP is 1 when no regulatory programs are distinguished Intuitively, smaller FUP score means more program pairs are distinguished We want to minimize FUP score Based only on model predictions, no measurements needed Human Pathway Experiments Input Tests on four human signaling pathways Structure: 1000 cyclic models each, generated through random shuffling Regulators: All variables that are not stimulators Regulation function: Only activation both Human Pathway Experiments Alternative methods 1: INDEP same entropy measure, but list generated independently Alternative methods 2: Random network-based Perturbed variables chosen using topological features in structure Perturbation and stimulation states chosen randomly R-IN_DEGREE, R-OUT_DEGREE, R-CONNECTIONS, R-TOPOL Alternative methods 3: Hybrid network-based Perturbed variables chosen using topological features in structure Perturbation and stimulation states chosen using MEED M-IN_DEGREE, M-OUT_DEGREE, M-CONNECTIONS, M-TOPOL MEED outperforms INDEP Important to score sets of experiments together, rather than independently MEED outperforms Network-based methods Hybrid network-based methods outperform Random methods MEED useful in selecting states even when perturbation variables are fixed Yeast Pathway Experiments 2 stimulators: environmental osmotic concentration, pheromone 15 regulators: all variables except Hog scaffold 6 biologically relevant regulation functions (shown before) Therefore, 90 regulatory programs 25 candidate experiments from microarray databases MEED proposes 11 out of 25 experiments MEED outperforms INDEP and Network-based methods Hybrid network-based methods outperform Random methods MEED useful in selecting states even when perturbation variables are fixed Evaluation of Expansion Procedure Unambiguous module matches exactly one regulatory program Ambiguous module matches more than one regulatory program Ambiguity network Nodes regulatory programs that matched ambiguous modules size of ambiguous module Edges from size node to matching regulatory programs Example ambiguity network After 5 experiments, there is one ambiguous module with 58 genes matching 7 regulatory programs Evaluation of Expansion Procedure Evaluation Metric: Ambiguity score Average number of regulatory programs identified for each gene Intuitively, the more regulatory programs matching each ambiguous module, and the more genes it contains, the higher the overall ambiguity score. Utilizes experimental data, not just model predictions (unlike FUP) Comparison with extant methods: Barrett and Palsson More number of modules than B&P MEED mostly showed lower ambiguity score than alternatives (log scale) Other Results Some of the larger regulatory modules are significantly enriched by GO, so are functionally coherent During expansion using successive experiments, subgraphs in ambiguity networks break up, showing reduced ambiguity scores genes rarely get reassigned to different regulatory modules Using additional candidate experiments not in databases, MEED identified two experiments that can significantly reduce ambiguity beyond the prior 11 out of 25 Evaluation Advantages General framework for discovering regulatory modules downstream of a studied signaling path- way. MEED significantly reduces the lab effort needed to attain the same level of ambiguity in regulatory module assignment MEED consistently outperforms alternatives, including INDEP and network-based ones. MEED outperforms prior work such as Barrett, Idekker etc. Even though they are online, i.e., need results of one experiment to suggest the next one MEED does not require high throughput binding data. Logical model can encode expert knowledge succinctly. Extensions Signaling model is not probabilistic Should be possible to extend to probabilistic network models (e.g., Bayesian networks) Bayesian framework can reduce reliance on correctness of prior knowledge (signaling network model) Limitation to single-regulator programs is necessary only for data and computational complexity reasons Incorporating biologically relevant combinatorial regulatory programs may significantly improve regulatory module assignment MEED is linear in number of regulatory programs MEED only considers steady state regulation. What about dynamic? Can we incorporate high throughput experimental data?