rna secondary structure prediction using stochastic context –free grammar and evolutionary history...
TRANSCRIPT
![Page 1: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/1.jpg)
RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar
And Evolutionary History
B. Knudsen and J. HeinDepartment of Genetics and Ecology
The institute of Biological Sciences
University of Aarhus, Denmark
Presented by Jing CuiNov.22, 2002
![Page 2: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/2.jpg)
Outline of the Lecture
• Introduction• Algorithms
The grammar
Probabilities of columns
Probabilities of an alignment
The full model
• ImplementationThe database
Frequencies
Mutation rates
Grammar parameters
• Results
The test sequencesUsing related
sequencesNeglecting phylogenyWeight of resultsComparison with
other methods
• Conclusion
The limitationsThe improvements
![Page 3: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/3.jpg)
Introduction
• Single sequence e.g. Zuker(1989)
using prior information on RNA structures, through energy functions
not ideal when estimating structures of sequences with known homologs
• Multiple sequences1. Covariance methods (Eddy and Durbin, 1994)
2. Profile stochastic context-free grammars (SCFGs Sakakibara et al. 1994)
Characteristics: do not explicitly take phylogeny into account, and do not use a prior probability distribution of structures
• Maximum weighted matching methods(Cary and Stormo, 1995; Tabaska et al. 1998)
share the above characteristics
![Page 4: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/4.jpg)
• The method used in this paper uses prior knowledge about RNA structure in making a maximum a posteriori
(MAP) estimation of the 2nd structure.
Performed on an alignment of sequences assumed to have identical 2nd structure, i.e. the alignment is assumed to be a structural alignment.
Take the phylogenetic tree of the sequences into account, including branch lengths, using a model of mutation processes in RNA.
The tree can be estimated by a maximum likelihood (ML) method.
• Originating from Goldman et al. (1996)
Predicting protein 2nd structure using HMMs including phylogenetic information
• Difference2nd structure in RNA are not local, like in proteins
SCFGs instead of HMM is used here
• LimitationSCFGs are unable to model crossing interactions, thus pseudoknots cannot be predicted
![Page 5: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/5.jpg)
Algorithm
• Input
an alignment of RNA sequences
• Output single common structure for the sequences
• The modelThe SCFG
The evolutionary model
![Page 6: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/6.jpg)
•The grammara set of variables; some terminal and non-terminal
![Page 7: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/7.jpg)
![Page 8: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/8.jpg)
•Probabilities of columns• Given the tree
A column of non-paring bases is independent of the other columns
Two paring columns is assumed to be independent of any other columns
• P = (pA, pU, pG, pC) the distribution of bases in loop regions of RNA sequences
• The rate matrix
• For base pair (16 by 16) rate matrix
• Given a tree, including branch lengths, the column probabilities are calculated using post-order traversal as described by Felsenstein (1981)
Reversibility of mutations
![Page 9: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/9.jpg)
• Probability of an alignment
The input data: D=(C1, C2, …,Cl)
The model: M
The tree: T
2nd structure: σ
s: a single base
d: a left column of pairs
dc: the right column of the pair
![Page 10: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/10.jpg)
•The core model•The SCFG
•The evolutionary model
The grammar is equivalent to a grammar that generates column in alignments instead of just secondary structure, meaning that for a two-sequence alignment, the production rule covers the following rules:
![Page 11: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/11.jpg)
• The full modelThe ML estimate of the tree, given the model (If no phylogenetic tree)
MAP (Maximum a posteriori) estimation of the most likely 2nd structure
by Bayes theorem
where
P(σ|T,M) is the prior distribution of structures given by the SCFG
![Page 12: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/12.jpg)
Implementation
• The database•The database used for estimating this model should represent RNA structure in general.
• The database should be composed of various types of RNA.
tRNAs database by Sprinzl et al. (1998)
ribosomal RNAs (LSU rRNAs) by De Rijk et al. (1998)
![Page 13: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/13.jpg)
•Frequencies• The single base frequencies were estimated from counts of the bases in the single base positions of the sequences.
• Base pair frequencies were estimated by counting base pairs.
![Page 14: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/14.jpg)
•Mutation rates
For a given pair, P,
tp : the time between sequences
Np: the number of columns in the two-sequence alignment
Ps: the prob. of a base being in a single base position
![Page 15: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/15.jpg)
![Page 16: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/16.jpg)
![Page 17: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/17.jpg)
•Grammar parameters
by inside-outside algorithm (an expectation maximization procedure)on the training set et of secondary structure (Baker, 1979; Lari and Young, 1990)
This is just like the forward-backward algorithm in HMM !!!
![Page 18: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/18.jpg)
Results• The test sequences4 bacterial RNase P RNA seq.
alignment: 385 columns
pair-wise sequence identities 65-92%
![Page 19: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/19.jpg)
• Pseudoknot 68-76 and 368-361; 18-12 and 370-364
• At least 22 positions wrongly predicted in each sequence
![Page 20: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/20.jpg)
•Using related sequences
![Page 21: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/21.jpg)
![Page 22: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/22.jpg)
•Weight of results
by inside and outside variables, calculate the probability that each position is correctly predicted.How certainty the predictions are, assuming that the model is correct.
![Page 23: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/23.jpg)
• Comparison with other methods
• The energy minimization method has more parameters, better results
• COVE (Eddy and Durbin, 1994) with lower accuracy
• This shows the significance of the method described here in situations where only a few sequences are known.
![Page 24: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/24.jpg)
Conclusion
• Limitations• Inability to predict pseudoknots.
• Loop and stem lengths are assumed to be geometrically distributed
The nature of the specific SCFG used here
• A good alignment is needed – hard to solve
• The dynamical programming algorithms are relatively slow. [They have a time complexity of O(N3) with respect to the length of the alignment.]
![Page 25: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/25.jpg)
• Possible improvements
1. Profile SCFGs and covariance models predict 2nd structure at the same time as making alignments
2. Modeling base stacking
3. The evolutionary model
4. Reduce the number of parameters for the rate matrix
Conclusion
![Page 26: RNA Secondary Structure Prediction Using Stochastic Context –Free Grammar And Evolutionary History B. Knudsen and J. Hein Department of Genetics and Ecology](https://reader033.vdocument.in/reader033/viewer/2022061304/5513cb9355034679748b4a4d/html5/thumbnails/26.jpg)
Thank you !
Have a nice weekend