rna secondary structure prediction
Post on 08-Jan-2016
40 Views
Preview:
DESCRIPTION
TRANSCRIPT
RNA Secondary Structure Prediction
Dong Xu
Computer Science Department271C Life Sciences Center
1201 East Rollins RoadUniversity of Missouri-Columbia
Columbia, MO 65211-2060E-mail: xudong@missouri.edu
573-882-7064 (O)http://digbio.missouri.edu
Final Report
Due on Dec. 8. A numerical score (0-15) will be assigned for
based onClear formulation of the project (2)
Method (4)
Significant results achieved (4)
Discussion (3)
Writing of the project (2)
Final Presentation
Preferably shown in powerpoint file, pdf is fine
Preferably 20 minutes (up to 25 min), plus 5min for questions
15 points for the presentation (introduction, methods, results, discussions)
15 points for software demo Implementation of the software Major functionalities Documentation Perform a test run
Presentation Evaluation
A numerical score (0-15) will be assigned based onDid the student put enough effort? (3)
Is the work interesting or novel? (3)
Is the method technically sound? (3)
Is the discussion insightful? (3)
Is the presentation clear? (3)
Software Demo
A numerical score (0-15) will be assigned based onWhether the program can run using actual
biological data (3)
Documentations (3)
Whether it is easy to use (3)
Performance in accuracy (3)
Performance in computing time and memory usage (3)
Outline
RNA Secondary Structure
Comparative Approach
Base-Pair Maximization
Free Energy Minimization
Local Structure Prediction
RNA Types
siRNA, short interfering RNA; miRNA, microRNA; small temporal RNA stRNA; snoRNA small nucleolar RNA ;snRNA: Small nuclear RNA.
Features of RNA
RNA: polymer composed of a combination of four nucleotidesadenine (A)
cytosine (C)
guanine (G)
uracil (U)
Features of RNA
G-C and A-U form complementary hydrogen bonded base pairs (canonical Watson-Crick)
G-C base pairs being more stable (3 hydrogen bonds) A-U base pairs less stable (2 bonds)
non-canonical pairs can occur in RNA -- most common is G-U
RNA Pairs
A-U G-C
G-U
5’ ACCACCUGCUGA 3’
Primary structure:
Secondary Structure
Tertiary structure:
RNA Structure Hierarchy
Secondary Structure Categories
Pseudoknots
Hairpin loop
Internal loop
Bulge loop
Hairpin loop
Stem
Internal loop
Bulge loop
Stem
tRNA structure
Circular Representation
Assumptions in Secondary Structure Prediction
Most likely structure similar to energetically most stable structure
Energy associated with any position is only influenced by local sequence and structure
Structure formed does not produce pseudoknots
Pseudoknot Kissing hairpins Hairpin-bulge
Do not obey “parentheses rule”
Exceptions
Outline
RNA Secondary Structure
Comparative Approach
Base-Pair Maximization
Free Energy Minimization
Local Structure Prediction
Inferring Structure By Comparative Sequence Analysis
First step is to calculate a multiple sequence alignment
Requires sequences be similar enough so that they can be initially aligned
Sequences should be dissimilar enough for correlated mutation to be detected
Mutual Information
fxi : frequency of a base in column i
fxixj : joint (pairwise) frequency of a base pair between columns i and j
Information ranges from 0 and 2 bits
If i and j are uncorrelated, mutual information is 0
ji ji
ji
jixx xx
xx
xxij ff
ffM
,2log
Mutual Information Plot
Mutual Information Plot
Outline
RNA Secondary Structure
Comparative Approach
Base-Pair Maximization
Free Energy Minimization
Local Structure Prediction
Dot Plot
Base-Pair Maximization
Find structure with the most base pairs
Efficient dynamic programming approach to this problem introduced by Nussinov (1970s).
Four ways to get the best structure between position i and j from the best structures of the smaller subsequences
Nussinov Algorithm
1) Add i,j pair onto best structure found for subsequence i+1, j-1
2) add unpaired position i onto best structure for subsequence i+1, j
3) add unpaired position j onto best structure for subsequence i, j-1
4) combine two optimal structures i,k and k+1, j
Notation:
e(ri,rj) : free energy of a base pair joining ri
and rj
S(i,j) : optimal free energy associated with segment ri…rj
Dynamic Programming - 1
1. i is unpaired, added on toa structure for i+1…j
S(i,j) = S(i+1,j)
2. j is unpaired, added on toa structure for i…j-1
S(i,j) = S(i,j-1)
Dynamic Programming - 2
3. i j paired, added on toa structure for i+1…j-1S(i,j) = S(i+1,j-1)+e(ri,rj)
4. i j paired, but not to each other;the structure for i…j adds together
structures for 2 sub regions, i…k and k+1…j
S(i,j) = max {S(i,k)+S(k+1,j)}i<k<j
Dynamic Programming - 3
Since there are only four cases, the optimal score S(i,j) is just the maximum of the four possibilities:
othereachtonotbutpairedjijkSkiS
pairbasejirrejiS
unpairedrjiS
unpairedrjiS
jiS
jki
ji
j
i
,,),1(),(max
,),()1,1(
)1,(
),1(
max),(
Dynamic Programming - 4
Initialisation:
No close basepairs
A U A C C C U G U G G U A U
A 0 0 0 0
U 0 0 0 0
A 0 0 0 0
C 0 0 0 0
C 0 0 0 0
C 0 0 0 0
U 0 0 0 0
G 0 0 0 0
U 0 0 0 0
G 0 0 0 0
G 0 0 0 0
U 0 0 0
A 0 0
U 0
i
j
Propagation:A U A C C C U G U G G U A U
A 0 0 0 0 0
U 0 0 0 0 0
A 0 0 0 0 2
C 0 0 0 0 3
C 0 0 0 0 0
C 0 0 0 0 3
U 0 0 0 0 1
G 0 0 0 0 1
U 0 0 0 0 2
G 0 0 0 0 1
G 0 0 0 0
U 0 0 0
A 0 0
U 0
j
C5….U9 :
C5 unpaired:S(6,9) = 0
U10 unpaired:S(5,8)=0
C5-U10 pairedS(6,8) +e(C,U)=0
C5 paired, U10 paired:S(5,6)+S(7,9)=0S(5,7)+S(8,9)=0
Propagation:
A U A C C C U G U G G U A U
A 0 0 0 0 0 0 2
U 0 0 0 0 0 2 3
A 0 0 0 0 2 3 5
C 0 0 0 0 3 3 3
C 0 0 0 0 0 3 6
C 0 0 0 0 3 3
U 0 0 0 0 1 1
G 0 0 0 0 1 2
U 0 0 0 0 2 2
G 0 0 0 0 1
G 0 0 0 0
U 0 0 0
A 0 0
U 0
j
C5….G11 :
C5 unpaired:S(6,11) = 3
G11 unpaired:S(5,10)=3
C5-G11 pairedS(6,10)+e(C,G)=6
C5 paired, G11 paired:S(5,6)+S(7,11)=1S(5,7)+S(8,11)=0S(5,8)+S(9,11)=0S(5,9)+S(10,11)=0
Propagation:
A U A C C C U G U G G U A U
A 0 0 0 0 0 0 2 3 5 6 6 8 10 12
U 0 0 0 0 0 2 3 5 6 6 8 10 10
A 0 0 0 0 2 3 5 5 6 8 8 8
C 0 0 0 0 3 3 3 6 6 6 6
C 0 0 0 0 0 3 6 6 6 6
C 0 0 0 0 3 3 3 3 3
U 0 0 0 0 1 1 3 3
G 0 0 0 0 1 2 2
U 0 0 0 0 2 2
G 0 0 0 0 1
G 0 0 0 0
U 0 0 0
A 0 0
U 0
i
j
Traceback:A U A C C C U G U G G U A U
A 0 0 0 0 0 0 2 3 5 6 6 8 10 12
U 0 0 0 0 0 2 3 5 6 6 8 10 10
A 0 0 0 0 2 3 5 5 6 8 8 8
C 0 0 0 0 3 3 3 6 6 6 6
C 0 0 0 0 0 3 6 6 6 6
C 0 0 0 0 3 3 3 3 3
U 0 0 0 0 1 1 3 3
G 0 0 0 0 1 2 2
U 0 0 0 0 2 2
G 0 0 0 0 1
G 0 0 0 0
U 0 0 0
A 0 0
U 0
i
j
A U
U A
A U
C G
C G
C U
U G
AUACCCUGUGGUAU
Total free energy: -12 kcal/mol
Final Prediction
Computational complexity: N3
Does not work with pseudo-knot (would invalidate DP algorithm)
Methods that include pseudo knots:
Rivas and Eddy, JMB 285, 2053 (1999)
These methods are at least N6
Some Notes
Outline
RNA Secondary Structure
Comparative Approach
Base-Pair Maximization
Free Energy Minimization
Local Structure Prediction
Energy Minimization Methods
RNA folding is determined by biophysical properties
Energy minimization algorithm predicts the correct secondary structure by minimizing the free energy (G)
G calculated as sum of individual contributions of: loops base pairs secondary structure elements
Energies of stems calculated as stacking contributions between neighboring base pairs
Example for Thermodynamic Parameters
Calculating Best Structure
sequence is compared against itself using a dynamic programming approach similar to the maximum base-paired structure
instead of using a scoring scheme, the score is based upon the free energy values
Gaps represent some form of a loop
The most widely used software that incorporates this minimum free energy algorithm is MFOLD.
How well do they perform?
Current RNA folding programs get about 60-70% of base pairs correct, on average: useful, but not yet good.
The problem is the scoring system: thermodynamic model is accurate within 5-10%, and many alternative structures are within 10%.
Possible solution: combination of thermodynamic score with comparative sequence information
Outline
RNA Secondary Structure
Comparative Approach
Base-Pair Maximization
Free Energy Minimization
Local Structure Prediction
RNA Motif in HIV
TAR motif:
Transactivating Response Element
RNA Motifs Associated with Transcription termination
Rho-independent terminator stop the transcription process via its hairpin structure
Algorithm in Rnall
Definition 1. A “match” : canonical base pairs Definition 2. A “mismatch”: non-canonical base pair Definition 3. An “insertion”/“deletion”: nucleotide unpaired
RNA LSS in HIVTAR (30)
DIS (260)
PolyA (82)SD (292)
PSI (319)
Comparative RNA web sitehttp://www.rna.icmb.utexas.edu/
RNA worldhttp://www.imb-jena.de/RNA.html
RNA page by Michael Sukerhttp://www.bioinfo.rpi.edu/~zukerm/rna/
RNA structure database http://www.rnabase.org/
http://ndbserver.rutgers.edu/ (nucleic acid database)http://prion.bchs.uh.edu/bp_type/ (non canonical bases)
RNA structure classificationhttp://scor.berkeley.edu/
RNA visualisationhttp://ndbserver.rutgers.edu/services/download/index.html#rnaview http://rutchem.rutgers.edu/~xiangjun/3DNA/
Some RNA Resource
Reading Assignments
Suggested reading: Chapter 14 in “Current Topics in
Computational Molecular Biology, edited by Tao Jiang, Ying Xu, and Michael Zhang. MIT Press. 2002.”
Optional reading: http://www.bioinfo.rpi.edu
/~zukerm/seqanal/mfold-3.0-manual.pdf
top related