rna secondary structure predictiontabio162/wiki.files/rnacky1.pdf · cyk algorithm • the cyk...
TRANSCRIPT
![Page 1: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/1.jpg)
1
RNA Secondary Structure Prediction
![Page 2: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/2.jpg)
2
RNA structure prediction methods
Base-Pair Maximization
Context-Free Grammar Parsing.
Free Energy Methods
Covariance Models
![Page 3: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/3.jpg)
A C A G U U G C A
1 2 3 4 5 6 7 8 9
q = 9
The Nussinov-Jacobson Algorithm
1 2 3 4 5 6 7 8 9
A C A G U U G C A
1 A 0 0 0 1 2 2 2 3
2 C 0 0 0 1 1 1 2 2 3
3 A 0 0 0 1 1 1 2 3
4 G 0 0 0 0 0 1 2
5 U 0 0 0 0 1 2
6 U 0 0 0 1 2
7 G 0 0 1 1
8 C 0 0 0
9 A 0 0
![Page 4: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/4.jpg)
4
SCFG Version
• Nussinov algorithm can be converted to
a stochastic context-free grammar:
• S W
• W aW | cW | gW | uW
• W Wa | Wc | Wg | Wu
• W aWu | cWg | uWa | gWc
• W WW
![Page 5: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/5.jpg)
5
SCFGs
• Stochastic Context Free Grammars (SCFGs) have also been used to model RNA secondary structure
• Examples – tRNAScan-SE
– program created to find snoRNAs
• Grammars are created by using a training set of data, and then the grammars are applied to potential sequences to see if they fit into the language
![Page 6: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/6.jpg)
6
SCFGs
• SCFGs allow the detection of
sequences belonging to a family
– tRNAs
– group I introns
– snoRNAs
– snRNAs
![Page 7: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/7.jpg)
7
SCFGs
• Any RNA structure can be reduced to a
SCFG (see Durbin, et al., p 278-279)
![Page 8: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/8.jpg)
8
Transformational Grammars
• First described by linguist Noam
Chomsky in the 1950’s.
– (Yes, the same Noam Chomsky who has
expressed various dissident political views
throughout the years!)
![Page 9: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/9.jpg)
13 June 2006 9
![Page 10: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/10.jpg)
13 June 2006 10
![Page 11: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/11.jpg)
11
Transformational Grammars
• Very important in computer science,
most notably in compiler design
• Covered in detail in compiler and
automaton classes
![Page 12: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/12.jpg)
12
Transformational Grammars
• Idea: take a set of outputs (sentence, RNA structure) and determine if it can be produced using a set of rules
• Consist of a set of symbols and production rules
• The symbols can be terminal (emitting) symbols or non-terminal symbols
![Page 13: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/13.jpg)
13 June 2006 13
![Page 14: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/14.jpg)
13 June 2006 14
![Page 15: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/15.jpg)
13 June 2006 15
![Page 16: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/16.jpg)
13 June 2006 16
![Page 17: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/17.jpg)
17
Grammar for Palindromes
• Consider palindromic DNA sequences
• Five possible terminal symbols: {a, c, g,
t, ) ( represents the blank terminal
symbol)
![Page 18: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/18.jpg)
18
Grammar for Palindromes
• Production Rules, where S and W are
non-terminal symbols:
• SW
• W aWa | cWc | gWg | tWt
• W a | c| g | t |
![Page 19: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/19.jpg)
19
Derivation of Sequences
• Using these production rules, a
derivation of the palindromic sequence
acttgttca follows:
• S W aWa acWcaactWtca
acttWttca acttgttca
![Page 20: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/20.jpg)
13 June 2006 20
![Page 21: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/21.jpg)
21
SCFGs for RNA
• base-paired columns modeled by pairwise emitting non terminals
– aWu; uWa; gWc; cWg; ...
• single-stranded columns modeled by leftwise emitting nonterminals (when possible)
– aW; cW; gW; uW; ..., when possible
![Page 22: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/22.jpg)
23
Parse Trees
• A context-free grammar can be aligned to a sequence using a parse tree
• Root of the tree is the non-terminal start symbol, S
• Leaves are terminal symbols
• Internal nodes are the nonterminals
• Leaves can be parsed from left to right to view the results of production
![Page 23: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/23.jpg)
13 June 2006 24
![Page 24: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/24.jpg)
25
Parse Tree
S
W
W
W
W
W
atta c cg t t
![Page 25: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/25.jpg)
13 June 2006 27
![Page 26: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/26.jpg)
13 June 2006 28
![Page 27: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/27.jpg)
13 June 2006 29
![Page 28: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/28.jpg)
13 June 2006 30
![Page 29: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/29.jpg)
13 June 2006 31
![Page 30: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/30.jpg)
CYK )Cocke-Younger-Kasami)
Parsing Algorithm
سید محمد حسین معطر
پردازش زبان طبیعی
ردانشگاه صنعتی امیر کبی
دانشکده مهندسی کامپیوتر
![Page 31: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/31.jpg)
Parsing Algorithms
• CFGs are basis for describing (syntactic) structure of NL sentences
• Thus - Parsing Algorithms are core of NL analysis systems
• Recognition vs. Parsing:– Recognition - deciding the membership in the language:
– Parsing – Recognition +producing a parse tree for it
• Parsing is more “difficult” than recognition? (time complexity)
• Ambiguity - an input may have exponentially manyparses
![Page 32: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/32.jpg)
CYK )Cocke-Younger-Kasami)
• One of the earliest recognition and parsing algorithms
• The standard version of CYK can only recognize languages defined by context-free grammars in Chomsky Normal Form (CNF).
• It is also possible to extend the CYK algorithm to handle some grammars which are not in CNF– Harder to understand
• Based on a “dynamic programming” approach:– Build solutions compositionally from sub-solutions
– Store sub-solutions and re-use them whenever necessary
• Recognition version: decide whether S == > w ?
![Page 33: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/33.jpg)
CYK Algorithm
• The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence of n letters a1 ... an.
– Let the grammar contain r terminal and nonterminal symbols R1 ... Rr, and let R1 be the start symbol.
– Let P[n,n,r] be an array of booleans. Initialize all elements of P to false.
– For each i = 1 to n • For each unit production Rj -> ai, set P[i,1,j] = true.
– For each i = 2 to n -- Length of span • For each j = 1 to n-i+1 -- Start of span
– For each k = 1 to i-1 -- Partition of span
» For each production RA -> RB RC
» If P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true
– If P[1,n,1] is true • Then string is member of language
• Else string is not member of language
![Page 34: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/34.jpg)
CYK Pseudocode
On input x = x1x2 … xn :
for (i = 1 to n) //create middle diagonal
for (each var. A)
if(Axi)
add A to table[i-1][i]
for (d = 2 to n) // d’th diagonal
for (i = 0 to n-d)
for (k = i+1 to i+d-1)
for (each var. A)
for(each var. B in table[i][k])
for(each var. C in table[k][k+d])
if(ABC)
add A to table[i][k+d]
return Stable[0][n] ? ACCEPT : REJECT
![Page 35: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/35.jpg)
CYK Algorithm
• this algorithm considers every possible consecutive subsequence of the sequence of letters and sets P[i,j,k] to be true if the sequence of letters starting from i of length j can be generated from Rk.
• Once it has considered sequences of length 1, it goes on to sequences of length 2, and so on.
• For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two halves, and checks to see if there is some production P -> Q R such that Q matches the first half and R matches the second half. If so, it records P as matching the whole subsequence.
• Once this process is completed, the sentence is recognized by the grammar if the subsequence containing the entire string is matched by the start symbol
![Page 36: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/36.jpg)
CYK Algorithm for Deciding Context
Free Languages
Q: Consider the grammar G given by
S | AB | XB
T AB | XB
X AT
A a
B b
1. Is x = aaabbb in L(G ) ?
![Page 37: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/37.jpg)
CYK Algorithm for Deciding Context
Free LanguagesNow look at aaabbb :
S | AB | XB
T AB | XB
X AT
A a
B b
a a a b b b
![Page 38: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/38.jpg)
CYK Algorithm for Deciding Context
Free Languages1) Write variables for all length 1 substrings.
S | AB | XB
T AB | XB
X AT
A a
B b
a a a b b
A A A B B
b
B
![Page 39: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/39.jpg)
CYK Algorithm for Deciding Context
Free Languages2) Write variables for all length 2 substrings.
S | AB | XB
T AB | XB
X AT
A a
B b
a a a b b
A A A B B
S,T
b
B
![Page 40: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/40.jpg)
CYK Algorithm for Deciding Context
Free Languages3) Write variables for all length 3 substrings.
S | AB | XB
T AB | XB
X ATA a
B b
a a a b b
A A A B B
T
X
b
B
S,T
![Page 41: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/41.jpg)
CYK Algorithm for Deciding Context
Free Languages4) Write variables for all length 4 substrings.
S | AB | XB
T AB | XB
X AT
A a
B b
a a a b b
A A A B B
T
X
S,T
b
B
S,T
![Page 42: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/42.jpg)
CYK Algorithm for Deciding Context
Free Languages5) Write variables for all length 5 substrings.
S | AB | XB
T AB | XB
X ATA a
B b
a a a b b
A A A B B
T
X
S,T
b
B
X
S,T
![Page 43: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/43.jpg)
CYK Algorithm for Deciding Context
Free Languages6) Write variables for all length 6 substrings.
S | AB | XB
T AB | XBX AT
A a
B b
S is included so
aaabbb accepted!
a a a b b
A A A B B
T
XS,T
b
B
X
S,T
S,T
![Page 44: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/44.jpg)
CYK Algorithm for Deciding Context
Free LanguagesCan also use a table for same purpose.
end at
start at
1: aaabbb
2: aaabbb
3: aaabbb
4: aaabbb
5: aaabbb
6: aaabbb
0:aaabbb
1:aaabbb
2:aaabbb
3:aaabbb
4:aaabbb
5:aaabbb
![Page 45: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/45.jpg)
CYK Algorithm for Deciding Context
Free Languages1. Variables for length 1 substrings.
end at
start at
1: aaabbb
2: aaabbb
3: aaabbb
4: aaabbb
5: aaabbb
6: aaabbb
0:aaabbb A
1:aaabbb A
2:aaabbb A
3:aaabbb B
4:aaabbb B
5:aaabbb B
![Page 46: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/46.jpg)
CYK Algorithm for Deciding Context
Free Languages2. Variables for length 2 substrings.
end at
start at
1: aaabbb
2: aaabbb
3: aaabbb
4: aaabbb
5: aaabbb
6: aaabbb
0:aaabbb A -
1:aaabbb A -
2:aaabbb A S,T
3:aaabbb B -
4:aaabbb B -
5:aaabbb B
![Page 47: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/47.jpg)
CYK Algorithm for Deciding Context
Free Languages3. Variables for length 3 substrings.
end at
start at
1: aaabbb
2: aaabbb
3: aaabbb
4: aaabbb
5: aaabbb
6: aaabbb
0:aaabbb A - -
1:aaabbb A - X
2:aaabbb A S,T -
3:aaabbb B - -
4:aaabbb B -
5:aaabbb B
![Page 48: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/48.jpg)
CYK Algorithm for Deciding Context
Free Languages4. Variables for length 4 substrings.
end at
start at
1: aaabbb
2: aaabbb
3: aaabbb
4: aaabbb
5: aaabbb
6: aaabbb
0:aaabbb A - - -
1:aaabbb A - X S,T
2:aaabbb A S,T - -
3:aaabbb B - -
4:aaabbb B -
5:aaabbb B
![Page 49: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/49.jpg)
CYK Algorithm for Deciding Context
Free Languages5. Variables for length 5 substrings.
end at
start at
1: aaabbb
2: aaabbb
3: aaabbb
4: aaabbb
5: aaabbb
6: aaabbb
0:aaabbb A - - - X
1:aaabbb A - X S,T -
2:aaabbb A S,T - -
3:aaabbb B - -
4:aaabbb B -
5:aaabbb B
![Page 50: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/50.jpg)
CYK Algorithm for Deciding Context
Free Languages6. Variables for aaabbb. ACCEPTED!
end at
start at
1: aaabbb
2: aaabbb
3: aaabbb
4: aaabbb
5: aaabbb
6: aaabbb
0:aaabbb A - - - X S,T
1:aaabbb A - X S,T -
2:aaabbb A S,T - -
3:aaabbb B - -
4:aaabbb B -
5:aaabbb B
![Page 51: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/51.jpg)
Parsing results
• We keep the results for every wij in a
table.
• Note that we only need to fill in entries
up to the diagonal – the longest
substring starting at i is of length n-i+1
![Page 52: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/52.jpg)
Constructing parse tree
• we need to construct parse trees for
string w:
• Idea:
– Keep back-pointers to the table entries that
we combine
– At the end - reconstruct a parse from the
back-pointers
• This allows us to find all parse trees
![Page 53: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/53.jpg)
References
• Hopcroft and Ullman,“Intro. to Automata
Theory, Lang. and Comp.”Section 6.3, pp.
139-141
• “CYK algorithm ” , Wikipedia, the free
encyclopedia
• A representation by Zeph Grunschlag
![Page 54: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/54.jpg)
A C A G U U G C A
1 2 3 4 5 6 7 8 9
q = 9
The Nussinov-Jacobson Algorithm
1 2 3 4 5 6 7 8 9
A C A G U U G C A
1 A 0 0 0 1 2 2 2 3
2 C 0 0 0 1 1 1 2 2 3
3 A 0 0 0 1 1 1 2 3
4 G 0 0 0 0 0 1 2
5 U 0 0 0 0 1 2
6 U 0 0 0 1 2
7 G 0 0 1 1
8 C 0 0 0
9 A 0 0
![Page 55: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/55.jpg)
65
-
1 2 3 4 5 6 7 8 9
A C A G U U G C A
1 A 0 0 0 1 2 2 2 3
2 C 0 0 0 1 1 1 2 2 3
3 A 0 0 0 1 1 1 2 3
4 G 0 0 0 0 0 1 2
5 U 0 0 0 0 1 2
6 U 0 0 0 1 2
7 G 0 0 1 1
8 C 0 0 0
9 A 0 0
A C A G U U G C A
1 2 3 4 5 6 7 8 9
The Nussinov-Jacobson Algorithm
![Page 56: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/56.jpg)
66
A C A G U U G C A
1 2 3 4 5 6 7 8 9
q-1 q
1 2 3 4 5 6 7 8 9
A C A G U U G C A
1 A 0 0 0 1 2 2 2 3
2 C 0 0 0 1 1 1 2 2 3
3 A 0 0 0 1 1 1 2 3
4 G 0 0 0 0 0 1 2
5 U 0 0 0 0 1 2
6 U 0 0 0 1 2
7 G 0 0 1 1
8 C 0 0 0
9 A 0 0
The Nussinov-Jacobson Algorithm
i < q ≤ j
![Page 57: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/57.jpg)
67
A U C A U G G C A U
• Co-terminus foldings:
• Partitionable foldings:
A C A G U U G C A
1 2 3 4 5 6 7 8 9
![Page 58: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/58.jpg)
).,1(),(max
);,()1,1(
);1,(
);,1(
max),(
jkki
jioreBasePairScji
ji
ji
ji
jki
68
Another way to write the
Nussinov-Jacobson recursion
• Initialization:
• Recursion:
0),(
to2for 0)1,(
ii
Liii
Two special cases of
Partitionable Folding
Partitionable
Folding
Co-Terminus
Folding
![Page 59: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/59.jpg)
69
SCFG version of the
Nussinov-Jacobson algorithm
• Stochastic Context-Free Grammars
• Makes use of production rules:
– W aW | cW | gW | uW (i unpaired)
• Every production rule has a associated
probability parameter.
• The maximum probability parse is
equivalent to the maximum probability
secondary structure.
![Page 60: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/60.jpg)
70
SCFG Version of Nussinov-
Jacobson Algorithm
• The algorithm can be converted to a stochastic context-free grammar:
• S W
• W aW | cW | gW | uW
• W Wa | Wc | Wg | Wu
• W aWu | cWg | uWa | gWc
• W WW
![Page 61: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/61.jpg)
71
Needed terminology• The inside-outside (recursive dynamic
programming) algorithm for SCFGs in
Chomsky normal form is the natural
counterpart of the forward-backward
algorithm for HMM.
• Best path variant of the inside-outside
algorithm is the Cocke-Younger-Kasami
(CYK) algorithm. It finds the maximum
probabilistic alignment of the SCFG to the
sequence.
![Page 62: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/62.jpg)
).(log),1(),(max
);(log)1,1(
);(log)1,(
);(log),1(
max),(
WWpjkki
Wxxpji
Wxpji
Wxpji
ji
jki
ji
j
i
72
CYK for Nussinov-style
RNA SCFG
• Initialization:
• Recursion:
LiSxp
Sxpii
Liii
i
i to1for
)(log
)(logmax),(
to2for )1,(
Addition to the fill stage
of the Nussinov
algorithm.
The principal difference
is that the SCFG
description is a
probabilistic model.
Two special cases of
Partitionable Folding
Partitionable
Folding
Co-Terminus
Folding
![Page 63: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/63.jpg)
73
CYK for Nussinov-style
RNA SCFG (2)
• The is the log likelihood
of the optimal structure given the
SCFG model
• The traceback to find the secondary
structure corresponding to the best
score is performed analogously to the
traceback in the Nussinov algorithm
)|ˆ,(log xP
![Page 64: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/64.jpg)
74
Example of RNA Structure
SCFG• RNA structure for the sequence produced by
MFOLD, can be constructed (5’ to 3’):
• GCUUACGACCAUAUCACGUUGAAUGCAC
GCCAUCCCGUCCGAUCUGGCAAGUUAAG
CAACGUUGAGUCCAGUUAGUACUUGGAU
CGGAGACGGCCUGGGAAUCCUGGAUGU
UGUAAGCU
![Page 65: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/65.jpg)
75
Example Construction
• S
• W
• Wu
• gWcu
• gcWgcu
• gcuWagcu
• gcuuWaagcu
• gcuuaWuaagcu
• gcuuacWguaagcu
• gcuuacgWuguaagcu
• gcuuacgaWuuguaagcu
• gcuuacgacWguuguaagcu
• gcuuacgaccWguuguaagcu
• gcuuacgaccaWguuguaagcu....
![Page 66: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence](https://reader034.vdocument.in/reader034/viewer/2022042709/5f4d60640c897a4b9f1d2984/html5/thumbnails/66.jpg)
76
CYK for Nussinov-style
RNA SCFG
• Good starting example, but it is too
simple to be an accurate RNA folder
• The algorithm does not consider
important structural features like
preferences for certain:
– Loop lengths
– Nearest neighbours in the structure caused
by stacking interactions between
neighbouring base pairs in a stem.