structural alignment of pseudo-knotted rna

32
Structural Alignment of Pseudo-knotted RNA

Upload: maurilio-nihill

Post on 01-Jan-2016

26 views

Category:

Documents


1 download

DESCRIPTION

Structural Alignment of Pseudo-knotted RNA. RNA pseudo-knotted structures. RNA alignment problem has been solved for RNAs with a regular structure, i.e. non-pseudo-knotted structures. Regular structure: All base pair are non-crossing. Pseudo-knotted structure: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Structural Alignment of Pseudo-knotted RNA

Structural Alignment of Pseudo-knotted RNA

Page 2: Structural Alignment of Pseudo-knotted RNA

RNA pseudo-knotted structures

RNA alignment problem has been solved for RNAs with a regular structure,

i.e. non-pseudo-knotted structures.

Regular structure:

All base pair are non-crossing.

Pseudo-knotted structure:

Some of the base pairs are crossing.

Page 3: Structural Alignment of Pseudo-knotted RNA

Solving problem for pseudo-knotted RNAs

Dynamic programming technique used to align subsequences.

Challenge: Aligning RNA with general pseudoknot sturctures is hard. (Jiang et. al JCB 2002).

Formal definition of pseudo-knots such thatTo classify the pesudoknot strcutres so that most common pseudoknot is compuatable. computation is not very expensive biologically important

Page 4: Structural Alignment of Pseudo-knotted RNA

Definition: simple pseudo-knot

i0

k0

j2

j1

i0

k0

j2

j1

i0k 0j1 j2

• How can we define a pseudo-knot?• There are many pseudo-knot definitions: Akutsu [journal 2002?], Rivas&Eddy, ….• For pediction.

• We start with Akutsu’s simple pseudo-knot formalism: • All base pairs non-crossing and horizontal when rotated to form 2 loops.

Page 5: Structural Alignment of Pseudo-knotted RNA

Sub-structure for a simple pseudo-knot

• Regular structure: • continuous subintervals as • substructure of recursion.

• Simple Pseudo-knot: • can not use this substructure • due to interweaving base pairs.

k 0i 0

k 0i 0

For DP algorithm, how to define sub-structure?

Page 6: Structural Alignment of Pseudo-knotted RNA

Sub-structure for a simple pseudo-knot

i j k

i0

sub-pseudoknot P(i, j, k) as

the union of two subintervals

P(i, j, k) = [i0, i] U [j, k]

frontier (i.j.k)

Page 7: Structural Alignment of Pseudo-knotted RNA

Naive approach

i j k

i0

targetquery

Compute B[i, j, k, i’, j’, k’]O(m3n3) scores.(m:query, n:target)

Instead of all triplets in the query, consider only the valid sub-pseudo-knots that will represent the simple pseudo-knot.

i ' j ' k '

B[i, j, k, i’, j’, k’]: Optimal score of the alignment of the sub-pseudoknot P’(i’,j’,k’) in target to sub-pseudoknot P(i,j,k) in query.

Page 8: Structural Alignment of Pseudo-knotted RNA

Use a chain of sub-pseudoknots to represent simple pseudo-knot

P(13, 14, 39)

10

8

6

4

13

12

11

9

7

5

3

2

1

39

3837

36

35

34

33

32

31

30

29

28

27

14

26

21

24

23

19

17

15

16

18

20

22

25

P(13, 14, 38)

P(13, 14, 37)

P(13, 14, 36)

P(13, 15, 35)

P(12, 15, 35)

P(11, 16, 35)

P(10, 16, 35)

……..

Page 9: Structural Alignment of Pseudo-knotted RNA

Why Chaining?

• DP: use sub-optimal solution of the child sub-structure to compute optimal score at each step.

• compute B[i,j,k, i’,j’, k’]

• => O(mn3) scores

• (m:query, n:target)

P(13, 14, 39)

P(13, 14, 38)

P(13, 14, 37)

P(13, 14, 36)

P(13, 15, 35)

P(12, 15, 35)

P(11, 16, 35)

P(10, 16, 35)

……..

10

8

6

4

13

1211

9

7

5

3

21

39

383736

3534

3332313029

2827

14

26

21

2423

19

17

1516

18

20

22

25

Page 10: Structural Alignment of Pseudo-knotted RNA

Alignment Algorithm Recursions:(i,j) is a base pair case

i j k i’ j’ k’

query target

MATCH: (i,j) and (i’,j’) are corresponding pairs

i’-1 j’+1i-1 j+1

• B[i, j, k , i’, j’, k’] = max {MATCH, INSERT, DELETE}

Page 11: Structural Alignment of Pseudo-knotted RNA

Alignment Algorithm Recursions:(i,j) is a base pair case

i j k i’ j’ k’

query target

DELETION: i is deleted

i-1

• B[i, j, k , i’, j’, k’] = max {MATCH, INSERT, DELETE}

Page 12: Structural Alignment of Pseudo-knotted RNA

Alignment Algorithm Recursions:(i,j) is a base pair case

i j k i’ j’ k’

query target

DELETION: j is deleted

j+1

• B[i, j, k , i’, j’, k’] = max {MATCH, INSERT, DELETE}

Page 13: Structural Alignment of Pseudo-knotted RNA

Alignment Algorithm Recursions:(i,j) is a base pair case

i j k i’ j’ k’

query target

DELETION: i and j are deleted

i-1 j+1

• B[i, j, k , i’, j’, k’] = max {MATCH, INSERT, DELETE}

Page 14: Structural Alignment of Pseudo-knotted RNA

Alignment Algorithm Recursions:(i,j) is a base pair case

i j k i’ j’ k’

query target

INSERTION: i’ is inserted

i’-1

• B[i, j, k , i’, j’, k’] = max {MATCH, INSERT, DELETE}

Page 15: Structural Alignment of Pseudo-knotted RNA

Alignment Algorithm Recursions:(i,j) is a base pair case

i j k i’ j’ k’

query target

INSERTION: j’ is inserted

j’+1

• B[i, j, k , i’, j’, k’] = max {MATCH, INSERT, DELETE}

Page 16: Structural Alignment of Pseudo-knotted RNA

Alignment Algorithm Recursions:(i,j) is a base pair case

i j k i’ j’ k’

query target

INSERTION: k’ is inserted

k’-1

• B[i, j, k , i’, j’, k’] = max {MATCH, INSERT, DELETE}

Page 17: Structural Alignment of Pseudo-knotted RNA

Simple Pseudo-knot in a Regular Structure: S in R

Use a binary tree to represent RNASolid circular nodes correspond to the actual base pairs.Empty circular nodes correspond to unpaired bases.Rectangular node correspond to sub-tree representing pseudo-knotted region

k 0i 0

a

b

c

p s eu d o r eg io nk

h

Page 18: Structural Alignment of Pseudo-knotted RNA

Simple pseudo-knot in a simple pseudo-knot: recursive simple pseudo-

knot

k 0

i 0

j1

j2

• S in S• R in S

Page 19: Structural Alignment of Pseudo-knotted RNA

Which structures can we handle?

• Time complexity increases with the recursion depth of the pseudo-knotted region!

• R: regular structure

• S: simple pseudo-knot

• R: O(mn3)

• S: O(mn4)

• S in R: O(mn4)

• R in S: O(mn5)

• R in S in R: O(mn5) = S in S in R: O(mn5)

• R in S in S in R = O(mn6)

• .…….

Page 20: Structural Alignment of Pseudo-knotted RNA

Can we handle simple pseudo-knots with higher degree: standard

pseudo-knots?

i 0

j1

j2

j3

k 0

jd-1

Page 21: Structural Alignment of Pseudo-knotted RNA

Can we handle simple pseudo-knots with higher degree: standard

pseudoknots?

• Yes! By revising the sub-pseudoknot structure and the recursion cases

accordingly.

i j k l i ' j' k ' l '

targetquery

Page 22: Structural Alignment of Pseudo-knotted RNA

Can we handle recursive standard pseudoknots?

i 0

j1 j3 jd-1

k 0

j2

Yes! Same reasoning with recursive simple pseudoknots.

Page 23: Structural Alignment of Pseudo-knotted RNA

What is left? What can we NOT handle?

?We can handle the class of pseudoknots defined by Akutsu which is the second largest class currently defined. We can additionally handle standard and recursive standard pseudoknots which are defined by us.

A&U A&U U {standard/recursive standard pseudoknots} R&E

The largest class is defined by Rivas and Eddy. An example from this class we can not handle:

We can handle this!(Standard pseudo-knot of degree 4)

We can NOT handle this!

Page 24: Structural Alignment of Pseudo-knotted RNA

Implementation: PAL

• C++ implementation of our algorithm.

– input:

• a query sequence with known structure (R/S/S in R)

• a target sequence

– output:

• all high scoring local alignments in the target sequence

Page 25: Structural Alignment of Pseudo-knotted RNA

Testing

• Test Data: • RFAM database, 6 RNA families with simple pseudo-knotted structures.

(simple pseudo-knots in regular structure)

• UPSK• Antizyme• Corona FSE• Corona pk3 • Parecho CRE • IFN gamma

Page 26: Structural Alignment of Pseudo-knotted RNA

Test 1: Structure Prediction

• How good is PAL in inferring structure of the target sequence?

– Pick 2 seed members of an RNA family as query and target.

– Align them.

– Compare the inferred structure of target with annotated structure in Rfam.

Page 27: Structural Alignment of Pseudo-knotted RNA

Test 1: Structure PredictionResults

• TP, FP, FN, Sensitivity, Specificity

• Specificity = TP/(TP+FP)• Sensitivity = TP/(TP+FN)• Both measure is ~0.95• PAL is a strong predictor of structure

RNA Family Specificity

(Mean)

Sensitivity

(Mean)

UPSK 1 1

Antizyme 0.99 0.99

Parecho 0.95 0.94

Corona FSE 0.94 0.94

Corona pk3 0.97 0.97

IFN Gamma 0.93 0.93

Page 28: Structural Alignment of Pseudo-knotted RNA

Test 2: Homologue Search

• How well is PAL in finding the homologues of an RNA sequence?

– Generate a random genome.

– Insert the members of an RNA family.

– Pick one of the members as a query.

– Search for the homologues of the query.

– Can we locate the members?

Page 29: Structural Alignment of Pseudo-knotted RNA

Test 2: Homologue SearchResults

Page 30: Structural Alignment of Pseudo-knotted RNA

Novel Homolologues Search

– Searched mouse, rat and gerbil genomes for homologues of

– IFN-gamma RNA family.

Page 31: Structural Alignment of Pseudo-knotted RNA
Page 32: Structural Alignment of Pseudo-knotted RNA

Conclusion

• PAL is a viable tool in finding novel homologues and inferring structure.

• We hope PAL will help to understand and explore the impact of

• pseudo-knotted RNAs in cellular function.