alain denise bioinformatique lri orsay umr cnrs 8623 université paris-sud 11 algorithmes pour la...

Post on 26-Mar-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Alain DeniseBioinformatiqueLRI OrsayUMR CNRS 8623Université Paris-Sud 11

Algorithmes pour la comparaison des structures secondaires d’ARNAlgorithmes pour la comparaison des structures secondaires d’ARN

© Ebbe Sloth Andersen

Les multiples rôles de l’ARNLes multiples rôles de l’ARN

© Ebbe Sloth Andersen

Les multiples rôles de l’ARNLes multiples rôles de l’ARN

Why RNA ?Why RNA ?

Present in all cellular processes The only molecule which can be genome as well

as catalyser Origin of life (?): RNA world Frequent target for antibiotics

© E.Westhof 2005

RNA structure: tRNARNA structure: tRNA

Primary structure

Tertiary structure Secondary structure

GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAUAUCUGGAGGUCCUGUGUUCGAUCCCACAGAAUUCGCACCA

RNA structure levelsRNA structure levelsRNA structure ~ Graph of bounded degree,

containing a (known) hamiltonian path.

Arc-annotated sequences

General (Tertiary structure)

Crossing (Secondary structure

with pseudoknots)

Nested (Secondary structurewithout pseudoknots)

Plain (Primary structure)

RNA « Bio-Algorithmics » RNA « Bio-Algorithmics »

Structure prediction (given sequence) Design: sequence prediction (given structure) Structural pattern-matching Comparison of two or several structures

Why to compare RNA structures ?Why to compare RNA structures ?

• How much are they similar (or different?)

classification phylogeny

• Which parts are the more similar between the two structures?

• Is the small one similar to a part of the large one?

Comparison score + correspondence between the structures

Edition and alignmentEdition and alignment

We are given a set of basic operations and a score function associated to each of them.

Data : two structures S1 and S2.

• Edit(S1,S2) : find a best-scoring sequence of operations which changes S1 into S2.

• Align(S1,S2) : find a structure S which contains S1 and S2 as substructures, in such a way to maximize

Score(Edit(S1,S)+Edit(S2,S)).

Example: sequence comparisonExample: sequence comparison

Deux séquences v = v1v2…vn et w = w1w2…wm

Opérations d’édition : • ins(x,i) • suppr(x,i)• subs(x,y,i)

CHAT - suppr(C,1) HAT - subs(H,R,1) RAT

(Pour les séquences : édition ~ alignement : CHAT - RAT)

Example: tree comparisonExample: tree comparison

Edition vs AlignmentEdition vs Alignment

Alignment

EditionIns( )Del( )

Subs( , )

Ancestor relations are conserved

The nested caseThe nested case

Secondary structures (without pseudokots) Tree comparison

Tree comparisonTree comparison

Tree edition algorithmTree edition algorithmZhang, Shasha 1989

Tree edition algorithmTree edition algorithmScore( (f) , ’(f’) ) = Max

Subs(, ’) + Score(f,f’)

Ins(’) + Score((f) , f’ )

Del() + Score( f , ’(f’) )

Score( [(f) o t1 o … o tp] , [’(f’), t’1 o … o t’q] ) = Max

Score((f), ’(f’)) + Score([t1 o … o tp], [t’1 o … o t’q])

Ins(’) + Score( [(f) o t1 o … o tp] , [ f’, t’1 o … o t’q])

Del() + Score([ f o t1 o … o tp] , [’(f’) o t’1,… o t’q])

f t1 t2 … tp

Zhang, Shasha 1989

O(n3logn) [Klein 1998]

Score( (f) , ’(f’) ) = Max

Subs(, ’) + Score(f,f’)

Ins(’) + Score((f) , f’ )

Del() + Score( f , ’(f’) )

Tree alignment algorithmTree alignment algorithm

Score((f) o t1 o … o tp ; ’(f’) o t’1 o … o t’q ) = Max

Score((f); ’(f’)) + Score(t1 o … o tp ; t’1 o … o t’q)

Ins(’) + Maxi { Score((f) o … o ti ; f’ ) + Score(ti+1 o … o tp ; t’1 o … o t’q) }

Del() + Maxj { Score( f ; ’(f’) o t’1 o … o t’j) + Score(t1 o … o tp; t’j+1 o … o t’q) }

f t1 t2 … tp

Jiang, Wang, Zhang 1995

O(n4)

Edition vs AlignmentEdition vs AlignmentScore( [(f), t1,…,tp] , [’(f’), t’1,…,t’q] ) = Max

Ins(’) + Score( [(f), t1,…,tp] , [ f’, t’1,…,t’q])

Score( [(f), t1,…,tp] , [’(f’), t’1,…,t’q] ) = Max

Ins(’) + Maxi { Score( [(f), …ti] , f’ ) + Score([ti+1,…, tp], [t’1,…,t’q]) }

Edition vs AlignmentEdition vs AlignmentScore( , ) = Max

Ins( ) + Score( , )

Score( , ) = Max

Ins( ) + Maxi { Score( , ) + Score( , ) }

i+1i

Edition vs AlignmentEdition vs AlignmentScore( , ) = Max

Ins( ) + Score( , )

Score( , ) = Max

Ins( ) + Maxi { Score( , ) + Score( , ) }

i+1i

Can be inserted anywhere

Complexity Complexity

Edition [Zhang, Shasha 1989, Klein 1998]

• Worst-case : O(n4) [Zhang-Shasha 1989] O(n3logn) [Klein 1998,

Dulucq-Touzet 2003]

• In average : O(n3) [Dulucq-Tichit 2003]

Alignment [Jiang, Wang, Zhang 1995]

• Worst-case : O(n4)

3 operations!

AU AU

GCGC

GUGU

UAU U

Delete( )

Insert( )

Insert( )

Edition operations: problemEdition operations: problem

A-UU-AG-CC-U

A-UU U

G-CC-U

AUGG…….UCAU AUGG…….UCUU

Opérations on bases: Substitution:

Deletion / Insertion:

Operations on arcs:Arc-substitution:

Arc-deletion / Arc-insertion:

Arc-breaking / :

Arc-altering / :

A C

A

C G U A

C G

C G C G

C G C -

Edition operations on RNAEdition operations on RNA

New

A first solutionA first solution

A-UU-AG-CC-U

A-UU A

G-CC-U

AUGG…….UCAU AUGG…….UCAU

A

U

G

C

U

A

C

U

A

U

G

C

U

A

C

U

But this implies some constraints on the scores. For example:

Arc-deletion = Arc-Breaking + 2 Base-Deletion

Höchsmann, Töller, Gierich, Kurtz 2003(RNAforester)

Edition operations on RNAEdition operations on RNA

Opérations on bases: Substitution:

Deletion / Insertion:

Operations on arcs:Arc-substitution:

Arc-deletion / Arc-insertion:

Arc-breaking / :

Arc-altering / :

A C

A

C G U A

C G

C G C G

C G C -

General

Crossing

Nested

Plain

Complexity of the edition problemComplexity of the edition problem

General Crossing Nested Plain

General NP-complete

Crossing NP-complete

Nested NP-complete O(nm3)

Plain O(nm / logn)

• Jiang, Lin, Ma, Zhang 2002• Blin, Fertin, Rusu, Sinoquet 2003• Crochemore, Landau, Ziv-Ukelson 2002

If 2Score(Arc-altering) = Score(Arc-breaking) + Score (Arc-removing), then algorithm in O(n3m) or Edit(crossing,nested) et Edit(nested,nested)

Complexity of the edition problemComplexity of the edition problem

Complexity of 2ary struct. comparisonComplexity of 2ary struct. comparison

Tree operations RNA operations

Edition O(n3logn)[Zhang-Shasha 1989, Klein 1998]

NP-complete[Blin, Fertin, Sinoquet, Rusu 2003]

Alignment O(n4) [Jiang, Wang, Zhang 1995] ?

Secondary structure alignmentSecondary structure alignment

A-BCD-EFGABB-DF-FG

AB---CDEFGABBDF---FG

ABCDEFG ABBDFFG

Edition Alignment

New edition operations on treesNew edition operations on trees

Arc-breaking / :

Arc-altering / :

C G C G

C G C -

Alignment algorithm (1/5)Alignment algorithm (1/5)

f

Alignment algorithm (2/5) Alignment algorithm (2/5)

f t

Alignment algorithm (2/5) Alignment algorithm (2/5)

f t

Alignment algorithm (2/5)Alignment algorithm (2/5)

f t

Alignment algorithm (2/5)Alignment algorithm (2/5)

f t

Alignment algorithm (2/5)Alignment algorithm (2/5)

f t

Alignment algorithm (3/5)Alignment algorithm (3/5)

f t

Alignment algorithm (3/5)Alignment algorithm (3/5)

f t

Alignment algorithm (3/5)Alignment algorithm (3/5)

f t

Alignment algorithm (3/5)Alignment algorithm (3/5)

f t

Alignment algorithm (3/5)Alignment algorithm (3/5)

f t

Alignment algorithm (4/5)Alignment algorithm (4/5)

f t

Alignment algorithm (5/5)Alignment algorithm (5/5)

f t

Alignment algorithm (5/5) Alignment algorithm (5/5)

f t

Alignment algorithm (5/5) Alignment algorithm (5/5)

f t

Tree operations RNA operations

Edition O(n3logn)[Zhang-Shasha 1989, Klein 1998]

NP-complete[Blin, Fertin, Sinoquet, Rusu 2003]

Alignment O(n4) [Jiang, Wang, Zhang 1995]

O(n4) [Herrbach, AD, Dulucq, Touzet 2005]

Complexity of 2ary struct. comparisonComplexity of 2ary struct. comparison

Tree operations RNA operations

Edition O(n3logn)[Zhang-Shasha 1989, Klein 1998]

NP-complete[Blin, Fertin, Sinoquet, Rusu 2003]

Alignment O(n4) [Jiang, Wang, Zhang 1995]

O(n4) [Herrbach, AD, Dulucq, Touzet 2005]

Complexity of 2ary struct. comparisonComplexity of 2ary struct. comparison

Complexity of the alignment problem for the other structure

levels: [Blin, Touzet 2006]

Example: two tRNAsExample: two tRNAs

Homo sapiens Bacillus subtilis

Drawing: Tulip (David Auber et al., LaBRI)

Base-subs / Arc-subs

Deletions / Insertions

Arc-breaking

Arc-altering

Et dans la vraie vie ? Et dans la vraie vie ?

Alignement de RNAses PAlignement de RNAses P

Alignement de RNAses PAlignement de RNAses P

Alignement de RNAses PAlignement de RNAses P

Alignement de RNAses PAlignement de RNAses P

To do…To do…• Biological validation :

• Test on real data• Comparison with other softwares (RNAForester, MiGal [J.Allali,

M.F.Sagot])• Combined approaches ([J.Allalli, A.Ouangraoua-P.Ferraro]) • Parameters : substitution matrices etc.• Statistical evaluation of results

Relevant algorithms and parameters Useful and user-friendly programs

• Sequence/Structure alignment• Multiple alignment • …

CréditsCrédits

• Julien Allali• David Auber • Serge Dulucq• Claire Herrbach• Rym Kachouri• Yann Ponty• Michel Termier• Laurent Tichit• Hélène Touzet• Eric Westhof

top related