genomic sorting with length-weighted intervals 236818 - seminar in bioinformatics advanced...

63
Genomic Sorting with Genomic Sorting with Length-Weighted Length-Weighted Intervals Intervals 236818 - Seminar in Bioinformatics 236818 - Seminar in Bioinformatics Advanced Algorithms in Advanced Algorithms in Computational Biology Computational Biology Spring 2005, Technion Spring 2005, Technion Asaf Merschon Asaf Merschon

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Genomic Sorting with Genomic Sorting with Length-Weighted IntervalsLength-Weighted Intervals

236818 - Seminar in Bioinformatics 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Advanced Algorithms in Computational

BiologyBiologySpring 2005, TechnionSpring 2005, Technion

Asaf MerschonAsaf Merschon

Page 2: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

What we saw so farWhat we saw so far

Current algorithms of genome Current algorithms of genome rearrangements ignore the length of rearrangements ignore the length of reversals; rather they only count reversals; rather they only count their number.their number.

Traditionally, such analysis assumes Traditionally, such analysis assumes that each reversal is of unit cost.that each reversal is of unit cost.

Page 3: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

MotivationMotivation

The assumption of unit cost reversals is The assumption of unit cost reversals is not completely defensible biologically:not completely defensible biologically: A longer genomic reversal will cause more A longer genomic reversal will cause more

upheaval to the organism, resulting in a upheaval to the organism, resulting in a lower likelihood of the organism surviving lower likelihood of the organism surviving to pass the mutation.to pass the mutation.

The mechanics of genome reversal may The mechanics of genome reversal may suggest that probabilities of reversals suggest that probabilities of reversals depends on their lengthdepends on their length (among other (among other factors).factors).

Page 4: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

The topics coveredThe topics covered On top of the surface:On top of the surface:

Introduction to Genomic Sorting with Length-Introduction to Genomic Sorting with Length-Weighted Intervals.Weighted Intervals. Lower and upper bounds on complexity of solution.Lower and upper bounds on complexity of solution. Proofs (Partial).Proofs (Partial).

Down under:Down under: Improved bounds on Sorting with Length-Improved bounds on Sorting with Length-

Weighted Reversals (Extended Abstract).Weighted Reversals (Extended Abstract). Concept and examples.Concept and examples.

Sorting by Length-weighted Reversals: Dealing Sorting by Length-weighted Reversals: Dealing with Signs and Circularity.with Signs and Circularity. General approach to solutions.General approach to solutions.

Page 5: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

GoalGoal

Find an algorithm that efficiently Find an algorithm that efficiently sorts one sequence into another by sorts one sequence into another by reversals under length sensitive cost reversals under length sensitive cost models.models. Focus is on sorting unsigned Focus is on sorting unsigned

permutations by reversals.permutations by reversals. The problem remains NP-hard in our The problem remains NP-hard in our

new model and hence we will try to new model and hence we will try to reach approximation results.reach approximation results.

Page 6: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Definitions (1)Definitions (1)

Let the function denote the cost of a Let the function denote the cost of a reversal of length .reversal of length .

Traditionally, .Traditionally, .

We say a function is:We say a function is: Additive ifAdditive if Subadditive ifSubadditive if Superadditive ifSuperadditive if

lfll

1f l

f x f y f x y f x f y f x y f x f y f x y

lf

Page 7: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Definitions (2)Definitions (2)

A Reversal Graph of permutations of A Reversal Graph of permutations of length length nn is a graph where: is a graph where: The vertices are all the permutations of The vertices are all the permutations of

length length nn.. There is an edge (There is an edge (pp11,,pp22) if of weight if ) if of weight if

there exists one -reversal that there exists one -reversal that transforms the permutation transforms the permutation pp11 into the into the permutation permutation pp22..

lfl

Page 8: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Wanted ResultsWanted Results1.1. Minimize the cost sufficient to sort any Minimize the cost sufficient to sort any

permutation of permutation of nn elements (actually elements (actually achieving an upper bound). Equivalent achieving an upper bound). Equivalent to computing the diameter of the to computing the diameter of the reversal graph under the shortest-path reversal graph under the shortest-path metric.metric.

2.2. Approximate the minimum-cost reversal Approximate the minimum-cost reversal sequence for a given permutation. We sequence for a given permutation. We would like a heuristic that assures the would like a heuristic that assures the resulting sequence costs no more than a resulting sequence costs no more than a slowly growing function of slowly growing function of nn times that of times that of the optimal sequence.the optimal sequence.

Page 9: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Important notesImportant notes

The relatively coarse bounds generated The relatively coarse bounds generated by the following techniques applying by the following techniques applying them to biological data.them to biological data.

The work presented leads to interesting The work presented leads to interesting algorithmic results and raises some algorithmic results and raises some interesting questions as a basis for interesting questions as a basis for further bioinformatics studies.further bioinformatics studies.

Page 10: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Previous WorkPrevious Work Unit cost, unsigned reversals was shown to be NP-Unit cost, unsigned reversals was shown to be NP-

hard by Caprara. Our problem inherits hardness hard by Caprara. Our problem inherits hardness under more general metrics from this result.under more general metrics from this result.

Kececloglu & Sankoff gave approximation algorithms Kececloglu & Sankoff gave approximation algorithms on reversal distance that guarantee results at most 2 on reversal distance that guarantee results at most 2 times optimal.times optimal.

Bafna & Pevzner improved this to a factor of 7/4.Bafna & Pevzner improved this to a factor of 7/4. Berman Berman et alet al improved this factor to 1.375. improved this factor to 1.375. Minimum-cost unsigned reversal sorting has been Minimum-cost unsigned reversal sorting has been

studied also under models where cost increases so studied also under models where cost increases so dramatically that only length-2 reversals are afforded.dramatically that only length-2 reversals are afforded.

Experiments were done on both mitochondrial Experiments were done on both mitochondrial genomes of two fungi as well as on random samples. genomes of two fungi as well as on random samples. They suggest that length may play an important role They suggest that length may play an important role in biasing certain rearrangement patterns.in biasing certain rearrangement patterns.

Page 11: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Goal 1 – Bounding the diameter of Goal 1 – Bounding the diameter of the Reversal Graphthe Reversal Graph

By bounding the diameter of the Reversal By bounding the diameter of the Reversal Graph, we establish an upper bound on the Graph, we establish an upper bound on the cost of sorting any cost of sorting any nn-element permutation.-element permutation.

Standard sorting algorithms exhibit Standard sorting algorithms exhibit interesting performance on highly interesting performance on highly subadditive and superadditive functions, but subadditive and superadditive functions, but not additive measures. The primary result of not additive measures. The primary result of this section is a new reversal-based sorting this section is a new reversal-based sorting algorithm which performs well on additive algorithm which performs well on additive cost functions. (Examples in next slides).cost functions. (Examples in next slides).

Page 12: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Examples on Highly Subadditive & Examples on Highly Subadditive & Superadditive functionsSuperadditive functions

Subadditive: A reversal-based version of Subadditive: A reversal-based version of selection sort performs at most selection sort performs at most n-1n-1 reversals, a fraction of which are potentially reversals, a fraction of which are potentially in length. Thus selection sort gives an in length. Thus selection sort gives an diameter algorithm. diameter algorithm. Especially efficient forEspecially efficient for

Superadditive: Bubble sort and insertion Superadditive: Bubble sort and insertion sort perform transpositions of neighboring sort perform transpositions of neighboring elements, one for each inversion in the elements, one for each inversion in the input permutation. This gives an input permutation. This gives an diameter algorithm. diameter algorithm. Particularly efficient for Particularly efficient for

1f l

O n f n

2 2O n f 2f s s

O n

Page 13: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

The interesting caseThe interesting case

Additive functions, particularlyAdditive functions, particularly

Presented is an algorithm for sorting Presented is an algorithm for sorting any permutation of any permutation of nn elements in elements incost using divide and conquer.cost using divide and conquer.

The key operation is The key operation is MedianEjectMedianEject..

f s s

2logO f n n

Page 14: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Definitions (3)Definitions (3)

Sorting a permutation involves putting Sorting a permutation involves putting element i in position i.element i in position i.

Let denote the element in the Let denote the element in the position in the permutation.position in the permutation.

Let denote the position of the Let denote the position of the element element in the permutation.in the permutation.

An element An element xx is wrong-sided if is wrong-sided if x x && are on different sides of the median are on different sides of the median . Meaning or vice . Meaning or vice versa.versa.

p i

1p i

i

1p x

/ 2n

1/ 2 / 2x n p x n

Page 15: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

MedianEjectMedianEject We apply We apply MedianEjectMedianEject to portions of the to portions of the

permutation from position a to b. One permutation from position a to b. One round of round of MedianEjectMedianEject moves all wrong-sided moves all wrong-sided elements in the interval [a,b] to the correct elements in the interval [a,b] to the correct side relative to its median in the following side relative to its median in the following manner:manner:

MedianEject(a,b)=MedianEject(a,b)=Identify the maximal runs of wrong-sided Identify the maximal runs of wrong-sided elements r, the median (b-a)/2.elements r, the median (b-a)/2.for (i = 1 to log r)for (i = 1 to log r)

reduce the number of wrong-reduce the number of wrong-sided sided runs by half using non-runs by half using non-overlapping reversals, overlapping reversals, none crossing the none crossing the median.median.

With two reversals, move remaining With two reversals, move remaining wrong-sided runs wrong-sided runs to median boundary. Reverse the to median boundary. Reverse the left and right left and right wrong-sized runs using a single reversal.wrong-sized runs using a single reversal.

Page 16: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

MedianEjectMedianEject – Sample Run – Sample Run

Page 17: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Lemmas (1)Lemmas (1)

Lemma 1:Lemma 1: MedianEjectMedianEject costs costs O(f(b-O(f(b-a)a)loglogr)r) for any additive cost function for any additive cost function ff.. Proof (intuitively):Proof (intuitively): There are There are O(O(loglogr)r)

reversals since with each pass there are reversals since with each pass there are half as many maximal runs of wrong-half as many maximal runs of wrong-sided elements on each side of the sided elements on each side of the median. Each reversal reveres at most median. Each reversal reveres at most b-ab-a elements and hence costs elements and hence costs O(f(b-a))O(f(b-a)) resulting in a total of resulting in a total of O(f(b-a)O(f(b-a)loglogr)r)..

Page 18: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Reversal SortReversal Sort

MedianEjectMedianEject is the partitioning is the partitioning operation of the following Quicksort-operation of the following Quicksort-like algorithm:like algorithm:

,

,

, / 2

/ 2 ,

ReversalSort a b

MedianEject a b

ReversalSort a b a

ReversalSort b a b

Page 19: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Lemmas (2)Lemmas (2)

Lemma 2:Lemma 2: ReversalSortReversalSort runs in runs intime for any additive cost function time for any additive cost function f(n)f(n).. Proof:Proof: By the master theorem, the By the master theorem, the

recurrencerecurrenceevaluates to .evaluates to .

2logO f n n

2logO f n n 2 / 2 logT n T n O f n n

Page 20: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Goal 2 – Approximating DistanceGoal 2 – Approximating Distance

From a biological point of view, From a biological point of view, constructing the least expensive constructing the least expensive transformation from a given transformation from a given permutation A to another permutation B permutation A to another permutation B is more interesting than minimizing is more interesting than minimizing diameter. This is because we want to diameter. This is because we want to reconstruct the evolutionary history reconstruct the evolutionary history from A and B, a history which from A and B, a history which presumably took the most parsimonious presumably took the most parsimonious possible path.possible path.

Page 21: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Definitions (4)Definitions (4)

We now show that for all permutations, We now show that for all permutations, the reversal sorting algorithm yields a cost the reversal sorting algorithm yields a cost which is which is times optimal for any times optimal for any additive cost function.additive cost function.

Our analysis requires the definition of a Our analysis requires the definition of a weighted graph weighted graph G(p)G(p) associated with a associated with a given permutation given permutation pp..

The vertices of The vertices of G(p)G(p) will be the will be the nn elements elements (positions) of p. There will be an edge (positions) of p. There will be an edge (i,j)(i,j) in in G(p)G(p) where . The weight where . The weight of this edge is .of this edge is .

2logO n

1 ; 1i p j j n f i j

Page 22: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Definitions (5)Definitions (5)

G(p)G(p) may be used to provide lower bounds may be used to provide lower bounds on the optimal cost of sorting. However, on the optimal cost of sorting. However, these bounds can be very coarse.these bounds can be very coarse.

Instead, we bound the optimal cost in Instead, we bound the optimal cost in terms of the weight of the heaviest non-terms of the weight of the heaviest non-crossing matchingcrossing matching M(G(p))M(G(p))..

We say a matchingWe say a matching M(G(p)) M(G(p)) (namely a (namely a group of edges from group of edges from G(p)G(p)) is non-crossing if) is non-crossing if

Such maximal matching can be easily Such maximal matching can be easily found using dynamic programming.found using dynamic programming.

, , , :i j k l M G p :x i x j k x l

Page 23: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Lemmas (3)Lemmas (3) Theorem 3:Theorem 3: The greedy breakpoint-merging The greedy breakpoint-merging

heuristic heuristic cancan yield a reversal sequence whose cost yield a reversal sequence whose cost is optimal.is optimal. Proof:Proof: Won’t be provided in this presentation. Won’t be provided in this presentation.

Lemma 4:Lemma 4: The weight of The weight of M(G(p))M(G(p)) is a lower bound is a lower bound on the reversal-sorting cost for permutation p on the reversal-sorting cost for permutation p under additive weight functions. under additive weight functions. Proof:Proof: Consider the simpler task of just placing the Consider the simpler task of just placing the

elements defining edges from elements defining edges from M(G(p))M(G(p)) into their proper into their proper position. This task can be done in cost position. This task can be done in cost f(w)f(w), where , where ww is the is the total weight of total weight of M(G(p))M(G(p)), by performing the reversals , by performing the reversals defined by the edges in the matches. Because none of the defined by the edges in the matches. Because none of the intervals overlap or nest, no longer reversal can be helpful intervals overlap or nest, no longer reversal can be helpful to move multiple elements into the proper position; to move multiple elements into the proper position; because the cost function is additive we cannot benefit by because the cost function is additive we cannot benefit by using shorter reversals.using shorter reversals.

2/ logn n

Page 24: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Lemmas (4)Lemmas (4) To argue that the weight of To argue that the weight of M(G(p))M(G(p)) is a is a goodgood

lower bound, we will bound certain properties lower bound, we will bound certain properties of of pp & & G(p)G(p) in the size of this matching. in the size of this matching.

Lemma 5:Lemma 5: Let denote the Let denote the kkthth edge of edge of M(G(p))M(G(p)), where . Let be a , where . Let be a function which equals function which equals 11 if intersects the if intersects the interval [i,…, j] and is zero otherwiseinterval [i,…, j] and is zero otherwise. . Then Then edge ifedge if

Proof:Proof: By definition, By definition, M(G(p))M(G(p)) is the maximum cost is the maximum cost non-crossing matching. Hence such an edge non-crossing matching. Hence such an edge (i, j)(i, j) cannot exist in cannot exist in G(p)G(p), for if so we could remove all , for if so we could remove all intersected matching edges and insert intersected matching edges and insert (i, j)(i, j) into into M(G(p))M(G(p)) to yield a higher cost non-crossing to yield a higher cost non-crossing matching.matching.

kE 1 k M G p , ,kE i j

kE ,i j G p

1

, ,M G p

k kkj i f E E i j

Page 25: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Lemmas (5)Lemmas (5)

Lemma 6:Lemma 6: The number of out-of-position The number of out-of-position elements in elements in pp is at most . is at most . Proof:Proof: Won’t be provided in this presentation. Won’t be provided in this presentation.

Lemma 7:Lemma 7: No element outside of the No element outside of the penumbra moves during the execution of penumbra moves during the execution of MedianEjectMedianEject.. Definition:Definition: The penumbra is the set of positions The penumbra is the set of positions

where out-of-position elements potentially lie where out-of-position elements potentially lie unioned with all positions overlapped by edges of unioned with all positions overlapped by edges of M(G(p)).M(G(p)).

Proof:Proof: Won’t be provided in this presentation. Won’t be provided in this presentation. Implied (By Lemma 7):Implied (By Lemma 7): Every round of non- Every round of non-

overlapping reversals costs at most overlapping reversals costs at most throughout the execution of throughout the execution of ReversalSortReversalSort..

3 M G p

3O M G p

Page 26: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Lemmas (6)Lemmas (6)

Corollary 1:Corollary 1: The cost of the each The cost of the each round of round of MedianEjectMedianEject is , is , and therefore and therefore ReversalSortReversalSort costs costs

. . Theorem 8:Theorem 8: The ReversalSort The ReversalSort

heuristic solution is at most a factor heuristic solution is at most a factor of times the optimal solution.of times the optimal solution. Proof:Proof: Derived from the previous Derived from the previous

lemmas.lemmas.

logO M G p n

2logO M G p n

2logO n

Page 27: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Coming Up NextComing Up Next

Improved bounds on Sorting with Improved bounds on Sorting with Length-Weighted Reversals (Extended Length-Weighted Reversals (Extended Abstract).Abstract).

Sorting by Length-weighted Reversals: Sorting by Length-weighted Reversals: Dealing with Signs and Circularity.Dealing with Signs and Circularity.

Conclusions, Suggestions & Questions Conclusions, Suggestions & Questions raised.raised.

Comments!?Comments!?

Page 28: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Improved bounds on Sorting with Improved bounds on Sorting with Length-Weighted ReversalsLength-Weighted Reversals

We will now approach the problem of We will now approach the problem of sorting integer sequences by length sorting integer sequences by length weighted reversals using a wider weighted reversals using a wider range of cost functions.range of cost functions.

For the cost function we consider a For the cost function we consider a wide class of functions, namelywide class of functions, namelywhere where ll is the length of the reversal. is the length of the reversal.

So far we have mainly dealt with the So far we have mainly dealt with the case where .case where .

| 0f l l

1

Page 29: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Sorting Sequences of 0’s and 1’sSorting Sequences of 0’s and 1’s

To sort a sequence of 0’s and 1’s.To sort a sequence of 0’s and 1’s. Recursively sort the left and right halves.Recursively sort the left and right halves. Perform one more reversal across the Perform one more reversal across the

median for a sorting cost of:median for a sorting cost of:

Pinter and Skiena used this algorithm Pinter and Skiena used this algorithm to obtain an upper bound of on to obtain an upper bound of on diameter for linear cost reversals.diameter for linear cost reversals. As was shown in first part of the As was shown in first part of the

presentation.presentation.

2 / 2 logB n B n O n O n n

2logO n n

Page 30: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Bounds and Approximation Ratios Bounds and Approximation Ratios for different valuesfor different values

The table summarizes the found The table summarizes the found bounds and approximations ratios for bounds and approximations ratios for different values.different values.

Proofs for some of the bounds and Proofs for some of the bounds and approximation ratios will be approximation ratios will be presented as proof of concept.presented as proof of concept.

Page 31: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Upper Bounds on Diameter (1)Upper Bounds on Diameter (1)

In the case of additive cost functions In the case of additive cost functions we saw that the upper bound on we saw that the upper bound on sorting any given permutation is sorting any given permutation is . .

Similarly, we would like to find such Similarly, we would like to find such bounds for other functions in the bounds for other functions in the class we are using (i.e. ).class we are using (i.e. ).

To do this, we will use the concept of To do this, we will use the concept of sorting sequences of 0’s and 1’s.sorting sequences of 0’s and 1’s.

2logO n n

1

Page 32: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Upper Bounds on Diameter (2)Upper Bounds on Diameter (2)

Case 1 – :Case 1 – : Consider the divide and conquer sorting Consider the divide and conquer sorting

algorithm described in the previous algorithm described in the previous slide. The recursion relation for sorting slide. The recursion relation for sorting the 0’s and 1’s becomes:the 0’s and 1’s becomes:

For permutations, the cost for the For permutations, the cost for the recursion sorting becomes:recursion sorting becomes:

Obviously, these results are upper Obviously, these results are upper bounds.bounds.

0 1

2 / 2B n B n n O n

2 / 2 logP n P n B n O n n

Page 33: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Upper Bounds on Diameter (3)Upper Bounds on Diameter (3)

Case 2 – :Case 2 – : Consider the divide and conquer sorting Consider the divide and conquer sorting

algorithm described in the previous algorithm described in the previous slide. The recursion relation for sorting slide. The recursion relation for sorting the 0’s and 1’s becomes:the 0’s and 1’s becomes:

For permutations, the cost for the For permutations, the cost for the recursion sorting becomes:recursion sorting becomes:

Obviously, these results are upper Obviously, these results are upper bounds.bounds.

1 2

2 / 2B n B n n O n

2 / 2P n P n B n O n

Page 34: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Upper Bounds on Diameter (4)Upper Bounds on Diameter (4)

Case 3 – :Case 3 – : This case has no use for reversals of This case has no use for reversals of

more than two elements. As such, more than two elements. As such, bubble sort is an asymptotically optimal bubble sort is an asymptotically optimal solution.solution.

As a result of this, a tight bound (Upper As a result of this, a tight bound (Upper and Lower) on the diameter is:and Lower) on the diameter is:

2

2n

Page 35: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Lower Bounds on Diameter:Lower Bounds on Diameter:ConceptConcept

Proving the lower bounds on the Proving the lower bounds on the diameters for different values of is diameters for different values of is much more complex than proving much more complex than proving the upper bounds.the upper bounds.

We will see the proof of a lower We will see the proof of a lower bound for a linear cost function .bound for a linear cost function . Tighter than what we have already seenTighter than what we have already seen..

1

Page 36: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Lemmas (7)Lemmas (7)

Theorem 2.3:Theorem 2.3: The cost to sort The cost to sort nn elements by reversals with a linear elements by reversals with a linear cost function is , even cost function is , even when all elements are 0’s and 1’s.when all elements are 0’s and 1’s.

Thus, our bounds for sorting 0/1 Thus, our bounds for sorting 0/1 sequences are tight (same Upper and sequences are tight (same Upper and Lower Bounds), but a multiplicative Lower Bounds), but a multiplicative gap ofgap of exists for sorting permutations. exists for sorting permutations.

1 logn n

logO n

Page 37: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Proof of Lower Bound on Diameter Proof of Lower Bound on Diameter for the Linear Cost Function (1)for the Linear Cost Function (1)

We will approach the problem by We will approach the problem by exhibiting a difficult sorting instance.exhibiting a difficult sorting instance. Specifically, we will prove a lower bound Specifically, we will prove a lower bound

ofof on the cost of sorting the on the cost of sorting the length-length-nn sequence 010101…01 by sequence 010101…01 by reversals.reversals.

The proof follows a potential function The proof follows a potential function argument.argument.

logn n

Page 38: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Definitions (6)Definitions (6)

Before the sorting begins, we match theBefore the sorting begins, we match the0 with the 1. Throughout the sorting 0 with the 1. Throughout the sorting algorithm we will keep this matching.algorithm we will keep this matching.

Let be the current distance between Let be the current distance between the 0 and the 1 after the reversal.the 0 and the 1 after the reversal.

When there is no ambiguity, we When there is no ambiguity, we abbreviate by .abbreviate by .

The potential function is:The potential function is:

thi

thi

id tthi thi tht

id

log ii

P t d t id t

Page 39: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Lemma 2.1:Lemma 2.1: The initial value of the The initial value of the potential function is 0, and the final value is potential function is 0, and the final value is . . We will show how a reversal affects the value ofWe will show how a reversal affects the value of

in the potential function by considering the in the potential function by considering the iithth (0,1)(0,1) pair. pair.

Observation 2.1:Observation 2.1: The distance can only The distance can only change when one element of the pair is inside change when one element of the pair is inside the reversal and the other is outside.the reversal and the other is outside.

Lemma 2.2:Lemma 2.2: A reversal of length A reversal of length kk increases the potential increases the potential P(t)P(t) by at most by at most 4k4k..

Proof of these two lemmas results in Proof of these two lemmas results in theorem 2.3.theorem 2.3.

Lemmas (8)Lemmas (8)

id

logn n

Page 40: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Proof of Lower Bound on Diameter Proof of Lower Bound on Diameter for the Linear Cost Function (2)for the Linear Cost Function (2)

Proof:Proof: Suppose that for a reversal of Suppose that for a reversal of length length kk, one the elements of a , one the elements of a (0,1)(0,1) pair is inside the reversal and pair is inside the reversal and another is outside so that is another is outside so that is affected by the reversal.affected by the reversal. At the most, the distance between the At the most, the distance between the

two elements of this pair can increase two elements of this pair can increase by by kk because each element is moved at because each element is moved at most by a distance most by a distance kk..

id

Page 41: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Proof of Lower Bound on Diameter Proof of Lower Bound on Diameter for the Linear Cost Function (3)for the Linear Cost Function (3)

Before reversal

After reversal

Page 42: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Proof of Lower Bound on Diameter Proof of Lower Bound on Diameter for the Linear Cost Function (4)for the Linear Cost Function (4)

Let us assume by symmetry that 0 is Let us assume by symmetry that 0 is outside the reversed sequence and outside the reversed sequence and the 1 is inside. Suppose that the the 1 is inside. Suppose that the distance from the 0 to the closest distance from the 0 to the closest element in the reversal is element in the reversal is ll..

The increase of the potential caused The increase of the potential caused by the change in for this pair is at by the change in for this pair is at most:most:

id

log log log 1 / log 1 /i i ik d d k d k l

Page 43: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Proof of Lower Bound on Diameter Proof of Lower Bound on Diameter for the Linear Cost Function (5)for the Linear Cost Function (5)

The distance The distance ll must be a natural must be a natural number and occurs at most twice in number and occurs at most twice in one reversal, once on the left side one reversal, once on the left side and once on the right side of the and once on the right side of the reversed sequence.reversed sequence.

According to observation 2.1, there According to observation 2.1, there are at most are at most kk such pair whose such pair whose distance changes the value of the distance changes the value of the potential function.potential function.

Page 44: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Proof of Lower Bound on Diameter Proof of Lower Bound on Diameter for the Linear Cost Function (6)for the Linear Cost Function (6)

As a result, the increase in the value As a result, the increase in the value of the potential function increases by of the potential function increases by at most:at most:

Notice that grows as Notice that grows as ll gets gets smaller.smaller.

/ 2 / 2

1 1

1

2 log 1 / 2 1 log /

2 log / 2 log / !

k k

j j

kk

j

k j k j

k k j k k k

log 1 /k l

Page 45: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Proof of Lower Bound on Diameter Proof of Lower Bound on Diameter for the Linear Cost Function (6)for the Linear Cost Function (6)

By Sterling’s approximation,By Sterling’s approximation,therefore and the therefore and the potential thus increases by at mostpotential thus increases by at most . .

1: / !k kk k k e

3log / ! log 2kk k k e k

32 42k k k

Page 46: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Sorting by Length-weighted Reversals: Sorting by Length-weighted Reversals: Dealing with Signs and Circularity.Dealing with Signs and Circularity.

Abstract:Abstract:

Sorting linear and circular permutations Sorting linear and circular permutations and 0/1 sequences by reversals in a and 0/1 sequences by reversals in a length sensitive cost model.length sensitive cost model.

We consider both the signed and We consider both the signed and unsigned case.unsigned case.

Page 47: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

What Lies AheadWhat Lies Ahead Lower and upper bounds on the various Lower and upper bounds on the various

cases.cases.

Mentions of some approximations that Mentions of some approximations that guarantee the bounds shownguarantee the bounds shown

Partial proofs some of the bounds and Partial proofs some of the bounds and approximations.approximations.

Cost functions are still of the classCost functions are still of the class . . | 0f l l

Page 48: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Circularity generally offers more opportunities to Circularity generally offers more opportunities to reduce the optimal cost to sort a given permutation reduce the optimal cost to sort a given permutation by reversals.by reversals.

At the same time, it presents a greater challenge of At the same time, it presents a greater challenge of finding a more efficient solution.finding a more efficient solution.

A non unit cost model exacerbates these problems A non unit cost model exacerbates these problems even further.even further.

Take as an example the permutation .Take as an example the permutation . One can sort it by using two reversals.One can sort it by using two reversals. In the circular case, where the two ends of the permutation In the circular case, where the two ends of the permutation

meet, one can sort it by using one reversal.meet, one can sort it by using one reversal. In the case of a unit cost model, the ratio of the costs is 2.In the case of a unit cost model, the ratio of the costs is 2. However, in the case of a linear cost model, the ratio is However, in the case of a linear cost model, the ratio is

. .

A Word (or Two) on CircularityA Word (or Two) on Circularity

/ 2 1 / 2 110 1 0n n

n

Page 49: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Relationship of Costs for the Relationship of Costs for the Different CasesDifferent Cases

The following relationships hold for The following relationships hold for the four different cases:the four different cases:

1 unsigned circular unsigned liner signed linear

2 unsigned circular signed circular signed linear

Page 50: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Bounds and Approximation RatiosBounds and Approximation Ratios

Lower and Lower and upper bounds upper bounds

for SBR of for SBR of singed or singed or

unsigned and unsigned and linear or linear or

circular 0/1 circular 0/1 sequences and sequences and permutations.permutations.

Approximation ratios for SBR of signed linear as well as Approximation ratios for SBR of signed linear as well as signed and unsigned circular 0/1 sequences and signed and unsigned circular 0/1 sequences and

permutations.permutations.

Page 51: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Approximation Algorithms for Approximation Algorithms for Sorting 0/1 Sequences Sorting 0/1 Sequences

We will now introduce lower bounds for We will now introduce lower bounds for sorting linear signed as well as circular sorting linear signed as well as circular unsigned 0/1 sequences.unsigned 0/1 sequences.

We will see an approximation algorithm We will see an approximation algorithm for linear signed 0/1 sequences.for linear signed 0/1 sequences.

We will deal with the case of .We will deal with the case of .0 1

Page 52: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

SBR of Circular Unsigned 0/1 SBR of Circular Unsigned 0/1 Sequences – DefinitionsSequences – Definitions

Given a circular sequence Given a circular sequence SS, denote , denote the length of the 0 and 1 blocks the length of the 0 and 1 blocks contained in contained in SS by and by and respectively.respectively.

Let and .Let and .

We define the potential function We define the potential function PP((SS) ) as follows:as follows:

1 2... kz z z 1 2... kw w w

1max i k iZ z 1max i k iW w

1

k

i iiP S z w Z W

Page 53: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

SBR of Circular Unsigned 0/1 SBR of Circular Unsigned 0/1 Sequences – LemmasSequences – Lemmas

Lemma 1:Lemma 1: A reversal of length A reversal of length rr acting on a circular sequence acting on a circular sequence SS increases the value of the potential increases the value of the potential function function PP((SS) by at most .) by at most . Proof:Proof: Won’t be provided in this Won’t be provided in this

presentation.presentation. Lemma 2:Lemma 2: The function The function

is a lower bound for sorting an is a lower bound for sorting an unsigned circular sequence unsigned circular sequence SS.. Proof:Proof: By induction (next slide). By induction (next slide).

4r

14V S P S

Page 54: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

SBR of Circular Unsigned 0/1 SBR of Circular Unsigned 0/1 Sequences – ProofSequences – Proof

Let Let mm be the number of reversals in some optimal be the number of reversals in some optimal sorting solution. We want to prove that if a sorting sorting solution. We want to prove that if a sorting solution uses exactly solution uses exactly mm reversals it costs at least reversals it costs at least VV((SS).).

Base case: Base case: mm==00 trivial. trivial. Induction step: Suppose the claim holds for all . Induction step: Suppose the claim holds for all .

Consider a 0/1 sequence Consider a 0/1 sequence SS of that has an optimal of that has an optimal sorting series of sorting series of reversals. Denote the first reversals. Denote the first reversal and let reversal and let rr be its length. Can be be its length. Can be sorted by sorted by kk reversals and hence reversals and hence VV((SS’) is a lower ’) is a lower bound for sorting bound for sorting SS’. By lemma 1 we get ’. By lemma 1 we get and by the definition of and by the definition of V V we know get we know get . Therefore: . Therefore: as needed.as needed.

'S S

m k

1m k

' 4P S r P S 'V S r V S

' 'opt S opt S r V S r V S

Page 55: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

SBR of Linear Signed 0/1 SBR of Linear Signed 0/1 Sequences – DefinitionsSequences – Definitions

Consider a linear signed 0/1 sequence. Consider a linear signed 0/1 sequence. Define a block in the sequence to be a Define a block in the sequence to be a contiguous segment of 0’s or 1’s of the contiguous segment of 0’s or 1’s of the same sign.same sign. Notice that there are four kinds of blocks Notice that there are four kinds of blocks

in such a sequence.in such a sequence. We represent the sequence as a series We represent the sequence as a series

ofof . Let us denote . Let us denote as the potential function for such a as the potential function for such a linear sequence linear sequence SS..

1 2, ,..., mb b b 1

12

m

iiV S b

Page 56: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

SBR of Linear Signed 0/1 SBR of Linear Signed 0/1 Sequences – LemmasSequences – Lemmas

Lemma 3:Lemma 3: The potential V(S) is a The potential V(S) is a lower bound on the cost of sorting lower bound on the cost of sorting linear signed sequences.linear signed sequences. Proof:Proof: Won’t be provided in this Won’t be provided in this

presentation.presentation. Theorem 2:Theorem 2: The algorithm The algorithm

signedImprovedDCsignedImprovedDC is an O(1) is an O(1) approximation algorithm.approximation algorithm. Proof:Proof: Won’t be provided in this Won’t be provided in this

presentation.presentation. And the algorithm? In the next slide…And the algorithm? In the next slide…

Page 57: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

SBR of Linear Signed 0/1 Sequences – SBR of Linear Signed 0/1 Sequences – Approximation AlgorithmApproximation Algorithm

Given a signed sequence Given a signed sequence SS, let unsign(, let unsign(SS) ) represent the sequence without the signs.represent the sequence without the signs.

signedImprovedDC(signedImprovedDC(SS))1.1. U U unsign(S) unsign(S)2.2. u u improvedDC(U) improvedDC(U)3.3. Mimic the reversals used to sort U on S. Mimic the reversals used to sort U on S.

Denote the resulting sequence as S Denote the resulting sequence as S ’’..4.4. Reverse elements of S Reverse elements of S ’’ with a negative sign. with a negative sign.

Let s be the cost of this step.Let s be the cost of this step.5.5. Output s + u Output s + u

improvedDC(S)improvedDC(S) is an is an OO((11) approximation ) approximation algorithm for unsigned sorting of linear algorithm for unsigned sorting of linear 0/1 sequences when . (Not supplied 0/1 sequences when . (Not supplied and not proved in this presentation.)and not proved in this presentation.)

1

Page 58: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Summary – What We’ve Seen (1)Summary – What We’ve Seen (1)

The introduction of Length Weighted The introduction of Length Weighted Models for Sorting By Reversals.Models for Sorting By Reversals. Incentive:Incentive:

Unit cost isn’t biologically defensible.Unit cost isn’t biologically defensible. Experiments show that length weighted Experiments show that length weighted

models may help substantially in biasing models may help substantially in biasing between two evolutionary paths.between two evolutionary paths.

Lower and Upper Bounds on sorting with Lower and Upper Bounds on sorting with additive cost functions.additive cost functions. Upper Bounds: For any given permutation.Upper Bounds: For any given permutation. Lower Bounds: For a specific permutation p.Lower Bounds: For a specific permutation p.

Page 59: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Summary – What We’ve Seen (2)Summary – What We’ve Seen (2)

Improved Bounds on Cost of Length Improved Bounds on Cost of Length Weighted Sorting By Reversals:Weighted Sorting By Reversals: Dealing with a wider range of functions Dealing with a wider range of functions

. . Improved Upper Bounds on sorting Improved Upper Bounds on sorting

unsigned 0/1 sequences and permutations unsigned 0/1 sequences and permutations for all values of .for all values of .

Improved Lower Bound for the case of .Improved Lower Bound for the case of . An improvement from something we’ve already An improvement from something we’ve already

seen.seen.

f l l

0

1

Page 60: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Summary – What We’ve Seen (3)Summary – What We’ve Seen (3)

Sorting By Reversals by Length Weighted Sorting By Reversals by Length Weighted Models – Dealing with Signs and Models – Dealing with Signs and Circularity:Circularity: Still with the same family of functions.Still with the same family of functions. Lower Bounds for the cases circular unsigned Lower Bounds for the cases circular unsigned

and linear singed 0/1 sequences.and linear singed 0/1 sequences. Approximation algorithm for the sorting of Approximation algorithm for the sorting of

linear signed 0/1 sequences.linear signed 0/1 sequences. Many lemmas, theorems and corollaries Many lemmas, theorems and corollaries

Page 61: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

Questions RaisedQuestions Raised Aside from what hasn’t been covered in this Aside from what hasn’t been covered in this

presentation (which is, other than more presentation (which is, other than more bounds and approximation algorithms, bounds and approximation algorithms, another gargantuan set of lemmas, theorems another gargantuan set of lemmas, theorems and corollaries) there are many questions left and corollaries) there are many questions left open.open. What is the right cost function, or what are the What is the right cost function, or what are the

right cost functions for various types of sequences?right cost functions for various types of sequences? Is the family of functions presented in this Is the family of functions presented in this

presentation large enough to contain the right presentation large enough to contain the right one(s)?one(s)?

Is the real cost function defined differently over Is the real cost function defined differently over different ranges? Should it be species specific?different ranges? Should it be species specific?

Should we include more data (other than length) Should we include more data (other than length) for computing a reversals cost? e.g. The place of for computing a reversals cost? e.g. The place of the reversal or the sequences being reversed.the reversal or the sequences being reversed.

Page 62: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

And least but not last…And least but not last…(as far as this presentation goes)(as far as this presentation goes)

Questions?Questions?

Comments!?Comments!?

Page 63: Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf

BibliographyBibliography

Pinter, R.Y., and Skiena, S., "Pinter, R.Y., and Skiena, S., "Sorting with length-weighted reversalsSorting with length-weighted reversals", ", Proceedings of the Proceedings of the 13th International Conference on Genome Informatics (GIW 13th International Conference on Genome Informatics (GIW 2002),2002), December 2002, pp. 103-111. December 2002, pp. 103-111.

M. A. Bender, D. Ge, S. He, H. Hu, R. Y. Pinter, S. Skiena, M. A. Bender, D. Ge, S. He, H. Hu, R. Y. Pinter, S. Skiena, and F. Swidan. "and F. Swidan. "Improved Bounds on Sorting with Length-Weighted ReversaImproved Bounds on Sorting with Length-Weighted Reversals (Extended Abstract).ls (Extended Abstract).“ “ Proceedings of the 15th Annual ACM-SIAM Symposium on Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA),Discrete Algorithms (SODA), pages 912-921, 2004. pages 912-921, 2004.

F. Swidan, M. A. Bender, D. Ge, S. He, H. Hu, and R. Pinter: "F. Swidan, M. A. Bender, D. Ge, S. He, H. Hu, and R. Pinter: "Sorting by length-weighted reversals: Dealing with signs anSorting by length-weighted reversals: Dealing with signs and circularity".d circularity"." " Proceedings of the 15th Annual Symposium on Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM),Combinatorial Pattern Matching (CPM), Lecture Notes in Lecture Notes in Computer Science (LNCS), Vol. 3109, July 2004, pp. 32-46. Computer Science (LNCS), Vol. 3109, July 2004, pp. 32-46.