simple and fast linear space computation of longest common subsequences claus rick, 1999

Post on 21-Dec-2015

227 Views

Category:

Documents

6 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Simple and fast linear space computation of Longest common subsequences

Claus Rick, 1999

What is the LCS problem?

A A B A C

A B C

…Finding a sequence of greatest possible length that can be obtained From both A and B by deleting zero or more (not necessarily adjacent) symbols.

Some boring history…Year Author Time Constants Paradigm

1975 Hirschberg O(mn) 2 Dyn. Prog.

1985 Apostolico, Guerra O(mLgm+pm) [2,logm] contours

1986 Myers O(n(n-p)) 2 Shortest path

1987 Kumar, Rangan O(n(m-p)) 3 contours

1990 Wu et al. O(n(m-p)) 2 Shortest path

1992 Apostolico, et al. O(n(m-p)) 3 contours

1992 Apostolico, et al. O(pm) 3 contours

1999 Goeman, Clausen O(min(pm, mLgm + p(n-p)])

[5,25,lgM] contours

1999 This article O(min(pm,p(n-p)]) 2 contours

Pre-Info

Divide and conquer Midpoint

Some basic terms

Ordered Pair (i,j)

A A B A C

A B C

(2,3)= (A,C)

Some basic terms

Match

A A B A C

A B C

Some basic terms

Chain

A A B A C

A B C

Rank k

A A B A C

A B C

Some basic terms

c b a b b a c a cabacbcba

Matching Matrix

Some basic terms

Dominant matches

All Upper-left matches in each rank

c b a b b a c a cabacbcba

Dominant matches

1

2

3

4

5

A A B A C

A B C

c b a b b a c a cabacbcba

c b a b b a c a c

abacbcba

Backward contours (BC)

1

2

3

4

5

Some last basic terms

FCk

BCk

c b a b b a c a cabacbcba

1

2

3

4

5

Forward contours (FC)

c b a b b a c a c

abacbcba

Backward contours (BC)

1

2

3

4

5

Let p be the length of an LCS between strings A and B. Then for every match (i,j) the following holds:

•There is an LCS containing (i,j) if and only if (i,j) is on the kth forward contour and on the (p-k+1)st backward contour.

Lemma 1

Lemma 1- proof

|BC|- (p-k+1)|FC|= (k)

P

P

K <(p-k+1)<(p-k+1)

Start calculating

FC1 BC1 FC2 BC2

Sooner or later…

Really really last terms

Define sets Mi as:

M0= M

M1= M0\FC1

M2= M1\BC1

M2i-1=M2(i-1) \FCi

M2i=M2i-1\BCi

c b a b b a c a cabacbcba

abacbcba

c b a b b a c a c

M

c b a b b a c a cabacbcba

abacbcba

c b a b b a c a c

M1M2M3M4M5

Let call the first empty Mi….

M p’

Lemma 2

The Length of an LCS is p’ and each match in M(p’-1) is a possible midpoint

Lemma 2- proof

K

M 0

K-1K-210

M 2M 1M k-1M kK=p

Little problem…

We can`t keep tracks of each set- very expensive

c b a b b a c a cabacbcba

abacbcba

c b a b b a c a c

What do we do?

Keep only dominant matches…

When we see a dominant match below- done.

c b a b b a c a cabacbcba

abacbcba

c b a b b a c a c

Lets define:

FCf’ , BCb’ the minimal indices as stated above

Lemma 3

The Length of an LCS is b’ + f’ -1.

Complexity

Finding the dominant matches each contour:

O(min(m, (n-p))

Number of contours:

P

O(Min(pm, p(n-p)

The End

Simple and fast linear space computation of longest common subsequence

Written by: Claus Rick,1999

Based on algorithm by:D.Hirschberg, 1975

Cast:

MatricesLines

ArrowsSquares

Blue Red

BrownGreyBlack

String AString B

Presentation: Uri Scheiner

No Dominant Matches were harmed during the making of this presentation

Appendix

What is the LCS

Divided And Conquer

Match

Chain

Dominant Matches

FC

BC

Lemma 1

Define M…

Lemma 2

Keep just Dominant…

Lemma 3

Complexity

top related