simple and fast linear space computation of longest common subsequences claus rick, 1999

36
Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Post on 21-Dec-2015

227 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Simple and fast linear space computation of Longest common subsequences

Claus Rick, 1999

Page 2: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

What is the LCS problem?

A A B A C

A B C

…Finding a sequence of greatest possible length that can be obtained From both A and B by deleting zero or more (not necessarily adjacent) symbols.

Page 3: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Some boring history…Year Author Time Constants Paradigm

1975 Hirschberg O(mn) 2 Dyn. Prog.

1985 Apostolico, Guerra O(mLgm+pm) [2,logm] contours

1986 Myers O(n(n-p)) 2 Shortest path

1987 Kumar, Rangan O(n(m-p)) 3 contours

1990 Wu et al. O(n(m-p)) 2 Shortest path

1992 Apostolico, et al. O(n(m-p)) 3 contours

1992 Apostolico, et al. O(pm) 3 contours

1999 Goeman, Clausen O(min(pm, mLgm + p(n-p)])

[5,25,lgM] contours

1999 This article O(min(pm,p(n-p)]) 2 contours

Page 4: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Pre-Info

Divide and conquer Midpoint

Page 5: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Some basic terms

Ordered Pair (i,j)

A A B A C

A B C

(2,3)= (A,C)

Page 6: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Some basic terms

Match

A A B A C

A B C

Page 7: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Some basic terms

Chain

A A B A C

A B C

Page 8: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Rank k

A A B A C

A B C

Page 9: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Some basic terms

c b a b b a c a cabacbcba

Matching Matrix

Page 10: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Some basic terms

Dominant matches

All Upper-left matches in each rank

Page 11: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

c b a b b a c a cabacbcba

Dominant matches

1

2

3

4

5

Page 12: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

A A B A C

A B C

Page 13: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

c b a b b a c a cabacbcba

Page 14: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

c b a b b a c a c

abacbcba

Backward contours (BC)

1

2

3

4

5

Page 15: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Some last basic terms

FCk

BCk

Page 16: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

c b a b b a c a cabacbcba

1

2

3

4

5

Forward contours (FC)

Page 17: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

c b a b b a c a c

abacbcba

Backward contours (BC)

1

2

3

4

5

Page 18: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Let p be the length of an LCS between strings A and B. Then for every match (i,j) the following holds:

•There is an LCS containing (i,j) if and only if (i,j) is on the kth forward contour and on the (p-k+1)st backward contour.

Lemma 1

Page 19: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Lemma 1- proof

|BC|- (p-k+1)|FC|= (k)

P

P

K <(p-k+1)<(p-k+1)

Page 20: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Start calculating

FC1 BC1 FC2 BC2

Sooner or later…

Page 21: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Really really last terms

Define sets Mi as:

M0= M

M1= M0\FC1

M2= M1\BC1

M2i-1=M2(i-1) \FCi

M2i=M2i-1\BCi

Page 22: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

c b a b b a c a cabacbcba

abacbcba

c b a b b a c a c

M

Page 23: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

c b a b b a c a cabacbcba

abacbcba

c b a b b a c a c

M1M2M3M4M5

Page 24: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Let call the first empty Mi….

M p’

Page 25: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Lemma 2

The Length of an LCS is p’ and each match in M(p’-1) is a possible midpoint

Page 26: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Lemma 2- proof

K

M 0

K-1K-210

M 2M 1M k-1M kK=p

Page 27: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Little problem…

We can`t keep tracks of each set- very expensive

Page 28: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

c b a b b a c a cabacbcba

abacbcba

c b a b b a c a c

Page 29: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

What do we do?

Keep only dominant matches…

When we see a dominant match below- done.

Page 30: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

c b a b b a c a cabacbcba

abacbcba

c b a b b a c a c

Page 31: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Lets define:

FCf’ , BCb’ the minimal indices as stated above

Page 32: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Lemma 3

The Length of an LCS is b’ + f’ -1.

Page 33: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Complexity

Finding the dominant matches each contour:

O(min(m, (n-p))

Number of contours:

P

O(Min(pm, p(n-p)

Page 34: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

The End

Page 35: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Simple and fast linear space computation of longest common subsequence

Written by: Claus Rick,1999

Based on algorithm by:D.Hirschberg, 1975

Cast:

MatricesLines

ArrowsSquares

Blue Red

BrownGreyBlack

String AString B

Presentation: Uri Scheiner

No Dominant Matches were harmed during the making of this presentation

Page 36: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999

Appendix

What is the LCS

Divided And Conquer

Match

Chain

Dominant Matches

FC

BC

Lemma 1

Define M…

Lemma 2

Keep just Dominant…

Lemma 3

Complexity