1 string matching with errors the theory and computation of evolutionary distances: pattern...
TRANSCRIPT
![Page 1: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/1.jpg)
1
String Matching with Errors
The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20, No. 1, 1980, pp. 359~373.
Speaker: C. C. LinAdviser: R. C. T. Lee
![Page 2: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/2.jpg)
2
In the following, we will present a problem related
to the notion of edit distance.
Next, let us introduce edit distance.
![Page 3: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/3.jpg)
3
In edit distance, there are three types of differences
between two strings X and Y:
Insertion: a symbol of Y is missing in X at a
corresponding position, with its cost being 1.
Substitution: symbols at corresponding positions are
distinct, with its cost being 1.
Deletion: a symbol of X is missing in Y at a
corresponding position, with its cost being 1. X: G C AY: G - A
X : A C CY : T C C
X : A - T Y : A G T
![Page 4: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/4.jpg)
4
Given two strings X and Y, the edit distance
between X and Y is the minimum number of
insertions, deletions and substitutions needed to
transform Y to X.
![Page 5: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/5.jpg)
5
String X ︰ ATGAATCTTACCGCCTCG String Y ︰ ATGAGGCTCTGGCCCCTG
Transformation (from string Y to string X)
String X:A T G A A – – T C T T A C C G C C T C G String Y:A T G A G G C T C T G G C C – C C T – G
EDIT(X, Y)=7 (2 insertions, 2 deletions and 3 changes).
![Page 6: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/6.jpg)
6
Next, we will introduce a dynamic programming
method to compute the edit distance between
two strings X and Y.
![Page 7: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/7.jpg)
7
.],0[,]0,[
otherwise. 1])[ ],[(
and ],[][ if 0,])[ ],[( where
])[ ],[ (]1 ,1[
1] ,1 [
1]1 , [
min ] , [
jjEDITiiEDIT
jyix
jyixjyix
jyixj iEDIT
jiEDIT
jiEDIT
jiEDIT
Dynamic Programming for Edit Distance:
(Delete)
(Insert)
(Substitute)
![Page 8: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/8.jpg)
8
a b c a b b a
c
b
a
b
a
c
0 1 2 3 4 5 6 7
1
2
3
4
5
6
Given
X=abcabba
Y=cbabac
![Page 9: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/9.jpg)
9
a b c a b b a
c
b
a
b
a
c
0 1 2 3 4 5 6 7
1 1
2
3
4
5
6
Given
X=abcabba
Y=cbabac
![Page 10: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/10.jpg)
10
a b c a b b a
c
b
a
b
a
c
0 1 2 3 4 5 6 7
1 1 2
2
3
4
5
6
Given
X=abcabba
Y=cbabac
![Page 11: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/11.jpg)
11
a b c a b b a
c
b
a
b
a
c
0 1 2 3 4 5 6 7
1 1 2 2
2
3
4
5
6
Given
X=abcabba
Y=cbabac
![Page 12: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/12.jpg)
12
a b c a b b a
c
b
a
b
a
c
0 1 2 3 4 5 6 7
1 1 2 2 3
2
3
4
5
6
Given
X=abcabba
Y=cbabac
![Page 13: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/13.jpg)
13
a b c a b b a
c
b
a
b
a
c
Given
X=abcabba
Y=cbabac
0 1 2 3 4 5 6 7
1 1 2 2 3 4 5 6
2 2 1 2 3 3 4 5
3 2 2 2 2 3 4 4
4 3 2 3 3 2 3 4
5 4 3 3 3 3 3 3
6 5 4 3 4 4 4 4
![Page 14: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/14.jpg)
14
a b c a b b a
c
b
a
b
a
c
0 1 2 3 4 5 6 7
1 1 2 2 3 4 5 6
2 2 1 2 3 3 4 5
3 2 2 2 2 3 4 4
4 3 2 3 3 2 3 4
5 4 3 3 3 3 3 3
6 5 4 3 4 4 4 4
EDIT(X, Y)=4
a
c
Given
X=abcabba
Y=cbabac
Substitute
![Page 15: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/15.jpg)
15
a b c a b b a
c
b
a
b
a
c
0 1 2 3 4 5 6 7
1 1 2 2 3 4 5 6
2 2 1 2 3 3 4 5
3 2 2 2 2 3 4 4
4 3 2 3 3 2 3 4
5 4 3 3 3 3 3 3
6 5 4 3 4 4 4 4
EDIT(X, Y)=4
ba
ac
Given
X=abcabba
Y=cbabac
Substitute
![Page 16: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/16.jpg)
16
a b c a b b a
c
b
a
b
a
c
EDIT(X, Y)=4
bba
bac
Given
X=abcabba
Y=cbabac
0 1 2 3 4 5 6 7
1 1 2 2 3 4 5 6
2 2 1 2 3 3 4 5
3 2 2 2 2 3 4 4
4 3 2 3 3 2 3 4
5 4 3 3 3 3 3 3
6 5 4 3 4 4 4 4
Match
![Page 17: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/17.jpg)
17
a b c a b b a
c
b
a
b
a
c
EDIT(X, Y)=4
abba
abac
Given
X=abcabba
Y=cbabac
0 1 2 3 4 5 6 7
1 1 2 2 3 4 5 6
2 2 1 2 3 3 4 5
3 2 2 2 2 3 4 4
4 3 2 3 3 2 3 4
5 4 3 3 3 3 3 3
6 5 4 3 4 4 4 4
Match
![Page 18: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/18.jpg)
18
a b c a b b a
c
b
a
b
a
c
0 1 2 3 4 5 6 7
1 1 2 2 3 4 5 6
2 2 1 2 3 3 4 5
3 2 2 2 2 3 4 4
4 3 2 3 3 2 3 4
5 4 3 3 3 3 3 3
6 5 4 3 4 4 4 4
EDIT(X, Y)=4
cabba
–abac
Given
X=abcabba
Y=cbabacInsert
![Page 19: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/19.jpg)
19
44443456
33333345
43233234
44322223
54332122
65432211
7654321
EDIT(X, Y)=4
bcabba
b–abac
Given
X=abcabba
Y=cbabac
c
a
b
a
b
c
abbacba
Match
0
![Page 20: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/20.jpg)
20
0 1 2 3 4 5 6 7
1 1 2 2 3 4 5 6
2 2 1 2 3 3 4 5
3 2 2 2 2 3 4 4
4 3 2 3 3 2 3 4
5 4 3 3 3 3 3 3
6 5 4 3 4 4 4 4
EDIT(X, Y)=4
abcabba
cb–abac
Given
X=abcabba
Y=cbabac
a b c a b b a
c
b
a
b
a
c
Substitute
![Page 21: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/21.jpg)
21
a b c a b b a
c
b
a
b
a
c
Given
X=abcabba
Y=cbabac
0 1 2 3 4 5 6 7
1 1 2 2 3 4 5 6
2 2 1 2 3 3 4 5
3 2 2 2 2 3 4 4
4 3 2 3 3 2 3 4
5 4 3 3 3 3 3 3
6 5 4 3 4 4 4 4
EDIT(X, Y)=4
abcabba-
cb–ab-ac
Substitute
Match
InsertMatch
Match
Insert Match
Delete
![Page 22: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/22.jpg)
22
a b c a b b a
c
b
a
b
a
c
Given
X=abcabba
Y=cbabac
0 1 2 3 4 5 6 7
1 1 2 2 3 4 5 6
2 2 1 2 3 3 4 5
3 2 2 2 2 3 4 4
4 3 2 3 3 2 3 4
5 4 3 3 3 3 3 3
6 5 4 3 4 4 4 4
EDIT(X, Y)=4
abcabba-
cb–a-bac
![Page 23: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/23.jpg)
23
We can recognize the time complexity of computing
edit distance by the above algorithm to be O(mn)
and space complexity O(mn) where n and m are the
size of text and pattern, respectively.
![Page 24: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/24.jpg)
24
In the following, we will introduce the topic, called
the “string matching with errors” problem.
![Page 25: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/25.jpg)
25
The definition of the problem: Given a pattern P of length m and a text T of length n, find a substring S of T such that EDIT(S, P) is minimal.
Given: T=abcabba
P=cbabac
Find: S=cabba
EDIT(S, P)=3
P=cbabac
S=c–abba
Given: T=abcabba
P=cbabac
T’s substring K=bcabb
EDIT(K, P)=4
P=–cbabac
K=bc–ab–b
![Page 26: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/26.jpg)
26
.0],0[,]0,[]0,0[
otherwise. 1])[ ],[(
and ],[][ if 0,])[ ],[( where
])[ ],[ (]1 ,1 [
1] ,1 [
1]1 , [
min ] , [
jSEiiSESE
jyix
jyixjyix
jyixjiSE
jiSE
jiSE
jiSE
Dynamic Programming for the String Matching with Error Problem:
![Page 27: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/27.jpg)
27
The difference between EDIT[i, j] is that the EDIT[0, j]=j for the edit distance finding problem and SE[0,j]=0 for the string with error problem.
.],0[,]0,[
otherwise. 1])[ ],[(
and ],[][ if 0,])[ ],[( where
])[ ],[ (]1 ,1 [
1] ,1 [
1]1 , [
min ] , [
jjEDITiiEDIT
jyix
jyixjyix
jyixjiEDIT
jiEDIT
jiEDIT
jiEDIT
The dynamic programming approach for the edit distance problem:
![Page 28: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/28.jpg)
28
In the edit distance problem, we have EDIT[0, j]=j.
In the string matching with error problem, we set SE[0, j]=0.
![Page 29: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/29.jpg)
29
a b c a b b a
c
b
a
b
a
c33343456
22233345
22123234
12212223
21111122
11110111
00000000T=abcabba
P=cbabac
Since this path starts at the bottom row and ends at the top row with SE(0, j)=0, this shows that there exists a substring S in T such that EDIT(S, P)=3.
![Page 30: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/30.jpg)
30
We find the lowest value of the last row and trace
back from the point.
Our output may be several strings.
![Page 31: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/31.jpg)
31
a b c a b b a
c
b
a
b
a
c
0 0 0 0 0 0 0 0
1 1 1 0 1 1 1 1
2 2 1 1 1 1 1 2
3 2 2 2 1 2 2 1
4 3 2 3 2 1 2 2
5 4 3 3 3 2 2 2
6 5 4 3 4 3 3 3
S=cabba
T=abcabba
P=cbabac
T: abc–abba
P: cbabac
![Page 32: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/32.jpg)
32
0 1 2 3 4 5
1 0 1 2 3 4
2 1 1 1 2 3
3 2 1 2 2 2
4 3 2 1 2 3
5 4 3 2 2 2
6 5 4 3 3 3
T=abcabba
P=cbabac
EDIT(S, P)=3
edit distance
c a b b a
c
b
a
b
a
c
S: c–abba
P: cbabac
![Page 33: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/33.jpg)
33
a b c a b b a
c
b
a
b
a
c
0 0 0 0 0 0 0 0
1 1 1 0 1 1 1 1
2 2 1 1 1 1 1 2
3 2 2 2 1 2 2 1
4 3 2 3 2 1 2 2
5 4 3 3 3 2 2 2
6 5 4 3 4 3 3 3
T=abcabba
P=cbabac
S: cabba–
P: cbabac
EDIT(S, P)=3
![Page 34: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/34.jpg)
34
a b c a b b a
c
b
a
b
a
c
0 0 0 0 0 0 0 0
1 1 1 0 1 1 1 1
2 2 1 1 1 1 1 2
3 2 2 2 1 2 2 1
4 3 2 3 2 1 2 2
5 4 3 3 3 2 2 2
6 5 4 3 4 3 3 3
T=abcabba
P=cbabac
S: c-ab--
P: cbabac
EDIT(S, P)=3
![Page 35: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/35.jpg)
35
a b c a b b a
c
b
a
b
a
c
0 0 0 0 0 0 0 0
1 1 1 0 1 1 1 1
2 2 1 1 1 1 1 2
3 2 2 2 1 2 2 1
4 3 2 3 2 1 2 2
5 4 3 3 3 2 2 2
6 5 4 3 4 3 3 3
T=abcabba
P=cbabac
S: --ab-c
P: cbabac
EDIT(S, P)=3
![Page 36: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/36.jpg)
36
References
For Edit Distance Computation:[NW70] Neddleman, S.B., and Wunsch, C.D., A general method applicable to the search for similarities in the aminoacid sequence of two proteins, Journal of Molecular Biology 48 (1970): 443-453.
For String matching with error:
[S80] The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20, No. 1, 1980, pp. 359~373.
![Page 37: 1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,](https://reader035.vdocument.in/reader035/viewer/2022062511/55149f92550346d36e8b594e/html5/thumbnails/37.jpg)
37
Thank you