dynamic edit distance table under a general weighted cost function
DESCRIPTION
Dynamic Edit Distance Table under a General Weighted Cost Function. Heikki Hyyrö (University of Tampere, Finland) Kazuyuki Narisawa (Kyushu University, Japan) and Shunsuke Inenaga (Kyushu University, Japan). Contents. Edit Distance Left Increment/Decrement Edit Distance Problem - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/1.jpg)
Dynamic Edit Distance Table under a General Weighted Cost Function
Heikki Hyyrö (University of Tampere, Finland) Kazuyuki Narisawa (Kyushu University, Japan)
and Shunsuke Inenaga (Kyushu University, Japan)
![Page 2: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/2.jpg)
Contents•Edit Distance
• Left Increment/Decrement Edit Distance Problem
•Related Work
•Our Algorithm
•Experiments
•Summary
![Page 3: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/3.jpg)
Contents•Edit Distance
• Left Increment/Decrement Edit Distance Problem
•Related Work
•Our Algorithm
•Experiments
•Summary
![Page 4: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/4.jpg)
Edit Distance
minimum total cost d for transforming string x[1:n] to y[1:m]
x = prague, y = passage Ins. = Del. = Sub. = 1p r a g u e
⇓ ⇓ ⇓ ⇓p a s s a g e
Edit Distance= Sub. + Ins. + Ins. + Del.= 1+1+1+1= 4
Example
Edit Operation CostInsertion Ins.= δ(ε, b)Deletion Del.= δ(a, ε)Substitution Sub.= δ(a, b)
![Page 5: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/5.jpg)
p r a g u e0 1 2 3 4 5 6
0 0 1 2 3 4 5 6p 1 1 0 1 2 3 4 5a 2 2 1 1 1 2 3 4s 3 3 2 2 2 2 3 4s 4 4 3 3 3 3 3 4a 5 5 4 4 3 4 4 4g 6 6 5 5 4 3 4 5e 7 7 6 6 5 4 4 4
D
Dynamic Programming
a a
b 1 2
a 1 1
)1,1(
),(],[),,(],1[
),,(]1,[
min],[
)0(),(],0[
)0(),(]0,[
1
1
njmi
bajiDajiD
bjiD
jiD
njbjD
miaiD
ji
i
j
i
hh
i
hh
p r a g u e
0 1 2 3 4 5 60 0 1 2 3 4 5 6
p 1 1 0 1 2 3 4 5a 2 2 1 1 1 2 3 4s 3 3 2 2 2 2 3 4s 4 4 3 3 3 3 3 4a 5 5 4 4 3 4 4 4g 6 6 5 5 4 3 4 5e 7 7 6 6 5 4 4 4
D
![Page 6: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/6.jpg)
Contents•Edit Distance
• Left Increment/Decrement Edit Distance Problem
•Related Work
•Our Algorithm
•Experiments
•Summary
![Page 7: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/7.jpg)
p r a g u e p r a g u0 1 2 3 4 5 6 0 1 2 3 4 5
0 0 1 2 3 4 5 6 0 0 1 2 3 4 5p 1 1 0 1 2 3 4 5 p 1 1 0 1 2 3 4a 2 2 1 1 1 2 3 4 a 2 2 1 1 1 2 3s 3 3 2 2 2 2 3 4 s 3 3 2 2 2 2 3s 4 4 3 3 3 3 3 4 s 4 4 3 3 3 3 3a 5 5 4 4 3 4 4 4 a 5 5 4 4 3 4 4g 6 6 5 5 4 3 4 5 g 6 6 5 5 4 3 4e 7 7 6 6 5 4 4 4 e 7 7 6 6 5 4 4
D D'
Right Increment/Decrement•Right I/D of Edit Distance▫ input : D of strings A and B▫output : D’ of strings A and B’ ( B = B’a or Ba
= B’ )▫easy to compute▫ insert or delete right column of D → D’ : O(m)
decrement
increment
![Page 8: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/8.jpg)
Left Increment/Decrement• Left I/D of ED▫ input : D of strings A and B▫output : D of strings A and B’ ( B = aB’ or aB =
B’ )▫difficult to compute
values of left side effect to the values of right sider a g u e p r a g u e
0 2 3 4 5 6 0 1 2 3 4 5 60 0 1 2 3 4 5 0 0 1 2 3 4 5 6
p 1 1 1 2 3 4 5 p 1 1 0 1 2 3 4 5a 2 2 2 1 2 3 4 a 2 2 1 1 1 2 3 4s 3 3 3 2 2 3 4 s 3 3 2 2 2 2 3 4s 4 4 4 3 3 3 4 s 4 4 3 3 3 3 3 4a 5 5 5 4 4 4 4 a 5 5 4 4 3 4 4 4g 6 6 6 5 4 5 5 g 6 6 5 5 4 3 4 5e 7 7 7 6 5 5 5 e 7 7 6 6 5 4 4 4
D' D
decrement
increment
![Page 9: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/9.jpg)
Contribution•Propose an efficient algorithm for Left I/D problem
with any nonnegative integer costs
• Left I/D problem▫ input : ED table D of strings A and B▫output : ED table D’ of strings A and B’
B = aB’ (decrement) B’ = aB (increment)
▫costs of operations are nonnegative integers
![Page 10: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/10.jpg)
Applications•Cyclic String Comparison [Landau et. al 1998]
•Computing Approximate periods [Schmidt 1998]
•Edit distance for sliding window
•String Kernel based on Edit distance▫kernel is mapping to high dimensional feature space
used in Support Vector Machine(classifier)
![Page 11: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/11.jpg)
Contents•Edit Distance
• Left Increment/Decrement Edit Distance Problem
•Related Work
•Our Algorithm
•Experiments
•Summary
![Page 12: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/12.jpg)
Related Work•naïve method▫compute D’ from scratch▫O(nm) time
•Kim & Park algorithm [2004]▫Each operation has cost 1▫Compute difference representation DR of table D
Using Change Table Ch▫O(n+m) time
![Page 13: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/13.jpg)
Definition• Left Increment/Decrement Problem
• input : DR table of string A and B•output : DR’ table of string A and B’▫B = aB’ (decrement)▫B’ = aB (increment)
•Each cost (Ins., Del., Sub.) is a Non Negative Integer▫Kim & Park algorithm : each cost is 1
![Page 14: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/14.jpg)
Difference Representation],1[],[].,[ jiDjiDUjiDR
]1,[],[].,[ jiDjiDLjiDR
under minus upper
right minus left
p r a g u e0 1 2 3 4 5 6
0 0 1 2 3 4 5 6p 1 1 0 1 2 3 4 5a 2 2 1 1 1 2 3 4s 3 3 2 2 2 2 3 4s 4 4 3 3 3 3 3 4a 5 5 4 4 3 4 4 4g 6 6 5 5 4 3 4 5e 7 7 6 6 5 4 4 4
D
p r a g u e0 1 2 3 4 5 6
0p 1 - 1 - 1 - 1 - 1 - 1 - 1a 2 1 0 - 1 - 1 - 1 - 1s 3 1 1 1 0 0 0s 4 1 1 1 1 0 0a 5 1 1 0 1 1 0g 6 1 1 1 - 1 0 1e 7 1 1 1 1 0 - 1
DR.U
p r a g u e0 1 2 3 4 5 6
0p 1 - 1 1 1 1 1 1a 2 - 1 0 0 1 1 1s 3 - 1 0 0 0 1 1s 4 - 1 0 0 0 0 1a 5 - 1 0 - 1 1 0 0g 6 - 1 0 - 1 - 1 1 1e 7 - 1 0 - 1 - 1 0 0
DR.L
![Page 15: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/15.jpg)
DR’ – DR
We need not update all cells
r a g u e p r a g u e r a g u e0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6
0 0 0p 1 0 0 0 0 0 p 1 - 1 - 1 - 1 - 1 - 1 - 1 p 1 1 1 1 1 1a 2 1 - 1 - 1 - 1 - 1 a 2 1 0 - 1 - 1 - 1 - 1 a 2 1 0 0 0 0s 3 1 1 0 0 0 s 3 1 1 1 0 0 0 s 3 0 0 0 0 0s 4 1 1 1 0 0 s 4 1 1 1 1 0 0 s 4 0 0 0 0 0a 5 1 1 1 1 0 a 5 1 1 0 1 1 0 a 5 0 1 0 0 0g 6 1 1 0 1 1 g 6 1 1 1 - 1 0 1 g 6 0 0 1 1 0e 7 1 1 1 0 0 e 7 1 1 1 1 0 - 1 e 7 0 0 0 0 1
r a g u e p r a g u e r a g u e0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6
0 0 0p 1 0 1 1 1 1 p 1 - 1 1 1 1 1 1 p 1 - 1 0 0 0 0a 2 0 - 1 1 1 1 a 2 - 1 0 0 1 1 1 a 2 0 - 1 0 0 0s 3 0 - 1 0 1 1 s 3 - 1 0 0 0 1 1 s 3 0 - 1 0 0 0s 4 0 - 1 0 0 1 s 4 - 1 0 0 0 0 1 s 4 0 - 1 0 0 0a 5 0 - 1 0 0 0 a 5 - 1 0 - 1 1 0 0 a 5 0 0 - 1 0 0g 6 0 - 1 - 1 1 0 g 6 - 1 0 - 1 - 1 1 1 g 6 0 0 0 0 - 1e 7 0 - 1 - 1 0 0 e 7 - 1 0 - 1 - 1 0 0 e 7 0 0 0 0 0
DR'.U
DR'.L
DR.U
DR.L
-
-
=
=
![Page 16: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/16.jpg)
Change Table•Ch[i, j] = D’[i, j] – D[i, j]• cost = 1▫values in Ch : –1, 0, 1▫ is separated into three areas
r a g u e p r a g u e p r a g u e0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6
0 0 1 2 3 4 5 0 0 1 2 3 4 5 6 0 - 1 - 1 - 1 - 1 - 1 - 1p 1 1 1 2 3 4 5 p 1 1 0 1 2 3 4 5 p 1 1 0 0 0 0 0a 2 2 2 1 2 3 4 a 2 2 1 1 1 2 3 4 a 2 1 1 0 0 0 0s 3 3 3 2 2 3 4 s 3 3 2 2 2 2 3 4 s 3 1 1 0 0 0 0s 4 4 4 3 3 3 4 s 4 4 3 3 3 3 3 4 s 4 1 1 0 0 0 0a 5 5 5 4 4 4 4 a 5 5 4 4 3 4 4 4 a 5 1 1 1 0 0 0g 6 6 6 5 4 5 5 g 6 6 5 5 4 3 4 5 g 6 1 1 1 1 1 0e 7 7 7 6 5 5 5 e 7 7 6 6 5 4 4 4 e 7 1 1 1 1 1 1
D' D Ch
- =
![Page 17: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/17.jpg)
Affected Entries•entries where DR’[i, j] ≠ DR[i, j]▫they must be updated▫affected entries are along the borders of three areas in Ch
r a g u e r a g u e r a g u e0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6
0 0 0 -1 -1 -1 -1 -1p 1 1 1 1 1 1 p 1 -1 0 0 0 0 p 1 0 0 0 0 0a 2 1 0 0 0 0 a 2 0 -1 0 0 0 a 2 1 0 0 0 0s 3 0 0 0 0 0 s 3 0 -1 0 0 0 s 3 1 0 0 0 0s 4 0 0 0 0 0 s 4 0 -1 0 0 0 s 4 1 0 0 0 0a 5 0 1 0 0 0 a 5 0 0 -1 0 0 a 5 1 1 0 0 0g 6 0 0 1 1 0 g 6 0 0 0 0 -1 g 6 1 1 1 1 0e 7 0 0 0 0 1 e 7 0 0 0 0 0 e 7 1 1 1 1 1
Ch
DR'.L - DR.LDR'.U - DR.U D' - D
![Page 18: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/18.jpg)
r a g u e0 1 2 3 4 5 6
0 -1 -1 -1 -1 -1p 1 0 0 0 0 0a 2 1 0 0 0 0s 3 1 0 0 0 0s 4 1 0 0 0 0a 5 1 1 0 0 0g 6 1 1 1 1 0e 7 1 1 1 1 1
Ch
Sketch of Kim & Park Algorithm•Update affected entries▫scan borders in Ch, computing Ch and DR’
•Time Complexity : O(n+m)
![Page 19: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/19.jpg)
Contents•Edit Distance
• Left Increment/Decrement Edit Distance Problem
•Related Work
•Our Algorithm
•Experiments
•Summary
![Page 20: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/20.jpg)
General Costs•Ch can be separated into more than three areas▫the number of areas depends on the costs▫the values are not limited to –1, 0, 1
•Kim & Park algorithm ▫ is specialized to the three area case▫can not be applied with general costs
r a g u e0 1 2 3 4 5 6
0 -2 -2 -2 -2 -2p 1 -1 -1 -1 -1 -1a 2 2 -1 -1 -1 -1s 3 2 1 -1 -1 -1s 4 2 1 1 -1 -1a 5 2 2 1 1 -1g 6 2 2 2 1 1e 7 2 2 2 2 1
Ch
Ins. = 2, Del. = 2, Sub. = 1Example
![Page 21: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/21.jpg)
Our Algorithm•Update only affected entries ▫without Ch▫compute only DR’.U and DR’.L
•Time complexity : O(min{c(n+m), nm})▫c is the maximum cost
r a g u e0 1 2 3 4 5 6
0 -2 -2 -2 -2 -2p 1 -1 -1 -1 -1 -1a 2 2 -1 -1 -1 -1s 3 2 1 -1 -1 -1s 4 2 1 1 -1 -1a 5 2 2 1 1 -1g 6 2 2 2 1 1e 7 2 2 2 2 1
Chr a g u e0 1 2 3 4 5 6
0p 1 1 1 1 1 1a 2 3 0 0 0 0s 3 0 2 0 0 0s 4 0 0 2 0 0a 5 0 1 0 2 0g 6 0 0 1 0 2e 7 0 0 0 1 0
r a g u e0 1 2 3 4 5 6
0p 1 - 3 0 0 0 0a 2 0 - 3 0 0 0s 3 0 - 1 - 2 0 0s 4 0 - 1 0 - 2 0a 5 0 0 - 1 0 - 2g 6 0 0 0 - 1 0e 7 0 0 0 0 - 1
DR’.U – DR.U DR’.L – DR.L D’ – D
![Page 22: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/22.jpg)
Affected Entry•DR’[i, j] ≠ DR[i, j]
•Kim & Park Algorithm ▫computes DR’ and Ch for computing Affected Entry
•Our Algorithm▫compute affected entry by only DR table▫use following lemma
DR’[i, j] is Affected Entry ⇔DR’[i–1, j].L ≠ DR[i–1 , j].LorDR’[i, j–1].U ≠ DR[i, j–1].U
![Page 23: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/23.jpg)
comparison of pseudo codesOur Algorithm
1 for i =1 to m do2 prev [⊿ i] = i + 1; DR[i,1].U = δ(ai, ε);3 i = 1; j = 1; DR[0, j].L = δ(ε, bj); currIdx = 1; prevIdx = 1;4 while i ≦ m and j ≦ n do5 while i ≦ m do6 x = DR[i-1, j].L; y = DR[i, j-1].U7 z = min{x+δ(ai,ε), y+δ(ε,bj), δ(ai,bj)}8 new.L = z-y; new.U = z-x;9 old.L = DR[i, j].L; old.U = DR[i, j].U;
10 DR[i, j].L = new.L; DR[i, j].U = new.U;11 if old.U ≠ new.U then12 curr [⊿ currIdx] = i; currIdx = currIdx + 1;13 i = i + 1;14 if old.L = new.L then15 now = i;16 repeat17 i = prev [⊿ prevIdx]; prevIdx = prevIdx + 1;18 until i ≧ now19 curr [⊿ currIdx] = m + 1;20 Interchange the roles of the tables curr and ⊿ prev ;⊿21 currIdx = 1; i = prev [1];⊿ prevIdx = 2; j = j + 1;
Kim & Park Algorithm 1 Let k be the smallest index in A such that A[k] = B[1]2 i-1 = 0; j-1 = 1; i1 = k; j1 = 0; f (0) = 0; g(0) = k;3 finished-1 = false; finished1 = false;4 while ( finished-1 = false) or ( finished1 = false) do5 if i-1 < i1 – 1 then /* case1 */6 if j-1 > j1 + 1 then7 if j-1 > j1+1 then X = -1;8 else X = 0;9 Y = 0;
10 else11 if f (i-1) < j-1 then X = -1;12 else if g( j1) ≦ i-1 then X = 1;13 else X = 0;14 if g( j1) ≦ i-1 + 1 then Y = 1;15 else Y = 0;16 Z = -1;17 Ch[i-1+1, j-1]= min{ -DR[i-1+1, j-1+1].UL + X+δi-1+1,j-1+1, -DR[i-1+1, j-1+1].U+Z+1, -DR[i-1+1, j-1+1].L+Y+1};18 DR’[i-1+1, j-1].U = DR[i-1+1, j-1+1].U – Ch[i-1+1, j-1] + Z;19 DR’[i-1+1, j-1].L = DR[i-1+1, j-1+1].L – Ch[i-1+1, j-1] + Y;20 if Ch[i-1+1, j-1] = -1 then i-1 = i-1 + 1; f (i-1) = j-1;21 else j-1 = j-1 + 1;22 else if j1 < j-1-1 then /* case2 */23 if i1 > i-1 +1 then24 if g( j1) < i1 then X =1;25 else X = 0;26 Y = 0;27 else28 if g( j1) < i1 then X =1;29 else if f (i-1) ≦ j1 then X = 0;30 else X = 0;31 if f ( i1-1) ≦ j1 + 1 then Y=-1;32 else Y = 0;33 Z = 1;34 Ch[i1, j1+1]= min{ -DR[i1, j1+2].UL + X+δi1,j1+2, -DR[i1, j1+2].U+Y+1, -DR[i1, j1+2].L+Z+1};35 DR’[i1, j1+1].U = DR[i1, j-1+2].U – Ch[i1, j1+1] + Y;36 DR’[i1, j1+1].L = DR[i1, j-1+2].L – Ch[i1, j1+1] + Z;37 if Ch[i1, j1+1] = 1 then j1 = j1 + 1; g( j1) = i1;38 else i1 = i1 + 1;39 else /* case3 */40 if f (i-1 < j-1) then X = -1;41 else if g( j1) ≦ i-1 then X = 1;42 else X = 0;43 Y = -1; Z = 1;44 Ch[i-1+1, j-1]= min{ -DR[i-1+1, j-1+1].UL +X+δi-1+1,j-1+1, -DR[i-1+1, j-1+1].U+Y+1, -DR[i-1+1, j-1+1].L+Z+1};45 DR’[i-1+1, j-1].U = DR[i-1+1, j-1+1].U – Ch[i-1+1, j-1] + Y;46 DR’[i-1+1, j-1].L = DR[i-1+1, j-1+1].L – Ch[i-1+1, j-1] + Z;47 if Ch[i-1+1, j-1] = 1 then j-1 = j-1 + 1; j1 = j1 + 1; g( j1) = i1;48 else if Ch[i-1+1, j-1] = 1 then j-1 = j-1 + 1; j1 = j1 + 1; g( j1) = i1;49 else j-1 = j-1 + 1; i1 = i1 + 1;50 if (i-1 = m) or ( j-1 = n) then 51 finished-1 = true;52 if (i1 = m+1) or ( j1 = n-1) then 53 finished1 = true;
![Page 24: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/24.jpg)
comparison of behaviors
0 1 2 3 4 5 6 7 … … … m01234567…
……
n
our algorithm Kim & Park algorithm
0 1 2 3 4 5 6 7 … … … m01234567…
……
n
0 1 2 3 4 5 6 7 … … … m01234567…
……
n
0 1 2 3 4 5 6 7 … … … m01234567…
……
n
![Page 25: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/25.jpg)
Contents•Edit Distance
• Left Increment/Decrement Edit Distance Problem
•Related Work
•Our Algorithm
•Experiments
•Summary
![Page 26: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/26.jpg)
Experiments• strings A[1:m] and B[1:m]▫Total time of computing representations of edit
distance between A and B[ j:m] for j = m, m–1,…, 1 left incremental computation
•Machine Specifications▫CentOS Linux▫Xeon 3.0GhHz▫16GB memory
![Page 27: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/27.jpg)
Experiment 1•Time comparison with naïve algorithm
• costs▫chosen randomly
Insertion = 137, Deletion = 116, Substitution = 242
•Random data▫alphabet size 2,3, …, 52▫string length 100, 200, …, 5000
![Page 28: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/28.jpg)
Result 1
![Page 29: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/29.jpg)
Result 1
![Page 30: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/30.jpg)
Experiment 2•Time comparison with Kim & Park algorithm
• costs▫Insertion = Deletion = Substitution = 1
•Random data▫alphabet size 2, 3, , …, 52▫string length 100, 200, …, 5000
![Page 31: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/31.jpg)
Result 2
![Page 32: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/32.jpg)
Result 2
![Page 33: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/33.jpg)
Experiment 3•TimeCompare with naïve algorithm
•Corpus ▫English(reuters news)
costs Insertion = 137, Deletion = 116, Substitution = 242
string length : 1000, 2000, 3000, 4000, 5000▫Protein data(canterbury corpus: E.coli)
costs proposed in [Kurtz 1996] string length : 1000, 2000, 3000, 4000, 5000
δ ε A C G Tε 0 3 3 3 3A 3 0 2 1 3C 3 2 0 2 1G 3 1 2 0 2T 3 2 1 2 0
![Page 34: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/34.jpg)
Result 3
lengthTime (seconds)
Our algorithm Naïve algorithm
1000 0.04 1.50
2000 0.27 12.0
3000 0.71 40.4
4000 1.36 97.1
5000 2.29 189
lengthTime (seconds)
Our algorithm Naïve algorithm
1000 0.01 1.43
2000 0.09 11.5
3000 0.23 38.8
4000 0.43 92.8
5000 0.70 181
English News Protein Data
![Page 35: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/35.jpg)
Summary•Algorithm for Left I/D problem▫nonnegative integer costs▫O( min{c(n+m), nm} )
c is the maximum cost▫experimentally fast
Our Algorithm Naïve Algorithm Kim & Park Algorithm
Costs Non negative integer Real number 1
Tables to compute DR D DR and ChTime Complexity O( min{c(n+m), nm} ) O(nm) O(n+m)Source code Simple Simple Cumbersome
Speed Fast Very Slow Slower
![Page 36: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/36.jpg)
![Page 37: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/37.jpg)
![Page 38: Dynamic Edit Distance Table under a General Weighted Cost Function](https://reader035.vdocument.in/reader035/viewer/2022062501/56816256550346895dd2a5f5/html5/thumbnails/38.jpg)
Related Work•naïve method▫compute D’ from scratch▫O(nm) time
•Kim & Park algorithm [2004]▫Each operation has cost 1▫Compute difference representation
DR → DR’ Using Change Table Ch
▫O(n+m) time
D
D’
DR, Ch
DR’, Ch
Edit Distance
O(nm)
O(nm)
O(1)
O(n+m)
O(n+m)
naïve Kim & Park