Download - 20121020 semi local-string_comparison_tiskin
![Page 1: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/1.jpg)
Semi-local string comparison:Algorithmic techniques and applications
Alexander Tiskin
Department of Computer ScienceUniversity of Warwick
http://go.warwick.ac.uk/alextiskin
Alexander Tiskin (Warwick) Semi-local string comparison 1 / 132
![Page 2: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/2.jpg)
1 Introduction
2 Matrix distance multiplication
3 Semi-local string comparison
4 The seaweed method
5 Periodic string comparison
6 The transposition networkmethod
7 Sparse string comparison
8 Compressed string comparison
9 Beyond semi-locality
10 Conclusions and future work
Alexander Tiskin (Warwick) Semi-local string comparison 2 / 132
![Page 3: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/3.jpg)
1 Introduction
2 Matrix distance multiplication
3 Semi-local string comparison
4 The seaweed method
5 Periodic string comparison
6 The transposition networkmethod
7 Sparse string comparison
8 Compressed string comparison
9 Beyond semi-locality
10 Conclusions and future work
Alexander Tiskin (Warwick) Semi-local string comparison 3 / 132
![Page 4: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/4.jpg)
Introduction
String matching: finding an exact pattern in a string
String comparison: finding similar patterns in two strings
Applications: computational biology, image recognition, . . .
Standard types of string comparison:
global: whole string vs whole string
local: substrings vs substrings
Main focus of this work:
semi-local: whole string vs substrings; prefixes vs suffixes
Closely related to approximate string matching (no relation toapproximation algorithms!)
Main tool: implicit unit-Monge matrices (a.k.a. seaweed matrices)
Alexander Tiskin (Warwick) Semi-local string comparison 4 / 132
![Page 5: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/5.jpg)
Introduction
String matching: finding an exact pattern in a string
String comparison: finding similar patterns in two strings
Applications: computational biology, image recognition, . . .
Standard types of string comparison:
global: whole string vs whole string
local: substrings vs substrings
Main focus of this work:
semi-local: whole string vs substrings; prefixes vs suffixes
Closely related to approximate string matching (no relation toapproximation algorithms!)
Main tool: implicit unit-Monge matrices (a.k.a. seaweed matrices)
Alexander Tiskin (Warwick) Semi-local string comparison 4 / 132
![Page 6: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/6.jpg)
IntroductionTerminology and notation
x− = x − 12 x+ = x + 1
2
Integers: {. . .− 2,−1, 0, 1, 2, . . .}Half-integers:
{. . .− 3
2 ,−12 ,
12 ,
32 ,
52 , . . .
}={. . . (−2)+, (−1)+, 0+, 1+, 2+
}(i , j)� (i ′, j ′) iff i < i ′ and j < j ′ (i , j) ≷ (i ′, j ′) iff i > i ′ and j < j ′
A permutation matrix is a 0/1 matrix with exactly one nonzero per rowand per column0 1 0
1 0 00 0 1
Alexander Tiskin (Warwick) Semi-local string comparison 5 / 132
![Page 7: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/7.jpg)
IntroductionTerminology and notation
Given matrix D, its distribution matrix is made up of ≷-dominance sums:
Given matrix E , its density matrix is made up of quadrangle differences:E�(ı, ) = E (ı−, +)− E (ı−, −)− E (ı+, +) + E (ı+, −)
where DΣ, E over integers; D, E� over half-integers0 1 01 0 00 0 1
Σ
=
0 1 2 30 1 1 20 0 0 10 0 0 0
0 1 2 30 1 1 20 0 0 10 0 0 0
�
=
0 1 01 0 00 0 1
(DΣ)� = D for all D
Matrix E is simple, if (E�)Σ = E ; equivalently, if it has all zeros in the leftcolumn and bottom row
Alexander Tiskin (Warwick) Semi-local string comparison 6 / 132
![Page 8: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/8.jpg)
IntroductionTerminology and notation
Given matrix D, its distribution matrix is made up of ≷-dominance sums:
Given matrix E , its density matrix is made up of quadrangle differences:E�(ı, ) = E (ı−, +)− E (ı−, −)− E (ı+, +) + E (ı+, −)
where DΣ, E over integers; D, E� over half-integers
0 1 01 0 00 0 1
Σ
=
0 1 2 30 1 1 20 0 0 10 0 0 0
0 1 2 30 1 1 20 0 0 10 0 0 0
�
=
0 1 01 0 00 0 1
(DΣ)� = D for all D
Matrix E is simple, if (E�)Σ = E ; equivalently, if it has all zeros in the leftcolumn and bottom row
Alexander Tiskin (Warwick) Semi-local string comparison 6 / 132
![Page 9: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/9.jpg)
IntroductionTerminology and notation
Given matrix D, its distribution matrix is made up of ≷-dominance sums:
Given matrix E , its density matrix is made up of quadrangle differences:E�(ı, ) = E (ı−, +)− E (ı−, −)− E (ı+, +) + E (ı+, −)
where DΣ, E over integers; D, E� over half-integers0 1 01 0 00 0 1
Σ
=
0 1 2 30 1 1 20 0 0 10 0 0 0
0 1 2 30 1 1 20 0 0 10 0 0 0
�
=
0 1 01 0 00 0 1
(DΣ)� = D for all D
Matrix E is simple, if (E�)Σ = E ; equivalently, if it has all zeros in the leftcolumn and bottom row
Alexander Tiskin (Warwick) Semi-local string comparison 6 / 132
![Page 10: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/10.jpg)
IntroductionTerminology and notation
Given matrix D, its distribution matrix is made up of ≷-dominance sums:
Given matrix E , its density matrix is made up of quadrangle differences:E�(ı, ) = E (ı−, +)− E (ı−, −)− E (ı+, +) + E (ı+, −)
where DΣ, E over integers; D, E� over half-integers0 1 01 0 00 0 1
Σ
=
0 1 2 30 1 1 20 0 0 10 0 0 0
0 1 2 30 1 1 20 0 0 10 0 0 0
�
=
0 1 01 0 00 0 1
(DΣ)� = D for all D
Matrix E is simple, if (E�)Σ = E ; equivalently, if it has all zeros in the leftcolumn and bottom row
Alexander Tiskin (Warwick) Semi-local string comparison 6 / 132
![Page 11: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/11.jpg)
IntroductionTerminology and notation
Matrix E is Monge, if E� is nonnegative
Intuition: boundary-to-boundary distances in a (weighted) planar graph
Matrix E is unit-Monge, if E� is a permutation matrix
Intuition: boundary-to-boundary distances in a grid-like graph
Simple unit-Monge matrix: PΣ, where P is a permutation matrix
Seaweed matrix: P used as an implicit representation of PΣ0 1 01 0 00 0 1
Σ
=
0 1 2 30 1 1 20 0 0 10 0 0 0
Alexander Tiskin (Warwick) Semi-local string comparison 7 / 132
![Page 12: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/12.jpg)
IntroductionTerminology and notation
Matrix E is Monge, if E� is nonnegative
Intuition: boundary-to-boundary distances in a (weighted) planar graph
Matrix E is unit-Monge, if E� is a permutation matrix
Intuition: boundary-to-boundary distances in a grid-like graph
Simple unit-Monge matrix: PΣ, where P is a permutation matrix
Seaweed matrix: P used as an implicit representation of PΣ0 1 01 0 00 0 1
Σ
=
0 1 2 30 1 1 20 0 0 10 0 0 0
Alexander Tiskin (Warwick) Semi-local string comparison 7 / 132
![Page 13: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/13.jpg)
IntroductionImplicit unit-Monge matrices
Efficient PΣ queries: range tree on nonzeros of P [Bentley: 1980]
binary search tree by i-coordinate
under every node, binary search tree by j-coordinate
••••−→ •
•••−→ •
•••
↓
••••−→ •
•••−→ •
•••
↓
••••−→ •
•••−→ •
•••
Alexander Tiskin (Warwick) Semi-local string comparison 8 / 132
![Page 14: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/14.jpg)
IntroductionImplicit unit-Monge matrices
Efficient PΣ queries: (contd.)
Every node of the range tree represents a canonical range (rectangularregion), and stores its nonzero count
Overall, ≤ n log n canonical ranges are non-empty
A PΣ query is equivalent to ≷-dominance counting: how many nonzerosare ≷-dominated by query point?
Answer: sum up nonzero counts in ≤ log2 n disjoint canonical ranges
Total size O(n log n), query time O(log2 n)
There are asymptotically more efficient (but less practical) data structures
Total size O(n), query time O( log n
log log n
)[JaJa+: 2004]
[Chan, Patrascu: 2010]
Alexander Tiskin (Warwick) Semi-local string comparison 9 / 132
![Page 15: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/15.jpg)
IntroductionImplicit unit-Monge matrices
Efficient PΣ queries: (contd.)
Every node of the range tree represents a canonical range (rectangularregion), and stores its nonzero count
Overall, ≤ n log n canonical ranges are non-empty
A PΣ query is equivalent to ≷-dominance counting: how many nonzerosare ≷-dominated by query point?
Answer: sum up nonzero counts in ≤ log2 n disjoint canonical ranges
Total size O(n log n), query time O(log2 n)
There are asymptotically more efficient (but less practical) data structures
Total size O(n), query time O( log n
log log n
)[JaJa+: 2004]
[Chan, Patrascu: 2010]
Alexander Tiskin (Warwick) Semi-local string comparison 9 / 132
![Page 16: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/16.jpg)
1 Introduction
2 Matrix distance multiplication
3 Semi-local string comparison
4 The seaweed method
5 Periodic string comparison
6 The transposition networkmethod
7 Sparse string comparison
8 Compressed string comparison
9 Beyond semi-locality
10 Conclusions and future work
Alexander Tiskin (Warwick) Semi-local string comparison 10 / 132
![Page 17: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/17.jpg)
Matrix distance multiplicationSeaweed braids
Distance algebra (a.k.a (min,+) or tropical algebra):
addition ⊕ given by min
multiplication � given by +
Matrix �-multiplication
A� B = C C (i , k) =⊕
j
(A(i , j)� B(j , k)
)= minj
(A(i , j) + B(j , k)
)Matrix classes closed under �-multiplication (for given n):
general numerical (integer, real) matrices
Monge matrices
simple unit-Monge matrices (!)
Alexander Tiskin (Warwick) Semi-local string comparison 11 / 132
![Page 18: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/18.jpg)
Matrix distance multiplicationSeaweed braids
Recall that simple unit-Monge matrices are represented implicitly bypermutation (seaweed) matrices
Define PA � PB = PC as PΣA � PΣ
B = PΣC
The seaweed monoid Tn:
simple unit-Monge matrices under �equivalently, permutation (seaweed) matrices under �
Also known as the 0-Hecke monoid of the symmetric group H0(Sn)
Alexander Tiskin (Warwick) Semi-local string comparison 12 / 132
![Page 19: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/19.jpg)
Matrix distance multiplicationSeaweed braids
PA � PB = PC can be seen as combing of seaweed braids
•••
••
•PA
•••
••
•PB
••
••
••PC
PA
PB
PC
Alexander Tiskin (Warwick) Semi-local string comparison 13 / 132
![Page 20: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/20.jpg)
Matrix distance multiplicationSeaweed braids
PA � PB = PC can be seen as combing of seaweed braids
•••
••
•PA
•••
••
•PB
••
••
••PC
PA
PB
PC
Alexander Tiskin (Warwick) Semi-local string comparison 13 / 132
![Page 21: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/21.jpg)
Matrix distance multiplicationSeaweed braids
PA � PB = PC can be seen as combing of seaweed braids
•••
••
•PA
•••
••
•PB
••
••
••PC
PA
PB
PC
Alexander Tiskin (Warwick) Semi-local string comparison 13 / 132
![Page 22: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/22.jpg)
Matrix distance multiplicationSeaweed braids
PA � PB = PC can be seen as combing of seaweed braids
•••
••
•PA
•••
••
•PB
••
••
••PC
PA
PB
PC
Alexander Tiskin (Warwick) Semi-local string comparison 13 / 132
![Page 23: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/23.jpg)
Matrix distance multiplicationSeaweed braids
The seaweed monoid Tn:
n! elements (permutations of size n)
n − 1 generators g1, g2, . . . , gn−1 (elementary crossings)
Idempotence:g2i = gi for all i =
Far commutativity:gigj = gjgi j − i > 1 · · · = · · ·
Braid relations:gigjgi = gjgigj j − i = 1 =
Alexander Tiskin (Warwick) Semi-local string comparison 14 / 132
![Page 24: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/24.jpg)
Matrix distance multiplicationSeaweed braids
Identity: 1� x = x
1 =
• · · ·· • · ·· · • ·· · · •
=
Zero: 0� x = 0
0 =
· · · •· · • ·· • · ·• · · ·
=
Alexander Tiskin (Warwick) Semi-local string comparison 15 / 132
![Page 25: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/25.jpg)
Matrix distance multiplicationSeaweed braids
Related structures:
positive braids: far comm; braid relations
braids: gig−1i = 1; far comm; braid relations
Coxeter’s presentation of Sn: g2i = 1; far comm; braid relations
locally free idempotent monoid: idem; far comm [Vershik+: 2000]
Generalisations:
general 0-Hecke monoids [Fomin, Greene: 1998; Buch+: 2008]
Coxeter monoids [Tsaranov: 1990; Richardson, Springer: 1990]
J -trivial monoids [Denton+: 2011]
Alexander Tiskin (Warwick) Semi-local string comparison 16 / 132
![Page 26: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/26.jpg)
Matrix distance multiplicationSeaweed braids
Computation in the seaweed monoid: a confluent rewriting system can beobtained by software (Semigroupe, GAP)
T3: 1, a = g1, b = g2; ab, ba, aba = 0
aa→ a bb → b bab → 0 aba→ 0
T4: 1, a = g1, b = g2, c = g3; ab, ac, ba, bc, cb, aba, abc, acb, bac,bcb, cba, abac , abcb, acba, bacb, bcba, abacb, abcba, bacba, abacba = 0
aa→ abb → b
ca→ accc → c
bab → abacbc → bcb
cbac → bcbaabacba→ 0
Easy to use, but not an efficient algorithm
Alexander Tiskin (Warwick) Semi-local string comparison 17 / 132
![Page 27: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/27.jpg)
Matrix distance multiplicationSeaweed braids
Computation in the seaweed monoid: a confluent rewriting system can beobtained by software (Semigroupe, GAP)
T3: 1, a = g1, b = g2; ab, ba, aba = 0
aa→ a bb → b bab → 0 aba→ 0
T4: 1, a = g1, b = g2, c = g3; ab, ac, ba, bc, cb, aba, abc, acb, bac,bcb, cba, abac , abcb, acba, bacb, bcba, abacb, abcba, bacba, abacba = 0
aa→ abb → b
ca→ accc → c
bab → abacbc → bcb
cbac → bcbaabacba→ 0
Easy to use, but not an efficient algorithm
Alexander Tiskin (Warwick) Semi-local string comparison 17 / 132
![Page 28: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/28.jpg)
Matrix distance multiplicationSeaweed braids
Computation in the seaweed monoid: a confluent rewriting system can beobtained by software (Semigroupe, GAP)
T3: 1, a = g1, b = g2; ab, ba, aba = 0
aa→ a bb → b bab → 0 aba→ 0
T4: 1, a = g1, b = g2, c = g3; ab, ac, ba, bc, cb, aba, abc, acb, bac,bcb, cba, abac , abcb, acba, bacb, bcba, abacb, abcba, bacba, abacba = 0
aa→ abb → b
ca→ accc → c
bab → abacbc → bcb
cbac → bcbaabacba→ 0
Easy to use, but not an efficient algorithm
Alexander Tiskin (Warwick) Semi-local string comparison 17 / 132
![Page 29: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/29.jpg)
Matrix distance multiplicationSeaweed braids
Computation in the seaweed monoid: a confluent rewriting system can beobtained by software (Semigroupe, GAP)
T3: 1, a = g1, b = g2; ab, ba, aba = 0
aa→ a bb → b bab → 0 aba→ 0
T4: 1, a = g1, b = g2, c = g3; ab, ac, ba, bc, cb, aba, abc, acb, bac,bcb, cba, abac , abcb, acba, bacb, bcba, abacb, abcba, bacba, abacba = 0
aa→ abb → b
ca→ accc → c
bab → abacbc → bcb
cbac → bcbaabacba→ 0
Easy to use, but not an efficient algorithm
Alexander Tiskin (Warwick) Semi-local string comparison 17 / 132
![Page 30: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/30.jpg)
Matrix distance multiplicationSeaweed matrix multiplication
The implicit unit-Monge matrix �-multiplication problem
Given permutation matrices PA, PB , compute PC , such thatPΣA � PΣ
B = PΣC (equivalently, PA � PB = PC )
Matrix �-multiplication: running time
type time
general O(n3) standard
O(n3(log log n)3
log2 n
)[Chan: 2007]
Monge O(n2) via [Aggarwal+: 1987]
implicit unit-Monge O(n1.5) [T: 2006]O(n log n) [T: 2010]
Alexander Tiskin (Warwick) Semi-local string comparison 18 / 132
![Page 31: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/31.jpg)
Matrix distance multiplicationSeaweed matrix multiplication
The implicit unit-Monge matrix �-multiplication problem
Given permutation matrices PA, PB , compute PC , such thatPΣA � PΣ
B = PΣC (equivalently, PA � PB = PC )
Matrix �-multiplication: running time
type time
general O(n3) standard
O(n3(log log n)3
log2 n
)[Chan: 2007]
Monge O(n2) via [Aggarwal+: 1987]
implicit unit-Monge O(n1.5) [T: 2006]O(n log n) [T: 2010]
Alexander Tiskin (Warwick) Semi-local string comparison 18 / 132
![Page 32: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/32.jpg)
Matrix distance multiplicationSeaweed matrix multiplication
PA
•••• •
•••• •
•• ••••••••
PB• • ••• ••• • •• •• • •••• • •
PC
?
Alexander Tiskin (Warwick) Semi-local string comparison 19 / 132
![Page 33: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/33.jpg)
Matrix distance multiplicationSeaweed matrix multiplication
PA,lo , PA,hi
•••• •
•••• •
•• ••••••••
PB,lo , PB,hi
• • ••• ••• • •• •• • •••• • •
•••• •
•••• •
• • ••••
• •••PC ,lo + PC ,hi
•••• •
•••• •
• • ••••
• •••
Alexander Tiskin (Warwick) Semi-local string comparison 20 / 132
![Page 34: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/34.jpg)
Matrix distance multiplicationSeaweed matrix multiplication
PA,lo , PA,hi
•••• •
•••• •
•• ••••••••
PB,lo , PB,hi
• • ••• ••• • •• •• • •••• • •
•••• •
•••• •
• • ••••
• •••PC ,lo + PC ,hi
•••• •
•••• •
• • ••••
• •••
Alexander Tiskin (Warwick) Semi-local string comparison 20 / 132
![Page 35: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/35.jpg)
Matrix distance multiplicationSeaweed matrix multiplication
PA,lo , PA,hi
•••• •
•••• •
•• ••••••••
PB,lo , PB,hi
• • ••• ••• • •• •• • •••• • •
•••• •
•••• •
• • ••••
• •••
PC ,lo + PC ,hi
•••• •
•••• •
• • ••••
• •••
Alexander Tiskin (Warwick) Semi-local string comparison 20 / 132
![Page 36: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/36.jpg)
Matrix distance multiplicationSeaweed matrix multiplication
PA,lo , PA,hi
•••• •
•••• •
•• ••••••••
PB,lo , PB,hi
• • ••• ••• • •• •• • •••• • •
•••• •
•••• •
• • ••••
• •••
PC ,lo + PC ,hi
•••• •
•••• •
• • ••••
• •••
Alexander Tiskin (Warwick) Semi-local string comparison 20 / 132
![Page 37: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/37.jpg)
Matrix distance multiplicationSeaweed matrix multiplication
PA,lo , PA,hi
•••• •
•••• •
•• ••••••••
PB,lo , PB,hi
• • ••• ••• • •• •• • •••• • •
PC ,lo + PC ,hi
•••• •
•••• •
• • ••••
• •••
PC
••••
••••
•••
•
•
•••
• •••
Alexander Tiskin (Warwick) Semi-local string comparison 21 / 132
![Page 38: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/38.jpg)
Matrix distance multiplicationSeaweed matrix multiplication
PA,lo , PA,hi
•••• •
•••• •
•• ••••••••
PB,lo , PB,hi
• • ••• ••• • •• •• • •••• • •
PC ,lo + PC ,hi
•••• •
•••• •
• • ••••
• •••
PC
••••
••••
•••
•
•
•••
• •••
Alexander Tiskin (Warwick) Semi-local string comparison 21 / 132
![Page 39: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/39.jpg)
Matrix distance multiplicationSeaweed matrix multiplication
PA
•••• •
•••• •
•• ••••••••
PB• • ••• ••• • •• •• • •••• • •
PC
••••
••••
•••
•
•
•••
• •••
Alexander Tiskin (Warwick) Semi-local string comparison 22 / 132
![Page 40: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/40.jpg)
Matrix distance multiplicationSeaweed matrix multiplication
Implicit unit-Monge matrix �-multiplication: the algorithm
PΣC (i , k) = minj
(PΣA (i , j) + PΣ
B (j , k))
Divide-and-conquer on the range of j
Divide PA horizontally, PB vertically: two subproblems of effective size n/2
PΣA,lo � PΣ
B,lo = PΣC ,lo PΣ
A,hi � PΣB,hi = PΣ
C ,hi
Conquer:
�-low nonzeros of PC ,lo and �-high nonzeros of PC ,hi appear in PC
The remaining nonzeros of PC ,lo and PC ,hi are “wrong”, and need to becorrected to obtain the remaining nonzeros of PC
Correction can be done in time O(n) using the unit-Monge property
Overall time O(n log n)
Alexander Tiskin (Warwick) Semi-local string comparison 23 / 132
![Page 41: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/41.jpg)
Matrix distance multiplicationBruhat order
Comparing permutations by the “degree of sortedness”
Bruhat order
Permutation A is lower (“more sorted”) than permutation B in the Bruhatorder (A � B), if B can be transformed to A by successive pairwise sortingbetween arbitrary pairs of elements.
Permutation matrices: PA � PB , if PB can be transformed to PA bysuccessive submatrix substitution: ( 0 1
1 0 ) ( 1 00 1 )
Alexander Tiskin (Warwick) Semi-local string comparison 24 / 132
![Page 42: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/42.jpg)
Matrix distance multiplicationBruhat order
Bruhat comparability: running time
O(n2) folkloreO(n log n) [T: NEW]
PA � PB iff PΣA ≤ PΣ
B elementwise, time O(n2) folklore
PA � PB iff PRA � PB = IdR , time O(n log n) [T: NEW]
where PR denotes clockwise rotation of matrix P
Alexander Tiskin (Warwick) Semi-local string comparison 25 / 132
![Page 43: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/43.jpg)
1 Introduction
2 Matrix distance multiplication
3 Semi-local string comparison
4 The seaweed method
5 Periodic string comparison
6 The transposition networkmethod
7 Sparse string comparison
8 Compressed string comparison
9 Beyond semi-locality
10 Conclusions and future work
Alexander Tiskin (Warwick) Semi-local string comparison 26 / 132
![Page 44: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/44.jpg)
Semi-local string comparisonSemi-local LCS and edit distance
Consider strings (= sequences) over an alphabet of size σ
Distinguish contiguous substrings and not necessarily contiguoussubsequences
Special cases of substring: prefix, suffix
Notation: strings a, b of length m, n respectively
Assume where necessary: m ≤ n; m, n reasonably close
The longest common subsequence (LCS) score:
length of longest string that is a subsequence of both a and b
equivalently, alignment score, where score(match) = 1 andscore(mismatch) = 0
In biological terms, “loss-free alignment” (unlike “lossy” BLAST)
Alexander Tiskin (Warwick) Semi-local string comparison 27 / 132
![Page 45: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/45.jpg)
Semi-local string comparisonSemi-local LCS and edit distance
Consider strings (= sequences) over an alphabet of size σ
Distinguish contiguous substrings and not necessarily contiguoussubsequences
Special cases of substring: prefix, suffix
Notation: strings a, b of length m, n respectively
Assume where necessary: m ≤ n; m, n reasonably close
The longest common subsequence (LCS) score:
length of longest string that is a subsequence of both a and b
equivalently, alignment score, where score(match) = 1 andscore(mismatch) = 0
In biological terms, “loss-free alignment” (unlike “lossy” BLAST)
Alexander Tiskin (Warwick) Semi-local string comparison 27 / 132
![Page 46: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/46.jpg)
Semi-local string comparisonSemi-local LCS and edit distance
The LCS problem
Give the LCS score for a vs b
LCS: running time
O(mn) [Wagner, Fischer: 1974]O(
mnlog2 n
)σ = O(1) [Masek, Paterson: 1980]
[Crochemore+: 2003]
O(mn(log log n)2
log2 n
)[Paterson, Dancık: 1994]
[Bille, Farach-Colton: 2008]
Running time varies depending on the RAM model version
We assume word-RAM with word size log n (where it matters)
Alexander Tiskin (Warwick) Semi-local string comparison 28 / 132
![Page 47: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/47.jpg)
Semi-local string comparisonSemi-local LCS and edit distance
The LCS problem
Give the LCS score for a vs b
LCS: running time
O(mn) [Wagner, Fischer: 1974]O(
mnlog2 n
)σ = O(1) [Masek, Paterson: 1980]
[Crochemore+: 2003]
O(mn(log log n)2
log2 n
)[Paterson, Dancık: 1994]
[Bille, Farach-Colton: 2008]
Running time varies depending on the RAM model version
We assume word-RAM with word size log n (where it matters)
Alexander Tiskin (Warwick) Semi-local string comparison 28 / 132
![Page 48: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/48.jpg)
Semi-local string comparisonSemi-local LCS and edit distance
LCS on the alignment graph (directed, acyclic)
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A blue = 0red = 1
score(“BAABCBCA”, “BAABCABCABACA”) = len(“BAABCBCA”) = 8
LCS = highest-score path from top-left to bottom-right
Alexander Tiskin (Warwick) Semi-local string comparison 29 / 132
![Page 49: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/49.jpg)
Semi-local string comparisonSemi-local LCS and edit distance
LCS: dynamic programming [WF: 1974]
Sweep cells in any �-compatible order
Cell update: time O(1)
Overall time O(mn)
Alexander Tiskin (Warwick) Semi-local string comparison 30 / 132
![Page 50: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/50.jpg)
Semi-local string comparisonSemi-local LCS and edit distance
‘Begin at the beginning,’ the King saidgravely, ‘and go on till you come to the end:then stop.’
L. Carroll, Alice in Wonderland(The standard approach in dynamicprogramming)
Alexander Tiskin (Warwick) Semi-local string comparison 31 / 132
![Page 51: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/51.jpg)
Semi-local string comparisonSemi-local LCS and edit distance
Sometimes dynamic programming can be run from both ends for extraflexibility
Is there a better, fully flexible alternative (e.g. for comparing compressedstrings, comparing strings dynamically or in parallel, etc.)?
Alexander Tiskin (Warwick) Semi-local string comparison 32 / 132
![Page 52: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/52.jpg)
Semi-local string comparisonSemi-local LCS and edit distance
Sometimes dynamic programming can be run from both ends for extraflexibility
Is there a better, fully flexible alternative (e.g. for comparing compressedstrings, comparing strings dynamically or in parallel, etc.)?
Alexander Tiskin (Warwick) Semi-local string comparison 32 / 132
![Page 53: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/53.jpg)
Semi-local string comparisonSemi-local LCS and edit distance
LCS: micro-block dynamic programming [MP: 1980; BF: 2008]
Sweep cells in micro-blocks, in any �-compatible order
Micro-block size:
t = O(log n) when σ = O(1)
t = O( log n
log log n
)otherwise
Micro-block interface:
O(t) characters, each O(log σ) bits, can be reduced to O(log t) bits
O(t) small integers, each O(1) bits
Micro-block update: time O(1), by precomputing all possible interfaces
Overall time O(
mnlog2 n
)when σ = O(1), O
(mn(log log n)2
log2 n
)otherwise
Alexander Tiskin (Warwick) Semi-local string comparison 33 / 132
![Page 54: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/54.jpg)
Semi-local string comparisonSemi-local LCS and edit distance
The semi-local LCS problem
Give the (implicit) matrix of O((m + n)2
)LCS scores:
string-substring LCS: string a vs every substring of b
prefix-suffix LCS: every prefix of a vs every suffix of b
suffix-prefix LCS: every suffix of a vs every prefix of b
substring-string LCS: every substring of a vs string b
Cf.: dynamic programming gives prefix-prefix LCS
Alexander Tiskin (Warwick) Semi-local string comparison 34 / 132
![Page 55: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/55.jpg)
Semi-local string comparisonSemi-local LCS and edit distance
The semi-local LCS problem
Give the (implicit) matrix of O((m + n)2
)LCS scores:
string-substring LCS: string a vs every substring of b
prefix-suffix LCS: every prefix of a vs every suffix of b
suffix-prefix LCS: every suffix of a vs every prefix of b
substring-string LCS: every substring of a vs string b
Cf.: dynamic programming gives prefix-prefix LCS
Alexander Tiskin (Warwick) Semi-local string comparison 34 / 132
![Page 56: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/56.jpg)
Semi-local string comparisonSemi-local LCS and edit distance
Semi-local LCS on the alignment graph
B
A
A
B
C
B
C
A
B A A B C A B C A B A C AC A B C A B A blue = 0red = 1
score(“BAABCBCA”, “CABCABA”) = len(“ABCBA”) = 5
String-substring LCS: all highest-score top-to-bottom paths
Semi-local LCS: all highest-score boundary-to-boundary paths
Alexander Tiskin (Warwick) Semi-local string comparison 35 / 132
![Page 57: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/57.jpg)
Semi-local string comparisonScore matrices and seaweed matrices
The score matrix H
0 1 2 3 4 5 6 6 7 8 8 8 8 8
-1 0 1 2 3 4 5 5 6 7 7 7 7 7
-2 -1 0 1 2 3 4 4 5 6 6 6 6 7
-3 -2 -1 0 1 2 3 3 4 5 5 6 6 7
-4 -3 -2 -1 0 1 2 2 3 4 4 5 5 6
-5 -4 -3 -2 -1 0 1 2 3 4 4 5 5 6
-6 -5 -4 -3 -2 -1 0 1 2 3 3 4 4 5
-7 -6 -5 -4 -3 -2 -1 0 1 2 2 3 3 4
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 3 4
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3
-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1
-13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0
5
a = “BAABCBCA”
b = “BAABCABCABACA”
H(i , j) = score(a, b〈i : j〉)H(4, 11) = 5
H(i , j) = j − i if i > j
Alexander Tiskin (Warwick) Semi-local string comparison 36 / 132
![Page 58: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/58.jpg)
Semi-local string comparisonScore matrices and seaweed matrices
Semi-local LCS: output representation and running time
size query time
O(n2) O(1) trivial
O(m1/2n) O(log n) string-substring [Alves+: 2003]O(n) O(n) string-substring [Alves+: 2005]O(n log n) O(log2 n) [T: 2006]
. . . or any 2D orthogonal range counting data structure
running time
O(mn2) naiveO(mn) string-substring [Schmidt: 1998; Alves+: 2005]O(mn) [T: 2006]O(
mnlog0.5 n
)[T: 2006]
O(mn(log log n)2
log2 n
)[T: 2007]
Alexander Tiskin (Warwick) Semi-local string comparison 37 / 132
![Page 59: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/59.jpg)
Semi-local string comparisonScore matrices and seaweed matrices
The score matrix H and the seaweed matrix P
H(i , j): the number of matched characters for a vs substring b〈i : j〉j − i − H(i , j): the number of unmatched characters
Properties of matrix j − i − H(i , j):
simple unit-Monge
therefore, = PΣ, where P = −H� is a permutation matrix
P is the seaweed matrix, giving an implicit representation of H
Range tree for P: memory O(n log n), query time O(log2 n)
Alexander Tiskin (Warwick) Semi-local string comparison 38 / 132
![Page 60: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/60.jpg)
Semi-local string comparisonScore matrices and seaweed matrices
The score matrix H and the seaweed matrix P
•
•
•
•
•
0 1 2 3 4 5 6 6 7 8 8 8 8 8
-1 0 1 2 3 4 5 5 6 7 7 7 7 7
-2 -1 0 1 2 3 4 4 5 6 6 6 6 7
-3 -2 -1 0 1 2 3 3 4 5 5 6 6 7
-4 -3 -2 -1 0 1 2 2 3 4 4 5 5 6
-5 -4 -3 -2 -1 0 1 2 3 4 4 5 5 6
-6 -5 -4 -3 -2 -1 0 1 2 3 3 4 4 5
-7 -6 -5 -4 -3 -2 -1 0 1 2 2 3 3 4
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 3 4
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3
-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1
-13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0
5
a = “BAABCBCA”
b = “BAABCABCABACA”
H(i , j) = score(a, b〈i : j〉)H(4, 11) = 5
H(i , j) = j − i if i > j
blue: difference in H is 0
red : difference in H is 1
green: P(i , j) = 1
H(i , j) = j − i − PΣ(i , j)
Alexander Tiskin (Warwick) Semi-local string comparison 39 / 132
![Page 61: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/61.jpg)
Semi-local string comparisonScore matrices and seaweed matrices
The score matrix H and the seaweed matrix P
•
•
•
•
•
0 1 2 3 4 5 6 6 7 8 8 8 8 8
-1 0 1 2 3 4 5 5 6 7 7 7 7 7
-2 -1 0 1 2 3 4 4 5 6 6 6 6 7
-3 -2 -1 0 1 2 3 3 4 5 5 6 6 7
-4 -3 -2 -1 0 1 2 2 3 4 4 5 5 6
-5 -4 -3 -2 -1 0 1 2 3 4 4 5 5 6
-6 -5 -4 -3 -2 -1 0 1 2 3 3 4 4 5
-7 -6 -5 -4 -3 -2 -1 0 1 2 2 3 3 4
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 3 4
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3
-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1
-13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0
5
a = “BAABCBCA”
b = “BAABCABCABACA”
H(i , j) = score(a, b〈i : j〉)H(4, 11) = 5
H(i , j) = j − i if i > j
blue: difference in H is 0
red : difference in H is 1
green: P(i , j) = 1
H(i , j) = j − i − PΣ(i , j)
Alexander Tiskin (Warwick) Semi-local string comparison 39 / 132
![Page 62: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/62.jpg)
Semi-local string comparisonScore matrices and seaweed matrices
The score matrix H and the seaweed matrix P
•
•
•
•
•
0 1 2 3 4 5 6 6 7 8 8 8 8 8
-1 0 1 2 3 4 5 5 6 7 7 7 7 7
-2 -1 0 1 2 3 4 4 5 6 6 6 6 7
-3 -2 -1 0 1 2 3 3 4 5 5 6 6 7
-4 -3 -2 -1 0 1 2 2 3 4 4 5 5 6
-5 -4 -3 -2 -1 0 1 2 3 4 4 5 5 6
-6 -5 -4 -3 -2 -1 0 1 2 3 3 4 4 5
-7 -6 -5 -4 -3 -2 -1 0 1 2 2 3 3 4
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 3 4
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3
-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1
-13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0
5
a = “BAABCBCA”
b = “BAABCABCABACA”
H(i , j) = score(a, b〈i : j〉)H(4, 11) = 5
H(i , j) = j − i if i > j
blue: difference in H is 0
red : difference in H is 1
green: P(i , j) = 1
H(i , j) = j − i − PΣ(i , j)
Alexander Tiskin (Warwick) Semi-local string comparison 39 / 132
![Page 63: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/63.jpg)
Semi-local string comparisonScore matrices and seaweed matrices
The score matrix H and the seaweed matrix P
•
•
•
•
•
0 1 2 3 4 5 6 6 7 8 8 8 8 8
-1 0 1 2 3 4 5 5 6 7 7 7 7 7
-2 -1 0 1 2 3 4 4 5 6 6 6 6 7
-3 -2 -1 0 1 2 3 3 4 5 5 6 6 7
-4 -3 -2 -1 0 1 2 2 3 4 4 5 5 6
-5 -4 -3 -2 -1 0 1 2 3 4 4 5 5 6
-6 -5 -4 -3 -2 -1 0 1 2 3 3 4 4 5
-7 -6 -5 -4 -3 -2 -1 0 1 2 2 3 3 4
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 3 4
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3
-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1
-13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0
5
a = “BAABCBCA”
b = “BAABCABCABACA”
H(4, 11) =11− 4− PΣ(i , j) =11− 4− 2 = 5
Alexander Tiskin (Warwick) Semi-local string comparison 40 / 132
![Page 64: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/64.jpg)
Semi-local string comparisonScore matrices and seaweed matrices
The seaweed braid in the alignment graph
B
A
A
B
C
B
C
A
B A A B C A B C A B A C AC A B C A B A a = “BAABCBCA”
b = “BAABCABCABACA”
H(4, 11) =11− 4− PΣ(i , j) =11− 4− 2 = 5
P(i , j) = 1 corresponds to seaweed top i bottom j
Also define top right, left right, left bottom seaweeds
Gives bijection between top-left and bottom-right graph boundaries
Alexander Tiskin (Warwick) Semi-local string comparison 41 / 132
![Page 65: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/65.jpg)
Semi-local string comparisonScore matrices and seaweed matrices
The seaweed braid in the alignment graph
B
A
A
B
C
B
C
A
B A A B C A B C A B A C AC A B C A B A a = “BAABCBCA”
b = “BAABCABCABACA”
H(4, 11) =11− 4− PΣ(i , j) =11− 4− 2 = 5
P(i , j) = 1 corresponds to seaweed top i bottom j
Also define top right, left right, left bottom seaweeds
Gives bijection between top-left and bottom-right graph boundaries
Alexander Tiskin (Warwick) Semi-local string comparison 41 / 132
![Page 66: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/66.jpg)
Semi-local string comparisonScore matrices and seaweed matrices
Seaweed braid: a highly symmetric object (element of the 0-Hecke monoidof the symmetric group)
Can be built recursively by assembling subbraids from separate parts
Highly flexible: local alignment, compression, parallel computation. . .
Alexander Tiskin (Warwick) Semi-local string comparison 42 / 132
![Page 67: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/67.jpg)
Semi-local string comparisonWeighted alignment
The LCS problem is a special case of the weighted alignment scoreproblem with weighted matches (wM), mismatches (wX ) and gaps (wG)
LCS score: wM = 1, wX = wG = 0
Levenshtein score: wM = 2, wX = 1, wG = 0
Alignment score is rational, if wM , wX , wG are rational numbers
Equivalent to LCS score on blown-up strings
Edit distance: minimum cost to transform a into b by weighted characteredits (insertion, deletion, substitution)
Corresponds to weighted alignment score with wM = 0, insertion/deletionweight −wG , substitution weight −wX
Alexander Tiskin (Warwick) Semi-local string comparison 43 / 132
![Page 68: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/68.jpg)
Semi-local string comparisonWeighted alignment
The LCS problem is a special case of the weighted alignment scoreproblem with weighted matches (wM), mismatches (wX ) and gaps (wG)
LCS score: wM = 1, wX = wG = 0
Levenshtein score: wM = 2, wX = 1, wG = 0
Alignment score is rational, if wM , wX , wG are rational numbers
Equivalent to LCS score on blown-up strings
Edit distance: minimum cost to transform a into b by weighted characteredits (insertion, deletion, substitution)
Corresponds to weighted alignment score with wM = 0, insertion/deletionweight −wG , substitution weight −wX
Alexander Tiskin (Warwick) Semi-local string comparison 43 / 132
![Page 69: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/69.jpg)
Semi-local string comparisonWeighted alignment
The LCS problem is a special case of the weighted alignment scoreproblem with weighted matches (wM), mismatches (wX ) and gaps (wG)
LCS score: wM = 1, wX = wG = 0
Levenshtein score: wM = 2, wX = 1, wG = 0
Alignment score is rational, if wM , wX , wG are rational numbers
Equivalent to LCS score on blown-up strings
Edit distance: minimum cost to transform a into b by weighted characteredits (insertion, deletion, substitution)
Corresponds to weighted alignment score with wM = 0, insertion/deletionweight −wG , substitution weight −wX
Alexander Tiskin (Warwick) Semi-local string comparison 43 / 132
![Page 70: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/70.jpg)
Semi-local string comparisonWeighted alignment
Weighted alignment graph
B
A
A
B
C
B
C
A
B A A B C A B C A B A C AC A B C A B A blue = 0red (solid) = 2
red (dotted) = 1
Levenshtein score(“BAABCBCA”, “CABCABA”) = 11
Alexander Tiskin (Warwick) Semi-local string comparison 44 / 132
![Page 71: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/71.jpg)
Semi-local string comparisonWeighted alignment
Alignment graph for blown-up strings
$B
$A
$A
$B
$C
$B
$C
$A
$B $A $A $B $C $A $B $C $A $B $A $C $A$C$A$B$C$A$B$A blue = 0red = 0.5 or 1
Levenshtein score(“BAABCBCA”, “CABCABA”) = 2 · 5.5
Alexander Tiskin (Warwick) Semi-local string comparison 45 / 132
![Page 72: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/72.jpg)
Semi-local string comparisonWeighted alignment
Rational-weighted semi-local alignment reduced to semi-local LCS
$B
$A
$A
$B
$C
$B
$C
$A
$B $A $A $B $C $A $B $C $A $B $A $C $A$C$A$B$C$A$B$A
Let wM = 1, wX = µν , wG = 0
Increase × ν2 in complexity (can be reduced to ν)
Alexander Tiskin (Warwick) Semi-local string comparison 46 / 132
![Page 73: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/73.jpg)
Alexander Tiskin (Warwick) Semi-local string comparison 47 / 132
![Page 74: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/74.jpg)
1 Introduction
2 Matrix distance multiplication
3 Semi-local string comparison
4 The seaweed method
5 Periodic string comparison
6 The transposition networkmethod
7 Sparse string comparison
8 Compressed string comparison
9 Beyond semi-locality
10 Conclusions and future work
Alexander Tiskin (Warwick) Semi-local string comparison 48 / 132
![Page 75: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/75.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 49 / 132
![Page 76: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/76.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 77: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/77.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 78: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/78.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 79: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/79.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 80: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/80.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 81: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/81.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 82: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/82.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 83: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/83.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 84: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/84.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 85: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/85.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 86: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/86.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 87: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/87.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 88: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/88.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 89: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/89.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 90: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/90.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 91: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/91.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 92: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/92.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 93: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/93.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 94: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/94.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 95: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/95.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 96: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/96.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 97: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/97.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 98: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/98.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 99: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/99.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 100: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/100.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 101: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/101.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 102: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/102.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 103: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/103.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 104: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/104.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 105: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/105.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 106: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/106.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 107: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/107.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 108: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/108.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 109: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/109.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 110: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/110.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 111: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/111.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 112: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/112.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
![Page 113: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/113.jpg)
The seaweed methodSeaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 51 / 132
![Page 114: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/114.jpg)
The seaweed methodSeaweed combing
Semi-local LCS: seaweed combing [T: 2006]
Initialise seaweed braid: crossings in all mismatch cells
Sweep cells in any �-compatible order
Match cell: two seaweeds uncrossed; skip
Mismatch cell: two seaweeds cross
if the same seaweeds crossed before, uncross them
otherwise skip, keep seaweeds crossed
Cell update: time O(1)
Overall time O(mn)
Alexander Tiskin (Warwick) Semi-local string comparison 52 / 132
![Page 115: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/115.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 53 / 132
![Page 116: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/116.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
![Page 117: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/117.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
![Page 118: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/118.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
![Page 119: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/119.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
![Page 120: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/120.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
![Page 121: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/121.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
![Page 122: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/122.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
![Page 123: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/123.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
![Page 124: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/124.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
![Page 125: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/125.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
![Page 126: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/126.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
![Page 127: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/127.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
![Page 128: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/128.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
![Page 129: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/129.jpg)
The seaweed methodMicro-block seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 55 / 132
![Page 130: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/130.jpg)
The seaweed methodMicro-block seaweed combing
Semi-local LCS: micro-block seaweed combing [T: 2007]
Initialise seaweed braid: crossings in all mismatch cells
Sweep cells in micro-blocks, in any �-compatible order
Micro-block size: t = O( log n
log log n
)Micro-block interface:
O(t) characters, each O(log σ) bits, can be reduced to O(log t) bits
O(t) integers, each O(log n) bits, can be reduced to O(log t) bits
Micro-block update: time O(1), by precomputing all possible interfaces
Overall time O(mn(log log n)2
log2 n
)
Alexander Tiskin (Warwick) Semi-local string comparison 56 / 132
![Page 131: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/131.jpg)
The seaweed methodCyclic LCS
The cyclic LCS problem
Give the maximum LCS score for a vs all cyclic rotations of b
Cyclic LCS: running time
O(mn2
log n
)naive
O(mn logm) [Maes: 1990]O(mn) [Bunke, Buhler: 1993; Landau+: 1998; Schmidt: 1998]
O(mn(log log n)2
log2 n
)[T: 2007]
Alexander Tiskin (Warwick) Semi-local string comparison 57 / 132
![Page 132: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/132.jpg)
The seaweed methodCyclic LCS
The cyclic LCS problem
Give the maximum LCS score for a vs all cyclic rotations of b
Cyclic LCS: running time
O(mn2
log n
)naive
O(mn logm) [Maes: 1990]O(mn) [Bunke, Buhler: 1993; Landau+: 1998; Schmidt: 1998]
O(mn(log log n)2
log2 n
)[T: 2007]
Alexander Tiskin (Warwick) Semi-local string comparison 57 / 132
![Page 133: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/133.jpg)
The seaweed methodCyclic LCS
Cyclic LCS: the algorithm
Micro-block seaweed combing on a vs bb, time O(mn(log log n)2
log2 n
)Make n string-substring LCS queries, time negligible
Alexander Tiskin (Warwick) Semi-local string comparison 58 / 132
![Page 134: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/134.jpg)
The seaweed methodLongest repeating subsequence
The longest repeating subsequence problem
Find the longest subsequence of a that is a square (a repetition of twoidentical strings)
Longest repeating subsequence: running time
O(m3) naiveO(m2) [Kosowski: 2004]
O(m2(log log m)2
log2 m
)[T: 2007]
Alexander Tiskin (Warwick) Semi-local string comparison 59 / 132
![Page 135: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/135.jpg)
The seaweed methodLongest repeating subsequence
The longest repeating subsequence problem
Find the longest subsequence of a that is a square (a repetition of twoidentical strings)
Longest repeating subsequence: running time
O(m3) naiveO(m2) [Kosowski: 2004]
O(m2(log log m)2
log2 m
)[T: 2007]
Alexander Tiskin (Warwick) Semi-local string comparison 59 / 132
![Page 136: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/136.jpg)
The seaweed methodLongest repeating subsequence
Longest repeating subsequence: the algorithm
Micro-block seaweed combing on a vs a, time O(m2(log log m)2
log2 m
)Make m − 1 suffix-prefix LCS queries, time negligible
Alexander Tiskin (Warwick) Semi-local string comparison 60 / 132
![Page 137: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/137.jpg)
The seaweed methodApproximate matching
The approximate pattern matching problem
Give the substring closest to a by alignment score, starting at eachposition in b
Assume rational alignment score
Approximate pattern matching: running time
O(mn) [Sellers: 1980]O(
mnlog n
)σ = O(1) via [Masek, Paterson: 1980]
O(mn(log log n)2
log2 n
)via [Bille, Farach-Colton: 2008]
Alexander Tiskin (Warwick) Semi-local string comparison 61 / 132
![Page 138: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/138.jpg)
The seaweed methodApproximate matching
Approximate pattern matching: the algorithm
Micro-block seaweed combing on a vs b (with blow-up), time
O(mn(log log n)2
log2 n
)The implicit semi-local edit score matrix:
an anti-Monge matrix
approximate pattern matching ∼ row minima
Row minima in O(n) element queries [Aggarwal+: 1987]
Each query in time O(log2 n) using the range tree representation,combined query time negligible
Overall running time O(mn(log log n)2
log2 n
), same as [Bille, Farach-Colton: 2008]
Alexander Tiskin (Warwick) Semi-local string comparison 62 / 132
![Page 139: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/139.jpg)
Alexander Tiskin (Warwick) Semi-local string comparison 63 / 132
![Page 140: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/140.jpg)
1 Introduction
2 Matrix distance multiplication
3 Semi-local string comparison
4 The seaweed method
5 Periodic string comparison
6 The transposition networkmethod
7 Sparse string comparison
8 Compressed string comparison
9 Beyond semi-locality
10 Conclusions and future work
Alexander Tiskin (Warwick) Semi-local string comparison 64 / 132
![Page 141: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/141.jpg)
Periodic string comparisonWraparound seaweed combing
The periodic string-substring LCS problem
Give (implicit) LCS scores for a vs each substring of b = . . . uuu . . . = u±∞
Let u be of length p
May assume that every character of a occurs in u
Only substrings of b of length at most mp (otherwise LCS score is m)
Alexander Tiskin (Warwick) Semi-local string comparison 65 / 132
![Page 142: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/142.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 66 / 132
![Page 143: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/143.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 144: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/144.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 145: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/145.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 146: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/146.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 147: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/147.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 148: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/148.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 149: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/149.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 150: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/150.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 151: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/151.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 152: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/152.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 153: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/153.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 154: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/154.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 155: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/155.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 156: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/156.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 157: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/157.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 158: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/158.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 159: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/159.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 160: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/160.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 161: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/161.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 162: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/162.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 163: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/163.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 164: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/164.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 165: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/165.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 166: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/166.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 167: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/167.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 168: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/168.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 169: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/169.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 170: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/170.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 171: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/171.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 172: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/172.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 173: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/173.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 174: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/174.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 175: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/175.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 176: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/176.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 177: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/177.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 178: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/178.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 179: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/179.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 180: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/180.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 181: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/181.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 182: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/182.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
![Page 183: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/183.jpg)
Periodic string comparisonWraparound seaweed combing
B
A
A
B
C
B
C
A
B A A B C A B C A B A C A
Alexander Tiskin (Warwick) Semi-local string comparison 68 / 132
![Page 184: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/184.jpg)
Periodic string comparisonWraparound seaweed combing
Periodic string-substring LCS: Wraparound seaweed combing
Initialise seaweed braid: crossings in all mismatch cells
Sweep cells row-by-row: each row starts at match cell, wraps at boundary
Match cell: two seaweeds uncrossed; skip
Mismatch cell: two seaweeds cross
if the same seaweeds crossed before (with wrapping), uncross them
otherwise skip, keep seaweeds crossed
Cell update: time O(1)
Overall time O(mn)
String-substring LCS score: count seaweeds with multiplicities
Alexander Tiskin (Warwick) Semi-local string comparison 69 / 132
![Page 185: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/185.jpg)
Periodic string comparisonWraparound seaweed combing
The tandem LCS problem
Give LCS score for a vs b = uk
We have n = kp; may assume k ≤ m
Tandem LCS: running time
O(mkp) naiveO(m(k + p)
)[Landau, Ziv-Ukelson: 2001]
O(mp) [T: 2009]
Direct application of wraparound seaweed combing
Alexander Tiskin (Warwick) Semi-local string comparison 70 / 132
![Page 186: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/186.jpg)
Periodic string comparisonWraparound seaweed combing
The tandem alignment problem
Give the substring closest to a by alignment score among certainsubstrings of b = u±∞:
global: substrings uk of length kp across all k
cyclic: substrings of length kp across all k
local: substrings of any length
Tandem alignment: running time
O(m2p) all naive
O(mp) global [Myers, Miller: 1989]
O(mp log p) cyclic [Benson: 2005]O(mp) cyclic [T: 2009]
O(mp) local [Myers, Miller: 1989]
Alexander Tiskin (Warwick) Semi-local string comparison 71 / 132
![Page 187: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/187.jpg)
Periodic string comparisonWraparound seaweed combing
Cyclic tandem alignment: the algorithm
Periodic seaweed combing for a vs b (with blow-up), time O(mp)
For each k ∈ [1 : m]:
solve tandem LCS (under given alignment score) for a vs uk
obtain scores for a vs p successive substrings of b of length kp by LCSbatch query: time O(1) per substring
Running time O(mp)
Alexander Tiskin (Warwick) Semi-local string comparison 72 / 132
![Page 188: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/188.jpg)
Alexander Tiskin (Warwick) Semi-local string comparison 73 / 132
![Page 189: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/189.jpg)
1 Introduction
2 Matrix distance multiplication
3 Semi-local string comparison
4 The seaweed method
5 Periodic string comparison
6 The transposition networkmethod
7 Sparse string comparison
8 Compressed string comparison
9 Beyond semi-locality
10 Conclusions and future work
Alexander Tiskin (Warwick) Semi-local string comparison 74 / 132
![Page 190: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/190.jpg)
The transposition network methodTransposition networks
Comparison network: a circuit of comparators
A comparator sorts two inputs and outputs them in prescribed order
Comparison networks traditionally used for non-branching merging/sorting
Classical comparison networks
# comparators
merging O(n log n) [Batcher: 1968]
sorting O(n log2 n) [Batcher: 1968]O(n log n) [Ajtai+: 1983]
Comparison networks are visualised by wire diagrams
Transposition network: all comparisons are between adjacent wires
Alexander Tiskin (Warwick) Semi-local string comparison 75 / 132
![Page 191: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/191.jpg)
The transposition network methodTransposition networks
Comparison network: a circuit of comparators
A comparator sorts two inputs and outputs them in prescribed order
Comparison networks traditionally used for non-branching merging/sorting
Classical comparison networks
# comparators
merging O(n log n) [Batcher: 1968]
sorting O(n log2 n) [Batcher: 1968]O(n log n) [Ajtai+: 1983]
Comparison networks are visualised by wire diagrams
Transposition network: all comparisons are between adjacent wires
Alexander Tiskin (Warwick) Semi-local string comparison 75 / 132
![Page 192: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/192.jpg)
The transposition network methodTransposition networks
Seaweed combing as a transposition network
A
C
B
C
A B C A
+7
+1
+5
+5
+3
+7
+1
−5
−1
−3
−3
+3
−5
−1
−7
−7
Character mismatches correspond to comparators
Inputs anti-sorted (sorted in reverse); each value traces a seaweed
Alexander Tiskin (Warwick) Semi-local string comparison 76 / 132
![Page 193: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/193.jpg)
The transposition network methodTransposition networks
Global LCS: transposition network with binary input
A
C
B
C
A B C A
1
1
1
1
1
1
1
0
0
0
0
1
0
0
0
0
Inputs still anti-sorted, but may not be distinct
Comparison between equal values is indeterminate
Alexander Tiskin (Warwick) Semi-local string comparison 77 / 132
![Page 194: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/194.jpg)
The transposition network methodParameterised string comparison
Parameterised string comparison
String comparison sensitive e.g. to
low similarity: small λ = LCS(a, b)
high similarity: small κ = distLCS(a, b) = m + n − 2λ
Can also use weighted alignment score or edit distance
Assume m = n, therefore κ = 2(n − λ)
Alexander Tiskin (Warwick) Semi-local string comparison 78 / 132
![Page 195: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/195.jpg)
The transposition network methodParameterised string comparison
Low-similarity comparison: small λ
sparse set of matches, may need to look at them all
preprocess matches for fast searching, time O(n log σ)
High-similarity comparison: small κ
set of matches may be dense, but only need to look at small subset
no need to preprocess, linear search is OK
Flexible comparison: sensitive to both high and low similarity, e.g. by bothcomparison types running alongside each other
Alexander Tiskin (Warwick) Semi-local string comparison 79 / 132
![Page 196: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/196.jpg)
The transposition network methodParameterised string comparison
Parameterised string comparison: running time
Low-similarity, after preprocessing in O(n log σ)
O(nλ) [Hirschberg: 1977][Apostolico, Guerra: 1985]
[Apostolico+: 1992]
High-similarity, no preprocessing
O(n · κ) [Ukkonen: 1985][Myers: 1986]
Flexible
O(λ · κ · log n) no preproc [Myers: 1986; Wu+: 1990]O(λ · κ) after preproc [Rick: 1995]
Alexander Tiskin (Warwick) Semi-local string comparison 80 / 132
![Page 197: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/197.jpg)
The transposition network methodParameterised string comparison
Parameterised string comparison: the waterfall algorithm
Low-similarity: O(n · λ)0 0 0 0 0 0 0 0
0 0 1 1 0 0 1 0
1
1
1
1
1
1
1
1
0
1
0
1
1
0
1
1
High-similarity: O(n · κ)0 0 0 0 0 0 0 0
1 1 1 0 1 1 0 0
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
0
Trace 0s through network in contiguous blocks and gaps
Alexander Tiskin (Warwick) Semi-local string comparison 81 / 132
![Page 198: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/198.jpg)
The transposition network methodDynamic string comparison
The dynamic LCS problem
Maintain current LCS score under updates to one or both input strings
Both input strings are streams, updated on-line:
appending characters at left or right
deleting characters at left or right
Assume for simplicity m ≈ n, i.e. m = Θ(n)
Goal: linear time per update
O(n) per update of a (n = |b|)O(m) per update of b (m = |a|)
Alexander Tiskin (Warwick) Semi-local string comparison 82 / 132
![Page 199: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/199.jpg)
The transposition network methodDynamic string comparison
Dynamic LCS in linear time: update models
left right
– app+del standard DP [Wagner, Fischer: 1974]app app a fixed [Landau+: 1998], [Kim, Park: 2004]app app [Ishida+: 2005]app+del app+del [T: NEW]
Main idea:
for append only, maintain seaweed matrix Pa,b
for append+delete, maintain partial seaweed layout by tracing atransposition network
Alexander Tiskin (Warwick) Semi-local string comparison 83 / 132
![Page 200: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/200.jpg)
The transposition network methodBit-parallel string comparison
Bit-parallel string comparison
String comparison using standard instructions on words of size w
Bit-parallel string comparison: running time
O(mn/w) [Allison, Dix: 1986; Myers: 1999; Crochemore+: 2001]
Alexander Tiskin (Warwick) Semi-local string comparison 84 / 132
![Page 201: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/201.jpg)
The transposition network methodBit-parallel string comparison
Bit-parallel string comparison: binary transposition network
In every cell: input bits s, c ; output bits s ′, c ′; match/mismatch flag µ
µ
s
c
s′
c ′
s 0 1 0 1 0 1 0 1c 0 0 1 1 0 0 1 1µ 0 0 0 0 1 1 1 1
s′ 0 1 1 1 0 0 1 1c ′ 0 0 0 1 0 1 0 1
+
∧µ
s
c
s′
c ′
s 0 1 0 1 0 1 0 1c 0 0 1 1 0 0 1 1µ 0 0 0 0 1 1 1 1
s′ 0 1 1 0 0 0 1 1c ′ 0 0 0 1 0 1 0 1
2c + s ← (s + (s ∧ µ) + c) ∨ (s ∧ ¬µ)
S ← (S + (S ∧M)) ∨ (S ∧ ¬M), where S , M are words of bits s, µ
Alexander Tiskin (Warwick) Semi-local string comparison 85 / 132
![Page 202: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/202.jpg)
The transposition network methodBit-parallel string comparison
Bit-parallel string comparison: binary transposition network
In every cell: input bits s, c ; output bits s ′, c ′; match/mismatch flag µ
µ
s
c
s′
c ′
s 0 1 0 1 0 1 0 1c 0 0 1 1 0 0 1 1µ 0 0 0 0 1 1 1 1
s′ 0 1 1 1 0 0 1 1c ′ 0 0 0 1 0 1 0 1
+
∧µ
s
c
s′
c ′
s 0 1 0 1 0 1 0 1c 0 0 1 1 0 0 1 1µ 0 0 0 0 1 1 1 1
s′ 0 1 1 0 0 0 1 1c ′ 0 0 0 1 0 1 0 1
2c + s ← (s + (s ∧ µ) + c) ∨ (s ∧ ¬µ)
S ← (S + (S ∧M)) ∨ (S ∧ ¬M), where S , M are words of bits s, µ
Alexander Tiskin (Warwick) Semi-local string comparison 85 / 132
![Page 203: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/203.jpg)
Alexander Tiskin (Warwick) Semi-local string comparison 86 / 132
![Page 204: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/204.jpg)
1 Introduction
2 Matrix distance multiplication
3 Semi-local string comparison
4 The seaweed method
5 Periodic string comparison
6 The transposition networkmethod
7 Sparse string comparison
8 Compressed string comparison
9 Beyond semi-locality
10 Conclusions and future work
Alexander Tiskin (Warwick) Semi-local string comparison 87 / 132
![Page 205: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/205.jpg)
Sparse string comparisonSemi-local LCS between permutations
The LCS problem on permutation strings
Give LCS score for a vs bIn each of a, b all characters distinct: total m = n matchesEquivalent to
longest increasing subsequence (LIS) in a string
maximum clique in a permutation graph
maximum planar matching in an embedded bipartite graph
LCS on permutation strings: running time
O(n log n) implicit in [Erdos, Szekeres: 1935][Robinson: 1938; Knuth: 1970; Dijkstra: 1980]
O(n log log n) unit-RAM [Chang, Wang: 1992][Bespamyatnikh, Segal: 2000]
Alexander Tiskin (Warwick) Semi-local string comparison 88 / 132
![Page 206: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/206.jpg)
Sparse string comparisonSemi-local LCS between permutations
The semi-local LCS problem on permutation strings
Give semi-local LCS scores of a vs bIn each of a, b all characters distinct: total m = n matchesEquivalent to
longest increasing subsequence (LIS) in every substring of a string
Semi-local LCS on permutation strings: running time
O(n2 log n) naiveO(n2) restricted [Albert+: 2003; Chen+: 2005]O(n1.5 log n) randomised, restricted [Albert+: 2007]O(n1.5) [T: 2006]O(n log2 n) [T: NEW]
Alexander Tiskin (Warwick) Semi-local string comparison 89 / 132
![Page 207: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/207.jpg)
Sparse string comparisonSemi-local LCS between permutations
C
F
A
E
D
H
G
B
D E H C B A F G
Alexander Tiskin (Warwick) Semi-local string comparison 90 / 132
![Page 208: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/208.jpg)
Sparse string comparisonSemi-local LCS between permutations
C
F
A
E
D
H
G
B
D E H C B A F G
Alexander Tiskin (Warwick) Semi-local string comparison 90 / 132
![Page 209: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/209.jpg)
Sparse string comparisonSemi-local LCS between permutations
C
F
A
E
D
H
G
B
D E H C B A F G
Alexander Tiskin (Warwick) Semi-local string comparison 91 / 132
![Page 210: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/210.jpg)
Sparse string comparisonSemi-local LCS between permutations
C
F
A
E
D
H
G
B
D E H C B A F G
Alexander Tiskin (Warwick) Semi-local string comparison 91 / 132
![Page 211: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/211.jpg)
Sparse string comparisonSemi-local LCS between permutations
C
F
A
E
D
H
G
B
D E H C B A F G
Alexander Tiskin (Warwick) Semi-local string comparison 92 / 132
![Page 212: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/212.jpg)
Sparse string comparisonSemi-local LCS between permutations
C
F
A
E
D
H
G
B
D E H C B A F G
Alexander Tiskin (Warwick) Semi-local string comparison 92 / 132
![Page 213: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/213.jpg)
Sparse string comparisonSemi-local LCS between permutations
Semi-local LCS on permutation strings: the algorithm
Divide-and-conquer on the alignment graph
Divide graph (say) horizontally; two subproblems of effective size n/2
Conquer: seaweed matrix �-multiplication, time O(n log n)
Overall time O(n log2 n)
Alexander Tiskin (Warwick) Semi-local string comparison 93 / 132
![Page 214: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/214.jpg)
Sparse string comparisonLongest piecewise monotone subsequences
A k-increasing sequence: a concatenation of k increasing sequences
A k-modal sequence: a concatenation of k alternating increasing anddecreasing sequences
The longest k-increasing (k-modal) subsequence problem
Give the longest k-increasing (k-modal) subsequence of string b
Longest k-increasing (k-modal) subsequence: running time
O(nk log n) k-modal [Demange+: 2007]O(nk log n) via [Hunt, Szymanski: 1977]O(n log2 n) [T: NEW]
Main idea: LCS for idk (respectively, (id id)k/2) vs b
Alexander Tiskin (Warwick) Semi-local string comparison 94 / 132
![Page 215: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/215.jpg)
Sparse string comparisonLongest piecewise monotone subsequences
Longest k-increasing subsequence: algorithm A
Sparse LCS for idk vs b: time O(nk log n)
Longest k-increasing subsequence: algorithm B
Compute seaweed matrix for id vs b: time O(n log2 n)
Extract three-way seaweed submatrix
Compute three-way seaweed submatrix for idk vs b by log k instances ofseaweed matrix �-square/multiply: time log k · O(n log n) = O(n log2 n)
Query the LCS score for idk vs b: time negligible
Overall time O(n log2 n)
Algorithm B faster than Algorithm A for k ≥ log n
Alexander Tiskin (Warwick) Semi-local string comparison 95 / 132
![Page 216: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/216.jpg)
Sparse string comparisonLongest piecewise monotone subsequences
Longest k-increasing subsequence: algorithm A
Sparse LCS for idk vs b: time O(nk log n)
Longest k-increasing subsequence: algorithm B
Compute seaweed matrix for id vs b: time O(n log2 n)
Extract three-way seaweed submatrix
Compute three-way seaweed submatrix for idk vs b by log k instances ofseaweed matrix �-square/multiply: time log k · O(n log n) = O(n log2 n)
Query the LCS score for idk vs b: time negligible
Overall time O(n log2 n)
Algorithm B faster than Algorithm A for k ≥ log n
Alexander Tiskin (Warwick) Semi-local string comparison 95 / 132
![Page 217: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/217.jpg)
Sparse string comparisonMaximum clique in a circle graph
The maximum clique problem in a circle graph
Given a circle with n chords, find the maximum-size subset of pairwiseintersecting chords
S
Standard reduction to an interval model: cut the circle and lay it out onthe line; chords become intervals (here drawn as square diagonals)
Chords intersect iff intervals overlap, i.e. intersect without containment
Alexander Tiskin (Warwick) Semi-local string comparison 96 / 132
![Page 218: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/218.jpg)
Sparse string comparisonMaximum clique in a circle graph
The maximum clique problem in a circle graph
Given a circle with n chords, find the maximum-size subset of pairwiseintersecting chords
S
Standard reduction to an interval model: cut the circle and lay it out onthe line; chords become intervals (here drawn as square diagonals)
Chords intersect iff intervals overlap, i.e. intersect without containment
Alexander Tiskin (Warwick) Semi-local string comparison 96 / 132
![Page 219: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/219.jpg)
Sparse string comparisonMaximum clique in a circle graph
Maximum clique in a circle graph: running time
exp(n) naiveO(n3) [Gavril: 1973]O(n2) [Rotem, Urrutia: 1981; Hsu: 1985]
[Masuda+: 1990; Apostolico+: 1992]O(n1.5) [T: 2006]O(n log2 n) [T: NEW]
Alexander Tiskin (Warwick) Semi-local string comparison 97 / 132
![Page 220: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/220.jpg)
Sparse string comparisonMaximum clique in a circle graph
Alexander Tiskin (Warwick) Semi-local string comparison 98 / 132
![Page 221: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/221.jpg)
Sparse string comparisonMaximum clique in a circle graph
Alexander Tiskin (Warwick) Semi-local string comparison 98 / 132
![Page 222: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/222.jpg)
Sparse string comparisonMaximum clique in a circle graph
Alexander Tiskin (Warwick) Semi-local string comparison 98 / 132
![Page 223: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/223.jpg)
Sparse string comparisonMaximum clique in a circle graph
Alexander Tiskin (Warwick) Semi-local string comparison 98 / 132
![Page 224: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/224.jpg)
Sparse string comparisonMaximum clique in a circle graph
Alexander Tiskin (Warwick) Semi-local string comparison 98 / 132
![Page 225: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/225.jpg)
Sparse string comparisonMaximum clique in a circle graph
Alexander Tiskin (Warwick) Semi-local string comparison 98 / 132
![Page 226: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/226.jpg)
Sparse string comparisonMaximum clique in a circle graph
Alexander Tiskin (Warwick) Semi-local string comparison 98 / 132
![Page 227: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/227.jpg)
Sparse string comparisonMaximum clique in a circle graph
Alexander Tiskin (Warwick) Semi-local string comparison 98 / 132
![Page 228: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/228.jpg)
Sparse string comparisonMaximum clique in a circle graph
Alexander Tiskin (Warwick) Semi-local string comparison 98 / 132
![Page 229: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/229.jpg)
Sparse string comparisonMaximum clique in a circle graph
Alexander Tiskin (Warwick) Semi-local string comparison 98 / 132
![Page 230: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/230.jpg)
Sparse string comparisonMaximum clique in a circle graph
Alexander Tiskin (Warwick) Semi-local string comparison 98 / 132
![Page 231: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/231.jpg)
Sparse string comparisonMaximum clique in a circle graph
Maximum clique in a circle graph: the algorithm
Helly property: if any set of intervals intersect pairwise, then they allintersect at a common point
Compute seaweed matrix, build range tree: time O(n log2 n)
Run through all 2n + 1 possible common intersection points
For each point, find a maximum subset of covering overlapping segmentsby a prefix-suffix LCS query: time (2n + 1) · O(log2 n) = O(n log2 n)
Overall time O(n log2 n) + O(n log2 n) = O(n log2 n)
Alexander Tiskin (Warwick) Semi-local string comparison 99 / 132
![Page 232: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/232.jpg)
Sparse string comparisonMaximum clique in a circle graph
Parameterised maximum clique in a circle graph
The maximum clique problem in a circle graph, sensitive e.g. to
the number e of edges
the size l of maximum clique
the thickness d of interval model (the maximum number of intervalscovering a point)
We have l ≤ d ≤ n, e ≤ n2
Parameterised maximum clique in a circle graph: running time
O(n log n + e) [Apostolico+: 1992]
O(n log n + nl log(n/l)) [Apostolico+: 1992]
O(n log n + n log2 d) NEW
Alexander Tiskin (Warwick) Semi-local string comparison 100 / 132
![Page 233: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/233.jpg)
Sparse string comparisonMaximum clique in a circle graph
Parameterised maximum clique in a circle graph: the algorithm
For each diagonal block of size d , compute seaweed matrix, build rangetree: time n/d · O(d log2 d) = O(n log2 d)
Extend each diagonal block to a quadrant: time O(n log2 d)
Run through all 2n + 1 possible common intersection points
For each point, find a maximum subset of covering overlapping segmentsby a prefix-suffix LCS query: time O(n log2 d)
Overall time O(n log2 d)
Alexander Tiskin (Warwick) Semi-local string comparison 101 / 132
![Page 234: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/234.jpg)
Alexander Tiskin (Warwick) Semi-local string comparison 102 / 132
![Page 235: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/235.jpg)
1 Introduction
2 Matrix distance multiplication
3 Semi-local string comparison
4 The seaweed method
5 Periodic string comparison
6 The transposition networkmethod
7 Sparse string comparison
8 Compressed string comparison
9 Beyond semi-locality
10 Conclusions and future work
Alexander Tiskin (Warwick) Semi-local string comparison 103 / 132
![Page 236: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/236.jpg)
Compressed string comparisonGrammar compression
Notation: pattern p of length m; text t of length n
A GC-string (grammar-compressed string) t is a straight-line program(context-free grammar) generating t = tn by n assignments of the form
tk = α, where α is an alphabet character
tk = ti tj , where i , j < k
In general, n = O(2n)
Example: Fibonacci string “ABAABABAABAAB”
t1 = ‘B’ t2 = ‘A’
t3 = t2t1 t4 = t3t2 t5 = t4t3 t6 = t5t4 t7 = t6t5
Alexander Tiskin (Warwick) Semi-local string comparison 104 / 132
![Page 237: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/237.jpg)
Compressed string comparisonGrammar compression
Grammar-compression covers various compression types, e.g. LZ78, LZW(not LZ77 directly)
Simplifying assumption: arithmetic up to n runs in O(1)
This assumption can be removed by careful index remapping
Alexander Tiskin (Warwick) Semi-local string comparison 105 / 132
![Page 238: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/238.jpg)
Compressed string comparisonExtended substring-string LCS on GC-strings
LCS: running time (r = m + n, r = m + n)
p t
plain plain O(mn) [Wagner, Fischer: 1974]O(
mnlog2 m
)[Masek, Paterson: 1980]
[Crochemore+: 2003]
plain GC O(m3n + . . .) gen. CFG [Myers: 1995]O(m1.5n) ext subs-s [T: 2008]O(m logm · n) ext subs-s [T: 2010]
GC GC NP-hard [Lifshits: 2005]O(r1.2r1.4) R weights [Hermelin+: 2009]O(r log r · r) [T: 2010]O(r log(r/r) · r
)[Hermelin+: 2010]
O(r log1/2(r/r) · r
)[Gawrychowski: NEW]
Alexander Tiskin (Warwick) Semi-local string comparison 106 / 132
![Page 239: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/239.jpg)
Compressed string comparisonExtended substring-string LCS on GC-strings
Extended substring-string LCS (plain pattern, GC text): the algorithm
For every k, compute by recursion the appropriate part of seaweed matrixPp,tk , using matrix �-multiplication: time O(m logm · n)
Overall time O(m logm · n)
Alexander Tiskin (Warwick) Semi-local string comparison 107 / 132
![Page 240: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/240.jpg)
Compressed string comparisonSubsequence recognition on GC-strings
The global subsequence recognition problem
Does pattern p appear in text t as a subsequence?
Global subsequence recognition: running time
p t
plain plain O(n) greedy
plain GC O(mn) greedy
GC GC NP-hard [Lifshits: 2005]
Alexander Tiskin (Warwick) Semi-local string comparison 108 / 132
![Page 241: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/241.jpg)
Compressed string comparisonSubsequence recognition on GC-strings
The local subsequence recognition problem
Find all minimally matching substrings of t with respect to p
Substring of t is matching, if p is a subsequence of t
Matching substring of t is minimally matching, if none of its propersubstrings are matching
Alexander Tiskin (Warwick) Semi-local string comparison 109 / 132
![Page 242: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/242.jpg)
Compressed string comparisonSubsequence recognition on GC-strings
Local subsequence recognition: running time ( + output)
p t
plain plain O(mn) [Mannila+: 1995]O(
mnlog m
)[Das+: 1997]
O(cm + n) [Boasson+: 2001]O(m + nσ) [Tronicek: 2001]
plain GC O(m2 logmn) [Cegielski+: 2006]O(m1.5n) [T: 2008]O(m logm · n) [T: NEW]
GC GC NP-hard [Lifshits: 2005]
Alexander Tiskin (Warwick) Semi-local string comparison 110 / 132
![Page 243: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/243.jpg)
Compressed string comparisonSubsequence recognition on GC-strings
ı0+
Oı1+
Oı2+
O
M0+
M1+
M2+
0H
n′
HnH
b〈i : j〉 matching iff box [i : j ] not pierced left-to-right
≷-maximal seaweeds: �-chain(ı0+ , 0+
)�(ı1+ , 1+
)� · · · �
(ıs− , s−
)b〈i : j〉 minimally matching iff (i , j) is in the interleaved �-chain(ı−1+ ,
+1−
)�(ı−2+ ,
+2−
)� · · · �
(ı−(s−1)+ ,
+(s−1)−
)Alexander Tiskin (Warwick) Semi-local string comparison 111 / 132
![Page 244: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/244.jpg)
Compressed string comparisonSubsequence recognition on GC-strings
•
•
•
•
•
×
×
×
×
0+ 1+ 2+n′ n m+n
−m
0
n′
ı0+
ı1+
ı2+
Alexander Tiskin (Warwick) Semi-local string comparison 112 / 132
![Page 245: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/245.jpg)
Compressed string comparisonSubsequence recognition on GC-strings
Local subsequence recognition (plain pattern, GC text): the algorithm
For every k, compute by recursion the appropriate part of seaweed matrixPp,tk , using matrix �-multiplication: time O(m logm · n)
Given an assignment t = t ′t ′′, count by recursion
minimally matching substrings in t ′
minimally matching substrings in t ′′
Then, find �-chain of ≷-maximal seaweeds in time n · O(m) = O(mn)
The interleaved �-chain defines minimally matching substrings in toverlapping both t ′ and t ′′
Overall time O(m logm · n) + O(mn) = O(m logm · n)
Alexander Tiskin (Warwick) Semi-local string comparison 113 / 132
![Page 246: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/246.jpg)
Compressed string comparisonSubsequence recognition on GC-strings
Local subsequence recognition (plain pattern, GC text): the algorithm
For every k, compute by recursion the appropriate part of seaweed matrixPp,tk , using matrix �-multiplication: time O(m logm · n)
Given an assignment t = t ′t ′′, count by recursion
minimally matching substrings in t ′
minimally matching substrings in t ′′
Then, find �-chain of ≷-maximal seaweeds in time n · O(m) = O(mn)
The interleaved �-chain defines minimally matching substrings in toverlapping both t ′ and t ′′
Overall time O(m logm · n) + O(mn) = O(m logm · n)
Alexander Tiskin (Warwick) Semi-local string comparison 113 / 132
![Page 247: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/247.jpg)
Compressed string comparisonSubsequence recognition on GC-strings
Local subsequence recognition (plain pattern, GC text): the algorithm
For every k, compute by recursion the appropriate part of seaweed matrixPp,tk , using matrix �-multiplication: time O(m logm · n)
Given an assignment t = t ′t ′′, count by recursion
minimally matching substrings in t ′
minimally matching substrings in t ′′
Then, find �-chain of ≷-maximal seaweeds in time n · O(m) = O(mn)
The interleaved �-chain defines minimally matching substrings in toverlapping both t ′ and t ′′
Overall time O(m logm · n) + O(mn) = O(m logm · n)
Alexander Tiskin (Warwick) Semi-local string comparison 113 / 132
![Page 248: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/248.jpg)
Compressed string comparisonThreshold approximate matching
The threshold approximate matching problem
Find all matching substrings of t with respect to p, according to athreshold k
Substring of t is matching, if the edit distance for p vs t is at most k
Alexander Tiskin (Warwick) Semi-local string comparison 114 / 132
![Page 249: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/249.jpg)
Compressed string comparisonThreshold approximate matching
Threshold approximate matching: running time ( + output)
p t
plain plain O(mn) [Sellers: 1980]O(mk) [Landau, Vishkin: 1989]
O(m + n + nk4
m ) [Cole, Hariharan: 2002]
plain GC O(mnk2) [Karkkainen+: 2003]O(mnk + n log n) [LV: 1989] via [Bille+: 2010]O(mn + nk4 + n log n) [CH: 2002] via [Bille+: 2010]O(m logm · n) [T: NEW]
GC GC NP-hard [Lifshits: 2005]
(Also many specialised variants for LZ compression)
Alexander Tiskin (Warwick) Semi-local string comparison 115 / 132
![Page 250: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/250.jpg)
Compressed string comparisonThreshold approximate matching
Threshold approx matching (plain pattern, GC text): the algorithm
Algorithm structure similar to local subsequence recognition by matrix�-multiplication and seaweed �-chains
Extra ingredients:
the blow-up technique: reduction of edit distances to LCS scores
implicit matrix searching, replaces �-chain interleaving
Monge row minima: “SMAWK” O(m) [Aggarwal+: 1987]
Implicit unit-Monge row minima O(m log logm) [T: NEW]O(m) [Gawrychowski: NEW]
replaces �-chain interleaving
Overall time O(m logm · n) + O(mn) = O(m logm · n)
Alexander Tiskin (Warwick) Semi-local string comparison 116 / 132
![Page 251: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/251.jpg)
Compressed string comparisonThreshold approximate matching
ı0+
Oı1+
Oı2+
Oı3+
Oı4+
O
M0+
M1+
M2+
M3+
M4+
0H
n′
HnH
Blow up: weighted alignment on strings p, t of size m, n equivalent toLCS on strings p, t of size m = νm, n = νn
Alexander Tiskin (Warwick) Semi-local string comparison 117 / 132
![Page 252: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/252.jpg)
Compressed string comparisonThreshold approximate matching
0+ 1+ 2+ 3+ 4+n′ n m+n
ı0+
ı1+
ı2+
ı3+
ı4+
−m
0
n′
•
•
•
•
•
×
×
×
×
×
×× × × ×× ×
× × ×× ×
× × ×× ×
× × ×× ×
× × ×× ×
× × ×× ×
Alexander Tiskin (Warwick) Semi-local string comparison 118 / 132
![Page 253: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/253.jpg)
1 Introduction
2 Matrix distance multiplication
3 Semi-local string comparison
4 The seaweed method
5 Periodic string comparison
6 The transposition networkmethod
7 Sparse string comparison
8 Compressed string comparison
9 Beyond semi-locality
10 Conclusions and future work
Alexander Tiskin (Warwick) Semi-local string comparison 119 / 132
![Page 254: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/254.jpg)
Beyond semi-localityWindow-local LCS and alignment plots
Given window length w
The window-local LCS problem
Give semi-local LCS score for every w -substring of a vs whole b
Window-local LCS: running time
O(mn3w) naiveO(mnw) multiple seaweedO(mn) [Krusche, T: 2010]
Alexander Tiskin (Warwick) Semi-local string comparison 120 / 132
![Page 255: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/255.jpg)
Beyond semi-localityWindow-local LCS and alignment plots
Given window length w
The window-local LCS problem
Give semi-local LCS score for every w -substring of a vs whole b
Window-local LCS: running time
O(mn3w) naiveO(mnw) multiple seaweedO(mn) [Krusche, T: 2010]
Alexander Tiskin (Warwick) Semi-local string comparison 120 / 132
![Page 256: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/256.jpg)
Beyond semi-localityWindow-local LCS and alignment plots
Quasi-local LCS: the algorithm
Compute seaweed matrices for canonical substrings of a against b
Build seaweed matrices for windows of a against b recursively
Both stages use matrix �-multiplication, overall time O(mn)
Alexander Tiskin (Warwick) Semi-local string comparison 121 / 132
![Page 257: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/257.jpg)
Beyond semi-localityWindow-local LCS and alignment plots
The window-window LCS problem
Give semi-local LCS score for every w -substring of a vs every w -substring b
Provides an alignment plot: loss-free local comparison of genomesequences, important for studying conservation in the genome
Window-window LCS: running time
O(mnw2) naiveO(mnw) multiple seaweedO(mn) [Krusche, T: 2010]
Alexander Tiskin (Warwick) Semi-local string comparison 122 / 132
![Page 258: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/258.jpg)
Beyond semi-localityWindow-local LCS and alignment plots
The window-window LCS problem
Give semi-local LCS score for every w -substring of a vs every w -substring b
Provides an alignment plot: loss-free local comparison of genomesequences, important for studying conservation in the genome
Window-window LCS: running time
O(mnw2) naiveO(mnw) multiple seaweedO(mn) [Krusche, T: 2010]
Alexander Tiskin (Warwick) Semi-local string comparison 122 / 132
![Page 259: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/259.jpg)
Beyond semi-localityWindow-local LCS and alignment plots
An implementation of alignment plots [Krusche: 2010]
Based on the O(mnw) multiple seaweed combing, using some ideas fromthe optimal O(mn) algorithm
Resulting theoretical complexity O(mnw0.5)
Alignment weights (match 1, mismatch 0, gap −0.5): ×4 slowdown
C++, Intel assembly (x86, x86 64, MMX/SSE2 data parallelism)
SMP parallelism (currently two processors)
Alexander Tiskin (Warwick) Semi-local string comparison 123 / 132
![Page 260: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/260.jpg)
Beyond semi-localityWindow-local LCS and alignment plots
Krusche’s implementation used at Warwick Systems Biology Centre tostudy weak conservation in plant genomes (BLAST too inaccurate)
Speedup on one processor:
×10 over heavily optimised, bit-parallel naive algorithm
×7 over biologists’ heuristics
Speedup on two processors: extra ×2, near-perfect parallelism
Alexander Tiskin (Warwick) Semi-local string comparison 124 / 132
![Page 261: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/261.jpg)
Beyond semi-localityWindow-local LCS and alignment plots
Publications:
[Picot+: 2010] Evolutionary Analysis of Regulatory Sequences (EARS) inPlants. Plant Journal.
[Baxter+: accepted] Conserved Noncoding Sequences Highlight SharedComponents of Regulatory Networks in Dicotyledonous Plants. Plant Cell.
Web service: http://wsbc.warwick.ac.uk/ears
Future work: genomic repeats; genome-scale comparison
Alexander Tiskin (Warwick) Semi-local string comparison 125 / 132
![Page 262: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/262.jpg)
Beyond semi-localityQuasi-local LCS and sparse spliced alignment
Given m (possibly overlapping) prescribed substrings in a
The quasi-local LCS problem
Give semi-local LCS score for every prescribed substring of a vs whole b
Quasi-local LCS: running time
O(m2n) multiple seaweedO(mn log2 m) [T: unpublished]
Alexander Tiskin (Warwick) Semi-local string comparison 126 / 132
![Page 263: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/263.jpg)
Beyond semi-localityQuasi-local LCS and sparse spliced alignment
Given m (possibly overlapping) prescribed substrings in a
The quasi-local LCS problem
Give semi-local LCS score for every prescribed substring of a vs whole b
Quasi-local LCS: running time
O(m2n) multiple seaweedO(mn log2 m) [T: unpublished]
Alexander Tiskin (Warwick) Semi-local string comparison 126 / 132
![Page 264: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/264.jpg)
Beyond semi-localityQuasi-local LCS and sparse spliced alignment
Window-local LCS: the algorithm
Compute seaweed matrices for canonical substrings of a against b
Build seaweed matrices for prescribed substrings of a against b recursively
Both stages use matrix �-multiplication, time O(mn log2 m)
Alexander Tiskin (Warwick) Semi-local string comparison 127 / 132
![Page 265: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/265.jpg)
Beyond semi-localityQuasi-local LCS and sparse spliced alignment
The sparse spliced alignment problem
Give the chain of non-overlapping prescribed substrings in a, closest to bby alignment score
Describes gene assembly: unknown gene from candidate exons, given aknown similar gene
Assume m = n; let s = total size of prescribed substrings
Sparse spliced alignment: running time
O(ns) = O(n3) [Gelfand+: 1996]O(n2.5) [Kent+: 2006]O(n2 log2 n) [T: unpublished]O(n2 log n) [Sakai: 2009]
Alexander Tiskin (Warwick) Semi-local string comparison 128 / 132
![Page 266: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/266.jpg)
Beyond semi-localityQuasi-local LCS and sparse spliced alignment
The sparse spliced alignment problem
Give the chain of non-overlapping prescribed substrings in a, closest to bby alignment score
Describes gene assembly: unknown gene from candidate exons, given aknown similar gene
Assume m = n; let s = total size of prescribed substrings
Sparse spliced alignment: running time
O(ns) = O(n3) [Gelfand+: 1996]O(n2.5) [Kent+: 2006]O(n2 log2 n) [T: unpublished]O(n2 log n) [Sakai: 2009]
Alexander Tiskin (Warwick) Semi-local string comparison 128 / 132
![Page 267: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/267.jpg)
1 Introduction
2 Matrix distance multiplication
3 Semi-local string comparison
4 The seaweed method
5 Periodic string comparison
6 The transposition networkmethod
7 Sparse string comparison
8 Compressed string comparison
9 Beyond semi-locality
10 Conclusions and future work
Alexander Tiskin (Warwick) Semi-local string comparison 129 / 132
![Page 268: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/268.jpg)
Conclusions and future work
Implicit unit-Monge matrices:
the seaweed monoid
distance multiplication in time O(n log n)
next: lower bound?
Semi-local LCS problem:
representation by implicit unit-Monge matrices
generalisation to rational alignment scores
next: real alignment scores?
Seaweed combing and micro-block speedup:
a simple algorithm for semi-local LCS
semi-local LCS in time o(mn)
improvements on related problems
Alexander Tiskin (Warwick) Semi-local string comparison 130 / 132
![Page 269: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/269.jpg)
Conclusions and future work
Transposition networks:
simple interpretation of high-similarity and dissimilarity LCS
dynamic LCS
simple interpretation of bit-parallel LCS
Sparse string comparison:
fast LCS on permutation strings
fast max-clique in a circle graph
next: lower bound? or even faster max-clique
Sparse string comparison:
semi-local LCS on permutations in time O(n log2 n)
maximum clique in a circle graph in time O(n log2 n)
improvements on related problems
Alexander Tiskin (Warwick) Semi-local string comparison 131 / 132
![Page 270: 20121020 semi local-string_comparison_tiskin](https://reader031.vdocument.in/reader031/viewer/2022013121/54631ffab1af9f936c8b530a/html5/thumbnails/270.jpg)
Conclusions and future work
Compressed string comparison:
three-way semi-local LCS on GC text against plain pattern
subsequence recognition in GC-strings
next: approximate matching in GC-strings
Beyond semi-locality:
quasi-local string comparison
sparse spliced alignment
next: full locality
Alexander Tiskin (Warwick) Semi-local string comparison 132 / 132