Download - A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns
![Page 1: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/1.jpg)
23 Jan, 2008 SOFSEM 2008 1
A New Model to Solve the Swap Matching
Problem and Efficient Algorithms for Short
Patterns
Costas IliopoulosM. Sohel Rahman
![Page 2: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/2.jpg)
23 Jan, 2008 SOFSEM 2008 2
Classic Pattern Matching
Input: A string T of length n (the text) A string P of length m (the
pattern).
Output Whether P occurs in T Occ = {i | P = T [i..i + m − 1]}
Existence Query
Computation of Occurrence
set
From Alphabet
![Page 3: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/3.jpg)
23 Jan, 2008 SOFSEM 2008 3
Example
We have GAC at position 3 and 12 Occ = {3, 12}.
P = GAC
Occ = {5, 14}.
![Page 4: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/4.jpg)
23 Jan, 2008 SOFSEM 2008 4
Swap Matching
G CC TC T C A C G T TText
P = ACGCT1 109 112 3 4 5 6 7 8 12 13
A
C TC GA
1 2 3 4 5
![Page 5: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/5.jpg)
23 Jan, 2008 SOFSEM 2008 5
Swap Matching
G CC TC T C A C G T TText
P = ACGCT1 109 112 3 4 5 6 7 8 12 13
A
C TC GA
C TC GA
C TC GA
Occ = {1,5,6}
![Page 6: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/6.jpg)
23 Jan, 2008 SOFSEM 2008 6
Motivation Swap Error is a common error
during typing.
The phenomenon of swaps occurs in gene mutations and duplications.
![Page 7: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/7.jpg)
23 Jan, 2008 SOFSEM 2008 7
Existing results
O(nm1/3 log m log )
O(n log2 m)
O(n log m log )
= min(m,||)
(Some very special cases)
2000: Amir, Aumann, Landau,Lewenstein, Lewenstein.
1998: Amir, Landau,Lewenstein, Lewenstein.
2003: Amir, Cole, Hariharan,Lewenstein, Porat.
All results uses FFT
![Page 8: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/8.jpg)
23 Jan, 2008 SOFSEM 2008 8
Existing results Some related variants are also
investigated in the literature: Approximate version:
Amir, Lewenstein, Porat (2002) Weighted Version:
Zhang, Guo, Iliopoulos (2004)
![Page 9: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/9.jpg)
23 Jan, 2008 SOFSEM 2008 9
Our Contribution A new graph theoretic model O(m/w n logm) time.
For word-size patterns: O(n log m) The first non-FFT efficient algorithm
for swap matching
![Page 10: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/10.jpg)
23 Jan, 2008 SOFSEM 2008 10
The new Model
![Page 11: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/11.jpg)
23 Jan, 2008 SOFSEM 2008 11
T-Graph
c ca aa c b a c c b c
1 109 112 3 4 5 6 7 8 12 13
aT = b a
14 15
T-Graph
a c a abcacab a cc bc
![Page 12: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/12.jpg)
23 Jan, 2008 SOFSEM 2008 12
P-Graph
c b a b
1 2 3 4 5
aP = P-Graph
a c b
babca
b
b
a c
a
ab
1 2 3 4 5
![Page 13: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/13.jpg)
23 Jan, 2008 SOFSEM 2008 13
P-Graph
c c a b
1 2 3 4 5
aP = P-Graph
a c c
bacca
b
b
a c
a
ac
1 2 3 4 5
![Page 14: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/14.jpg)
23 Jan, 2008 SOFSEM 2008 14
So…
P swap matches T
P-Graph swap matches T-Graph
![Page 15: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/15.jpg)
23 Jan, 2008 SOFSEM 2008 15
An Efficient Algorithm
![Page 16: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/16.jpg)
23 Jan, 2008 SOFSEM 2008 16
Degenerate strings Let = {A, C, G, T} Then we can get 2^4 -1 = 15 non-
empty sets of letters. At each position of a degenerate
string we have one of those sets.
![Page 17: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/17.jpg)
23 Jan, 2008 SOFSEM 2008 17
Degenerate strings…
TGA C
GA C TA C TA G TC G
A C A G A T C G C TC G
A C G T
![Page 18: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/18.jpg)
23 Jan, 2008 SOFSEM 2008 18
Degenerate strings…
X=T
CCA
T
C
A
CA C
1 2 3 4 5 6 7
![Page 19: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/19.jpg)
23 Jan, 2008 SOFSEM 2008 19
Degenerate stringsEquality/Match
X=T
CCA
T
C
A
CA C
1 2 3 4 5 6 7
Y=T
CA
C
A
X[3] =d Y[1]. WHY?
Because, X[3] Y[1] = A
Y =d X[1..3]
Y =d X[3..5]
Y =d X[4..6]
![Page 20: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/20.jpg)
23 Jan, 2008 SOFSEM 2008 20
P-Graph => Degenerate String
a c b
babc
b
a c
a
ab
1 2 3 4 5
a
c
a
b
c
a
b
c
a
b
a
b
![Page 21: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/21.jpg)
23 Jan, 2008 SOFSEM 2008 21
P =>
c ab a a a b c b
1 1092 3 4 5 6 7 8
bT =
According to Deg. Mat, OK!
According to Swap. Mat, NOT OK!
Swap Match vs Deg. Match
a
c
a
b
c
a
b
c
a
b
a
b
a
c
a
b
c
a
b
c
a
b
a
b
![Page 22: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/22.jpg)
23 Jan, 2008 SOFSEM 2008 22
Why Doesn’t Work?
c ab a a a b c b
1 1092 3 4 5 6 7 8
bT =
a
c
a
b
c
a
b
c
a
b
a
b
a c b
babc
b
a c
a
ab
1 2 3 4 5
c c a b
1 2 3 4 5
a
![Page 23: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/23.jpg)
23 Jan, 2008 SOFSEM 2008 23
Forbidden Graph
a c c
bac
b
a
a
ac
![Page 24: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/24.jpg)
23 Jan, 2008 SOFSEM 2008 24
Our Algorithm
Shift-Or Algorithm
The concept of the Forbidden Graph
![Page 25: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/25.jpg)
23 Jan, 2008 SOFSEM 2008 25
D-Mask
c a baP = c => a
c
a
ba
b
a
c
a
cc
a c XD-> b
0 0 1ac 1
0 0 1ac 1
0 0 1ac 1
0 0 1abc 0
1
2
3
4
0 1 1ab 05
![Page 26: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/26.jpg)
23 Jan, 2008 SOFSEM 2008 26
F-Mask
a c c
bac
b
a
a
ac
(a,a)
0
0
0
0
1
2
3
4
05
(a,b) (b,b) (c,c) (c,a) (X,X)
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 2 3 4 5
1 1
11
1
1
1 1
![Page 27: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/27.jpg)
23 Jan, 2008 SOFSEM 2008 27
Computing R matrix
c ca aa c b a c c b c
1 109 112 3 4 5 6 7 8 12 13
a b a
14 15
2
3
4
5
1
c
c
a
b
a
1
1
1
1
1
0
1
1
1
1
1
Shift
1
1
1
1
0
0
0
0
0
0
Da
0
0
0
0
0
F(X,a)
Or
1
1
1
1
01
1
1
1
0
X
![Page 28: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/28.jpg)
23 Jan, 2008 SOFSEM 2008 28
Computing R matrix
c ca aa c b a c c b c
1 109 112 3 4 5 6 7 8 12 13
a b a
14 15
2
3
4
5
1
c
c
a
b
a
1
1
1
1
1
0
1
1
1
1
0
Shift
0
1
1
1
0
0
0
0
1
0
Dc
0
0
0
0
0
F(a,c)
Or
0
1
1
1
01
1
1
1
0
X
0
1
1
1
0
![Page 29: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/29.jpg)
23 Jan, 2008 SOFSEM 2008 29
Computing R matrix
c ca aa c b a c c b c
1 109 112 3 4 5 6 7 8 12 13
a b a
14 15
2
3
4
5
1
c
c
a
b
a
1
1
1
1
1
0
0
1
1
1
0
Shift
0
0
1
1
0
0
0
0
0
0
Da
0
0
0
1
0
F(c,a)
Or
0
0
1
1
01
1
1
1
0
X
0
1
1
1
0
0
0
1
1
0
![Page 30: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/30.jpg)
23 Jan, 2008 SOFSEM 2008 30
Computing R matrix
c ca aa c b a c c b c
1 109 112 3 4 5 6 7 8 12 13
a b a
14 15
2
3
4
5
1
c
c
a
b
a
1
1
1
1
1
0
0
0
0
1
0
Shift
0
0
0
0
0
1
1
0
0
1
Db
0
0
0
0
0
F(c,b)
Or
1
1
0
0
11
1
1
1
0
X
0
1
1
1
0
0
0
1
1
0
0
0
0
1
0
1
1
0
0
1
![Page 31: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/31.jpg)
23 Jan, 2008 SOFSEM 2008 31
Computing R matrix
c ca aa c b a c c b c
1 109 112 3 4 5 6 7 8 12 13
a b a
14 15
2
3
4
5
1
c
c
a
b
a
1
1
1
1
1
0
1
1
1
1
0
X
0
1
1
1
0
0
0
1
1
0
0
0
0
1
0
1
1
0
0
1
1
1
1
0
0
0
1
1
1
0
1
0
1
1
0
1
1
0
1
1
1
1
1
0
0
0
1
1
1
0
0
0
1
1
0
0
0
0
1
0
1
1
0
0
1
1
1
1
0
0
![Page 32: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/32.jpg)
23 Jan, 2008 SOFSEM 2008 32
Running Time
Computing D-Maks: O(m/w (m + ||))
Computing F-Maks: O(m/w m log m)
Computing R Values: O(m/w n log m)
O(m/w n log m)
O(n log m)short patterns (m~w)
![Page 33: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/33.jpg)
23 Jan, 2008 SOFSEM 2008 33
Future Works Explore the possibilities of using
Graph pattern matching Experimental works
Forthcoming paper contains experimental works using biological examples.
![Page 34: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/34.jpg)
23 Jan, 2008 SOFSEM 2008 34
The End
Thank you very much
![Page 35: A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns](https://reader036.vdocument.in/reader036/viewer/2022081513/56813352550346895d9a60d2/html5/thumbnails/35.jpg)
23 Jan, 2008 SOFSEM 2008 35