smith algorithm experiments with a very fast substring search algorithm, smith p.d., software -...
Post on 21-Dec-2015
223 views
TRANSCRIPT
![Page 1: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/1.jpg)
Smith Algorithm
Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp.
1065-1074.
Adviser: R. C. T. LeeSpeaker: C. W. Cheng
National Chi Nan University
![Page 2: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/2.jpg)
Problem Definition
Input: a text string T with length n and a pattern string P with length m.
Output: all occurrences of P in T.
![Page 3: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/3.jpg)
Definition• Ts : the first character of a string T aligns to a pattern P.
• Pl : the first character of a pattern P aligns to a string T.
• Tj : the character of the jth position of a string T.
• Pi : the character of the ith position of a pattern P.
• Pf : the last character of a pattern P.
• n : The length of T.
• m : The length of P.
![Page 4: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/4.jpg)
Rule 2-2: 1-Suffix Rule (A Special Version of Rule 2)
• Consider the 1-suffix x. We may apply Rule 2-2 now.
T
P
x
x
![Page 5: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/5.jpg)
Introduction
• takes the maximum of the Horspool shift function and the Quick Search shift function.
• uses Rule 2-2: 1-Suffix Rule
![Page 6: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/6.jpg)
Smith Algorithm
• This algorithm is almost the same as Quick Search Algorithm except the last character of the window is also considered.
T
P
x
x
If this will induce a better movement than the Quick Search Algorithm. This is used; otherwise the Quick Search is used.
![Page 7: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/7.jpg)
Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string
P=CAGAGAG
G C G C A G A G A G T A G A G A G T A C G
C A G A G A G
A C G T
hpBC 1 6 2 7
A C G T
qsBC 2 7 1 8
![Page 8: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/8.jpg)
Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string
P=CAGAGAG
G C G C A G A G A G T A G A G A G T A C G
C A G A G A G
A C G T
hpBC 1 6 2 7
A C G T
qsBC 2 7 1 8
mismatch
![Page 9: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/9.jpg)
Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string
P=CAGAGAG
G C G C A G A G A G T A G A G A G T A C G
C A G A G A G
A C G T
hpBC 1 6 2 7
A C G T
qsBC 2 7 1 8
mismatch
hpBC[A]=1, qsBC[G]=1, shift=1
![Page 10: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/10.jpg)
Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string
P=CAGAGAG
G C G C A G A G A G T A G A G A G T A C G
C A G A G A G
A C G T
hpBC 1 6 2 7
A C G T
qsBC 2 7 1 8
![Page 11: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/11.jpg)
Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string
P=CAGAGAG
G C G C A G A G A G T A G A G A G T A C G
C A G A G A G
A C G T
hpBC 1 6 2 7
A C G T
qsBC 2 7 1 8
mismatch
![Page 12: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/12.jpg)
Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string
P=CAGAGAG
G C G C A G A G A G T A G A G A G T A C G
C A G A G A G
A C G T
hpBC 1 6 2 7
A C G T
qsBC 2 7 1 8
mismatch
hpBC[G]=2, qsBC[A]=2, shift=2
![Page 13: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/13.jpg)
Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string
P=CAGAGAG
G C G C A G A G A G T A G A G A G T A C G
C A G A G A G
A C G T
hpBC 1 6 2 7
A C G T
qsBC 2 7 1 8
![Page 14: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/14.jpg)
Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string
P=CAGAGAG
G C G C A G A G A G T A G A G A G T A C G
C A G A G A G
A C G T
hpBC 1 6 2 7
A C G T
qsBC 2 7 1 8
exact match
![Page 15: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/15.jpg)
Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string
P=CAGAGAG
G C G C A G A G A G T A G A G A G T A C G
C A G A G A G
A C G T
hpBC 1 6 2 7
A C G T
qsBC 2 7 1 8
exact match
hpBC[G]=2, qsBC[T]=8, shift=8
![Page 16: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/16.jpg)
Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string
P=CAGAGAG
G C G C A G A G A G T A G A G A G T A C G
C A G A G A G
A C G T
hpBC 1 6 2 7
A C G T
qsBC 2 7 1 8
![Page 17: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/17.jpg)
Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string
P=CAGAGAG
G C G C A G A G A G T A G A G A G T A C G
C A G A G A G
A C G T
hpBC 1 6 2 7
A C G T
qsBC 2 7 1 8
mismatch
![Page 18: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/18.jpg)
Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string
P=CAGAGAG
G C G C A G A G A G T A G A G A G T A C G
C A G A G A G
A C G T
hpBC 1 6 2 7
A C G T
qsBC 2 7 1 8
mismatch
hpBC[T]=7, qsBC[A]=2, shift=7
![Page 19: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/19.jpg)
Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string
P=CAGAGAG
G C G C A G A G A G T A G A G A G T A C G
C A G A G A G
A C G T
hpBC 1 6 2 7
A C G T
qsBC 2 7 1 8
![Page 20: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/20.jpg)
Time complexity
• preprocessing phase in O(m+ σ) time and O(σ) space complexity, σ is the number of alphabets in pattern.
• searching phase in O(mn) time complexity.
![Page 21: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/21.jpg)
Reference[KMP77] Fast pattern matching in strings, D. E. Knuth, J. H. Morris, Jr and V. B. Pratt, SIAM J. Computing, 6, 1977, pp. 323–350.[BM77] A fast string search algorithm, R. S. Boyer and J. S. Moore, Comm. ACM, 20, 1977, pp. 762–772.[S90] A very fast substring search algorithm, D. M. Sunday, Comm. ACM, 33, 1990, pp. 132–142.[RR89] The Rand MH Message Handling system: User’s Manual (UCIVersion), M. T. Rose and J. L. Romine, University of California, Irvine, 1989.[S82] A comparison of three string matching algorithms, G. De V. Smith, Software—Practice and Experience,12, 1982, pp. 57–66.[HS91] Fast string searching, HUME A. and SUNDAY D.M. , Software - Practice & Experience 21(11), 1991, pp.
1221-1248. [S94] String Searching Algorithms , Stephen, G.A., World Scientific, 1994. [ZT87] On improving the average case of the Boyer-Moore string matching algorithm, ZHU, R.F. and TAKAOKA, T., Journal of Information Processing 10(3) , 1987, pp. 173-177 .[R92] Tuning the Boyer-Moore-Horspool string searching algorithm, RAITA T., Software - Practice & Experienc
e, 22(10) , 1992, pp. 879-884. [S94] On tuning the Boyer-Moore-Horspool string searching algorithms, SMITH, P.D., Software - Practice & Experience, 24(4) , 1994, pp. 435-436. [BR92] Average running time of the Boyer-Moore-Horspool algorithm, BAEZA-YATES, R.A., RÉGNIER, M., Theoretical Computer Science 92(1) , 1992, pp. 19-31. [H80] Practical fast searching in strings, HORSPOOL R.N., Software - Practice & Experience, 10(6) , 1980, pp. 501-506. [L95] Experimental results on string matching algorithms, LECROQ, T., Software - Practice & Experience 25(7) , 1995, pp. 727-765.
![Page 22: Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp. 1065-1074. Adviser:](https://reader036.vdocument.in/reader036/viewer/2022062421/56649d575503460f94a35447/html5/thumbnails/22.jpg)
Thanks for your listening