bom algorithm

17
Recuperació de la informació rn Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Net ible Pattern Matching in Strings (2002) Gonzalo Navarro and Mathieu Raffinot rithms on strings (2001) M. Crochemore, C. Hancart and T. Lecroq ://www-igm.univ-mlv.fr/~lecroq/string/index

Upload: nguyentruc

Post on 06-Jan-2017

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: BOM algorithm

Recuperació de la informació•Modern Information Retrieval (1999)

Ricardo-Baeza Yates and Berthier Ribeiro-Neto

•Flexible Pattern Matching in Strings (2002)Gonzalo Navarro and Mathieu Raffinot

•Algorithms on strings (2001)M. Crochemore, C. Hancart and T. Lecroq

•http://www-igm.univ-mlv.fr/~lecroq/string/index.html

Page 2: BOM algorithm

String Matching

String matching: definition of the problem (text,pattern) depends on what we have: text or patterns• Exact matching:

• Approximate matching:

• 1 pattern ---> The algorithm depends on |p| and || • k patterns ---> The algorithm depends on k, |p| and ||

• The text ----> Data structure for the text (suffix tree, ...)

• The patterns ---> Data structures for the patterns

• Dynamic programming • Sequence alignment (pairwise and multiple)

• Extensions • Regular Expressions

• Probabilistic search:

• Sequence assembly: hash algorithm

Hidden Markov Models

Page 3: BOM algorithm

String matching: one pattern

There is a sliding window along the text against which the pattern is compared:

How does the matching algorithms made the search?

Pattern :

Text :

Which are the facts that differentiate the algorithms?

1. How the comparison is made.2. The length of the shift.

At each step the comparison is made and the window is shifted to the right.

Page 4: BOM algorithm

Alg. Cerca exacta d’un patró (text on-line)

Algorismes més eficients (Navarro & Raffinot)

2 4 8 16 32 64 128 256

64

32

16

8

4

2

| |

Long. patró

Horspool

BNDMBOM

BNDM : Backward Nondeterministic Dawg Matching

BOM : Backward Oracle Matching

w

Page 5: BOM algorithm

Automaton

Given a pattern p, we find for an automaton A that verifies the properties:

1. A is acyclic.2. A recognizes at least the factors of p.3. A has the fewer states as possible.4. A has a linear number of transitions according to

the length of p.

… and at the end of the last century …

Page 6: BOM algorithm

Automaton Factor Oracle: properties

Given the word G T A T G T A

GG AT T ATTA

G

All states are accepting states ==> Recognize all the factors …. and more

GG AT T ATTA

G

Hip: recognize all factors of GTA

This state recognizes all the factors that ends in the fourth letter that have not beenaccepted before: GTAT, TAT, AT (note that T had been recognized before).

All the factors of the first four letters have been recognized.

Page 7: BOM algorithm

Automaton Factor Oracle: algorithm

Algorithm: for i=1 to p do Add those transitions that recognize the new factors

that end in letter i;

?

Page 8: BOM algorithm

Automaton Factor Oracle: algorithm

What happens if the transition is in the automaton?

TT

Page 9: BOM algorithm

Automaton Factor Oracle: algorithm

What happens if transition isn’t in the automaton?

TT

Page 10: BOM algorithm

Automaton Factor Oracle and BOM.

GG AT T ATTA

G

This automaton recognizes words that are not factors of GTATGTA like GTGTA => the affirmative answer is not informative, …. but

The negative answer ==> the word isn’t a factor!

Is the strategy of the BOM algorithm.

Page 11: BOM algorithm

Algorithm BOM (Backward Oracle Matching)

• Which is the next position of the window?

• How the comparison is made?

Text :

Pattern : Autòmaton Factor Oracle of the reverse pattern

Check if the suffix is a factor of the pattern.

a

• If some letter is not found

• If the pattern is found:

Page 12: BOM algorithm

BOM algorithm

• Given the pattern ATGTATG we construct the automaton of the reverse pattern:

• And the search phase : G T A C T A G A A T G T G T A G A C A T G T A T G G T G A...A T G T A T G

• How the comparison is made?

GG AT T ATTA

G

Page 13: BOM algorithm

BOM algorithm

• Given the pattern ATGTATG we construct the automaton of the reverse pattern

•And the search phase : G T A C T A G A A T G T G T A G A C A T G T A T G G T GA T G T A T G

• How the comparison is made?

GG AT T ATTA

G

A T G T A T G

Page 14: BOM algorithm

BOM algorithm

• Given the pattern ATGTATG we construct the automaton of the reverse pattern

• And the search phase : G T A C T A G A A T G T G T A G A C A T G T A T G G T G A T G T A T G

• How the comparison is made?

GG AT T ATTA

G

A T G T A T G A T G T A T G

Page 15: BOM algorithm

BOM algorithm

• Given the pattern ATGTATG we construct the automaton of the reverse pattern

• And the search phase : G T A C T A G A A T G T G T A G A C A T G T A T G G T GA T G T A T G

• How the comparison is made?

GG AT T ATTA

G

A T G T A T G A T G T A T G

A T G T A T G

Page 16: BOM algorithm

BOM algorithm

• Given the pattern ATGTATG we construct the automaton of the reverse pattern

•And the search phase : G T A C T A G A A T G T G T A G A C A T G T A T G G T G ...A T G T A T G

• How the comparison is made?

GG AT T ATTA

G

A T G T A T G A T G T A T G

A T G T A T G A T G T A T G

Page 17: BOM algorithm

BOM algorithm

• Given the pattern ATGTATG we construct the automaton of the reverse pattern

• And the search phase : G T A C T A G A A T G T G T A G A C A T G T A T G G T G ...A T G T A T G

• How the comparison is made?

GG AT T ATTA

G

A T G T A T G A T G T A T G

A T G T A T G A T G T A T G

A T G T A T G