![Page 1: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/1.jpg)
Constant-Delay Enumeration for NondeterministicDocument Spanners
Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3, Matthias Niewerth4
March 27th, 20191Télécom ParisTech
2CNRS CRIStAL
3CNRS CRIL
4Universität Bayreuth1/16
![Page 2: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/2.jpg)
Problem: Finding Patterns in Text
• We have a long text T:Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and [email protected] Affiliation Associate professor of computer science (office C201-4) in the DIG team ofTélécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer scienceawarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• We want to find a pattern P in the text T:→ Example: find email addresses
• Write the pattern as a regular expression:
P := [a-z0-9.]∗ @ [a-z0-9.]∗
→ How to find the pattern P eciently in the text T?
2/16
![Page 3: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/3.jpg)
Problem: Finding Patterns in Text
• We have a long text T:Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and [email protected] Affiliation Associate professor of computer science (office C201-4) in the DIG team ofTélécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer scienceawarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• We want to find a pattern P in the text T:→ Example: find email addresses
• Write the pattern as a regular expression:
P := [a-z0-9.]∗ @ [a-z0-9.]∗
→ How to find the pattern P eciently in the text T?
2/16
![Page 4: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/4.jpg)
Problem: Finding Patterns in Text
• We have a long text T:Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and [email protected] Affiliation Associate professor of computer science (office C201-4) in the DIG team ofTélécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer scienceawarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• We want to find a pattern P in the text T:→ Example: find email addresses
• Write the pattern as a regular expression:
P := [a-z0-9.]∗ @ [a-z0-9.]∗
→ How to find the pattern P eciently in the text T?
2/16
![Page 5: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/5.jpg)
Problem: Finding Patterns in Text
• We have a long text T:Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and [email protected] Affiliation Associate professor of computer science (office C201-4) in the DIG team ofTélécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer scienceawarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• We want to find a pattern P in the text T:→ Example: find email addresses
• Write the pattern as a regular expression:
P := [a-z0-9.]∗ @ [a-z0-9.]∗
→ How to find the pattern P eciently in the text T?
2/16
![Page 6: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/6.jpg)
Solution: Automata
• Convert the regular expression P to an automaton A
P := [a-z0-9.]∗ @ [a-z0-9.]∗
1start 2 3 4
• •
[a-z0-9.] [a-z0-9.]
@
• Then, evaluate the automaton on the text TE m a i l a 3 n m @ a 3 n m . n e t A f f i l i a t i o n
• The complexity is O(|A| × |T|), i.e., linear in T and polynomial in P→ This is very ecient in T and reasonably ecient in P
3/16
![Page 7: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/7.jpg)
Solution: Automata
• Convert the regular expression P to an automaton A
P := [a-z0-9.]∗ @ [a-z0-9.]∗
1start 2 3 4
• •
[a-z0-9.] [a-z0-9.]
@
• Then, evaluate the automaton on the text TE m a i l a 3 n m @ a 3 n m . n e t A f f i l i a t i o n
• The complexity is O(|A| × |T|), i.e., linear in T and polynomial in P→ This is very ecient in T and reasonably ecient in P
3/16
![Page 8: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/8.jpg)
Solution: Automata
• Convert the regular expression P to an automaton A
P := [a-z0-9.]∗ @ [a-z0-9.]∗
1start 2 3 4
• •
[a-z0-9.] [a-z0-9.]
@
• Then, evaluate the automaton on the text TE m a i l a 3 n m @ a 3 n m . n e t A f f i l i a t i o n
• The complexity is O(|A| × |T|), i.e., linear in T and polynomial in P→ This is very ecient in T and reasonably ecient in P
3/16
![Page 9: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/9.jpg)
Solution: Automata
• Convert the regular expression P to an automaton A
P := [a-z0-9.]∗ @ [a-z0-9.]∗
1start 2 3 4
• •
[a-z0-9.] [a-z0-9.]
@
• Then, evaluate the automaton on the text T
E m a i l a 3 n m @ a 3 n m . n e t A f f i l i a t i o n
• The complexity is O(|A| × |T|), i.e., linear in T and polynomial in P→ This is very ecient in T and reasonably ecient in P
3/16
![Page 10: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/10.jpg)
Solution: Automata
• Convert the regular expression P to an automaton A
P := [a-z0-9.]∗ @ [a-z0-9.]∗
1start 2 3 4
• •
[a-z0-9.] [a-z0-9.]
@
• Then, evaluate the automaton on the text TE m a i l a 3 n m @ a 3 n m . n e t A f f i l i a t i o n
• The complexity is O(|A| × |T|), i.e., linear in T and polynomial in P
→ This is very ecient in T and reasonably ecient in P
3/16
![Page 11: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/11.jpg)
Solution: Automata
• Convert the regular expression P to an automaton A
P := [a-z0-9.]∗ @ [a-z0-9.]∗
1start 2 3 4
• •
[a-z0-9.] [a-z0-9.]
@
• Then, evaluate the automaton on the text TE m a i l a 3 n m @ a 3 n m . n e t A f f i l i a t i o n
• The complexity is O(|A| × |T|), i.e., linear in T and polynomial in P→ This is very ecient in T and reasonably ecient in P
3/16
![Page 12: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/12.jpg)
Actual Problem: Extracting all Patterns
• This only tests if the pattern occurs in the text!→ “YES”
• Goal: find all substrings in the text T which match the pattern P0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
E m a i l A f f i l i a t i o n
→ One match: [5, 20〉
4/16
![Page 13: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/13.jpg)
Actual Problem: Extracting all Patterns
• This only tests if the pattern occurs in the text!→ “YES”
• Goal: find all substrings in the text T which match the pattern P
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
E m a i l A f f i l i a t i o n
→ One match: [5, 20〉
4/16
![Page 14: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/14.jpg)
Actual Problem: Extracting all Patterns
• This only tests if the pattern occurs in the text!→ “YES”
• Goal: find all substrings in the text T which match the pattern P0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
E m a i l a 3 n m @ a 3 n m . n e t A f f i l i a t i o n
→ One match: [5, 20〉
4/16
![Page 15: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/15.jpg)
Actual Problem: Extracting all Patterns
• This only tests if the pattern occurs in the text!→ “YES”
• Goal: find all substrings in the text T which match the pattern P0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
E m a i l a 3 n m @ a 3 n m . n e t A f f i l i a t i o n
→ One match: [5, 20〉
4/16
![Page 16: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/16.jpg)
Formal Problem Statement
• Problem description:
• Input:• A text T
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. Frenchnational. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP [email protected] Associate professor of computer science (office C201-4) in the DIG team of TélécomParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded byTélécom ParisTech on March 14, 2016. Former student of the École normale supérieure. [email protected] Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• A pattern P given as a regular expression
P := [a-z0-9.]∗ @ [a-z0-9.]∗
• Output: the list of substrings of T that match P:[186, 200〉, [483, 500〉, . . .
• Goal: be very ecient in T and reasonably ecient in P
5/16
![Page 17: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/17.jpg)
Formal Problem Statement
• Problem description:• Input:
• A text TAntoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. Frenchnational. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP [email protected] Associate professor of computer science (office C201-4) in the DIG team of TélécomParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded byTélécom ParisTech on March 14, 2016. Former student of the École normale supérieure. [email protected] Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• A pattern P given as a regular expression
P := [a-z0-9.]∗ @ [a-z0-9.]∗
• Output: the list of substrings of T that match P:[186, 200〉, [483, 500〉, . . .
• Goal: be very ecient in T and reasonably ecient in P
5/16
![Page 18: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/18.jpg)
Formal Problem Statement
• Problem description:• Input:
• A text TAntoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. Frenchnational. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP [email protected] Associate professor of computer science (office C201-4) in the DIG team of TélécomParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded byTélécom ParisTech on March 14, 2016. Former student of the École normale supérieure. [email protected] Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• A pattern P given as a regular expression
P := [a-z0-9.]∗ @ [a-z0-9.]∗
• Output: the list of substrings of T that match P:[186, 200〉, [483, 500〉, . . .
• Goal: be very ecient in T and reasonably ecient in P
5/16
![Page 19: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/19.jpg)
Formal Problem Statement
• Problem description:• Input:
• A text TAntoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. Frenchnational. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP [email protected] Associate professor of computer science (office C201-4) in the DIG team of TélécomParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded byTélécom ParisTech on March 14, 2016. Former student of the École normale supérieure. [email protected] Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• A pattern P given as a regular expression
P := [a-z0-9.]∗ @ [a-z0-9.]∗
• Output: the list of substrings of T that match P:[186, 200〉, [483, 500〉, . . .
• Goal: be very ecient in T and reasonably ecient in P
5/16
![Page 20: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/20.jpg)
Formal Problem Statement
• Problem description:• Input:
• A text TAntoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. Frenchnational. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP [email protected] Associate professor of computer science (office C201-4) in the DIG team of TélécomParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded byTélécom ParisTech on March 14, 2016. Former student of the École normale supérieure. [email protected] Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• A pattern P given as a regular expression
P := [a-z0-9.]∗ @ [a-z0-9.]∗
• Output: the list of substrings of T that match P:[186, 200〉, [483, 500〉, . . .
• Goal: be very ecient in T and reasonably ecient in P
5/16
![Page 21: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/21.jpg)
Formal Problem Statement
• Problem description:• Input:
• A text TAntoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. Frenchnational. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP [email protected] Associate professor of computer science (office C201-4) in the DIG team of TélécomParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded byTélécom ParisTech on March 14, 2016. Former student of the École normale supérieure. [email protected] Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• A sequential document spanner P given as a regular expression
P := x`[a-z0-9.]∗ @ [a-z0-9.]∗a x
• Output: the list of tuples of substrings of T that match P:[186, 200〉, [483, 500〉, . . .
• Goal: be very ecient in T and reasonably ecient in P
5/16
![Page 22: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/22.jpg)
Formal Problem Statement
• Problem description:• Input:
• A text TAntoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. Frenchnational. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP [email protected] Associate professor of computer science (office C201-4) in the DIG team of TélécomParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded byTélécom ParisTech on March 14, 2016. Former student of the École normale supérieure. [email protected] Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• A pattern P given as a regular expression
P := [a-z0-9.]∗ @ [a-z0-9.]∗
• Output: the list of substrings of T that match P:[186, 200〉, [483, 500〉, . . .
• Goal: be very ecient in T and reasonably ecient in P
5/16
![Page 23: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/23.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l
[〉
o
[〉
l
[〉→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 24: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/24.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T[〉 l
[〉
o
[〉
l
[〉→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 25: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/25.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T[
〉
l
[
〉 o
[〉
l
[〉→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 26: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/26.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T[
〉
l
[〉
o
[
〉 l
[〉→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 27: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/27.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T[
〉
l
[〉
o
[〉
l
[
〉
→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 28: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/28.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l [〉 o
[〉
l
[〉→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 29: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/29.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l [
〉
o
[
〉 l
[〉→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 30: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/30.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l [
〉
o
[〉
l
[
〉
→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 31: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/31.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l
[〉
o [〉 l
[〉→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 32: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/32.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l
[〉
o [
〉
l
[
〉
→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 33: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/33.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l
[〉
o
[〉
l [〉
→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 34: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/34.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l
[〉
o
[〉
l
[〉
→ Complexity is O(|T|2 × |A| × |T|)
→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 35: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/35.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l
[〉
o
[〉
l
[〉
→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 36: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/36.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l
[〉
o
[〉
l
[〉
→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:
• Consider the text T:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 37: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/37.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l
[〉
o
[〉
l
[〉
→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 38: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/38.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l
[〉
o
[〉
l
[〉
→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 39: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/39.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l
[〉
o
[〉
l
[〉
→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 40: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/40.jpg)
Measuring the Complexity
• Naive algorithm: Run the automaton A on each substring of T
[〉
l
[〉
o
[〉
l
[〉
→ Complexity is O(|T|2 × |A| × |T|)→ Can be optimized to O(|T|2 × |A|)
• Problem: We may need to output Ω(|T|2) matching substrings:• Consider the text T:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider the pattern P := a∗
• The number of matches is Ω(|T|2)
→ We need a dierent way to measure complexity
6/16
![Page 41: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/41.jpg)
Enumeration Algorithms
Idea: In real life, we do not want to compute all the matcheswe just need to be able to enumerate matches quickly
→ Formalization: enumeration algorithms
7/16
![Page 42: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/42.jpg)
Enumeration Algorithms
Idea: In real life, we do not want to compute all the matcheswe just need to be able to enumerate matches quickly
→ Formalization: enumeration algorithms
7/16
![Page 43: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/43.jpg)
Enumeration Algorithms
Idea: In real life, we do not want to compute all the matcheswe just need to be able to enumerate matches quickly
→ Formalization: enumeration algorithms
7/16
![Page 44: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/44.jpg)
Enumeration Algorithms
Idea: In real life, we do not want to compute all the matcheswe just need to be able to enumerate matches quickly
→ Formalization: enumeration algorithms
7/16
![Page 45: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/45.jpg)
Enumeration Algorithms
Idea: In real life, we do not want to compute all the matcheswe just need to be able to enumerate matches quickly
→ Formalization: enumeration algorithms
7/16
![Page 46: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/46.jpg)
Enumeration Algorithms
Idea: In real life, we do not want to compute all the matcheswe just need to be able to enumerate matches quickly
→ Formalization: enumeration algorithms
7/16
![Page 47: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/47.jpg)
Formalizing Enumeration Algorithms
Antoine Amarilli Description Name AntoineAmarilli. Handle: a3nm. Identity Born1990-02-07. French national. Appearance asof 2017. Auth OpenPGP. OpenId. Bitcoin.Contact Email and XMPP [email protected] Associate professor ...
Text T
[a-z0-9.]∗@[a-z0-9.]∗
Pattern P
Phase 1:Preprocessing
Index structure
Phase 2:Enumeration
[42, 57〉,[1337, 1351〉
Results
8/16
![Page 48: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/48.jpg)
Formalizing Enumeration Algorithms
Antoine Amarilli Description Name AntoineAmarilli. Handle: a3nm. Identity Born1990-02-07. French national. Appearance asof 2017. Auth OpenPGP. OpenId. Bitcoin.Contact Email and XMPP [email protected] Associate professor ...
Text T
[a-z0-9.]∗@[a-z0-9.]∗
Pattern P
Phase 1:Preprocessing
Index structure
Phase 2:Enumeration
[42, 57〉,[1337, 1351〉
Results
8/16
![Page 49: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/49.jpg)
Formalizing Enumeration Algorithms
Antoine Amarilli Description Name AntoineAmarilli. Handle: a3nm. Identity Born1990-02-07. French national. Appearance asof 2017. Auth OpenPGP. OpenId. Bitcoin.Contact Email and XMPP [email protected] Associate professor ...
Text T
[a-z0-9.]∗@[a-z0-9.]∗
Pattern P
Phase 1:Preprocessing
Index structure
Phase 2:Enumeration
[42, 57〉,[1337, 1351〉
Results
8/16
![Page 50: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/50.jpg)
Formalizing Enumeration Algorithms
Antoine Amarilli Description Name AntoineAmarilli. Handle: a3nm. Identity Born1990-02-07. French national. Appearance asof 2017. Auth OpenPGP. OpenId. Bitcoin.Contact Email and XMPP [email protected] Associate professor ...
Text T
[a-z0-9.]∗@[a-z0-9.]∗
Pattern P
Phase 1:Preprocessing
Index structure
Phase 2:Enumeration
[42, 57〉,[1337, 1351〉
Results
8/16
![Page 51: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/51.jpg)
Formalizing Enumeration Algorithms
Antoine Amarilli Description Name AntoineAmarilli. Handle: a3nm. Identity Born1990-02-07. French national. Appearance asof 2017. Auth OpenPGP. OpenId. Bitcoin.Contact Email and XMPP [email protected] Associate professor ...
Text T
[a-z0-9.]∗@[a-z0-9.]∗
Pattern P
Phase 1:Preprocessing
Index structure
Phase 2:Enumeration
[42, 57〉,
[1337, 1351〉
Results8/16
![Page 52: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/52.jpg)
Formalizing Enumeration Algorithms
Antoine Amarilli Description Name AntoineAmarilli. Handle: a3nm. Identity Born1990-02-07. French national. Appearance asof 2017. Auth OpenPGP. OpenId. Bitcoin.Contact Email and XMPP [email protected] Associate professor ...
Text T
[a-z0-9.]∗@[a-z0-9.]∗
Pattern P
Phase 1:Preprocessing
Index structure
Phase 2:Enumeration
[42, 57〉,[1337, 1351〉
Results
8/16
![Page 53: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/53.jpg)
Formalizing Enumeration Algorithms
Antoine Amarilli Description Name AntoineAmarilli. Handle: a3nm. Identity Born1990-02-07. French national. Appearance asof 2017. Auth OpenPGP. OpenId. Bitcoin.Contact Email and XMPP [email protected] Associate professor ...
Text T
[a-z0-9.]∗@[a-z0-9.]∗
Pattern P
Phase 1:Preprocessing
Index structure
Phase 2:Enumeration
[42, 57〉,[1337, 1351〉
Results
Two performance criteria:
• Total time for phase 1
• Delay between two results in phase 2... as a function of the text and pattern
8/16
![Page 54: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/54.jpg)
Complexity of Enumeration Algorithms
• Recall the inputs to our problem:• A text T
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and [email protected] Affiliation Associate professor of computer science (office C201-4) in the DIG team ofTélécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer scienceawarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• A pattern P given as a regular expression
P := [a-z0-9.]∗ @ [a-z0-9.]∗
• What is the delay of the naive algorithm?
→ it is the maximal time to find the next matching substring→ i.e. O(|T|2 × |A|), e.g., if only the beginning and end match
→ Can we do better?
9/16
![Page 55: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/55.jpg)
Complexity of Enumeration Algorithms
• Recall the inputs to our problem:• A text T
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and [email protected] Affiliation Associate professor of computer science (office C201-4) in the DIG team ofTélécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer scienceawarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• A pattern P given as a regular expression
P := [a-z0-9.]∗ @ [a-z0-9.]∗
• What is the delay of the naive algorithm?
→ it is the maximal time to find the next matching substring→ i.e. O(|T|2 × |A|), e.g., if only the beginning and end match
→ Can we do better?
9/16
![Page 56: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/56.jpg)
Complexity of Enumeration Algorithms
• Recall the inputs to our problem:• A text T
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and [email protected] Affiliation Associate professor of computer science (office C201-4) in the DIG team ofTélécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer scienceawarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• A pattern P given as a regular expression
P := [a-z0-9.]∗ @ [a-z0-9.]∗
• What is the delay of the naive algorithm?
→ it is the maximal time to find the next matching substring→ i.e. O(|T|2 × |A|), e.g., if only the beginning and end match
→ Can we do better?
9/16
![Page 57: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/57.jpg)
Complexity of Enumeration Algorithms
• Recall the inputs to our problem:• A text T
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and [email protected] Affiliation Associate professor of computer science (office C201-4) in the DIG team ofTélécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer scienceawarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• A pattern P given as a regular expression
P := [a-z0-9.]∗ @ [a-z0-9.]∗
• What is the delay of the naive algorithm?
→ it is the maximal time to find the next matching substring
→ i.e. O(|T|2 × |A|), e.g., if only the beginning and end match
→ Can we do better?
9/16
![Page 58: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/58.jpg)
Complexity of Enumeration Algorithms
• Recall the inputs to our problem:• A text T
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and [email protected] Affiliation Associate professor of computer science (office C201-4) in the DIG team ofTélécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer scienceawarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• A pattern P given as a regular expression
P := [a-z0-9.]∗ @ [a-z0-9.]∗
• What is the delay of the naive algorithm?
→ it is the maximal time to find the next matching substring→ i.e. O(|T|2 × |A|), e.g., if only the beginning and end match
→ Can we do better?
9/16
![Page 59: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/59.jpg)
Complexity of Enumeration Algorithms
• Recall the inputs to our problem:• A text T
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and [email protected] Affiliation Associate professor of computer science (office C201-4) in the DIG team ofTélécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer scienceawarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• A pattern P given as a regular expression
P := [a-z0-9.]∗ @ [a-z0-9.]∗
• What is the delay of the naive algorithm?
→ it is the maximal time to find the next matching substring→ i.e. O(|T|2 × |A|), e.g., if only the beginning and end match
→ Can we do better?9/16
![Page 60: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/60.jpg)
Results for Enumerating Pattern Matches
• Existing work has shown the best possible bounds:
Theorem [Florenzano et al., 2018]We can enumerate all matches of a pattern P on a text T with:
• Preprocessing linear in T
and exponential in P
• Delay constant (independent from T)
and exponential in P
→ Problem: Only ecient for deterministic automata!
• Our contribution is:
TheoremWe can enumerate all matches of a pattern P on a text T with:
• Preprocessing in O(|T| × Poly(P))
• Delay polynomial in P and independent from T
10/16
![Page 61: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/61.jpg)
Results for Enumerating Pattern Matches
• Existing work has shown the best possible bounds:
Theorem [Florenzano et al., 2018]We can enumerate all matches of a pattern P on a text T with:
• Preprocessing linear in T
and exponential in P
• Delay constant (independent from T)
and exponential in P
→ Problem: Only ecient for deterministic automata!
• Our contribution is:
TheoremWe can enumerate all matches of a pattern P on a text T with:
• Preprocessing in O(|T| × Poly(P))
• Delay polynomial in P and independent from T
10/16
![Page 62: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/62.jpg)
Results for Enumerating Pattern Matches
• Existing work has shown the best possible bounds in T:
Theorem [Florenzano et al., 2018]We can enumerate all matches of a pattern P on a text T with:
• Preprocessing linear in T and exponential in P• Delay constant (independent from T) and exponential in P
→ Problem: Only ecient for deterministic automata!
• Our contribution is:
TheoremWe can enumerate all matches of a pattern P on a text T with:
• Preprocessing in O(|T| × Poly(P))
• Delay polynomial in P and independent from T
10/16
![Page 63: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/63.jpg)
Results for Enumerating Pattern Matches
• Existing work has shown the best possible bounds in T:
Theorem [Florenzano et al., 2018]We can enumerate all matches of a pattern P on a text T with:
• Preprocessing linear in T and exponential in P• Delay constant (independent from T) and exponential in P
→ Problem: Only ecient for deterministic automata!
• Our contribution is:
TheoremWe can enumerate all matches of a pattern P on a text T with:
• Preprocessing in O(|T| × Poly(P))
• Delay polynomial in P and independent from T
10/16
![Page 64: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/64.jpg)
Results for Enumerating Pattern Matches
• Existing work has shown the best possible bounds in T:
Theorem [Florenzano et al., 2018]We can enumerate all matches of a pattern P on a text T with:
• Preprocessing linear in T and exponential in P• Delay constant (independent from T) and exponential in P
→ Problem: Only ecient for deterministic automata!
• Our contribution is:
TheoremWe can enumerate all matches of a pattern P on a text T with:
• Preprocessing in O(|T| × Poly(P))
• Delay polynomial in P and independent from T
10/16
![Page 65: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/65.jpg)
Automaton Formalism
• We use automata that read letters and capture variables
→ Example: P := •∗ α a∗ β •∗
1 2 3
•
α
a
β
•
• Semantics of the automaton A:• Reads letters from the text• Guesses variables at positions in the text→ Output: tuples 〈α : i, β : j〉 such that
A has an accepting run reading α at position i and β at j
• Assumption: There is no run for which A readsthe same capture variable twice at the same position
• Challenge: Because of nondeterminism we can havemany dierent runs of A producing the same tuple!
11/16
![Page 66: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/66.jpg)
Automaton Formalism
• We use automata that read letters and capture variables→ Example: P := •∗ α a∗ β •∗
1 2 3
•
α
a
β
•
• Semantics of the automaton A:• Reads letters from the text• Guesses variables at positions in the text→ Output: tuples 〈α : i, β : j〉 such that
A has an accepting run reading α at position i and β at j
• Assumption: There is no run for which A readsthe same capture variable twice at the same position
• Challenge: Because of nondeterminism we can havemany dierent runs of A producing the same tuple!
11/16
![Page 67: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/67.jpg)
Automaton Formalism
• We use automata that read letters and capture variables→ Example: P := •∗ α a∗ β •∗
1 2 3
•
α
a
β
•
• Semantics of the automaton A:• Reads letters from the text• Guesses variables at positions in the text→ Output: tuples 〈α : i, β : j〉 such that
A has an accepting run reading α at position i and β at j
• Assumption: There is no run for which A readsthe same capture variable twice at the same position
• Challenge: Because of nondeterminism we can havemany dierent runs of A producing the same tuple!
11/16
![Page 68: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/68.jpg)
Automaton Formalism
• We use automata that read letters and capture variables→ Example: P := •∗ x`
α
a∗ a x
β
•∗
1 2 3
•
x`
a
a x
•
• Semantics of the automaton A:• Reads letters from the text• Guesses variables at positions in the text→ Output: tuples 〈α : i, β : j〉 such that
A has an accepting run reading α at position i and β at j
• Assumption: There is no run for which A readsthe same capture variable twice at the same position
• Challenge: Because of nondeterminism we can havemany dierent runs of A producing the same tuple!
11/16
![Page 69: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/69.jpg)
Automaton Formalism
• We use automata that read letters and capture variables→ Example: P := •∗ α a∗ β •∗
1 2 3
•
α
a
β
•
• Semantics of the automaton A:• Reads letters from the text• Guesses variables at positions in the text→ Output: tuples 〈α : i, β : j〉 such that
A has an accepting run reading α at position i and β at j
• Assumption: There is no run for which A readsthe same capture variable twice at the same position
• Challenge: Because of nondeterminism we can havemany dierent runs of A producing the same tuple!
11/16
![Page 70: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/70.jpg)
Automaton Formalism
• We use automata that read letters and capture variables→ Example: P := •∗ α a∗ β •∗
1 2 3
•
α
a
β
•
• Semantics of the automaton A:• Reads letters from the text• Guesses variables at positions in the text
→ Output: tuples 〈α : i, β : j〉 such thatA has an accepting run reading α at position i and β at j
• Assumption: There is no run for which A readsthe same capture variable twice at the same position
• Challenge: Because of nondeterminism we can havemany dierent runs of A producing the same tuple!
11/16
![Page 71: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/71.jpg)
Automaton Formalism
• We use automata that read letters and capture variables→ Example: P := •∗ α a∗ β •∗
1 2 3
•
α
a
β
•
• Semantics of the automaton A:• Reads letters from the text• Guesses variables at positions in the text→ Output: tuples 〈α : i, β : j〉 such that
A has an accepting run reading α at position i and β at j
• Assumption: There is no run for which A readsthe same capture variable twice at the same position
• Challenge: Because of nondeterminism we can havemany dierent runs of A producing the same tuple!
11/16
![Page 72: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/72.jpg)
Automaton Formalism
• We use automata that read letters and capture variables→ Example: P := •∗ α a∗ β •∗
1 2 3
•
α
a
β
•
• Semantics of the automaton A:• Reads letters from the text• Guesses variables at positions in the text→ Output: tuples 〈α : i, β : j〉 such that
A has an accepting run reading α at position i and β at j
• Assumption: There is no run for which A readsthe same capture variable twice at the same position
• Challenge: Because of nondeterminism we can havemany dierent runs of A producing the same tuple!
11/16
![Page 73: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/73.jpg)
Automaton Formalism
• We use automata that read letters and capture variables→ Example: P := •∗ α a∗ β •∗
1 2 3
•
α
a
β
•
• Semantics of the automaton A:• Reads letters from the text• Guesses variables at positions in the text→ Output: tuples 〈α : i, β : j〉 such that
A has an accepting run reading α at position i and β at j
• Assumption: There is no run for which A readsthe same capture variable twice at the same position
• Challenge: Because of nondeterminism we can havemany dierent runs of A producing the same tuple!
11/16
![Page 74: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/74.jpg)
Proof idea: Product DAG
Compute a product DAG of the text T and of the automaton A
Example: Text T := aaaba and P := •∗ α a∗ β •∗,
match 〈α : 0, β : 3〉
1
2
3
•
α
a
β
•
a a a b a
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3 3
α
β
α
β
α
β
α
β
α
β
α
β
→ Each path in the product DAG corresponds to a match
→ Challenge: Enumerate paths but avoid duplicate matchesand do not waste time to ensure constant delay
12/16
![Page 75: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/75.jpg)
Proof idea: Product DAG
Compute a product DAG of the text T and of the automaton A
Example: Text T := aaaba and P := •∗ α a∗ β •∗,
match 〈α : 0, β : 3〉
1
2
3
•
α
a
β
•
a a a b a
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3 3
α
β
α
β
α
β
α
β
α
β
α
β
→ Each path in the product DAG corresponds to a match
→ Challenge: Enumerate paths but avoid duplicate matchesand do not waste time to ensure constant delay
12/16
![Page 76: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/76.jpg)
Proof idea: Product DAG
Compute a product DAG of the text T and of the automaton A
Example: Text T := aaaba and P := •∗ α a∗ β •∗,
match 〈α : 0, β : 3〉
1
2
3
•
α
a
β
•
a a a b a
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3 3
α
β
α
β
α
β
α
β
α
β
α
β
→ Each path in the product DAG corresponds to a match
→ Challenge: Enumerate paths but avoid duplicate matchesand do not waste time to ensure constant delay
12/16
![Page 77: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/77.jpg)
Proof idea: Product DAG
Compute a product DAG of the text T and of the automaton A
Example: Text T := aaaba and P := •∗ α a∗ β •∗,
match 〈α : 0, β : 3〉
1
2
3
•
α
a
β
•
a a a b a
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3 3
α
β
α
β
α
β
α
β
α
β
α
β
→ Each path in the product DAG corresponds to a match
→ Challenge: Enumerate paths but avoid duplicate matchesand do not waste time to ensure constant delay
12/16
![Page 78: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/78.jpg)
Proof idea: Product DAG
Compute a product DAG of the text T and of the automaton A
Example: Text T := aaaba and P := •∗ α a∗ β •∗,
match 〈α : 0, β : 3〉
1
2
3
•
α
a
β
•
a a a b a
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3 3
α
β
α
β
α
β
α
β
α
β
α
β
→ Each path in the product DAG corresponds to a match
→ Challenge: Enumerate paths but avoid duplicate matchesand do not waste time to ensure constant delay
12/16
![Page 79: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/79.jpg)
Proof idea: Product DAG
Compute a product DAG of the text T and of the automaton A
Example: Text T := aaaba and P := •∗ α a∗ β •∗,
match 〈α : 0, β : 3〉
1
2
3
•
α
a
β
•
a a a b a
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3 3
α
β
α
β
α
β
α
β
α
β
α
β
→ Each path in the product DAG corresponds to a match
→ Challenge: Enumerate paths but avoid duplicate matchesand do not waste time to ensure constant delay
12/16
![Page 80: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/80.jpg)
Proof idea: Product DAG
Compute a product DAG of the text T and of the automaton A
Example: Text T := aaaba and P := •∗ α a∗ β •∗, match 〈α : 0, β : 3〉
1
2
3
•
α
a
β
•
a a a b a
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3 3
α
β
α
β
α
β
α
β
α
β
α
β
→ Each path in the product DAG corresponds to a match
→ Challenge: Enumerate paths but avoid duplicate matchesand do not waste time to ensure constant delay
12/16
![Page 81: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/81.jpg)
Proof idea: Product DAG
Compute a product DAG of the text T and of the automaton A
Example: Text T := aaaba and P := •∗ α a∗ β •∗,
match 〈α : 0, β : 3〉
1
2
3
•
α
a
β
•
a a a b a
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3 3
α
β
α
β
α
β
α
β
α
β
α
β
→ Each path in the product DAG corresponds to a match
→ Challenge: Enumerate paths but avoid duplicate matchesand do not waste time to ensure constant delay 12/16
![Page 82: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/82.jpg)
Proof ingredients
Several ingredients to do this ecient
• Prune non-accepting paths
• Use shortcuts (pointers) to skip long paths
• Flashlight search
13/16
![Page 83: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/83.jpg)
Proof ingredient: jump pointers to save time
• Issue: When we can’t assign variables, we do not make progress
· · ·
· · ·
· · ·
· · ·α α α α α
• Idea: Directly jump to the reachable statesat the next position where we can assign a variable
• Challenge: Preprocessing in linear time in T and polynomial in A:→ Compute for each state the next position where we can reach
some state that can assign a variable→ Compute at each position i the transitive closure to all positions j
such that j is the next position of some state at i (there are ≤ |A|)
14/16
![Page 84: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/84.jpg)
Proof ingredient: jump pointers to save time
• Issue: When we can’t assign variables, we do not make progress
· · ·
· · ·
· · ·
· · ·α α α α α
• Idea: Directly jump to the reachable statesat the next position where we can assign a variable
• Challenge: Preprocessing in linear time in T and polynomial in A:→ Compute for each state the next position where we can reach
some state that can assign a variable→ Compute at each position i the transitive closure to all positions j
such that j is the next position of some state at i (there are ≤ |A|)
14/16
![Page 85: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/85.jpg)
Proof ingredient: jump pointers to save time
• Issue: When we can’t assign variables, we do not make progress
· · ·
· · ·
· · ·
· · ·α α α α α
• Idea: Directly jump to the reachable statesat the next position where we can assign a variable
• Challenge: Preprocessing in linear time in T and polynomial in A:→ Compute for each state the next position where we can reach
some state that can assign a variable→ Compute at each position i the transitive closure to all positions j
such that j is the next position of some state at i (there are ≤ |A|)
14/16
![Page 86: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/86.jpg)
Proof ingredient: jump pointers to save time
• Issue: When we can’t assign variables, we do not make progress
· · ·
· · ·
· · ·
· · ·α α α α α
• Idea: Directly jump to the reachable statesat the next position where we can assign a variable
• Challenge: Preprocessing in linear time in T and polynomial in A:→ Compute for each state the next position where we can reach
some state that can assign a variable→ Compute at each position i the transitive closure to all positions j
such that j is the next position of some state at i (there are ≤ |A|)
14/16
![Page 87: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/87.jpg)
Proof ingredient: jump pointers to save time
• Issue: When we can’t assign variables, we do not make progress
· · ·
· · ·
· · ·
· · ·α α α α α
• Idea: Directly jump to the reachable statesat the next position where we can assign a variable
• Challenge: Preprocessing in linear time in T and polynomial in A:→ Compute for each state the next position where we can reach
some state that can assign a variable→ Compute at each position i the transitive closure to all positions j
such that j is the next position of some state at i (there are ≤ |A|)
14/16
![Page 88: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/88.jpg)
Proof ingredient: jump pointers to save time
• Issue: When we can’t assign variables, we do not make progress
· · ·
· · ·
· · ·
· · ·α α α α α
• Idea: Directly jump to the reachable statesat the next position where we can assign a variable
• Challenge: Preprocessing in linear time in T and polynomial in A:→ Compute for each state the next position where we can reach
some state that can assign a variable→ Compute at each position i the transitive closure to all positions j
such that j is the next position of some state at i (there are ≤ |A|)
14/16
![Page 89: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/89.jpg)
Proof ingredient: jump pointers to save time
• Issue: When we can’t assign variables, we do not make progress
· · ·
· · ·
· · ·
· · ·α α α α α
• Idea: Directly jump to the reachable statesat the next position where we can assign a variable
• Challenge: Preprocessing in linear time in T and polynomial in A:→ Compute for each state the next position where we can reach
some state that can assign a variable→ Compute at each position i the transitive closure to all positions j
such that j is the next position of some state at i (there are ≤ |A|)
14/16
![Page 90: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/90.jpg)
Proof ingredient: jump pointers to save time
• Issue: When we can’t assign variables, we do not make progress
· · ·
· · ·
· · ·
· · ·α α α α α
• Idea: Directly jump to the reachable statesat the next position where we can assign a variable
• Challenge: Preprocessing in linear time in T and polynomial in A:
→ Compute for each state the next position where we can reachsome state that can assign a variable
→ Compute at each position i the transitive closure to all positions jsuch that j is the next position of some state at i (there are ≤ |A|)
14/16
![Page 91: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/91.jpg)
Proof ingredient: jump pointers to save time
• Issue: When we can’t assign variables, we do not make progress
· · ·
· · ·
· · ·
· · ·α α α α α
• Idea: Directly jump to the reachable statesat the next position where we can assign a variable
• Challenge: Preprocessing in linear time in T and polynomial in A:→ Compute for each state the next position where we can reach
some state that can assign a variable
→ Compute at each position i the transitive closure to all positions jsuch that j is the next position of some state at i (there are ≤ |A|)
14/16
![Page 92: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/92.jpg)
Proof ingredient: jump pointers to save time
• Issue: When we can’t assign variables, we do not make progress
· · ·
· · ·
· · ·
· · ·α α α α α
• Idea: Directly jump to the reachable statesat the next position where we can assign a variable
• Challenge: Preprocessing in linear time in T and polynomial in A:→ Compute for each state the next position where we can reach
some state that can assign a variable→ Compute at each position i the transitive closure to all positions j
such that j is the next position of some state at i (there are ≤ |A|)14/16
![Page 93: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/93.jpg)
Proof ingredient: jump pointers to save time
• Issue: When we can’t assign variables, we do not make progress
· · ·
· · ·
· · ·
· · ·α α α α α
• Idea: Directly jump to the reachable statesat the next position where we can assign a variable
• Challenge: Preprocessing in linear time in T and polynomial in A:→ Compute for each state the next position where we can reach
some state that can assign a variable→ Compute at each position i the transitive closure to all positions j
such that j is the next position of some state at i (there are ≤ |A|)14/16
![Page 94: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/94.jpg)
Proof ingredient: jump pointers to save time
• Issue: When we can’t assign variables, we do not make progress
· · ·
· · ·
· · ·
· · ·α α α α α
• Idea: Directly jump to the reachable statesat the next position where we can assign a variable
• Challenge: Preprocessing in linear time in T and polynomial in A:→ Compute for each state the next position where we can reach
some state that can assign a variable→ Compute at each position i the transitive closure to all positions j
such that j is the next position of some state at i (there are ≤ |A|)14/16
![Page 95: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/95.jpg)
Proof ingredient: jump pointers to save time
• Issue: When we can’t assign variables, we do not make progress
· · ·
· · ·
· · ·
· · ·α α α α α
• Idea: Directly jump to the reachable statesat the next position where we can assign a variable
• Challenge: Preprocessing in linear time in T and polynomial in A:→ Compute for each state the next position where we can reach
some state that can assign a variable→ Compute at each position i the transitive closure to all positions j
such that j is the next position of some state at i (there are ≤ |A|)14/16
![Page 96: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/96.jpg)
Proof ingredient: flashlight search
• Issue: Finding which variable sets we can assign at position i?
i i+ 1
α
γ
β
α
α
β
γ
α?
β?
γ?0 1
γ?0 1
0 1β?
γ?0 1
γ?0 1
0 1
0 1
• Idea: Explore a decision tree on the variables (built on the fly)
• At each decision tree node, find the reachable states whichhave all required variables (1) and no forbidden variables (0)→ Assumption: we don’t see the same variable twice on a path
15/16
![Page 97: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/97.jpg)
Proof ingredient: flashlight search
• Issue: Finding which variable sets we can assign at position i?
i i+ 1
α
γ
β
α
α
β
γ
α?
β?
γ?0 1
γ?0 1
0 1β?
γ?0 1
γ?0 1
0 1
0 1
• Idea: Explore a decision tree on the variables (built on the fly)
• At each decision tree node, find the reachable states whichhave all required variables (1) and no forbidden variables (0)→ Assumption: we don’t see the same variable twice on a path
15/16
![Page 98: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/98.jpg)
Proof ingredient: flashlight search
• Issue: Finding which variable sets we can assign at position i?
i i+ 1
α
γ
β
α
α
β
γ
α?
β?
γ?0 1
γ?0 1
0 1β?
γ?0 1
γ?0 1
0 1
0 1
• Idea: Explore a decision tree on the variables (built on the fly)
• At each decision tree node, find the reachable states whichhave all required variables (1) and no forbidden variables (0)→ Assumption: we don’t see the same variable twice on a path
15/16
![Page 99: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/99.jpg)
Proof ingredient: flashlight search
• Issue: Finding which variable sets we can assign at position i?
i i+ 1
α
γ
β
α
α
β
γ
α?
β?
γ?0 1
γ?0 1
0 1β?
γ?0 1
γ?0 1
0 1
0 1
• Idea: Explore a decision tree on the variables (built on the fly)
• At each decision tree node, find the reachable states whichhave all required variables (1) and no forbidden variables (0)
→ Assumption: we don’t see the same variable twice on a path
15/16
![Page 100: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/100.jpg)
Proof ingredient: flashlight search
• Issue: Finding which variable sets we can assign at position i?
i i+ 1
α
γ
β
α
α
β
γ
α?
β?
γ?0 1
γ?0 1
0 1β?
γ?0 1
γ?0 1
0 1
0 1
• Idea: Explore a decision tree on the variables (built on the fly)
• At each decision tree node, find the reachable states whichhave all required variables (1) and no forbidden variables (0)→ Assumption: we don’t see the same variable twice on a path
15/16
![Page 101: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/101.jpg)
Proof ingredient: flashlight search
• Issue: Finding which variable sets we can assign at position i?
i i+ 1
α
γ
β
α
α
β
γ
α?
β?
γ?0 1
γ?0 1
0 1β?
γ?0 1
γ?0 1
0 1
0 1
• Idea: Explore a decision tree on the variables (built on the fly)
• At each decision tree node, find the reachable states whichhave all required variables (1) and no forbidden variables (0)→ Assumption: we don’t see the same variable twice on a path
15/16
![Page 102: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/102.jpg)
Proof ingredient: flashlight search
• Issue: Finding which variable sets we can assign at position i?
i i+ 1
α
γ
β
α
α
β
γ
α?
β?
γ?1
γ?0 1
0 1β?
γ?0 1
1
0 1
• Idea: Explore a decision tree on the variables (built on the fly)
• At each decision tree node, find the reachable states whichhave all required variables (1) and no forbidden variables (0)→ Assumption: we don’t see the same variable twice on a path
15/16
![Page 103: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/103.jpg)
Summary and Future Work
![Page 104: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/104.jpg)
Main Result and Future Work
TheoremGiven a sequential document spanner P and text T, we canenumerate with:
Preprocessing O(|P|ω+1 × |T|)Delay O(|V3| × |P|2)
V : Set of Variablesω : Exponent for Boolean matrix multiplication
Extensions and future work:
• Extending the results from text to trees
PODS 2019
• Supporting updates on the input data• Enumerating results in a relevant order?• Testing how well our methods perform in practice
Rémi Dupré
Thanks for your attention!
16/16
![Page 105: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/105.jpg)
Main Result and Future Work
TheoremGiven a sequential document spanner P and text T, we canenumerate with:
Preprocessing O(|P|ω+1 × |T|)Delay O(|V3| × |P|2)
Extensions and future work:
• Extending the results from text to trees
PODS 2019
• Supporting updates on the input data• Enumerating results in a relevant order?• Testing how well our methods perform in practice
Rémi Dupré
Thanks for your attention!
16/16
![Page 106: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/106.jpg)
Main Result and Future Work
TheoremGiven a sequential document spanner P and text T, we canenumerate with:
Preprocessing O(|P|ω+1 × |T|)Delay O(|V3| × |P|2)
Extensions and future work:
• Extending the results from text to trees
PODS 2019• Supporting updates on the input data
• Enumerating results in a relevant order?• Testing how well our methods perform in practice
Rémi Dupré
Thanks for your attention!
16/16
![Page 107: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/107.jpg)
Main Result and Future Work
TheoremGiven a sequential document spanner P and text T, we canenumerate with:
Preprocessing O(|P|ω+1 × |T|)Delay O(|V3| × |P|2)
Extensions and future work:
• Extending the results from text to trees
PODS 2019
• Supporting updates on the input data
• Enumerating results in a relevant order?• Testing how well our methods perform in practice
Rémi Dupré
Thanks for your attention!
16/16
![Page 108: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/108.jpg)
Main Result and Future Work
TheoremGiven a sequential document spanner P and text T, we canenumerate with:
Preprocessing O(|P|ω+1 × |T|)Delay O(|V3| × |P|2)
Extensions and future work:
• Extending the results from text to trees PODS 2019• Supporting updates on the input data
• Enumerating results in a relevant order?• Testing how well our methods perform in practice
Rémi Dupré
Thanks for your attention!
16/16
![Page 109: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/109.jpg)
Main Result and Future Work
TheoremGiven a sequential document spanner P and text T, we canenumerate with:
Preprocessing O(|P|ω+1 × |T|)Delay O(|V3| × |P|2)
Extensions and future work:
• Extending the results from text to trees PODS 2019• Supporting updates on the input data
• Enumerating results in a relevant order?
• Testing how well our methods perform in practice
Rémi Dupré
Thanks for your attention!
16/16
![Page 110: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/110.jpg)
Main Result and Future Work
TheoremGiven a sequential document spanner P and text T, we canenumerate with:
Preprocessing O(|P|ω+1 × |T|)Delay O(|V3| × |P|2)
Extensions and future work:
• Extending the results from text to trees PODS 2019• Supporting updates on the input data
• Enumerating results in a relevant order?• Testing how well our methods perform in practice
Rémi Dupré
Thanks for your attention!
16/16
![Page 111: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/111.jpg)
Main Result and Future Work
TheoremGiven a sequential document spanner P and text T, we canenumerate with:
Preprocessing O(|P|ω+1 × |T|)Delay O(|V3| × |P|2)
Extensions and future work:
• Extending the results from text to trees PODS 2019• Supporting updates on the input data
• Enumerating results in a relevant order?• Testing how well our methods perform in practice Rémi Dupré
Thanks for your attention!
16/16
![Page 112: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/112.jpg)
Main Result and Future Work
TheoremGiven a sequential document spanner P and text T, we canenumerate with:
Preprocessing O(|P|ω+1 × |T|)Delay O(|V3| × |P|2)
Extensions and future work:
• Extending the results from text to trees PODS 2019• Supporting updates on the input data
• Enumerating results in a relevant order?• Testing how well our methods perform in practice Rémi Dupré
Thanks for your attention!16/16
![Page 113: Constant-Delay Enumeration for Nondeterministic Document ... · Constant-Delay Enumeration for Nondeterministic Document Spanners Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3,](https://reader034.vdocument.in/reader034/viewer/2022051607/603099190277f607c70a4465/html5/thumbnails/113.jpg)
References i
Florenzano, F., Riveros, C., Ugarte, M., Vansummeren, S., and Vrgoc,D. (2018).Constant delay algorithms for regular document spanners.In PODS.