reducing the size of nfas by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_ilie.pdf ·...

29
Reducing the size of NFAs by using equivalences and preorders L UCIAN I LIE ROBERTO S OLIS -O BA S HENG Y U Department of Computer Science, University of Western Ontario N6A 5B7, London, Ontario, CANADA e-mails: ilie|solis|[email protected] 1

Upload: others

Post on 24-Feb-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

Reducing the size of NFAs by using equivalences andpreorders

LUCIAN ILIE ROBERTO SOLIS-OBA SHENG YU

Department of Computer Science, University of Western OntarioN6A 5B7, London, Ontario, CANADA

e-mails: ilie|solis|[email protected]

1

Page 2: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 2

Regular expressions� powerful tool for describing sets of words

– programming languages – Perl, Awk, Python– editors – emacs, vi– lexical analyzers – lex, flex– pattern matching tools – egrep

Applications� pattern recognition� text retrieval� computational biology� linguistics

Page 3: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 3

Regular expressions – Examples

� identifiers:

[A-Za-z]| {z }letter [A-Za-z0-9]*| {z }letters and digits� unsigned numbers:

[0-9]+| {z }digits (.[0-9]+)?| {z }optional fra tion (Eopt signz }| {

(+|-)? [0-9]+)?| {z }optional exponent� C comments: /? ... text not including “?/ ” ... ?/

/\*([ˆ*]|\*+[ˆ/*])*\*+/

Page 4: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 3

Regular expressions – Examples

� identifiers:

[A-Za-z]| {z }letter [A-Za-z0-9]*| {z }letters and digits� unsigned numbers:

[0-9]+| {z }digits (.[0-9]+)?| {z }optional fra tion (Eopt signz }| {

(+|-)? [0-9]+)?| {z }optional exponent� C comments: /? ... text not including “?/ ” ... ?/

/ ?�(A� f?g) + ?+(A� f?; / g)��?+=

Page 5: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 4

Regular expressions – Implementation

regular expression+

nondeterministic finite automaton (NFA)

+

deterministic finite automaton (DFA)

Page 6: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 4

Regular expressions – Implementation

regular expression+

nondeterministic finite automaton (NFA)

+

PROBLEM – exponential blow-up here

deterministic finite automaton (DFA)

Page 7: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 4

Regular expressions – Implementation

regular expression+

nondeterministic finite automaton (NFA)

+ SOLUTION – reduce NFA firstPROBLEM – exponential blow-up here

deterministic finite automaton (DFA)

Page 8: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 5

Reducing NFAs

� NFA state minimization problem is hard (PSPACE-complete)

– Jiang, Ravikumar, 1993

Page 9: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 5

Reducing NFAs

� NFA state minimization problem is hard (PSPACE-complete)

– Jiang, Ravikumar, 1993� we reduce NFAs and do not minimize

Page 10: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 5

Reducing NFAs

� NFA state minimization problem is hard (PSPACE-complete)

– Jiang, Ravikumar, 1993� we reduce NFAs and do not minimize� idea – state merging

Page 11: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 6

Reducing NFAs by merging states

� using equivalences – Ilie, Yu, 2002� M = (Q;A; Æ; I; F ) is an NFA� �R – coarsest equivalence relation over Q such that:(P1) �R \(F � (Q� F )) = ;(P2) 8p; q2Q;8a2A,�p �R q ) 8q02Æ(q; a); 9p02Æ(p; a); q0 �R p0�p�Rq a�! q0 ) p a�! p0�R �Rq a�! q0

Page 12: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 7

Reducing NFAs by equivalences

� �R – coarsest equivalence over Q right-invariant with respect to M� NFA reduction with �R is trivial

– merge all states in the same equivalence class� �L – defined symmetrically (using the reversed automaton)

Page 13: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 8

Reducing NFAs by equivalences – example

�R: f0gf1; 2; 3gf4g, f5gf6g, f7g�L: f0g, f1gf2g, f3gf4; 5; 6gf7g 70(a)

(c)

(b)

4, 5, 6

ac,d,e

cb4

70

e

d

b

b5

6

a

ad

c

a

2

1

c,d,eb

70

e

d

b

b5

6

4

1, 2, 3

c,d,e a1, 2, 3

cb

e 3

(a) An NFA(b) Its reduced version using �R

(c) Its reduced version using �R and �L

Page 14: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 9

Reducing NFAs by equivalences

� computing �R – Ilie, Navarro, Yu, 2004

– uses a classical algorithm of Paige, Tarjan, 1987

– automaton with n states and m transitions

– time O(m log n) and space O(m + n)� regular expression matching (experiments)

– significant reduction of NFAs (10% – 40%)

– very big reduction of resulting DFAs (by a factor of up to 10�6)– little overhead

– faster regular expression matching

– better than Glushkov property (Navarro, Raffinot, 2001)

Page 15: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 10

Reducing NFAs by equivalences

� problem – how to use �R and �L optimally� example – best reduction is not unique

– an NFA and its reduced versions using �R and �L

c

da

a,bd

ca

b c

a 44

a 1,2

4

3

0

2

0

1,3

1

3

0 2

c

c,d

b

Page 16: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 11

Reducing NFAs by merging states

� using preorders – Champarnaud, Coulon, 2003� same axioms, preorders instead of equivalences� �R – largest (w.r.t. inclusion) preorder over Q with (P1) and (P2)� �L – defined symmetrically� preorders are stronger than equivalences

– p �R q ) p �R q, q �R p– the converse need not be true� computing �R, �L – time O(mn), space O(n2)– Ilie, Navarro, Yu, 2004– Champarnaud, Coulon, 2004

Page 17: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 12

Reducing NFAs with preorders

� reduction is complicated

– after merging two states, we recompute �R and �L� Champarnaud, Coulon, 2004: p and q can be merged iff

(i) p �R q, q �R p or

(ii) p �L q, q �L p or

(iii) p �R q, p �L q

Page 18: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 12

Reducing NFAs with preorders

� reduction is complicated

– after merging two states, we recompute �R and �L� Champarnaud, Coulon, 2004: p and q can be merged iff

(i) p �R q, q �R p or

(ii) p �L q, q �L p or

(iii) p �R q, p �L q� (iii) is incorrect – Champarnaud, Coulon, 2005+

Page 19: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 12

Reducing NFAs with preorders

� reduction is complicated

– after merging two states, we recompute �R and �L� Champarnaud, Coulon, 2004: p and q can be merged iff

(i) p �R q, q �R p or

(ii) p �L q, q �L p or

(iii) p �R q, p �L q� (iii) is incorrect – Champarnaud, Coulon, 2005+� correct (iii)p �R q, p �L q, and L(p; p) = f"g (no cycle on p)

Page 20: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 13

Efficient merging of states

� equivalences – how to use �R and �L “optimally”

– “optimally” – to be defined later

– use as much information as possible from �R and �L

� preorders – how to use �R and �L “optimally”

– “optimally” – to be defined later

– use as much information as possible from �R and �L

Page 21: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 14

Optimal use of equivalences

� �R and �L define the partitions�R = fX1; X2; : : : ; Xrg�L = fXr+1;Xr+2; : : : ; Xr+sg;� reduction: X� = fX�1 ; X�2 ; : : : ; X�` g– X�i – set of states merged together– the reduced NFA has ` states� each X�i is a subset of some X�(i), 1 � �(i) � r + s� fX�(1); X�(2); : : : ; X�(`)g – set cover for Q from �R [ �L.� “optimal” use of equivalences – optimal solution for the instancehQ;�R [ �Li of the set covering problem

Page 22: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 15

Optimal use of equivalences

Theorem 1

Optimal use of equivalences (EQ-reduced NFA) can be computed inpolynomial time.

Page 23: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 15

Optimal use of equivalences

Theorem 1

Optimal use of equivalences (EQ-reduced NFA) can be computed inpolynomial time.� time O(n3=2 +m log n)� proof idea

– set covering

– minimum vertex cover in bipartite graphs

– maximum matching in bipartite graphs

– O(epv) (Hopcroft, Karp, 1973)

Page 24: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 16

Optimal use of equivalences

�L: f0gf1; 2; 3gf4g, f5gf6g, f7g2

1,2,3

4

5

6

0

7 7

0

4,5,6

3

1

associated bipartite graph�R: f0g, f1gf2g, f3gf4; 5; 6gf7g

Page 25: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 17

Optimal use of preorders

� p �=R q iff p �R q and q �R p� p �=L q iff p �L q and q �L p– �=R and �=L – equivalence relations (coarser than �R and �L)– induce the partitions �R and �L� p � q iff p �R q; p �L q and L(p; p) = f"g– � – partial order– induces a family �P = fP1; P2; : : : ; Pkg (need not be partition)– each Pi has a unique maximal element mi– p, q belong to the same set Pi iff p � mi and q � mi� “optimal” use of preorders – optimal solution for the instancehQ; �R [ �L [ �Pi of the set covering problem

Page 26: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 18

Optimal use of preorders

Theorem 2

Optimal use of preorders (PRE-reduced NFA) is NP-hard.

Page 27: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 18

Optimal use of preorders

Theorem 2

Optimal use of preorders (PRE-reduced NFA) is NP-hard.

� proof idea

– consider only the instances for which �P is a partition

– vertex covering problem on 3-partite hypergraphs

– 3-SAT

Page 28: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 19

Conclusions and further research� equivalences

– useful, easy to compute– easy to combine “optimally”– iterate our algorithm for equivalences– merge only some equivalent states (and then iterate)– stronger definition of “optimal”� preorders

– potentially more powerful than equivalences– too expensive to combine “optimally”– weaker definition of “optimal”– efficient approximation algorithms

Page 29: Reducing the size of NFAs by using equivalences and preordersstelo/cpm/cpm05/cpm05_8_2_Ilie.pdf · LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equiv alences and

LUCIAN ILIE – CPM’05 – Reducing the size of NFAs by using equivalences and preorders 20

Conclusions and further research

� experiments – compare:

– single equivalences

– combined equivalences

– preorders (random merging)

– iterated versions