japanjune 20031 the correction of xml data université paris ii & lri michel de rougemont...
TRANSCRIPT
![Page 1: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/1.jpg)
japan june 2003 1
The correction of XML data
Université Paris II & LRI
Michel de Rougemont
http://www.lri.fr/~mdr
1. Approximation and Edit Distance2. Testers and Correctors3. Correcting regular binary trees4. Applications to XML
Practical corrector5. Relative value of documents
![Page 2: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/2.jpg)
japan june 2003 2
1. Relations Dist (R,S) = # x :
if Dist(R,S) <
2. Edit-distance
3. Trees: Tree-Edit-Distance Min # Deletions,
Insertions
Approximation
)()( xSxR
SR )(. Raritén
Left-deletion
Left-insertion
![Page 3: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/3.jpg)
japan june 2003 3
Binary trees : p-Distance allows permutation
Classical Tree-Edit-Distance
Dist(T1,T2) =2 p-Dist (T1,T2) =1
Dist (T, L) = Min Dist (T,T’)
a
e
b
c d
a
e
b
c
a
e
b
c d
fe
Deletion
Insertion
LT '
![Page 4: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/4.jpg)
japan june 2003 4
1. Satisfiability : Tree |= F
2. Approximate satisfiability
Tree |= F
Image on a class K of trees
Approximate satisfiability
F FF
F fromfar
![Page 5: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/5.jpg)
japan june 2003 5
Logic, testers, correctors
A Tester decides |= for a formula F.
A Corrector takes a tree T close to a language L and find T’ in L close to T.
This is possible if F follows a simple logic.
Theorem. there is linear time corrector for regular binary trees and a constant distance.
Given a tree T, k- close to a regular language L, we find in linear time T’ in L, c.k -close to T.
General problem: given a language L defined in some Logic, find a corrector.
Theorem. (implicit in Alon and al. FOCS2000) There is a linear time corrector for regular words and distance
Application to Model-Checking (LICS2002)
n .
![Page 6: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/6.jpg)
japan june 2003 6
Simple example
Tester for 0+ 1* 0+
Types of segments:
000000011111110000010000 probablyaccepted011110000000110111 rejected with highprobability
0 01
0000011111000111110001100
0 0
Corrector for 0+ 1* 0+ 00000001111110000100000 *
![Page 7: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/7.jpg)
japan june 2003 7
• Tree-automata• Logical definability on trees• Tree grammar• Regular expression
Regular Trees
r(a,b(a,b(a,b(a,b(a,b(a,b)....) r(a(a,b(a,b(a(a,b),b)....),b)
![Page 8: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/8.jpg)
japan june 2003 8
• (q0, q0) q1• (q0,q1) q1
Tree automata
q0 q0
q0
q0
q0
q0
q1
q1
q1
q1
q1
q0 q0
q0q1
q2
(q1,q1)q2
(q1,q0)q2
(q2,-) q2
(-,q2) q2)1,,0,( qqQA
![Page 9: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/9.jpg)
japan june 2003 9
Definition : a subtree t is feasible for L if there are subtrees (for its leaves) which reach states (q1...ql) such that the state of the root q=t(q1...ql) can reach an accepting state (in the automaton for L).
A subtree is infeasible if it is not feasible
Feasible and infeasible subtrees
feasible
infeasible
![Page 10: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/10.jpg)
japan june 2003 10
Fact . If then the number of unfeasible subtrees of length a is O(n).
Fact. If the distance is small, there are few infeasibles trees.
Intuition : make local corrections at the root of the infeasible trees
Infeasible subtrees
nLT .),(Distance
![Page 11: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/11.jpg)
japan june 2003 11
Phase 1 : (Bottom-up) Marking of * nodes, roots of infeasible subtrees.
Phase 2 : (Top-down) Recursive analysis of the * subtrees to make root accept.
Phase 3 : Local corrections
Structure of the corrector
q0
q1
![Page 12: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/12.jpg)
japan june 2003 12
Phase 1 : bottom-up marking
Definitions: 1. A terminal *-node is the first sink node of a run2. A * subtree of a node v is the subtree whose root is v reaching leaves or *-node 3. A node v is a *-node if its state is a sink node when all possible reachable states replace the *-nodes of its *-subtree.4. Compute the size of the subtrees
**
Runs withall possible reachable states (q,q’) reach a sink.
*
O(n) procedure.
Lemma 1: If Dist(T,L)<k, there are at most k *-nodes.
![Page 13: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/13.jpg)
japan june 2003 13
Phase 2 : top-down possible states
**
Let (q,q’) a possible choice at the top *-subtree.
Let q’’ a possible state for the *-node of the left *-subtree
*
q1 q2
q’’ instead of *
Correction needed.
![Page 14: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/14.jpg)
japan june 2003 14
Case 1: One essentially-connected component.
Case 2: General case
Many components
Case analysis of the automaton
![Page 15: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/15.jpg)
japan june 2003 15
Lemma: if (q1,q2,q’’) are in the same connected component, there is a finite subtree t which can correct.
Case a : there is a transition (q,q’) to q’’ with both q,q’ in C: there is a finite tree t1 from q1 to q, a finite tree t2 from q2 to q’ and the correction is:
Case 1: one component
q1 q2
q’’
q q’
q’’
q1q2
t2t1
![Page 16: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/16.jpg)
japan june 2003 16
Case b : there is a transition (q,q’) to q’’ with one of q or q’ being q0: suppose q=q0. The correction uses t2 and cut the left branch.
Case c: there is a transition (q0,q0) to q’’. The correction cuts both branches.
Case 1: b and c
q1 q2
q’’q0 q’
q’’
q2
t2
q1 q2
q’’
q0 q0
q’’
![Page 17: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/17.jpg)
japan june 2003 17
Correction rules
q1 q2 q q’ q’’
q in C
q’ in C
q’’
q0 q’ q’’
q1 q2
q’’ instead of *
Action
Insert,
Insert
Cut,
Insert
![Page 18: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/18.jpg)
japan june 2003 18
Hypothesis : q1 in Ci q2 in Cj q’’ in Ck
Case a: P such that Ci < Ck and Cj < Ck
Find t1 and t2 as in case 1.a
Case 2 : many components
q1 q2
q’’
q q’
q’’
q1q2
t2t1
![Page 19: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/19.jpg)
japan june 2003 19
Case b,c : P such that Ci >Ck and Cj < Ck Find t2 and let Cp=inf(Ci,Ck). Cut the left
branch until Cp.
Case d: P such that Ci >Ck and Cj > Ck Let Cp=inf(Ci,Ck). Cut the left branch until Cp.
Let Cq=inf(Cj,Ck). Cut the right branch until Cq.
Case 2: b and c
q1 q2
q’’ q’
q’’
q2
t2
q1 q2
q’’ q’’
![Page 20: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/20.jpg)
japan june 2003 20
Correction rules
q1 C1
q2
C2
Q
C
q’
C’
q’’
C’’
C1<
C’’
C2<
C’’
C1<C
C2<
C’
q’’
… … …. …. ….
q1 q2
q’’ instead of *
Action
Insert,
Insert
….
![Page 21: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/21.jpg)
japan june 2003 21
Fact 1: finitely many insertions
Fact 2: deletions less predictable
Lemma: If the cut is large, than the distance must be large.
Analysis of the corrector
General Corrector:
1. Do the inductive Marking bottom-up.
2. Apply the recursive analysis of compatible states top-down.
3. For each transition (q,q’) -> q’’ apply the correction, compute the distance and select the rule with smallest distance
4. Select the * states with Minimum Dist..
Procedure is O(n), exponential in k and size(Q)
![Page 22: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/22.jpg)
japan june 2003 22
Theorem: If Dist(T,L) <k, the general corrector finds T’ such that Dist(T,T’) <c.k.
Proof :
# *-nodes < k
Case 1: 0 *-node: no correction
Case 2: at least 1 *-node. Looking at all possible k-variations will correct the errors in the *-subtree and diminish the *-nodes.
General result
![Page 23: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/23.jpg)
japan june 2003 23
Labelled trees of large degree. Structure given by a « grammar », or DTD.
Generalization of automata:1. Unranked tree automaton2. Tree-walking automaton
Method: Code an unranked labelled tree with a binary labelled tree.
Advantage: the correction table is FINITE.
Theorem: If Dist(T,L) <k, the general corrector finds T’ such that Dist(T,T’) <c.k.
Unranked trees: XML
![Page 24: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/24.jpg)
japan june 2003 24
Applications to XML
DTD
<?xml version='1.0' ?><!ELEMENT book (chapter*,title,author)><!ELEMENT chapter (title,para*)><!ELEMENT title (#PCDATA)><!ELEMENT para (#PCDATA)><!ELEMENT author (#PCDATA)>
Binary Normal Form
l -> l1, al1 -> c1, t
c1 -> c, c1c1 -> -c -> t, p1
p1 -> p, p1p1 -> -
a -> datat -> datap -> data
![Page 25: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/25.jpg)
japan june 2003 25
XML tree decomposition
XML file transformed into a binary labelled tree.
![Page 26: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/26.jpg)
japan june 2003 26
XML file with errors
![Page 27: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/27.jpg)
japan june 2003 27
Corrected XML file
No ambiguities on the possible states of q’’
Immediate correction!
![Page 28: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/28.jpg)
japan june 2003 28
XML Correction rules
q1 q2 q q’ q’’
- p1 t p1 c
… … - - -
q1 q2
q’’ instead of *
Action
Insert,
Link
Delete,
Delete
![Page 29: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/29.jpg)
japan june 2003 29
Parser: Xerces, Tree structure : DOM
Phase 1: look at the parent node of *-node. Propose tags for * (c or f)
Phase 2: for each proposal, compute the distance.
*=c, distance=1, replacing c with b.
*=f, distance=2, replacing c with b
and adding an a leaf.
Choose the 1st solution.
Java Implementation
a b c
* b a
d
a
DTD: d (c,b,a) or (f,b,a) c (a,b,b) f (a,b,b,a)
![Page 30: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/30.jpg)
japan june 2003 30
Relative value of documents
• Given a DTD, mark the Web documents as follows:– Infinity if there are far– Dist(Document,DTD)=i
• Provides a relative valued landscape. Works for boolean combinations
• Generalize to – Min{ Dist(D,DTD’) : }'DTDDTD
![Page 31: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/31.jpg)
japan june 2003 31
Distance on words and trees
• On words, how can one compute– Dist(w,w’), a P-problem– Is is possible in less than O(n) ?
• Yes, STOC 2003
– Dist(w,L) and Dist(L,L’)
• Given two trees, how can one compute:– Dist(T,T’) P on ordered trees and
NP-complete on unordered trees– p-Dist(T,T’) NP-complete.
![Page 32: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/32.jpg)
japan june 2003 32
Conclusion
• Testers and Correctors– Testers for approximate
verification– Correctors
• Trees– Regular trees are testable– If T is at distance less than k,then
we can correct it.• Theoretical algorithms
• Practical algorithms
![Page 33: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/33.jpg)
japan june 2003 33
Testers, Correctors and formal
verification
Two different views of logical verification:
1. Formal verification. How can we check if a program satisfies a specification?
Logical proof: theorem proving, model checking
2. Design a tester for the specification (closer to practice: Windows 95 to XP !) (Blum & Kanan)
3. Combine the two approaches to approximately verify a specification (LICS 2002, Sylvain’s thesis)
![Page 34: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/34.jpg)
japan june 2003 34
Testers
Self-testers and correctors for Linear Algebra Blum & Kanan 1985s
Testers for graph properties : k-colorabilityGoldreich and al. 1995s
graph properties have testersAlon and al. 1999
Regular languages have testersAlon and al. 2000s
Testers for Regular tree languages (Mdr and Magniez)
Corrector for regular trees!
2
F
F fromfar
F fromfar k
![Page 35: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/35.jpg)
japan june 2003 35
Blum’s Checker and Tester
Checker for f (Blum, Kannan, ~1990)
P
C
x y
A checker is a probabilistic program with an oracle P such that for all x,k :
if P=f, C(x,k) = Correct
If P(x)!=f(x), Prob[ C(x,k) =Buggy] >1- ½^k
CorrectBuggy
![Page 36: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/36.jpg)
japan june 2003 36
• Distance d(f,g) = | {x : f(x) != g(x)}| / | D|
• A self-tester for f is a probabilistic program T(P, ) such that :
– If d(P,f)=0, then T(P, )=Correct– If d(P,f) > then T(P, )=Buggy
• Corrector. Division (x,y) : Majority { x.r /y.r : r random.}
Self-testing
![Page 37: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/37.jpg)
japan june 2003 37
Property testing on graphs
H random subgraph
G Bipartite
2-colorable
H
2-Colorability
G bipartite Prob [ H is bipartite] =1
G is -far from bipartite Prob [ H is non-bipartite] > 2/3
),( ofset theis EDGK
![Page 38: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/38.jpg)
japan june 2003 38
Property testing on graphs
3-Colorability
G 3-colorable Prob [ H is 3-colorable] =1
G is -far from 3-colorable Prob [ H is non 3-colorable] > 2/3
Generalization to k-colorability
G
H random subgraph
![Page 39: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/39.jpg)
japan june 2003 39
• Which graphs (and matrices) properties have testers?
– Alon and al., STOC 99: Sigma 2 testers
• Compression.
Property testing and descriptive complexity
?)( gsatisfiesU
?)( gsatisfiesV -equivalent
![Page 40: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/40.jpg)
japan june 2003 40
Property testing on words
F : 0*1*
W |= F Prob [ H |= F’ ] =1
W is -far from F Prob [ H |= not F’] >2/3
H random subword
),,( ofset theis UDWK
Word W
![Page 41: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/41.jpg)
japan june 2003 41
A testable regular property
W |= F Prob [ H |= F’ ] =1
W is -far from F Prob [ H |= not F’] >2/3
Many 10 appear in W. Repeating the test will detect it with high probability
H random subword000011110111 ..... F’
Word W
How can we verify F : 0*1* ?
distance(w,w’) =Hamming distance
![Page 42: Japanjune 20031 The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr mdr 1.Approximation and Edit Distance](https://reader035.vdocument.in/reader035/viewer/2022062806/56649e4a5503460f94b3d9ae/html5/thumbnails/42.jpg)
japan june 2003 42
Regular properties are testable
Theorem. Regular languages are testable.
N. Alon, M. Krivelevich, I. Newman, M. SzegedyFOCS 99.
General idea : if a word is far from a regular language, it contains many subwords which areinfeasible and can be detected.
Theorem. Dyck languages are not testable