xml research for formal language theorists · 2019. 9. 21. · xml schemas ˇtree automata xpath...
TRANSCRIPT
![Page 1: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/1.jpg)
XML Research for Formal Language Theorists
Wim Martens
TU Dortmund
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 1 / 65
![Page 2: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/2.jpg)
Goal of this talk
XML Research vs Formal Languages
XML benefits from Formal Language Theory
XML schemas ≈ tree automataXPath patterns ≈ regular expressionsFormal Language Theory has a nice algorithmic toolbox
Formal Language Theory benefits from XML
XML motivates interesting Formal Language problems
Warning
Rather informal strongly biased survey
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 2 / 65
![Page 3: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/3.jpg)
Goal of this talk
XML Research vs Formal Languages
XML benefits from Formal Language Theory
XML schemas ≈ tree automataXPath patterns ≈ regular expressionsFormal Language Theory has a nice algorithmic toolbox
Formal Language Theory benefits from XML
XML motivates interesting Formal Language problems
Warning
Rather informal strongly biased survey
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 2 / 65
![Page 4: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/4.jpg)
Goal of this talk
XML Research vs Formal Languages
XML benefits from Formal Language Theory
XML schemas ≈ tree automataXPath patterns ≈ regular expressionsFormal Language Theory has a nice algorithmic toolbox
Formal Language Theory benefits from XML
XML motivates interesting Formal Language problems
Warning
Rather informal strongly biased survey
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 2 / 65
![Page 5: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/5.jpg)
Outline
1 Introduction to XML
2 An FLT Approach to XML ResearchDocument Type DefinitionsXML QueriesExtended Document Type Definitions and XML SchemaCharacterizations of single-type EDTDs
3 From XML to Formal Language TheoryComplexity of Regular ExpressionsConstructions on Regular ExpressionsAutomata Minimization
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 3 / 65
![Page 6: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/6.jpg)
Outline
1 Introduction to XML
2 An FLT Approach to XML ResearchDocument Type DefinitionsXML QueriesExtended Document Type Definitions and XML SchemaCharacterizations of single-type EDTDs
3 From XML to Formal Language TheoryComplexity of Regular ExpressionsConstructions on Regular ExpressionsAutomata Minimization
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 4 / 65
![Page 7: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/7.jpg)
Searching the Internet
Enough with these sissy keyword searches!
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 5 / 65
![Page 8: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/8.jpg)
Searching the Internet
A real search
Where can I buy a flatscreen-TV, in a store at most 20km from Dresden,that is open tomorrow until 18:00?
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 5 / 65
![Page 9: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/9.jpg)
An Example
Internet
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 6 / 65
![Page 10: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/10.jpg)
An Example
<time>
XML
</time>
1800
<time><time>
<time>
1800
</time></time>
</time>
XMLXML
XML1200
1700
Internet
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 6 / 65
![Page 11: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/11.jpg)
An Example
<time>
XML
</time>
1800
<time><time>
<time>
1800
</time></time>
</time>
XMLXML
XML1200
1700
Internet
XML Schema
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 6 / 65
![Page 12: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/12.jpg)
A self-describing data format
<store>
<normal>
<guitar type="electric">
<maker> Tandler </maker>
<price> 3500 </price>
</guitar>
<guitar type="electric">
<maker> Fender </maker>
<price> 1000 </price>
</guitar>
</normal>
<discount>
<guitar type="electric">
<maker> Gibson </maker>
<price> 2500 </price>
<discount> 10% </discount>
</guitar>
</discount>
</store>
element: <title>...</title>
start tag: <title>
end tag: </title>
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 7 / 65
![Page 13: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/13.jpg)
XML as a hierarchical structure
Example
store
normal
guitar
maker
“Tandler”
price
“3500”
guitar
maker
“Fender”
price
“1000”
discount
guitar
maker
“Gibson”
price
“2500”
discount
“10%”
(type=“electric”) (type=“electric”) (type=“electric”)
Abstraction: ordered, unranked, labeled tree (with data-values)
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 8 / 65
![Page 14: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/14.jpg)
XML schema languages
Schema
A schema defines the set of allowable labels and the way they can bestructured.
Advantages
automatic validation
automatic integration of data
automatic translation
query optimization
provides a user with a concrete semantics of the document
aids in the specification of meaningful queries over XML data
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 9 / 65
![Page 15: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/15.jpg)
XML schema languages
In formal language theoretic terms
A schema defines a tree language.
Example
DTDs (W3C) CFGs with REs
XML Schema (W3C) 6≈ tree automata
Relax NG (Clark, Murata) ≈ tree automata
several dozen others (DSD, Schematron, . . . )
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 10 / 65
![Page 16: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/16.jpg)
Summary slide
What to remember?
XML is an international standard for data exchange
XML documents or XML data are simply ordered unranked labeledtrees with data values
a schema defines a tree language (no data values — in this talk)
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 11 / 65
![Page 17: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/17.jpg)
Outline
1 Introduction to XML
2 An FLT Approach to XML ResearchDocument Type DefinitionsXML QueriesExtended Document Type Definitions and XML SchemaCharacterizations of single-type EDTDs
3 From XML to Formal Language TheoryComplexity of Regular ExpressionsConstructions on Regular ExpressionsAutomata Minimization
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 12 / 65
![Page 18: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/18.jpg)
Outline
1 Introduction to XML
2 An FLT Approach to XML ResearchDocument Type DefinitionsXML QueriesExtended Document Type Definitions and XML SchemaCharacterizations of single-type EDTDs
3 From XML to Formal Language TheoryComplexity of Regular ExpressionsConstructions on Regular ExpressionsAutomata Minimization
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 13 / 65
![Page 19: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/19.jpg)
Document Type Definitions (DTDs)
Example<!DOCTYPE store [
<!ELEMENT store (normal,discount)>
<!ELEMENT normal (guitar*)>
<!ELEMENT discount (guitar+)>
<!ELEMENT guitar (maker,price,discount?)>
<!ELEMENT maker (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT discount (#PCDATA)>
]>
Corresponding grammar (start symbol store)
store → normal discountnormal → guitar∗
discount → guitar+
guitar → maker price discount?maker → DATAprice → DATAdiscount → DATA
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 14 / 65
![Page 20: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/20.jpg)
Document Type Definitions (DTDs)
XML Document
store
normal
guitar
maker
“Tandler”
price
“3500”
guitar
maker
“Fender”
price
“1000”
discount
guitar
maker
“Gibson”
price
“2500”
discount
“10%”
Corresponding grammar (start symbol store)
store → normal discountnormal → guitar∗
discount → guitar+
guitar → maker price discount?maker → DATAprice → DATAdiscount → DATA
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 15 / 65
![Page 21: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/21.jpg)
Extended Context-free grammars as a formal abstraction
Definition
A DTD is a triple (Σ,d ,sd) where
Σ is a finite alphabet
sd ∈ Σ is the start symbol
d : Σ→ RE(Σ) maps every Σ-symbol to a regular expression over Σ
Definition
A tree t satisfies d (is valid) iff
the root of t is labeled sd
for every node v labeled a the string formed by the children of vbelongs to d(a).
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 16 / 65
![Page 22: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/22.jpg)
Optimization questions: from FLT to XML
Schema containment (⊆)
Given: Schemas d1, d2
Question: Is L(d1)⊆ L(d2)?
DTD containment reduces to containment of regular expressions
d1 ⊆ d2 iff d1(a)⊆ d2(a), ∀a ∈ Σ
(when d1 and d2 are reduced).
Theorem (Meyer, Stockmeyer, 1973)
Containment of regular expressions is PSPACE-complete.
Corollary
DTD containment is PSPACE-complete.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 17 / 65
![Page 23: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/23.jpg)
Optimization questions: from FLT to XML
Schema containment (⊆)
Given: Schemas d1, d2
Question: Is L(d1)⊆ L(d2)?
DTD containment reduces to containment of regular expressions
d1 ⊆ d2 iff d1(a)⊆ d2(a), ∀a ∈ Σ
(when d1 and d2 are reduced).
Theorem (Meyer, Stockmeyer, 1973)
Containment of regular expressions is PSPACE-complete.
Corollary
DTD containment is PSPACE-complete.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 17 / 65
![Page 24: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/24.jpg)
Optimization questions: from FLT to XML
Schema containment (⊆)
Given: Schemas d1, d2
Question: Is L(d1)⊆ L(d2)?
DTD containment reduces to containment of regular expressions
d1 ⊆ d2 iff d1(a)⊆ d2(a), ∀a ∈ Σ
(when d1 and d2 are reduced).
Theorem (Meyer, Stockmeyer, 1973)
Containment of regular expressions is PSPACE-complete.
Corollary
DTD containment is PSPACE-complete.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 17 / 65
![Page 25: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/25.jpg)
Optimization questions: from FLT to XML
Schema containment (⊆)
Given: Schemas d1, d2
Question: Is L(d1)⊆ L(d2)?
DTD containment reduces to containment of regular expressions
d1 ⊆ d2 iff d1(a)⊆ d2(a), ∀a ∈ Σ
(when d1 and d2 are reduced).
Theorem (Meyer, Stockmeyer, 1973)
Containment of regular expressions is PSPACE-complete.
Corollary
DTD containment is PSPACE-complete.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 17 / 65
![Page 26: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/26.jpg)
Outline
1 Introduction to XML
2 An FLT Approach to XML ResearchDocument Type DefinitionsXML QueriesExtended Document Type Definitions and XML SchemaCharacterizations of single-type EDTDs
3 From XML to Formal Language TheoryComplexity of Regular ExpressionsConstructions on Regular ExpressionsAutomata Minimization
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 18 / 65
![Page 27: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/27.jpg)
Queries for XMLConjunctive Queries over Trees
XPath
Tree: Pattern:
a
b
e
d
c
d
a
b
d
c
d
Pattern Matching
Tree matches Pattern if there is a homomorphism h : Pattern → Tree
Homomorphism doesn’t have to be injective
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 19 / 65
![Page 28: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/28.jpg)
Queries for XMLConjunctive Queries over Trees
XPath
Tree: Pattern:
a
b
e
c
d
a
b
d
c
d
Pattern Matching
Tree matches Pattern if there is a homomorphism h : Pattern → Tree
Homomorphism doesn’t have to be injective
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 19 / 65
![Page 29: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/29.jpg)
Queries for XMLConjunctive Queries over Trees
Conjunctive Queries over Trees
Tree: Pattern:
a
b
e
d
c
d
a
b
d
c
d
Pattern Matching
Tree matches Pattern if there is a homomorphism h : Pattern → Tree
Homomorphism doesn’t have to be injective
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 19 / 65
![Page 30: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/30.jpg)
Queries for XMLConjunctive Queries over Trees
Conjunctive Queries over Trees
Tree: Pattern:
a
b
e
d
c
d
a
b c
d
Pattern Matching
Tree matches Pattern if there is a homomorphism h : Pattern → Tree
Homomorphism doesn’t have to be injective
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 19 / 65
![Page 31: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/31.jpg)
Queries for XMLConjunctive Queries over Trees
Conjunctive Queries over Trees
Tree: Pattern:
a
b
e
d
c
d
a
b c
d
Pattern Matching
Tree matches Pattern if there is a homomorphism h : Pattern → Tree
Homomorphism doesn’t have to be injective
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 19 / 65
![Page 32: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/32.jpg)
Queries for XMLConjunctive Queries over Trees
Conjunctive Queries over Trees
Tree: Pattern:
a
b
e
c
d
a
b c
d
Pattern Matching
Tree matches Pattern if there is a homomorphism h : Pattern → Tree
Homomorphism doesn’t have to be injective
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 19 / 65
![Page 33: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/33.jpg)
Query Optimization
L(Q): the set of trees that match query Q
Query Containment
Given two queries Q1 and Q2, is L(Q1)⊆ L(Q2)?
Query Containment w.r.t. a DTD
Given Q1, Q2, and a DTD d , is L(Q1)∩L(d)⊆ L(Q2)?
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 20 / 65
![Page 34: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/34.jpg)
XPath Query OptimizationFormal Language Theory to the Rescue!
XPath Query
a
b
d
c
d
Lemma
For each XPath query Q there is an Alternating Tree Automaton A s.t.
L(Q) = L(A)
Moreover, |A| is polynomial in |Q|, even if Q uses disjunction and negation
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 21 / 65
![Page 35: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/35.jpg)
XPath Query OptimizationFormal Language Theory to the Rescue!
XPath Query
a
b
d
c
d
Lemma
For each XPath query Q there is an Alternating Tree Automaton A s.t.
L(Q) = L(A)
Moreover, |A| is polynomial in |Q|
, even if Q uses disjunction and negation
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 21 / 65
![Page 36: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/36.jpg)
XPath Query OptimizationFormal Language Theory to the Rescue!
XPath Query
a
b
d
c
d
Lemma
For each XPath query Q there is an Alternating Tree Automaton A s.t.
L(Q) = L(A)
Moreover, |A| is polynomial in |Q|, even if Q uses disjunction and negation
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 21 / 65
![Page 37: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/37.jpg)
XPath Query OptimizationFormal Language Theory to the Rescue!
Lemma
For each XPath query Q there is an Alternating Tree Automaton A s.t.
L(Q) = L(A)
Moreover, |A| is polynomial in |Q|, even if Q uses disjunction and negation
Theorem
XPath Containment is in EXPTIME
XPath Containment w.r.t. DTDs is in EXPTIME
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 22 / 65
![Page 38: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/38.jpg)
XPath Query OptimizationFormal Language Theory to the Rescue!
Lemma
For each XPath query Q there is an Alternating Tree Automaton A s.t.
L(Q) = L(A)
Moreover, |A| is polynomial in |Q|, even if Q uses disjunction and negation
Theorem
XPath Containment (tree pattern fragment) is NP-complete [Miklau, Suciu 2002]
XPath Containment (with ¬ and ∨) is EXPTIME-complete [Marx 2004]
XPath Containment w.r.t. DTDs is EXPTIME-complete [Neven, Schwentick 2003]
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 22 / 65
![Page 39: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/39.jpg)
Conjunctive Query OptimizationFormal Language Theory to the Rescue!
Conjunctive Query
a
b c
d
Lemma (Bjorklund, Mar., Schwentick 2008)
For each Conjunctive Query Q there is an Alternating Tree Automaton As.t.
L(Q) = L(A)
But, |A| is exponential in |Q| and this is optimal
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 23 / 65
![Page 40: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/40.jpg)
Conjunctive Query OptimizationFormal Language Theory to the Rescue!
Conjunctive Query
a
b c
d
Lemma (Bjorklund, Mar., Schwentick 2008)
For each Conjunctive Query Q there is an Alternating Tree Automaton As.t.
L(Q) = L(A)
But, |A| is exponential in |Q|
and this is optimal
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 23 / 65
![Page 41: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/41.jpg)
Conjunctive Query OptimizationFormal Language Theory to the Rescue!
Conjunctive Query
a
b c
d
Lemma (Bjorklund, Mar., Schwentick 2008)
For each Conjunctive Query Q there is an Alternating Tree Automaton As.t.
L(Q) = L(A)
But, |A| is exponential in |Q| and this is optimal
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 23 / 65
![Page 42: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/42.jpg)
Conjunctive Query OptimizationFormal Language Theory to the Rescue!
Lemma (Bjorklund, Mar., Schwentick 2008)
For each Conjunctive Query Q there is an Alternating Tree Automaton As.t.
L(Q) = L(A)
But, |A| is exponential in |Q| and this is optimal
Theorem
CQ Containment is ΠP2 -complete [Bjorklund, Mar., Schwentick 2007]
CQ Containment w.r.t. DTDs is 2EXPTIME-complete[Bjorklund, Mar., Schwentick 2008]
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 24 / 65
![Page 43: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/43.jpg)
Conjunctive Query OptimizationFormal Language Theory to the Rescue!
Lemma (Bjorklund, Mar., Schwentick 2008)
For each Conjunctive Query Q there is an Alternating Tree Automaton As.t.
L(Q) = L(A)
But, |A| is exponential in |Q| and this is optimal
Theorem
CQ Containment is ΠP2 -complete [Bjorklund, Mar., Schwentick 2007]
CQ Containment w.r.t. DTDs is 2EXPTIME-complete[Bjorklund, Mar., Schwentick 2008]
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 24 / 65
![Page 44: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/44.jpg)
Outline
1 Introduction to XML
2 An FLT Approach to XML ResearchDocument Type DefinitionsXML QueriesExtended Document Type Definitions and XML SchemaCharacterizations of single-type EDTDs
3 From XML to Formal Language TheoryComplexity of Regular ExpressionsConstructions on Regular ExpressionsAutomata Minimization
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 25 / 65
![Page 45: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/45.jpg)
Extended DTDsGrammar based approach to unranked regular tree languages
Typed
tree t
′
store
guitar
maker
“Tandler”
price
“3500”
guitar
maker
“Fender”
price
“1000”
guitar
maker
“Gibson”
price
“2500”
discount
“10%”
1 1 2
Example
store → (guitar1)∗ (guitar2)+
guitar1 → maker priceguitar2 → maker price discount
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 26 / 65
![Page 46: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/46.jpg)
Extended DTDsGrammar based approach to unranked regular tree languages
Typed tree t ′
store
guitar
maker
“Tandler”
price
“3500”
guitar
maker
“Fender”
price
“1000”
guitar
maker
“Gibson”
price
“2500”
discount
“10%”
1 1 2
Example
store → (guitar1)∗ (guitar2)+
guitar1 → maker priceguitar2 → maker price discount
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 26 / 65
![Page 47: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/47.jpg)
Extended DTDsGrammar based approach to unranked regular tree languages
Definition (Papakonstantinou, Vianu, 2000)
Let ΣN := σn | σ ∈ Σ,n ∈ N be the alphabet of types.
An extended DTD (EDTD) is a tuple D = (Σ,d ,sd), where (d ,sd) is a(finite) DTD over Σ∪ΣN.
A tree t is valid w.r.t. D if there is an assignment of types such that thetyped tree is a derivation tree of d .
Example
store → (guitar1)∗ (guitar2)+
guitar1 → maker priceguitar2 → maker price discount
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 27 / 65
![Page 48: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/48.jpg)
EDTDs versus Tree Automata
Theorem (Papakonstantinou, Vianu, 2000, BMW)
Non-deterministic (unranked) tree automata and EDTDs define preciselythe class of (homogeneous) regular unranked tree languages.
Example
EDTD
store → (guitar1)∗ (guitar2)+
guitar1 → maker priceguitar2 → maker price discount
NTA
δ (store,store) = (guitar1)∗ (guitar2)+
δ (guitar1,guitar) = maker priceδ (guitar2,guitar) = maker price discount
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 28 / 65
![Page 49: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/49.jpg)
EDTDs versus Tree Automata
Theorem (Papakonstantinou, Vianu, 2000, BMW)
Non-deterministic (unranked) tree automata and EDTDs define preciselythe class of (homogeneous) regular unranked tree languages.
Example
EDTD
store → (guitar1)∗ (guitar2)+
guitar1 → maker priceguitar2 → maker price discount
NTA
δ (store,store) = (guitar1)∗ (guitar2)+
δ (guitar1,guitar) = maker priceδ (guitar2,guitar) = maker price discount
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 28 / 65
![Page 50: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/50.jpg)
Does XML Schema correspond to EDTDs?
<xs:element name="store"><xs:complexType><xs:sequence><xs:element name="guitar" type="1"
minOccurs="0"maxOccurs="unbounded"/>
<xs:element name="guitar" type="2"minOccurs="1"maxOccurs="unbounded"/>
</xs:sequence></xs:complexType>
</xs:element>
Rejected by XML Schema validator
Violates the Element Declarations Consistent Constraint.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 29 / 65
![Page 51: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/51.jpg)
Does XML Schema correspond to EDTDs?
<xs:element name="store"><xs:complexType><xs:sequence><xs:element name="guitar" type="1"
minOccurs="0"maxOccurs="unbounded"/>
<xs:element name="guitar" type="2"minOccurs="1"maxOccurs="unbounded"/>
</xs:sequence></xs:complexType>
</xs:element>
Rejected by XML Schema validator
Violates the Element Declarations Consistent Constraint.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 29 / 65
![Page 52: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/52.jpg)
A formalization of XML Schema: single-type EDTDs
XML Schema 1: Element Declarations Consistent constraint (Section3.8.6)
It is illegal to have two elements of the same name [. . . ] but differenttypes in a content model [. . . ].
Definition (Murata, Lee, Mani, 2001)
A single-type EDTD is an EDTD for which in no regular expression twotypes bi and bj with i 6= j occur.
Not single-type
store → (guitar1)∗ (guitar2)+
guitar1 → maker priceguitar2 → maker price discount
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 30 / 65
![Page 53: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/53.jpg)
A formalization of XML Schema: single-type EDTDs
XML Schema 1: Element Declarations Consistent constraint (Section3.8.6)
It is illegal to have two elements of the same name [. . . ] but differenttypes in a content model [. . . ].
Definition (Murata, Lee, Mani, 2001)
A single-type EDTD is an EDTD for which in no regular expression twotypes bi and bj with i 6= j occur.
Not single-type
store → (guitar1)∗ (guitar2)+
guitar1 → maker priceguitar2 → maker price discount
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 30 / 65
![Page 54: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/54.jpg)
A formalization of XML Schema: single-type EDTDs
XML Schema 1: Element Declarations Consistent constraint (Section3.8.6)
It is illegal to have two elements of the same name [. . . ] but differenttypes in a content model [. . . ].
Definition (Murata, Lee, Mani, 2001)
A single-type EDTD is an EDTD for which in no regular expression twotypes bi and bj with i 6= j occur.
Not single-type
store → (guitar1)∗ (guitar2)+
guitar1 → maker priceguitar2 → maker price discount
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 30 / 65
![Page 55: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/55.jpg)
A formalization of XML Schema: single-type EDTDs
Definition (Murata, Lee, Mani, 2001)
A single-type EDTD is an EDTD in which in no regular expression twotypes bi and bj with i 6= j occur.
Example
store → normal discountnormal → (guitar1)∗
discount → (guitar2)+
guitar1 → maker priceguitar2 → maker price discount
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 31 / 65
![Page 56: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/56.jpg)
A formalization of XML Schema: single-type EDTDs
Formal abstraction
XML Schema ≈ single-type EDTDs
Immediate Questions
What kind of languages can be defined by single-type EDTDs?
Is it decidable whether an EDTD rewritten to an equivalentsingle-type EDTD?
smart XML Schema validator
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 32 / 65
![Page 57: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/57.jpg)
A formalization of XML Schema: single-type EDTDs
Formal abstraction
XML Schema ≈ single-type EDTDs
Immediate Questions
What kind of languages can be defined by single-type EDTDs?
Is it decidable whether an EDTD rewritten to an equivalentsingle-type EDTD?
smart XML Schema validator
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 32 / 65
![Page 58: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/58.jpg)
Outline
1 Introduction to XML
2 An FLT Approach to XML ResearchDocument Type DefinitionsXML QueriesExtended Document Type Definitions and XML SchemaCharacterizations of single-type EDTDs
3 From XML to Formal Language TheoryComplexity of Regular ExpressionsConstructions on Regular ExpressionsAutomata Minimization
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 33 / 65
![Page 59: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/59.jpg)
Properties of single-type EDTDs
Three properties
1 Single-type EDTDs admit unique top-down typing
2 Closure under a certain form of subtree exchange
3 Characterization as a pattern-based language
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 34 / 65
![Page 60: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/60.jpg)
(1) Single-type EDTDs: simple top-down typing
store
normal
guitar
maker
“Tandler”
price
“3500”
guitar
maker
“Fender”
price
“1000”
discount
guitar
maker
“Gibson”
price
“2500”
discount
“10%”
1 1 2
Example
store → normal discountnormal → (guitar1)∗
discount → (guitar2)+
guitar1 → maker priceguitar2 → maker price discount
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 35 / 65
![Page 61: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/61.jpg)
(1) Single-type EDTDs: simple top-down typing
store
normal
guitar
maker
“Tandler”
price
“3500”
guitar
maker
“Fender”
price
“1000”
discount
guitar
maker
“Gibson”
price
“2500”
discount
“10%”
1 1
2
Example
store → normal discountnormal → (guitar1)∗
discount → (guitar2)+
guitar1 → maker priceguitar2 → maker price discount
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 35 / 65
![Page 62: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/62.jpg)
(1) Single-type EDTDs: simple top-down typing
store
normal
guitar
maker
“Tandler”
price
“3500”
guitar
maker
“Fender”
price
“1000”
discount
guitar
maker
“Gibson”
price
“2500”
discount
“10%”
1 1 2
Example
store → normal discountnormal → (guitar1)∗
discount → (guitar2)+
guitar1 → maker priceguitar2 → maker price discount
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 35 / 65
![Page 63: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/63.jpg)
(1) Single-type EDTDs: simple top-down typing
Algorithm to validate and type a tree (Murata et al., 2001)
Given: tree t and single-type EDTD D = (Σ,d ,a0)
1 Check if root of t is labeled with a, assign type a0
2 for every interior node u with type bi , test whether the children of umatch µ(d(bi )). If so, assign unique type to every child. Else fail.
µ(a1 + b1c2) = a + bc
Corollary
Single-typedness implies unique top-down typing.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 36 / 65
![Page 64: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/64.jpg)
(1) Single-type EDTDs: simple top-down typing
Algorithm to validate and type a tree (Murata et al., 2001)
Given: tree t and single-type EDTD D = (Σ,d ,a0)
1 Check if root of t is labeled with a, assign type a0
2 for every interior node u with type bi , test whether the children of umatch µ(d(bi )). If so, assign unique type to every child. Else fail.
µ(a1 + b1c2) = a + bc
Corollary
Single-typedness implies unique top-down typing.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 36 / 65
![Page 65: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/65.jpg)
(2) An exchange property of single-type EDTDs
The Ancestor-String
a
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 37 / 65
![Page 66: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/66.jpg)
(2) An exchange property for single-type EDTDs
Ancestor-Guarded Subtree Exchange
T is a regular tree language
∈ T
∈ T
⇒
∈ T
∈ T
Theorem (Mar., Neven, Schwentick 2005)
A regular tree language is definable by a single-type EDTD iff it is closedunder ancestor-guarded subtree exchange.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 38 / 65
![Page 67: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/67.jpg)
(2) Tool for proving inexpressibility
“At least one discount guitar” is not single-type
store
guitar
maker
“Tandler”
price
“3500”
guitar
maker
“Gibson”
price
“2500”
discount
“10%”
store
guitar
maker
“Fender”
price
“1000”
discount
“10%”
guitar
maker
“Gibson”
price
“2500”
store
guitar
maker
“Tandler”
price
“3500”
guitar
maker
“Gibson”
price
“2500”
Single-type EDTDs are not closed under union or complement.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 39 / 65
![Page 68: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/68.jpg)
(2) Tool for proving inexpressibility
“At least one discount guitar” is not single-type
store
guitar
maker
“Tandler”
price
“3500”
guitar
maker
“Gibson”
price
“2500”
discount
“10%”
store
guitar
maker
“Fender”
price
“1000”
discount
“10%”
guitar
maker
“Gibson”
price
“2500”
store
guitar
maker
“Tandler”
price
“3500”
guitar
maker
“Gibson”
price
“2500”
Single-type EDTDs are not closed under union or complement.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 39 / 65
![Page 69: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/69.jpg)
(2) Tool for proving inexpressibility
“At least one discount guitar” is not single-type
store
guitar
maker
“Tandler”
price
“3500”
guitar
maker
“Gibson”
price
“2500”
discount
“10%”
store
guitar
maker
“Fender”
price
“1000”
discount
“10%”
guitar
maker
“Gibson”
price
“2500”
store
guitar
maker
“Tandler”
price
“3500”
guitar
maker
“Gibson”
price
“2500”
Single-type EDTDs are not closed under union or complement.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 39 / 65
![Page 70: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/70.jpg)
(3) Pattern-based LanguageMaking dependencies explicit
Definition
An ancestor-based DTD A is a set of rules r → s where r and s are regularexpressions over Σ.
∈ L(r)
a
∈ L(s)
Definition
A tree t is valid w.r.t. A iff for every vertex v there is some r → s suchthat v ’s ancestor string matches r and the children of v match s.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 40 / 65
![Page 71: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/71.jpg)
(3) Pattern-based LanguageMaking dependencies explicit
single-type EDTD
store → normal discountnormal → (guitar1)∗
discount → (guitar2)+
guitar1 → maker priceguitar2 → maker price discount
Ancestor-guarded DTD
store → normal discountnormal → guitar∗
discount → guitar+
∗· normal · guitar → maker price∗· discount · guitar → maker price discount
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 41 / 65
![Page 72: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/72.jpg)
Smart XML Schema validator
Theorem (Mar., Neven, Schwentick, 2005)
Deciding whether an EDTD is equivalent to a single-type EDTD or a DTDis EXPTIME-complete.
Upper bound
Compute single-type closure D ′ of given EDTD D:E.g, a1→ b1b2, b1→ c1, b2→ c2 becomes
a1→ b1,2b1,2
b1,2→ c1+ c2
L(D ′) = L(D) iff L(D) is single-type.We know that L(D)⊆ L(D ′).So, only need to test L(D ′)⊆ L(D): D ′∩¬D = /0.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 42 / 65
![Page 73: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/73.jpg)
Smart XML Schema validator
Theorem (Mar., Neven, Schwentick, 2005)
Deciding whether an EDTD is equivalent to a single-type EDTD or a DTDis EXPTIME-complete.
Upper bound
Compute single-type closure D ′ of given EDTD D:E.g, a1→ b1b2, b1→ c1, b2→ c2 becomes
a1→ b1,2b1,2
b1,2→ c1,2+ c1,2
L(D ′) = L(D) iff L(D) is single-type.We know that L(D)⊆ L(D ′).So, only need to test L(D ′)⊆ L(D): D ′∩¬D = /0.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 42 / 65
![Page 74: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/74.jpg)
Summary slide
What to remember?
XML Schema ≈ single-type EDTDs ( regular tree languages
single-type EDTDs admit top-down unique typing
XML Schema can be simply characterized without using types
Relax NG corresponds to unranked regular tree languages (EDTDs)
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 43 / 65
![Page 75: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/75.jpg)
Outline
1 Introduction to XML
2 An FLT Approach to XML ResearchDocument Type DefinitionsXML QueriesExtended Document Type Definitions and XML SchemaCharacterizations of single-type EDTDs
3 From XML to Formal Language TheoryComplexity of Regular ExpressionsConstructions on Regular ExpressionsAutomata Minimization
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 44 / 65
![Page 76: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/76.jpg)
Outline
1 Introduction to XML
2 An FLT Approach to XML ResearchDocument Type DefinitionsXML QueriesExtended Document Type Definitions and XML SchemaCharacterizations of single-type EDTDs
3 From XML to Formal Language TheoryComplexity of Regular ExpressionsConstructions on Regular ExpressionsAutomata Minimization
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 45 / 65
![Page 77: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/77.jpg)
Complexity of basic decision problems
Theorem (Mar., Neven, Schwentick 2004)
Let R be a class of regular expressions and C a complexity class. Then thefollowing are equivalent:
CONTAINMENT for R is in C ;
CONTAINMENT for DTD(R) is in C ;
CONTAINMENT for single-type EDTD(R) is in C ;
Theorem (Seidl 1990, 1994)
CONTAINMENT and EQUIVALENCE are EXPTIME-complete forEDTDs (even with deterministic REs).
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 46 / 65
![Page 78: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/78.jpg)
Complexity of basic decision problems
INTERSECTION: Given a number of schemas S1, . . . ,Sn, decide if⋂ni=1 L(Si ) 6= /0.
Theorem (Mar., Neven, Schwentick 2004)
Let R be a class of regular expressions and C a complexity class. Then thefollowing are equivalent:
INTERSECTION for R is in C ;
INTERSECTION for DTD(R) is in C .
Theorem (Mar., Neven, Schwentick 2004)
There is a class of regular expressions X such that
INTERSECTION for X is NP-complete;
INTERSECTION for single-type EDTD(X ) isEXPTIME-complete.
Remark: INTERSECTION for deterministic REs is PSPACE-complete.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 47 / 65
![Page 79: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/79.jpg)
Complexity of basic decision problems
INTERSECTION: Given a number of schemas S1, . . . ,Sn, decide if⋂ni=1 L(Si ) 6= /0.
Theorem (Mar., Neven, Schwentick 2004)
Let R be a class of regular expressions and C a complexity class. Then thefollowing are equivalent:
INTERSECTION for R is in C ;
INTERSECTION for DTD(R) is in C .
Theorem (Mar., Neven, Schwentick 2004)
There is a class of regular expressions X such that
INTERSECTION for X is NP-complete;
INTERSECTION for single-type EDTD(X ) isEXPTIME-complete.
Remark: INTERSECTION for deterministic REs is PSPACE-complete.Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 47 / 65
![Page 80: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/80.jpg)
Focus on Regular Expressions
What to remember?
Decision problems for XML Schema translate to decision problems forregular expressions.
What regular expression classes are interesting?
Regular expressions that occur in schemas!
A base symbol is a regular expression w , w?, or w∗ where w is anon-empty string;
A factor is of the form e, e?, e+, or e∗ where e is a disjunction ofbase symbols.
A CHAin Regular Expression (CHARE) is ε, /0, or a sequence f1 · · · fkof factors.
[Bex,Neven,Van den Bussche 2004]: > 90% of expressions in practicalDTDs or XSDs are CHAREs
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 48 / 65
![Page 81: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/81.jpg)
Focus on Regular Expressions
What to remember?
Decision problems for XML Schema translate to decision problems forregular expressions.
What regular expression classes are interesting?
Regular expressions that occur in schemas!
A base symbol is a regular expression w , w?, or w∗ where w is anon-empty string;
A factor is of the form e, e?, e+, or e∗ where e is a disjunction ofbase symbols.
A CHAin Regular Expression (CHARE) is ε, /0, or a sequence f1 · · · fkof factors.
[Bex,Neven,Van den Bussche 2004]: > 90% of expressions in practicalDTDs or XSDs are CHAREs
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 48 / 65
![Page 82: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/82.jpg)
Regular Expression Analysis Revisited
Fragment CONTAINMENT EQUIVALENCE INTERSECTIONa,a+ in PTIME (DFA!) in PTIME in PTIMEa,a∗ coNP in PTIME NPa,a? coNP in PTIME NP
a,(+a)∗ PSPACE in PSPACE NPall−(+w)∗,(+w)+ PSPACE in PSPACE NP
a,(+w)∗ PSPACE in PSPACE PSPACE [Bala 2002]RE PSPACE PSPACE PSPACE
Observation
Not many PTIME results. . .
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 49 / 65
![Page 83: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/83.jpg)
Regular Expression Analysis Revisited
Fragment CONTAINMENT EQUIVALENCE INTERSECTIONa,a+ in PTIME (DFA!) in PTIME in PTIMEa,a∗ coNP in PTIME NPa,a? coNP in PTIME NP
a,(+a)∗ PSPACE in PSPACE NPall−(+w)∗,(+w)+ PSPACE in PSPACE NP
a,(+w)∗ PSPACE in PSPACE PSPACE [Bala 2002]RE PSPACE PSPACE PSPACE
Observation
Not many PTIME results. . .
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 49 / 65
![Page 84: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/84.jpg)
What Regular Expressions are Allowed in Schemas?
Counting and shuffle
Numerical occurrence operator (#): (a[4,5](b + c∗)7)
shuffle operator (a&b = ab,ba)
Theorem (Mayer, Stockmeyer 1994)
CONTAINMENT and EQUIVALENCE for RE(&) isEXPSPACE-complete
Theorem (Gelade, Mar., Neven 2007)
CONTAINMENT and EQUIVALENCE is EXPSPACE-complete for
RE(#) and
RE(#,&)
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 50 / 65
![Page 85: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/85.jpg)
What Regular Expressions are Allowed in Schemas?
Counting and shuffle
Numerical occurrence operator (#): (a[4,5](b + c∗)7)
shuffle operator (a&b = ab,ba)
Theorem (Mayer, Stockmeyer 1994)
CONTAINMENT and EQUIVALENCE for RE(&) isEXPSPACE-complete
Theorem (Gelade, Mar., Neven 2007)
CONTAINMENT and EQUIVALENCE is EXPSPACE-complete for
RE(#) and
RE(#,&)
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 50 / 65
![Page 86: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/86.jpg)
On the Search for more PTIME fragments
Theorem (Ghelli, Colazzo, Sartiani 2007)
CONTAINMENT is in PTIME for conflict-free regular expressions
Conflict-free
counting and interleaving allowed!
single occurrence
Kleene star only applied to disjunctions single symbols
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 51 / 65
![Page 87: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/87.jpg)
On the Search for more PTIME fragments
Theorem (Ghelli, Colazzo, Sartiani 2007)
CONTAINMENT is in PTIME for conflict-free regular expressions
Conflict-free
counting and interleaving allowed!
single occurrence
Kleene star only applied to disjunctions single symbols
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 51 / 65
![Page 88: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/88.jpg)
Outline
1 Introduction to XML
2 An FLT Approach to XML ResearchDocument Type DefinitionsXML QueriesExtended Document Type Definitions and XML SchemaCharacterizations of single-type EDTDs
3 From XML to Formal Language TheoryComplexity of Regular ExpressionsConstructions on Regular ExpressionsAutomata Minimization
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 52 / 65
![Page 89: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/89.jpg)
Complementing schemas
Schema Complementation
I have a schema S which I update to S ′
What are the documents I admitted in S , but not in S ′ anymore?
This should be L(S)−L(S ′) = L(S)∩L(S ′)
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 53 / 65
![Page 90: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/90.jpg)
Complementing regular expressions
Given a regular expression r , define a regexp for L(r).
Naive approach: transform to an NFA, determinize, complement, andtransform again to a regular expression (2EXPTIME)
Lemma [Gelade and Neven 2008]
For every n, there is a regular expression r of size O(n), such that anyregular expression defining L(r) must be of size Ω(22n
)
Idea
Ehrenfeucht, Zeiger (1974): There is a class of DFAs Kn whosesmallest equivalent regular expression is at least 2n. (States =1, . . . ,n, edges between i and j labeled with ai ,j)
Generalize this theorem to four-letter alphabets
Construct r of size O(n) for K2n
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 54 / 65
![Page 91: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/91.jpg)
Complementing regular expressions
Given a regular expression r , define a regexp for L(r).
Naive approach: transform to an NFA, determinize, complement, andtransform again to a regular expression (2EXPTIME)
Lemma [Gelade and Neven 2008]
For every n, there is a regular expression r of size O(n), such that anyregular expression defining L(r) must be of size Ω(22n
)
Idea
Ehrenfeucht, Zeiger (1974): There is a class of DFAs Kn whosesmallest equivalent regular expression is at least 2n. (States =1, . . . ,n, edges between i and j labeled with ai ,j)
Generalize this theorem to four-letter alphabets
Construct r of size O(n) for K2n
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 54 / 65
![Page 92: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/92.jpg)
Complementing regular expressions
Given a regular expression r , define a regexp for L(r).
Naive approach: transform to an NFA, determinize, complement, andtransform again to a regular expression (2EXPTIME)
Lemma [Gelade and Neven 2008]
For every n, there is a regular expression r of size O(n), such that anyregular expression defining L(r) must be of size Ω(22n
)
Idea
Ehrenfeucht, Zeiger (1974): There is a class of DFAs Kn whosesmallest equivalent regular expression is at least 2n. (States =1, . . . ,n, edges between i and j labeled with ai ,j)
Generalize this theorem to four-letter alphabets
Construct r of size O(n) for K2n
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 54 / 65
![Page 93: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/93.jpg)
Outline
1 Introduction to XML
2 An FLT Approach to XML ResearchDocument Type DefinitionsXML QueriesExtended Document Type Definitions and XML SchemaCharacterizations of single-type EDTDs
3 From XML to Formal Language TheoryComplexity of Regular ExpressionsConstructions on Regular ExpressionsAutomata Minimization
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 55 / 65
![Page 94: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/94.jpg)
Schema Minimization
Schema Minimization
Given a schema D, compute the smallest equivalent schema D ′
Why relevant?
Recall: Query Optimization
Input: Queries Q1, Q2, and a schema D
Smaller schema improves the run-time of the query optimization problems!
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 56 / 65
![Page 95: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/95.jpg)
Schema Minimization
Minimization is typically studied on automata models
and the results look prettier on deterministic automata
Question
What’s the deterministic automata model for XML?
single-type EDTDs with DFAs?
≈ top-down det.
deterministic unranked tree automata?
≈ bottom-up det.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 57 / 65
![Page 96: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/96.jpg)
Schema Minimization
Minimization is typically studied on automata modelsand the results look prettier on deterministic automata
Question
What’s the deterministic automata model for XML?
single-type EDTDs with DFAs?
≈ top-down det.
deterministic unranked tree automata?
≈ bottom-up det.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 57 / 65
![Page 97: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/97.jpg)
Schema Minimization
Minimization is typically studied on automata modelsand the results look prettier on deterministic automata
Question
What’s the deterministic automata model for XML?
single-type EDTDs with DFAs?
≈ top-down det.
deterministic unranked tree automata?
≈ bottom-up det.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 57 / 65
![Page 98: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/98.jpg)
Schema Minimization
Minimization is typically studied on automata modelsand the results look prettier on deterministic automata
Question
What’s the deterministic automata model for XML?
single-type EDTDs with DFAs? ≈ top-down det.
deterministic unranked tree automata? ≈ bottom-up det.
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 57 / 65
![Page 99: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/99.jpg)
Single-type EDTD Minimization
Theorem (Mar., Niehren 2005)
Single-type EDTD with DFA Minimization is in PTIME
Minimal models are unique
Minimization Algorithm
Reduce the input single-type EDTDFor every pair of states q1, q2, decide equivalenceIf equivalent, merge q1 and q2
In the resulting EDTD, minimize each DFA
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 58 / 65
![Page 100: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/100.jpg)
Unranked Tree Automaton Minimization
(Bruggemann-Klein, Murata, Wood 2001)
A bottom-up unranked tree automaton is deterministic if for every pair ofrules a(L1)→ q1 and a(L2)→ q2,
L1∩L2 = /0
Additional requirement: L1, L2 represented by DFAs
Theorem (Mar., Niehren 2005)
MINIMIZATION is NP-complete for deterministic unranked treeautomata
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 59 / 65
![Page 101: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/101.jpg)
Unranked Tree Automaton Minimization
For the right definition of bottom-up determinism:
Theorem (Mar., Niehren 2005)
MINIMIZATION is in P for bottom-up deterministic tree automata
the Myhill-Nerode theorem for unranked tree languages holds
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 60 / 65
![Page 102: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/102.jpg)
Myhill-Nerode for Unranked Tree Automata
For tree language L, define relation ≡L on trees
Definition
t1≡L t2
if
∀E
:E
t1
∈ L iffE
t2
∈ L
≡L is an equivalence relation on unranked trees
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 61 / 65
![Page 103: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/103.jpg)
Myhill-Nerode for Unranked Tree Automata
Theorem (Myhill-Nerode for Unranked Trees (Mar., Niehren 2005))
Let L be an unranked tree language.The following are equivalent:
L is regular
≡L has finitely many equivalence classes
Moreover, the equivalence classes of ≡L correspond to states of minimal(new) bottom-up deterministic unranked TA for L
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 62 / 65
![Page 104: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/104.jpg)
Back to the BasicsNFA Minimization
Question
How much non-determinism can be admitted for PTIME minimization?
Theorem (Jiang, Ravikumar 1993)
DFA → unambiguous FA MINIMIZATION is NP-complete
Theorem (Malcher 2003)
MINIMIZATION is NP-complete for
NFAs with fixed branching (≥ 3)
NFAs with at least two start states
Question Revisited
Can there be any non-determinism at all?
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 63 / 65
![Page 105: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/105.jpg)
Back to the BasicsNFA Minimization
Question
How much non-determinism can be admitted for PTIME minimization?
Theorem (Jiang, Ravikumar 1993)
DFA → unambiguous FA MINIMIZATION is NP-complete
Theorem (Malcher 2003)
MINIMIZATION is NP-complete for
NFAs with fixed branching (≥ 3)
NFAs with at least two start states
Question Revisited
Can there be any non-determinism at all?
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 63 / 65
![Page 106: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/106.jpg)
Back to the BasicsNFA Minimization
Question
How much non-determinism can be admitted for PTIME minimization?
Theorem (Jiang, Ravikumar 1993)
DFA → unambiguous FA MINIMIZATION is NP-complete
Theorem (Malcher 2003)
MINIMIZATION is NP-complete for
NFAs with fixed branching (≥ 3)
NFAs with at least two start states
Question Revisited
Can there be any non-determinism at all?
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 63 / 65
![Page 107: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/107.jpg)
Back to the BasicsNFA Minimization
Question
How much non-determinism can be admitted for PTIME minimization?
Theorem (Jiang, Ravikumar 1993)
DFA → unambiguous FA MINIMIZATION is NP-complete
Theorem (Malcher 2003)
MINIMIZATION is NP-complete for
NFAs with fixed branching (≥ 3)
NFAs with at least two start states
Question Revisited
Can there be any non-determinism at all?
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 63 / 65
![Page 108: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/108.jpg)
Back to the BasicsFinite State Automata Minimization
Definition (δ NFA)
The class of NFAs that
have at most one pair (q,a) such that (q,a)→ q1 and (q,a)→ q2
are unambiguous
do not loop
Theorem (Bjorklund, Mar., ICALP 2008)
For every class C of NFAs such that δ NFA ⊆ C :
DFA → C MINIMIZATION is NP-hard
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 64 / 65
![Page 109: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/109.jpg)
Back to the BasicsFinite State Automata Minimization
Definition (δ NFA)
The class of NFAs that
have at most one pair (q,a) such that (q,a)→ q1 and (q,a)→ q2
are unambiguous
do not loop
Theorem (Bjorklund, Mar., ICALP 2008)
For every class C of NFAs such that δ NFA ⊆ C :
DFA → C MINIMIZATION is NP-hard
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 64 / 65
![Page 110: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/110.jpg)
Conclusion and Outlook
XML and Formal Languages are great for cross-fertilization
Many problems in XML research are solved through FLT techniques
XML research poses interesting questions for FLT
So, . . .
if you like formal language theory, but also want a PODS/ICDT paper
have a look at XML
if you like formal language theory, and you want more formallanguage theory
have a look at XML
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 65 / 65
![Page 111: XML Research for Formal Language Theorists · 2019. 9. 21. · XML schemas ˇtree automata XPath patterns ˇregular expressions Formal Language Theory has a nice algorithmic toolbox](https://reader033.vdocument.in/reader033/viewer/2022060904/609f9c2518dccd52c75bd10e/html5/thumbnails/111.jpg)
Conclusion and Outlook
XML and Formal Languages are great for cross-fertilization
Many problems in XML research are solved through FLT techniques
XML research poses interesting questions for FLT
So, . . .
if you like formal language theory, but also want a PODS/ICDT paper
have a look at XML
if you like formal language theory, and you want more formallanguage theory
have a look at XML
Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 65 / 65