on the use of query-driven xml auto-indexing€¦ · isp: 10 tables each having 8 columns and max 3...
TRANSCRIPT
University of Kaiserslautern
On the Use of Query-driven XML Auto-Indexing
Karsten Schmidt and Theo Härder
SMDB'10 (ICDE), Long Beach – March, 1
University of Kaiserslautern
Motivation – Self-Tuning '10● The last 10+ years
– Index tuning
What-if
Wizards, Guides, Druids
– Monitoring
– Workload analysis
– Static resources
– Single-user optimization
– …● Today: dynamic environments (queries, data, resources)
– Self-tuning has to be done Permanently and Online1
University of Kaiserslautern
Motivation – Index Tuning● Which column(s) to index?
2
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
University of Kaiserslautern
Motivation – Index Tuning● Which column(s) to index?
2
PK?
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
University of Kaiserslautern
Motivation – Index Tuning● Which column(s) to index?
2
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
query-supportingcandidate?
University of Kaiserslautern
Motivation – Index Tuning● Which column(s) to index?
2
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
multiple candidates?
University of Kaiserslautern
Motivation – Index Tuning● Which column(s) to index?
2
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
multi-column indexor
multiple indexes ?
University of Kaiserslautern
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
Motivation – Index Tuning● Which column(s) to index?
2
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes
redundancyand
maintenance!
University of Kaiserslautern
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
Motivation – Index Tuning● Which column(s) to index?
2
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes
● XML specifics on top of the ISP
– Semi-structured and schema changes
– Indexing may focus on structure and/or content
– Flexible path expressions (axes, wildcard, name)
→ containment problem and generalized indexes
→ descendant axis is evil, but wildcards too
– Node types (elements, attributes, text, …)
–
University of Kaiserslautern
Outline
● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook
University of Kaiserslautern
Outline
● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook
University of Kaiserslautern
XML Index Types - Storage● XML Mapping
– Native
– No Shredding or Blob
● Fundamentals– Node labeling
– B-tree
4
University of Kaiserslautern
XML Index Types - Storage
5
bib
publication
book book book . . .
year id title
1994 1 TCP/IP
author
last first
Stevens W.
price
65.95
Structure
content
. . .
● XML Mapping Elementless
University of Kaiserslautern
XML Index Types - Storage
5
1
1.3
bib
publication
1.3.3 book 1.3.5 book 1.3.7 book . . .
1.3.3.1year
1.3.3.2.1id
1.3.3.3title
1994 1 TCP/IP
1.3.3.5author
1.3.3.5.3last
1.3.5.5first
Stevens W.
1.3.3.7price
65.95
1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3
1.3.3.5.3.3 1.3.3.5.5.3
1.3.3.7.3
Structure
content
. . .
● XML Mapping Elementless
University of Kaiserslautern
● XML Mapping Elementless 1
1.3
bib
publication
1.3.3 book 1.3.5 book 1.3.7 book . . .
1.3.3.1year
1.3.3.2.1id
1.3.3.3title
1994 1 TCP/IP
1.3.3.5author
1.3.3.5.3last
1.3.5.5first
Stevens W.
1.3.3.7price
65.95
1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3
1.3.3.5.3.3 1.3.3.5.5.3
1.3.3.7.3
Structure
content
. . .
XML Index Types - Storage
5
University of Kaiserslautern
● XML Mapping Elementless
Path Synopsis
bib
last
id titleyear price
book
publication
type author paperpublisher
name
journal
first loc titlename
1
1817
16
1514
13
12
11
109
87654
3
2
1
1.3
bib
publication
1.3.3 book 1.3.5 book 1.3.7 book . . .
1.3.3.1year
1.3.3.2.1id
1.3.3.3title
1994 1 TCP/IP
1.3.3.5author
1.3.3.5.3last
1.3.5.5first
Stevens W.
1.3.3.7price
65.95
1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3
1.3.3.5.3.3 1.3.3.5.5.3
1.3.3.7.3
Structure
content
. . .
XML Index Types - Storage
5
University of Kaiserslautern
● XML Mapping Elementless
content
4
1.3.3.5.3.3
. . .
1994
1.3.3.7.3
1.3.3.3.3 TCP/IP10
1.3.3.2.1.3
W.
DeweyID PCR
1.3.3.1.3 1.3.3.9.3
1.3.7.1.3 . . .
document container
documentindex1.3.9
1.3.3.9.3 1.3.11.1.3
1.3.11
. . .
9 Stevens
1.3.3.1.3
5 1 . . .7
1.3.3.5.5.311
. . .13
65.95 4
. . .
Path Synopsis + B-tree layout
bib
last
id titleyear price
book
publication
type author paperpublisher
name
journal
first loc titlename
1
1817
16
1514
13
12
11
109
87654
3
2
bib1
1994 1 TCP/IP
Stevens W.
65.95
1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3
1.3.3.5.3.3 1.3.3.5.5.3
1.3.3.7.3
Structure
content
XML Index Types - Storage
5
University of Kaiserslautern
● XML Mapping Elementless
content
4
1.3.3.5.3.3
. . .
1994
1.3.3.7.3
1.3.3.3.3 TCP/IP10
1.3.3.2.1.3
W.
1.3.3.1.3 1.3.3.9.3
1.3.7.1.3 . . .
documentindex1.3.9
1.3.3.9.3 1.3.11.1.3
1.3.11
. . .
9 Stevens
1.3.3.1.3
5 1 . . .7
1.3.3.5.5.311
. . .13
65.95 4
. . .
Path Synopsis + B-tree layout
bib
last
id titleyear price
book
publication
type author paperpublisher
name
journal
first loc titlename
1
1817
16
1514
13
12
11
109
87654
3
2
bib1
1994 1 TCP/IP
Stevens W.
65.95
1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3
1.3.3.5.3.3 1.3.3.5.5.3
1.3.3.7.3 content
DeweyID PCR
document container
Primary target: space efficiencySecondary: more indexing options for free!
XML Index Types - Storage
5
University of Kaiserslautern
XML Index Types● Content
6
● Element
University of Kaiserslautern
XML Index Types
● Content
● Path
6
● Element
● Content And Structure
● Content
● Path
● Element
● Content And StructurePath: /bib//title PCRs: → 5,11,33 Path: /bib//title PCRs: → 5,11,33
University of Kaiserslautern
XML Index Types
● Content
● Path
● Element
● Content And Structure
Path: /bib//title PCRs: → 5,11,33 Path: /bib//title PCRs: → 5,11,33
● Content
● Path
● Element
● Content And Structure
●Easy to define●Generic●Large●Maintenance costs
●Easy to define●Generic●False positive filtering●Large●Maintenance costs
●Specific●Clustering●Small●Hard to define
●Specific●Clustering●Medium size●No document order●Type support●Hard to define
6
University of Kaiserslautern
XML Index Types – Usage Sample● Simplified Query Graph Model for XMark 01
7
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
No Indexes: 3 x document scans 1 x document navigational access
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
University of Kaiserslautern
XML Index Types – Usage Sample
7
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
● Simplified Query Graph Model for XMark 01
University of Kaiserslautern
XML Index Types – Usage Sample
7
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
No Indexes: 3 x document scans 1 x document navigational access
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
● Simplified Query Graph Model for XMark 01
University of Kaiserslautern
XML Index Types – Usage Sample
7
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
● Simplified Query Graph Model for XMark 01
University of Kaiserslautern
XML Index Types – Usage Sample
7
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
● Simplified Query Graph Model for XMark 01
University of Kaiserslautern
XML Index Types – Usage Sample
7
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
● Simplified Query Graph Model for XMark 01
University of Kaiserslautern
XML Index Types – Usage Sample
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
● Simplified Query Graph Model for XMark 01
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
Besides element and content indexMore than 280 path or CAS indexes possible
by varying depth, axes (“/”,”//”), clustering, and wildcards
7
University of Kaiserslautern
Outline
● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook
University of Kaiserslautern
Statistics● Extended Path Synopsis
9
bib
last
id titleyear price
book
publication
type author paperpublisher
name
journal
first loc titlename
1
1817
16
1514
13
12
11
109
87654
3
2
bib1
Each node is extended:
● Index Statistics
Gathered during storage:●B-tree height●# leave pages●Cardinality●Index size
…
… h
pages, size
Instance counter | o content length | IUD counter
●Extended PS typically <0.1% of XML document●No optimization yet (e.g., recursion)●Processing overhead less than 5-6%
Overhead:
journal12
…
… …
University of Kaiserslautern
Statistics● Cost Estimation for Index Candidates
● Estimation Accuracy
10
1.Evaluate Index Expression on Path Synopsis → PCR set2.For each PCR add node's statistics (cardinality, width, IUD counter)3.Index type-dependent estimation of height, no. of leaves, and size
University of Kaiserslautern
Outline
● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook
University of Kaiserslautern
Index Configuration Management● Integrated into XTC
backend
● Asynchronous jobs for
– Candidate search
– Index building● Flexible configuration
– Type support
– Pruning thresholds
– Size limits
– Job schedule
12
University of Kaiserslautern
Index Configuration Management1. Record query processing costs
2. If cost > threshold do Auto Indexing
3. Feedback Auto Indexing costs
Auto Indexing
1. Traverse query plan bottom-up (access operators)
2. Generate index candidates
3. Rerun optimization including candidates
4. Analyze plan(s) for selected candidates
5. Update candidate set
Cost/benefit calculation → schedule index materialization jobs13
University of Kaiserslautern
Candidate Generation
14
Auto Indexing – Traverse
Candidates:QGM representation
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
University of Kaiserslautern
14
Auto Indexing – Traverse
Candidates:QGM representation
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
1. //
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Candidate Generation
University of Kaiserslautern
1. Record query processing costs
2. If cost < threshold do Auto Indexing
3. Feedback Auto Indexing costs
Auto Indexing
1. Traverse query plan bottom-up (access operators)
2. Create index candidates
3. Rerun optimization including candidates
4. Analyze plan(s) for selected candidates
5. Update candidate set
14
Auto Indexing – Traverse
Candidates:QGM representation
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
1. //2. book
Candidate Generation
University of Kaiserslautern
14
Auto Indexing – Traverse
Candidates:QGM representation
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
1. //2. book3. @id
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Candidate Generation
University of Kaiserslautern
14
Auto Indexing – Traverse
Candidates:QGM representation
1. //2. book3. @id4. “book1”
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Candidate Generation
University of Kaiserslautern
14
Auto Indexing – Traverse
Candidates:QGM representation
1. //2. book3. @id4. “book1”…
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Candidate Generation
University of Kaiserslautern
14
Auto Indexing – Traverse
Candidates:QGM representation
1. //2. book3. @id4. “book1”…
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Candidate Generation
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Over 18 enumeration rules and optimization rules
University of Kaiserslautern
Cost-Benefit Calculation
● Rank candidates by benefit
● Observe space limitation (Greedy algorithm)
● Search space pruning (merge indexes)
Containment Problem:
● Comparing PCR sets, but index types and semantics may differ
Reconfigure Index Set
● Schedule materialization and deletion jobs
Candidate Selection
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
15
University of Kaiserslautern
Outline
● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook
University of Kaiserslautern
Evaluation● Workload Documents
– XMark documents (12MB and 112MB)
– TPoX collection (~250MB)
– Standard XML documents (dblp, treebank, psd, …)
● Benchmark Scenarios– No Indexes
– Manual (element + content index)
– Manual + Self-tuning
– Self-tuning
17
University of Kaiserslautern
Evaluation
18
University of Kaiserslautern
Evaluation
18
University of Kaiserslautern
Evaluation
18
(2)
University of Kaiserslautern
Evaluation
18
(2)
University of Kaiserslautern
Evaluation
18
(2)(2 + 23)
University of Kaiserslautern
Evaluation
18
(2)(2 + 23)
University of Kaiserslautern
Evaluation
18
(2)(2 + 23)
(23)
University of Kaiserslautern
Evaluation
Impact of parallel indexing
19
Impact of aggressive indexing
University of Kaiserslautern
Evaluation
20
Overhead in %
% of workload
Overhead / Pruning effects Workload shifts
University of Kaiserslautern
Summary● Self-tuning of XML indexes causes new challenges to the ISP
● Path Synopsis use for storage, indexing, and managing
● Overhead of statistics and management (mostly) pays off
● Self-tuning of tuning “frequency” and pruning are effective
● Integrate Update workload
● Analyze workload shift reaction (stability vs. effectiveness)
● Evaluate XML warehouse scenarios
● Integrate data placement decision
● Combine with buffer tuning
Outlook
21