db2 for linux, unix, and windows
DESCRIPTION
DB2 for Linux, Unix, and Windows. pureXML Indexing Overview DB2 9, 9.5, and 9.7 for Linux, Unix, and Windows Christina (Tina) Lee IBM Silicon Valley Laboratory. September 2009. Agenda. pureXML Basics Regions and Paths Indexes Index on XML column DB2 9.5 Reject Invalid Values option - PowerPoint PPT PresentationTRANSCRIPT
© 2009 IBM Corporation
pureXML Indexing Overview DB2 9, 9.5, and 9.7
for Linux, Unix, and Windows
Christina (Tina) Lee IBM Silicon Valley Laboratory
DB2 for Linux, Unix, and Windows
September 2009
2
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Agenda
pureXML Basics Regions and Paths Indexes Index on XML column
– DB2 9.5 Reject Invalid Values option
Common User Errors Queries and XML Indexes Catalog Changes DB2 9.7
– XML indexes on Range Partitioned Tables
– Online Index Create and Online Index Reorg
– Index Compression
3
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
pureXML Basics
create table dept (deptID char(8),…, deptdoc xml);
• XML stored in a parsed hierarchical format• No fixed XML schema per XML column required• XML Schema validation is optional, per document• XML indexes for specific elements/attributes• XQuery and SQL/XML Integration
DB2 Storage
Relational Storage
pureXML Storage
page page page
4
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Storing the XML Document
Person
@gender
"Female"Name
First
text()
"Snow"
Last
text()
"White"
Age
@unit
"years"
text()
"17"
XML Document Tree
CREATE TABLE T1 (docID int, XMLDoc xml);insert into t1 values( 10, xmlparse(document'<?xml version="1.0"?><Person gender="Female"> <Name> <First>Snow</First> <Last>White</Last> </Name> <Age unit="years">17</Age></Person> ' )
);
XML Regions Index
XDA
Relational Table T1DocID XMLDoc
10
DAT
INX
5
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
XML Regions and Column Path Indexes
/person 3/person/age 4
XML Regions Index
XML Column Path Index on
XMLDoc1
XML Column Path Index on
XMLDoc2
CREATE TABLE t1 (docID int, XMLDoc1 xml, XMLDoc2 xml);
/dept 1/dept/emp 2
XML Regions Index– System generated when first XML column created or added to table
– Nodes and subtrees in a data page form regions in a document
– Provides logical mapping of regions to retrieve document data XML Column Path Index
– System generated for each XML column created or added to table
– Maps unique paths to path ids for each XML column
– Used to improve performance during queries
6
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Index on an XML Column
XMLRegions Index
create table T1 (docID int, XMLDoc xml);create index AgeIndex on T1(XMLDoc) generate key using xmlpattern '/person/age' as sql varchar(10);
XQUERY for $i in db2-fn:xmlcolumn('T1.XMLDOC') /person[age='17'] return $i/Name
XML Document Tree in XDA
Person
@gender"Female"
Name
First
text()
"Snow"
Last
text()
"White"
Age
@unit
"years"
text()
"17"
Relational Table T1DocID XMLDoc
10 Descriptor
AgeIndexIndex on XML
Column
pathID Value4 17
Path Index on XMLDoc
path ID/person/age 4
7
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
The Big XML Indexing Picture
SQL Table with XML Column
XML ColumnPathsIndex
XML Storage.XDA file
XML Regions Index
Catalog Path Table
Index on XML Column
Relational Column 1 Relational Column 2 XML Column
Relational Index
Maps paths to path ids for each XML column. Subset of paths stored in global catalog path table.
Logical mapping of regions in an XML document used to retrieve the document data
Created by users to improve performance during queries on XML documents
8
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
CREATE INDEX for Index on XML Column
create index AgeUnitIndex on t1(XMLDoc) generate key using xmlpattern '/Person/Age/@unit' as sql varchar(16);
Index created on single XML column Composite keys not supported Only indexes document nodes that satisfy XML pattern XML index specification
– GENERATE KEY USING XMLPATTERN
– XML pattern expression
– Data type
9
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Key Generation
Relational index inserts one key per table row Index on XML Column may insert multiple keys per table row Multiple parts of document may satisfy XML pattern
/Person/Age/@unit "days"/Person/Age/@unit "years"
AgeUnitIndexIndex on XML Column
<?xml version="1.0"?><Person gender="Female"> <Name> <Last>White</Last> <First>Snow</First> </Name> <Age unit="years">17</Age> <Age unit="days">6322</Age></Person>
XML Documentcreate index AgeUnitIndex on t1(XMLDoc) generate key using xmlpattern '/Person/Age/@unit' as sql varchar(16);
10
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
XML Documents for Examples
<company name="Empire"> <emp id="31201" salary="60000" gender="Male">
<name><first>Darth</first><last>Vapor</last></name> <dept id="M25">Security</dept>
<birthdate>1974-08-27</birthdate> </emp></company>
XML Document 1
<company name="Alliance"> <emp id="31664" salary="60000" gender="Male">
<name><first>Luke</first><last>Moonwalker</last></name> <dept id="M55">Marketing</dept>
<birthdate>1960-07-21</birthdate> </emp> <emp id="42366" salary="50000" gender="Female"> <name><first>Laura</first><last>Organa</last></name>
<dept id="K55">Sales</dept> <birthdate>1960-07-21</birthdate> </emp></company>
XML Document 2
11
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
XML Pattern: Path Expression Steps
Supports subset of XQuery path expressions Path expression steps separated by forward slash (/) Double forward slash (//) is abbreviated syntax for
/descendant-or-self::node()/
1. CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL DOUBLE
2. CREATE INDEX idindex on company(companydocs) GENERATE KEY USING XMLPATTERN '//@id' AS SQL DOUBLE
12
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Qualifying Paths and Nodes
Set of nodes may qualify if single path specified Set of paths and nodes may qualify if wildcard, descendant axis, or
descendant-or-self axis specified
idindexIndex on XML Column
CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL VARCHAR(10)
Set of nodes qualifies
CREATE INDEX idindex on company(companydocs) GENERATE KEY USING XMLPATTERN '//@id' AS SQL VARCHAR(10)
Set of paths and nodes qualifies
empindexIndex on XML Column
/company/emp/@id 31201/company/emp/@id 31664/company/emp/@id 42366
/company/emp/@id 31201/company/emp/@id 31664/company/emp/@id 42366/company/emp/dept/@id M25/company/emp/dept@id M55/company/emp/dept@id K55
13
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Specifying text()
nameix1
No index entries generated. "name" element doesn't contain text. Text only found in child elements, "first" and "last".
Company
@name
"Empire"emp
@id
"31201"
@salary
"60000"
@gender
"Male"name
first
text()"Darth"
last
text()"Vapor"
dept
@id"M25"
text()"Security"
birthdate
text()"1974-08-27"
CREATE INDEX nameix1 on T1(XMLCol) GENERATE KEY USING XMLPATTERN '/company/emp/name/text()' AS SQL VARCHAR(30)
14
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
If text() not specified
nameix2
/company/emp/name DarthVapor
Company
@name
"Empire"emp
@id
"31201"
@salary
"60000"
@gender
"Male"name
first
text()"Darth"
last
text()"Vapor"
dept
@id"M25"
text()"Security"
birthdate
text()"1974-08-27"
Text from "first" and "last" child elements are concatenated together
CREATE INDEX nameix2 on T1(XMLCol) GENERATE KEY USING XMLPATTERN
'/company/emp/name' AS SQL VARCHAR(30)
15
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Data Types
Four SQL data types are supported– VARCHAR
– DOUBLE
– DATE
– TIMESTAMP
CREATE INDEX empindex1 on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL VARCHAR(10)
CREATE INDEX empindex2 on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL DOUBLE
@id indexed as a character string such as '31201'
@id indexed as a number such as 3.120100e+04
16
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
VARCHAR(n)
Values longer than specified length(n) are not indexed
– Document insertion or index creation will fail Index can support both range scans and equality look-ups Trailing blanks are significant during string comparisons
CREATE INDEX empindex1 on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL VARCHAR(10)
Page size Maximum value for length "n" in bytes
4K 8178K 1,84116K 3,88932K 7,985
17
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
VARCHAR HASHED
Has no length limit and can index arbitrary length character strings
System generates an 8-byte hash code over entire string Only used for equality look-ups and not range scans
CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/name/last' AS SQL VARCHAR HASHED
18
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
DOUBLE
All numeric values will be converted and stored in the index as the DOUBLE data type
Special numeric values (NaN, INF, -INF, +0, -0) indexed even though not supported by SQL DOUBLE data type
CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL DOUBLE
19
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
DATE and TIMESTAMP
If timezone not specified, original value stored in index If timezone specified, DATE and TIMESTAMP data type values
are normalized to UTC (Coordinated Universal Time) before storing in index
CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/birthdate' AS SQL DATE
CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/birthdate' AS SQL TIMESTAMP
20
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Document Rejection or CREATE INDEX Failures
Errors causing document rejection for INSERT or UPDATE statements and CREATE INDEX failure if table already populated– VARCHAR(n): Value length exceeds length constraint– Conversion Errors: Valid XML value but can't convert to DB2's
representation for the data type because of DB2 limitations
XML Type XML Schema DB2 XML Range (min : max)xs:date No maximum limit for years.
Negative dates supported.0001-01-01:9999-12-31
xs:dateTime No maximum limit for years. Negative dates supported. Arbitrary precision supported for fractional seconds.
0001-01-01T00:00:00.000000Z:9999-12-31T23:59:59.999999Z
xs:integer No limit on minimum or maximum range
-9223372036854775808: 9223372036854775807
Some Examples of DB2 Limitations
21
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Invalid XML Values
For DOUBLE, DATE, and TIMESTAMP indexes– XML values without a valid lexical form for the target
index XML data type are invalid DB2 9 XML indexes always ignore invalid XML values Invalid XML values can be rejected or ignored on new
CREATE INDEX option for DB2 9.5
DOUBLE
22
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
DB2 9.5 Reject Invalid Values
New REJECT INVALID VALUES option for DB2 9.5 If XML value can’t be cast to index XML data type, error returned If index does not exist, index is not created XML data not inserted or updated in the table if index exists
/company/emp/dept/@id = "M25" not a valid DOUBLE value
CREATE INDEX empindex on T1(XMLCol) GENERATE KEY USING XMLPATTERN '//@id' AS SQL DOUBLE
REJECT INVALID VALUES
CREATE INDEX Error
23
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
DB2 9.5 Ignore Invalid Values
Invalid values for index XML data type ignored and not indexed No error or warning is issued Default option
empindex
/company/emp/@id 3.120100e+04/company/emp/@id = "31201" is a valid DOUBLE
/company/emp/dept/@id = "M25" is not a valid DOUBLE
CREATE INDEX empindex on T1(XMLCol) GENERATE KEY USING XMLPATTERN '//@id' AS SQL DOUBLE
IGNORE INVALID VALUES
M25
24
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Unique Keyword
Uniqueness enforced across all documents within a single XML column
Enforced within index data type, XML path to node, and value of node after value cast to index data type
CREATE UNIQUE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/name/last' AS SQL VARCHAR(100)
descendant axisdescendant-or-self axis//wildcards for XML name testnode() or processing
instruction() for XML kind test
XML Pattern must specify a single complete path and may not contain:
CREATE UNIQUE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '//last' AS SQL VARCHAR(100)
25
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Query Operators for XML XSCAN ( XML Document Scan)
–Traverses XML document trees and may evaluate predicates and extract document values
XISCAN (XML Index Scan)–Performs probes and scans on XML indexes and can evaluate predicates.
XANDOR (XML Index ANDing and ORing)–Evaluates two or more equality predicates
by driving multiple XISCANs.
CREATE INDEX AgeIndex on t1(XMLDoc)GENERATE KEY USING XMLPATTERN'/Person/Age' AS SQL DOUBLE;
XQUERY for $i in
db2-fn:xmlcolumn(‘T1.XMLDOC’)/Person where $i/Age = 17 return $i;
26
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Index Eligibility
Requirements for an XML index to be used for a query:
1. Index “contains” the query predicate, i.e. isequally or less restrictive than the predicate
2. Query predicate matches the index data type
3. /text() is used consistently in query predicate and index definition: both specify /text() or not specify /text()
Even if these requirements are satisfied, the optimizer can still decide NOT to use an eligible index!
27
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Queries using an Index on an XML Column
Some sample queries using equality and range predicates
Return the salary of the employee with id = '42366'
CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL VARCHAR(5)
XQUERY for $i in db2-fn:xmlcolumn('COMPANY.COMPANYDOCS')/company/emp[@id='42366'] return $i/@salary
Return the id's of the employees with salary > 50000
CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '//@salary' AS SQL DOUBLE
XQUERY for $i in db2-fn:xmlcolumn('COMPANY.COMPANYDOCS')/company/emp[@salary >50000] return $i/@id
28
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Catalog view externalizes information on the XML pattern specified for an Index on an XML Column
SYSCAT.INDEXXMLPATTERNSINDSCHEMA INDNAME DATATYPE PATTERNLEECM EMPINDEX DOUBLE /company/emp/@id
LEECM DEPTINDEX VARCHAR /company/dept/name
CREATE INDEX deptindex on T1(XMLCol) GENERATE KEY USING XMLPATTERN '/company/dept/name' AS SQL VARCHAR HASHED;
CREATE INDEX empindex on T1(XMLCol) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL DOUBLE;
29
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
Index on an XML Column has a logical and physical index Logical index contains XML pattern created by user Physical index contains index values
– DB2 system generated key columns
INDNAME IID TABNAME INDEXTYPESQL060414133259940 1 COMPANY XRGNSQL060414133300150 2 COMPANY XPTH
NAMEIX1 3 COMPANY XVILSQL060414134408390 4 COMPANY XVIPNAMEIX2 5 COMPANY XVILSQL060414134408620 6 COMPANY XVIP
SYSCAT.INDEXES
30
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
DB2 9.7 XML Indexes on Range Partitioned Tables
Relational Indexes may be not partitioned or partitioned in DB2 9.7 User-defined XML Indexes may be not partitioned (DB2 9.7 GA) or
partitioned (DB2 9.7 FP1) System generated XML Paths Indexes are always not partitioned System generated XML Regions Indexes are always partitioned
CREATE INDEX zipcode ON sales(customer_info)
GENERATE KEY USING XMLPATTERN
’/Customer/Address/Zipcode’ AS SQL varchar(10)
NOT PARTITIONED;
CREATE INDEX zipcode ON sales(customer_info)
GENERATE KEY USING XMLPATTERN
’/Customer/Address/Zipcode’ AS SQL varchar(10)
PARTITIONED;
31
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
XDA
Base Table Partition 1
XDA
Base Table Partition 2
Partitioned Regions Index
XDA
Base Table Partition 3
Partitioned Regions Index
Partitioned Regions Index
Partitioned Relational Index or
Index on XML Column
Non-Partitioned Relational Index or Index on XML Column
Partitioned Relational Index or
Index on XML Column
Partitioned Relational Index or
Index on XML Column
Non-Partitioned XML Path Index
DB2 9.7 XML Indexes on Range Partitioned Tables
32
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
DB2 9.7 Online XML Index Create and Reorg
Insert/Update/Delete transactions no longer need to wait until the CREATE INDEX/REORG INDEXES/REORG INDEX statement completes
Results in increased throughput and faster response time for concurrent transactions.
Transaction 1 Transaction 2
create table employee (empid integer, info XML);
create index empidx on employee(info) generate key using xmlpattern '/employeeinfo/addr' as sql varchar(50);
Create Index will delete the index entry before it completes.
Delete from employee where empid = 31201
Delete will not wait for the Create Index and will complete successfully
reorg indexes all for table employeeallow write access
Reorg Indexes command will delete the index entry before it completes.
Delete from employee where empid = 31664
Delete will not wait for the Reorg Indexes and will complete successfully
33
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
DB2 9.7 Index Compression
Default for relational and XML indexes enables compression if data row compression enabled
New COMPRESS keyword on CREATE/ALTER INDEX can override default behavior
–Index can be compressed even if data rows not compressed
MDC Block Indexes and XML Paths Indexes not compressed
CREATE INDEX deptindex on T1(XMLCol) GENERATE KEY USING XMLPATTERN '/company/dept' AS SQL VARCHAR(20)
COMPRESS YES
ALTER INDEX deptindex
COMPRESS NO
reorg indexes all for table T1allow write access
34
DB2 for Linux, Unix, and Windows
pureXML Indexing Overview © 2009 IBM Corporation
What Did You Learn Today?
What the difference is between XML and relational indexes How to create an index on an XML column How to avoid common user errors What the requirements are for queries to use XML indexes How the XML indexes are defined in the catalog DB2 9.5 and DB2 9.7 XML index enhancements