db2 for linux, unix, and windows

34
© 2009 IBM Corporation pureXML Indexing Overview DB2 9, 9.5, and 9.7 for Linux, Unix, and Windows Christina (Tina) Lee IBM Silicon Valley Laboratory DB2 for Linux, Unix, and Windows September 2009

Upload: york

Post on 07-Jan-2016

59 views

Category:

Documents


3 download

DESCRIPTION

DB2 for Linux, Unix, and Windows. pureXML Indexing Overview DB2 9, 9.5, and 9.7 for Linux, Unix, and Windows Christina (Tina) Lee IBM Silicon Valley Laboratory. September 2009. Agenda. pureXML Basics Regions and Paths Indexes Index on XML column DB2 9.5 Reject Invalid Values option - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DB2 for Linux, Unix, and Windows

© 2009 IBM Corporation

pureXML Indexing Overview DB2 9, 9.5, and 9.7

for Linux, Unix, and Windows

Christina (Tina) Lee IBM Silicon Valley Laboratory

DB2 for Linux, Unix, and Windows

September 2009

Page 2: DB2 for Linux, Unix, and Windows

2

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Agenda

pureXML Basics Regions and Paths Indexes Index on XML column

– DB2 9.5 Reject Invalid Values option

Common User Errors Queries and XML Indexes Catalog Changes DB2 9.7

– XML indexes on Range Partitioned Tables

– Online Index Create and Online Index Reorg

– Index Compression

Page 3: DB2 for Linux, Unix, and Windows

3

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

pureXML Basics

create table dept (deptID char(8),…, deptdoc xml);

• XML stored in a parsed hierarchical format• No fixed XML schema per XML column required• XML Schema validation is optional, per document• XML indexes for specific elements/attributes• XQuery and SQL/XML Integration

DB2 Storage

Relational Storage

pureXML Storage

page page page

Page 4: DB2 for Linux, Unix, and Windows

4

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Storing the XML Document

Person

@gender

"Female"Name

First

text()

"Snow"

Last

text()

"White"

Age

@unit

"years"

text()

"17"

XML Document Tree

CREATE TABLE T1 (docID int, XMLDoc xml);insert into t1 values( 10, xmlparse(document'<?xml version="1.0"?><Person gender="Female"> <Name> <First>Snow</First> <Last>White</Last> </Name> <Age unit="years">17</Age></Person> ' )

);

XML Regions Index

XDA

Relational Table T1DocID XMLDoc

10

DAT

INX

Page 5: DB2 for Linux, Unix, and Windows

5

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

XML Regions and Column Path Indexes

/person 3/person/age 4

XML Regions Index

XML Column Path Index on

XMLDoc1

XML Column Path Index on

XMLDoc2

CREATE TABLE t1 (docID int, XMLDoc1 xml, XMLDoc2 xml);

/dept 1/dept/emp 2

XML Regions Index– System generated when first XML column created or added to table

– Nodes and subtrees in a data page form regions in a document

– Provides logical mapping of regions to retrieve document data XML Column Path Index

– System generated for each XML column created or added to table

– Maps unique paths to path ids for each XML column

– Used to improve performance during queries

Page 6: DB2 for Linux, Unix, and Windows

6

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Index on an XML Column

XMLRegions Index

create table T1 (docID int, XMLDoc xml);create index AgeIndex on T1(XMLDoc) generate key using xmlpattern '/person/age' as sql varchar(10);

XQUERY for $i in db2-fn:xmlcolumn('T1.XMLDOC') /person[age='17'] return $i/Name

XML Document Tree in XDA

Person

@gender"Female"

Name

First

text()

"Snow"

Last

text()

"White"

Age

@unit

"years"

text()

"17"

Relational Table T1DocID XMLDoc

10 Descriptor

AgeIndexIndex on XML

Column

pathID Value4 17

Path Index on XMLDoc

path ID/person/age 4

Page 7: DB2 for Linux, Unix, and Windows

7

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

The Big XML Indexing Picture

SQL Table with XML Column

XML ColumnPathsIndex

XML Storage.XDA file

XML Regions Index

Catalog Path Table

Index on XML Column

Relational Column 1 Relational Column 2 XML Column

Relational Index

Maps paths to path ids for each XML column. Subset of paths stored in global catalog path table.

Logical mapping of regions in an XML document used to retrieve the document data

Created by users to improve performance during queries on XML documents

Page 8: DB2 for Linux, Unix, and Windows

8

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

CREATE INDEX for Index on XML Column

create index AgeUnitIndex on t1(XMLDoc) generate key using xmlpattern '/Person/Age/@unit' as sql varchar(16);

Index created on single XML column Composite keys not supported Only indexes document nodes that satisfy XML pattern XML index specification

– GENERATE KEY USING XMLPATTERN

– XML pattern expression

– Data type

Page 9: DB2 for Linux, Unix, and Windows

9

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Key Generation

Relational index inserts one key per table row Index on XML Column may insert multiple keys per table row Multiple parts of document may satisfy XML pattern

/Person/Age/@unit "days"/Person/Age/@unit "years"

AgeUnitIndexIndex on XML Column

<?xml version="1.0"?><Person gender="Female"> <Name> <Last>White</Last> <First>Snow</First> </Name> <Age unit="years">17</Age> <Age unit="days">6322</Age></Person>

XML Documentcreate index AgeUnitIndex on t1(XMLDoc) generate key using xmlpattern '/Person/Age/@unit' as sql varchar(16);

Page 10: DB2 for Linux, Unix, and Windows

10

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

XML Documents for Examples

<company name="Empire"> <emp id="31201" salary="60000" gender="Male">

<name><first>Darth</first><last>Vapor</last></name> <dept id="M25">Security</dept>

<birthdate>1974-08-27</birthdate> </emp></company>

XML Document 1

<company name="Alliance"> <emp id="31664" salary="60000" gender="Male">

<name><first>Luke</first><last>Moonwalker</last></name> <dept id="M55">Marketing</dept>

<birthdate>1960-07-21</birthdate> </emp> <emp id="42366" salary="50000" gender="Female"> <name><first>Laura</first><last>Organa</last></name>

<dept id="K55">Sales</dept> <birthdate>1960-07-21</birthdate> </emp></company>

XML Document 2

Page 11: DB2 for Linux, Unix, and Windows

11

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

XML Pattern: Path Expression Steps

Supports subset of XQuery path expressions Path expression steps separated by forward slash (/) Double forward slash (//) is abbreviated syntax for

/descendant-or-self::node()/

1. CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL DOUBLE

2. CREATE INDEX idindex on company(companydocs) GENERATE KEY USING XMLPATTERN '//@id' AS SQL DOUBLE

Page 12: DB2 for Linux, Unix, and Windows

12

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Qualifying Paths and Nodes

Set of nodes may qualify if single path specified Set of paths and nodes may qualify if wildcard, descendant axis, or

descendant-or-self axis specified

idindexIndex on XML Column

CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL VARCHAR(10)

Set of nodes qualifies

CREATE INDEX idindex on company(companydocs) GENERATE KEY USING XMLPATTERN '//@id' AS SQL VARCHAR(10)

Set of paths and nodes qualifies

empindexIndex on XML Column

/company/emp/@id 31201/company/emp/@id 31664/company/emp/@id 42366

/company/emp/@id 31201/company/emp/@id 31664/company/emp/@id 42366/company/emp/dept/@id M25/company/emp/dept@id M55/company/emp/dept@id K55

Page 13: DB2 for Linux, Unix, and Windows

13

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Specifying text()

nameix1

No index entries generated. "name" element doesn't contain text. Text only found in child elements, "first" and "last".

Company

@name

"Empire"emp

@id

"31201"

@salary

"60000"

@gender

"Male"name

first

text()"Darth"

last

text()"Vapor"

dept

@id"M25"

text()"Security"

birthdate

text()"1974-08-27"

CREATE INDEX nameix1 on T1(XMLCol) GENERATE KEY USING XMLPATTERN '/company/emp/name/text()' AS SQL VARCHAR(30)

Page 14: DB2 for Linux, Unix, and Windows

14

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

If text() not specified

nameix2

/company/emp/name DarthVapor

Company

@name

"Empire"emp

@id

"31201"

@salary

"60000"

@gender

"Male"name

first

text()"Darth"

last

text()"Vapor"

dept

@id"M25"

text()"Security"

birthdate

text()"1974-08-27"

Text from "first" and "last" child elements are concatenated together

CREATE INDEX nameix2 on T1(XMLCol) GENERATE KEY USING XMLPATTERN

'/company/emp/name' AS SQL VARCHAR(30)

Page 15: DB2 for Linux, Unix, and Windows

15

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Data Types

Four SQL data types are supported– VARCHAR

– DOUBLE

– DATE

– TIMESTAMP

CREATE INDEX empindex1 on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL VARCHAR(10)

CREATE INDEX empindex2 on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL DOUBLE

@id indexed as a character string such as '31201'

@id indexed as a number such as 3.120100e+04

Page 16: DB2 for Linux, Unix, and Windows

16

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

VARCHAR(n)

Values longer than specified length(n) are not indexed

– Document insertion or index creation will fail Index can support both range scans and equality look-ups Trailing blanks are significant during string comparisons

CREATE INDEX empindex1 on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL VARCHAR(10)

Page size Maximum value for length "n" in bytes

4K 8178K 1,84116K 3,88932K 7,985

Page 17: DB2 for Linux, Unix, and Windows

17

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

VARCHAR HASHED

Has no length limit and can index arbitrary length character strings

System generates an 8-byte hash code over entire string Only used for equality look-ups and not range scans

CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/name/last' AS SQL VARCHAR HASHED

Page 18: DB2 for Linux, Unix, and Windows

18

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

DOUBLE

All numeric values will be converted and stored in the index as the DOUBLE data type

Special numeric values (NaN, INF, -INF, +0, -0) indexed even though not supported by SQL DOUBLE data type

CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL DOUBLE

Page 19: DB2 for Linux, Unix, and Windows

19

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

DATE and TIMESTAMP

If timezone not specified, original value stored in index If timezone specified, DATE and TIMESTAMP data type values

are normalized to UTC (Coordinated Universal Time) before storing in index

CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/birthdate' AS SQL DATE

CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/birthdate' AS SQL TIMESTAMP

Page 20: DB2 for Linux, Unix, and Windows

20

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Document Rejection or CREATE INDEX Failures

Errors causing document rejection for INSERT or UPDATE statements and CREATE INDEX failure if table already populated– VARCHAR(n): Value length exceeds length constraint– Conversion Errors: Valid XML value but can't convert to DB2's

representation for the data type because of DB2 limitations

XML Type XML Schema DB2 XML Range (min : max)xs:date No maximum limit for years.

Negative dates supported.0001-01-01:9999-12-31

xs:dateTime No maximum limit for years. Negative dates supported. Arbitrary precision supported for fractional seconds.

0001-01-01T00:00:00.000000Z:9999-12-31T23:59:59.999999Z

xs:integer No limit on minimum or maximum range

-9223372036854775808: 9223372036854775807

Some Examples of DB2 Limitations

Page 21: DB2 for Linux, Unix, and Windows

21

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Invalid XML Values

For DOUBLE, DATE, and TIMESTAMP indexes– XML values without a valid lexical form for the target

index XML data type are invalid DB2 9 XML indexes always ignore invalid XML values Invalid XML values can be rejected or ignored on new

CREATE INDEX option for DB2 9.5

DOUBLE

Page 22: DB2 for Linux, Unix, and Windows

22

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

DB2 9.5 Reject Invalid Values

New REJECT INVALID VALUES option for DB2 9.5 If XML value can’t be cast to index XML data type, error returned If index does not exist, index is not created XML data not inserted or updated in the table if index exists

/company/emp/dept/@id = "M25" not a valid DOUBLE value

CREATE INDEX empindex on T1(XMLCol) GENERATE KEY USING XMLPATTERN '//@id' AS SQL DOUBLE

REJECT INVALID VALUES

CREATE INDEX Error

Page 23: DB2 for Linux, Unix, and Windows

23

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

DB2 9.5 Ignore Invalid Values

Invalid values for index XML data type ignored and not indexed No error or warning is issued Default option

empindex

/company/emp/@id 3.120100e+04/company/emp/@id = "31201" is a valid DOUBLE

/company/emp/dept/@id = "M25" is not a valid DOUBLE

CREATE INDEX empindex on T1(XMLCol) GENERATE KEY USING XMLPATTERN '//@id' AS SQL DOUBLE

IGNORE INVALID VALUES

M25

Page 24: DB2 for Linux, Unix, and Windows

24

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Unique Keyword

Uniqueness enforced across all documents within a single XML column

Enforced within index data type, XML path to node, and value of node after value cast to index data type

CREATE UNIQUE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/name/last' AS SQL VARCHAR(100)

descendant axisdescendant-or-self axis//wildcards for XML name testnode() or processing

instruction() for XML kind test

XML Pattern must specify a single complete path and may not contain:

CREATE UNIQUE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '//last' AS SQL VARCHAR(100)

Page 25: DB2 for Linux, Unix, and Windows

25

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Query Operators for XML XSCAN ( XML Document Scan)

–Traverses XML document trees and may evaluate predicates and extract document values

XISCAN (XML Index Scan)–Performs probes and scans on XML indexes and can evaluate predicates.

XANDOR (XML Index ANDing and ORing)–Evaluates two or more equality predicates

by driving multiple XISCANs.

CREATE INDEX AgeIndex on t1(XMLDoc)GENERATE KEY USING XMLPATTERN'/Person/Age' AS SQL DOUBLE;

XQUERY for $i in

db2-fn:xmlcolumn(‘T1.XMLDOC’)/Person where $i/Age = 17 return $i;

Page 26: DB2 for Linux, Unix, and Windows

26

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Index Eligibility

Requirements for an XML index to be used for a query:

1. Index “contains” the query predicate, i.e. isequally or less restrictive than the predicate

2. Query predicate matches the index data type

3. /text() is used consistently in query predicate and index definition: both specify /text() or not specify /text()

Even if these requirements are satisfied, the optimizer can still decide NOT to use an eligible index!

Page 27: DB2 for Linux, Unix, and Windows

27

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Queries using an Index on an XML Column

Some sample queries using equality and range predicates

Return the salary of the employee with id = '42366'

CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL VARCHAR(5)

XQUERY for $i in db2-fn:xmlcolumn('COMPANY.COMPANYDOCS')/company/emp[@id='42366'] return $i/@salary

Return the id's of the employees with salary > 50000

CREATE INDEX empindex on company(companydocs) GENERATE KEY USING XMLPATTERN '//@salary' AS SQL DOUBLE

XQUERY for $i in db2-fn:xmlcolumn('COMPANY.COMPANYDOCS')/company/emp[@salary >50000] return $i/@id

Page 28: DB2 for Linux, Unix, and Windows

28

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Catalog view externalizes information on the XML pattern specified for an Index on an XML Column

SYSCAT.INDEXXMLPATTERNSINDSCHEMA INDNAME DATATYPE PATTERNLEECM EMPINDEX DOUBLE /company/emp/@id

LEECM DEPTINDEX VARCHAR /company/dept/name

CREATE INDEX deptindex on T1(XMLCol) GENERATE KEY USING XMLPATTERN '/company/dept/name' AS SQL VARCHAR HASHED;

CREATE INDEX empindex on T1(XMLCol) GENERATE KEY USING XMLPATTERN '/company/emp/@id' AS SQL DOUBLE;

Page 29: DB2 for Linux, Unix, and Windows

29

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

Index on an XML Column has a logical and physical index Logical index contains XML pattern created by user Physical index contains index values

– DB2 system generated key columns

INDNAME IID TABNAME INDEXTYPESQL060414133259940 1 COMPANY XRGNSQL060414133300150 2 COMPANY XPTH

NAMEIX1 3 COMPANY XVILSQL060414134408390 4 COMPANY XVIPNAMEIX2 5 COMPANY XVILSQL060414134408620 6 COMPANY XVIP

SYSCAT.INDEXES

Page 30: DB2 for Linux, Unix, and Windows

30

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

DB2 9.7 XML Indexes on Range Partitioned Tables

Relational Indexes may be not partitioned or partitioned in DB2 9.7 User-defined XML Indexes may be not partitioned (DB2 9.7 GA) or

partitioned (DB2 9.7 FP1) System generated XML Paths Indexes are always not partitioned System generated XML Regions Indexes are always partitioned

CREATE INDEX zipcode ON sales(customer_info)

GENERATE KEY USING XMLPATTERN

’/Customer/Address/Zipcode’ AS SQL varchar(10)

NOT PARTITIONED;

CREATE INDEX zipcode ON sales(customer_info)

GENERATE KEY USING XMLPATTERN

’/Customer/Address/Zipcode’ AS SQL varchar(10)

PARTITIONED;

Page 31: DB2 for Linux, Unix, and Windows

31

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

XDA

Base Table Partition 1

XDA

Base Table Partition 2

Partitioned Regions Index

XDA

Base Table Partition 3

Partitioned Regions Index

Partitioned Regions Index

Partitioned Relational Index or

Index on XML Column

Non-Partitioned Relational Index or Index on XML Column

Partitioned Relational Index or

Index on XML Column

Partitioned Relational Index or

Index on XML Column

Non-Partitioned XML Path Index

DB2 9.7 XML Indexes on Range Partitioned Tables

Page 32: DB2 for Linux, Unix, and Windows

32

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

DB2 9.7 Online XML Index Create and Reorg

Insert/Update/Delete transactions no longer need to wait until the CREATE INDEX/REORG INDEXES/REORG INDEX statement completes

Results in increased throughput and faster response time for concurrent transactions.

Transaction 1 Transaction 2

create table employee (empid integer, info XML);

create index empidx on employee(info) generate key using xmlpattern '/employeeinfo/addr' as sql varchar(50);

Create Index will delete the index entry before it completes.

Delete from employee where empid = 31201

Delete will not wait for the Create Index and will complete successfully

reorg indexes all for table employeeallow write access

Reorg Indexes command will delete the index entry before it completes.

Delete from employee where empid = 31664

Delete will not wait for the Reorg Indexes and will complete successfully

Page 33: DB2 for Linux, Unix, and Windows

33

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

DB2 9.7 Index Compression

Default for relational and XML indexes enables compression if data row compression enabled

New COMPRESS keyword on CREATE/ALTER INDEX can override default behavior

–Index can be compressed even if data rows not compressed

MDC Block Indexes and XML Paths Indexes not compressed

CREATE INDEX deptindex on T1(XMLCol) GENERATE KEY USING XMLPATTERN '/company/dept' AS SQL VARCHAR(20)

COMPRESS YES

ALTER INDEX deptindex

COMPRESS NO

reorg indexes all for table T1allow write access

Page 34: DB2 for Linux, Unix, and Windows

34

DB2 for Linux, Unix, and Windows

pureXML Indexing Overview © 2009 IBM Corporation

What Did You Learn Today?

What the difference is between XML and relational indexes How to create an index on an XML column How to avoid common user errors What the requirements are for queries to use XML indexes How the XML indexes are defined in the catalog DB2 9.5 and DB2 9.7 XML index enhancements