ibm software group ® native xml support in

Post on 11-May-2015

794 Views

Category:

Documents

45 Downloads

Preview:

Click to see full reader

TRANSCRIPT

IBM Software Group

®

Native XML Support in DB2 Universal Database

Matthias Nicola, Bert van der LindenIBM Silicon Valley Labmnicola@us.ibm.com

Sept 1, 2005

IBM Software Group | DB2 Information Management Software

2

Agenda

Why “native XML” and what does it mean?

Native XML in the forthcoming version of DB2Native XML StorageXML Schema SupportXML IndexesXQuery, and the Integration with SQL

Summary

IBM Software Group | DB2 Information Management Software

3

XML Databases

XML-enabled DatabasesThe core data model is not XML (but e.g. relational)Mapping between XML data model and DB’s data

model is required, or XML is stored as textE.g.: DB2 XML Extender (V7, V8)

Native XML DatabasesUse the hierarchical XML data model to store and

process XML internallyNo mapping, no storage as textStorage format = processing formatE.g.: Forthcoming version of DB2

IBM Software Group | DB2 Information Management Software

4

Problems of XML-enabled Databases

CLOB storage:Query evaluation & sub-document level access requires

costly XML Parsing – too slow !

Shredding:Mapping from XML to relational often too complexOften requires dozens or hundreds of tablesComplex multi-way joins to reconstruct documentsXML schema changes break the mapping

• no schema flexibility !

• For example: Change element from single- to multi-occurrence requires normalization of relational schema & data

IBM Software Group | DB2 Information Management Software

5

Integration of XML & Relational Capabilities in DB2

SERVER

CLIENT SQL/X

XQueryXMLInterface

RelationalInterface Relational

XML

DB2 Storage:

DB2 Client /Customer Client Application

Native XML data type • (not Varchar, not CLOB, not object-relational)

XML Capabilities in all DB2 componentsApplications combine XML & relational data

HybridDB2 Engine

IBM Software Group | DB2 Information Management Software

6

Native XML Storage DB2 stores XML in parsed hierarchical format

(similar to the DOM representation of the XML infoset)

create table dept (deptID char(8),…, deptdoc xml);

Relational columnsare stored in relationalformat (tables)

XML is stored nativelyas type-annotated trees(representing theXQuery Data Model).

deptID … deptdoc

“PR27” … <dept> … <emp>…</emp></dept>

… … …

DB2 Storage

IBM Software Group | DB2 Information Management Software

7

dept

name

employee

phoneid=901

John Doe

office

408-555-1212 344

name

employee

phoneid=902

Peter Pan

office

408-555-9918 216

0

1

4

25=901

John Doe

3

408-555-1212 344

1

4

24=902

Peter Pan

3

408-555-9918 216

<dept> <employee id=901> <name>John Doe</name>

<phone>408 555 1212</phone><office>344</office>

</employee><employee id=902>

<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>

</employee></dept>

Efficient Document Tree Storage

0 dept4 employee1 name5 id2 phone3 office

String tableSYSIBM.SYSXMLSTRINGS

1 String table per database Database wide dictionary for all

tags in all XML columns

Reduces storage Fast comparisons &

navigation

IBM Software Group | DB2 Information Management Software

8

Information for Every Node

Tag name, encoded as unique StringID

A nodeID

Node kind (e.g. element, attribute, etc.)

Namespace / Namespace prefix

Type annotation

Pointer to parent

Array of child pointers Hints to the kind & name of child nodes

(for early-out navigation)

For text/attribute nodes: the data itself

IBM Software Group | DB2 Information Management Software

9

XML Node Storage Layout

Node hierarchy of an XML document stored on DB2 pages Documents that don’t fit on 1 page: split into regions/pages Docs < 1 page: 1 region, multiple docs/regions per page

Example:

Document split into 3 regions, stored on 3 pages

Split can be at anylevel of the document

IBM Software Group | DB2 Information Management Software

10

XML Storage: “Regions Index”

maps nodeIDs to regions & pages

allows to fetch required regions instead of full documents

allows intelligent prefetching

page page page

Regions index

not user defined, default component of XML storage layer

IBM Software Group | DB2 Information Management Software

11

XML Schema Support & Validation in DB2

Use of XML Schemas is optional, on a per-document basis

No need for a fixed schema per XML column

Validation per document (i.e. per row)

Zero, one, or many schemas per XML column For example: different versions of the same schema,

or schemas with conflicting definitions

Mix validated & non-validated documents in 1 XML column

Schemas are registered & stored in the DB2 Schema Repository (XSR) for fast and stable access.

IBM Software Group | DB2 Information Management Software

12

Validation using Schemas

Validate XML from a parameter marker using xsi:schemaLocation:

insert into dept(deptdoc) values (xmlvalidate(?))

Identify schema for a given document: select deptid, xmlxsrobjectid(deptdoc) from dept where deptid = “PR27”

Override schema location by referencing a schema ID or URI:

insert into dept(deptdoc) values ( xmlvalidate(? according to xmlschema id departments.deptschema)

insert into dept(deptdoc) values ( xmlvalidate(? according to xmlschema uri 'http://my.dept.com’)

IBM Software Group | DB2 Information Management Software

13

XML Indexes for High Query Performance

Define 0, 1 or multiple XML Value Indexes per XML column

XML index maps: (pathID, value) (nodeID, rowID)

Index any elements or attributes, incl. mixed content

Index definition uses an XML pattern to specify which elements/attributes to index (and which not to)

Can index all elements/attributes, but not forced to do so

Can index repeating elements 0 , 1 or multiple index entries per document

New XML-specific join and query evaluation methods, evaluate multiple predicates concurrently with minimal index I/O

xmlpattern = XPath without predicates, only child axis (/) and descendent-or-self axis (//)

IBM Software Group | DB2 Information Management Software

14

XML Indexing: Examplescreate table dept(deptID char(8) primary key, deptdoc xml);

create index idx1 on dept(deptdoc) generate keyusing xmlpattern '/dept/@bldg' as sql double;

create unique index idx2 on dept(deptdoc) generate keyusing xmlpattern '/dept/employee/@id' as sql double;

create index idx3 on dept(deptdoc) generate keyusing xmlpattern '/dept/employee/name' as sql varchar(35);

<dept bldg=101> <employee id=901> <name>John Doe</name>

<phone>408 555 1212</phone><office>344</office>

</employee><employee id=902>

<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>

</employee></dept>

…xmlpattern '//name' as sql varchar(35); (Index on ALL “name” elements)…xmlpattern '//@*' as sql double; (Index on ALL numeric attributes)…xmlpattern '//text()‘ as sql varchar(hashed); (Index on ALL text nodes, hash code)…xmlpattern ‘/dept//name' as sql varchar(35);

…xmlpattern '/dept/employee//text()' as sql varchar(128); (All text nodes under employee)

…xmlpattern 'declare namespace m="http://www.myself.com/"; /m:dept/m:employee/m:name‘ as

sql varchar(45);

IBM Software Group | DB2 Information Management Software

15

Querying XML Data in DB2

The following options are supported:

XQuery/XPath as a stand-alone language

SQL embedded in XQuery

XQuery/XPath embedded in SQL/XML

Plain SQL for full-document retrieval

Compiler/Optimizer Details: Beyer et al. “System RX”, SIGMOD 2005 [2]

IBM Software Group | DB2 Information Management Software

16

Example: XQuery as a stand-alone Language in DB2

create table dept(deptID char(8) primary key, deptdoc xml);

for $d in db2-fn:xmlcolumn(‘dept.deptdoc’)/deptlet $emp := $d//employee/namewhere $d/@bldg = > 95 order by $d/@bldgreturn <EmpList> {$d/@bldg, $emp} </EmpList>

db2-fn:xmlcolumn returns the sequence ofall documents in the specified XML column

<dept bldg=101> <employee id=901> <name>John Doe</name>

<phone>408 555 1212</phone><office>344</office>

</employee><employee id=902>

<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>

</employee></dept>

IBM Software Group | DB2 Information Management Software

17

Examples: SQL embedded in XQuery create table dept(deptID char(8) primary key, deptdoc xml);

Identify XML data by a SELECT statementLeverage predicates/indexes on relational columns

for $d in db2-fn:sqlquery('select deptdoc from dept (single document) where deptID = “PR27” ')…

for $d in db2-fn:sqlquery('select deptdoc from dept (some documents) where deptID LIKE “PR%” ')…

for $d in db2-fn:sqlquery('select dept.deptdoc from dept, unit (some documents) where dept.deptID=unit.ID and unit.headcount > 200’)…..

for $d in db2-fn:xmlcolumn(‘dept.deptdoc’)/dept , $e in db2-fn:sqlquery('select xmlforest(name, desc)

from unit u’)…

(constructed documents)(join & combine XML and relational data)

IBM Software Group | DB2 Information Management Software

18

Example: XQuery embedded in SQL/XML

SQL/XML Standard Functions: xmlexists, xmlquery, xmltable

create table dept(deptID char(8) primary key, deptdoc xml);

select deptID, xmlquery(‘for $i in $d/dept

let $j := $i//name return $j’ passing deptdoc as “d”) from deptwhere deptID LIKE “PR%”

and xmlexists(‘$d/dept[@bdlg = 101]’ passing deptdoc as “d“)

IBM Software Group | DB2 Information Management Software

19

Other Features in DB2 native XML

XML Text Search SupportXML Import/ExportXML Type in Stored ProceduresAPI Extensions (JDBC, CLI, .NET, etc.)XML Schema RepositoryFull SQL/XML supportVisual XQuery BuilderAnnotated schema shredding…and more

IBM Software Group | DB2 Information Management Software

20

Summary

CLOB and shredded XML storage restrict performance and flexibility

New native XML support in DB2:

Better Performance through Hierarchical & parsed XML representation at all layers Path-specific XML Indexing New XML join and query methods

Higher Flexibility through: Integration of SQL and XQuery Schemas are optional, per document, not per column Zero, one, or many XML schemas per XML column

IBM Software Group | DB2 Information Management Software

21

Questions?

db2 =>

mnicola@us.ibm.com

IBM Software Group | DB2 Information Management Software

22

BACKUP CHARTS

IBM Software Group | DB2 Information Management Software

23

Shredding: A simple case

<DEPARTMENT deptid="15" deptname="Sales"> <EMPLOYEE> <EMPNO>10</EMPNO> <FIRSTNAME>CHRISTINE</FIRSTNAME> <LASTNAME>SMITH</LASTNAME> <PHONE>408-463-4963</PHONE> <SALARY>52750.00</SALARY> </EMPLOYEE> <EMPLOYEE> <EMPNO>27</EMPNO> <FIRSTNAME>MICHAEL</FIRSTNAME> <LASTNAME>THOMPSON</LASTNAME> <PHONE>406-463-1234</PHONE> <SALARY>41250.00</SALARY> </EMPLOYEE></DEPARTMENT> Department

DEPTID DEPTNAME15 Sales

EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY

15 27 MICHAEL THOMPSON 406-463-1234 4125015 10 CHRISTINE SMITH 408-463-4963 52750

IBM Software Group | DB2 Information Management Software

24

Shredding: A schema change…“Employees are now allowed to have multiple phone numbers…”

<DEPARTMENT deptid="15" deptname="Sales"> <EMPLOYEE> <EMPNO>10</EMPNO> <FIRSTNAME>CHRISTINE</FIRSTNAME> <LASTNAME>SMITH</LASTNAME> <PHONE>408-463-4963</PHONE> <PHONE>415-010-1234</PHONE> <SALARY>52750.00</SALARY> </EMPLOYEE> <EMPLOYEE> <EMPNO>27</EMPNO> <FIRSTNAME>MICHAEL</FIRSTNAME> <LASTNAME>THOMPSON</LASTNAME> <PHONE>406-463-1234</PHONE> <SALARY>41250.00</SALARY> </EMPLOYEE></DEPARTMENT>

PhoneEMPNO PHONE

27 406-463-123410 415-010-1234

10 408-463-4963

Requires:• Normalization of existing data !• Modification of the mapping• Change of applications

Costly!

DepartmentDEPTID DEPTNAME

15 Sales

EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY

15 27 MICHAEL THOMPSON 406-463-1234 4125015 10 CHRISTINE SMITH 408-463-4963 52750

IBM Software Group | DB2 Information Management Software

25

Native XML in DB2: Key Themes

Standards compliant + driving the standards XML, XQuery, SQL/XML, XML Schema…

Flexibility, because that is what XML is all about.. Zero, one, or many XML schemas per XML column

Native (hierarchical) Storage & Sophisticated XML Indexes New “pivot join” to evaluate many predicates concurrently

Integrated in DB2 Leveraging scalability, reliability, performance, availability…

Integrated with application APIs: JDBC, ODBC, .NET, embedded SQL, CLI,…

Integrated with SQL Access relational data and XML data in same statement

IBM Software Group | DB2 Information Management Software

26

XML Index DDL

create index idx1 on T(xmlcol) generate key using xmlpattern '/a/b/@c' as sql date;

Declaration & use of namespace prefix supported (not shown above)

AS SQL VARCHAR (integer)

CREATE index-name

ON table-name (xml-column-name) GENERATE KEY USING xmlpattern

UNIQUE

text()

@attribute-tag

@*

///

element-tag

*///

INDEX

DOUBLE

DATETIMESTAMP

VARCHAR (HASHED)

xmlpattern:

xmlpattern = XPath without predicates, only child axis (/) and descendent-or-self axis (//)

top related