© 2006 IBM Corporation
IBM Information Management
A next generation hybrid data server
Holger [email protected]
DB2 Information Management DevelopmentIBM Laboratory Boeblingen
LUW
DB2 Version 9 – the Viper Release
122. Datenbankstammtisch der HTW Dresden Fachbereich Informatik/Mathematik13. Dezember 2006
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
"There are 68 patents alone in Viper, and it involved 750 developers over five years,"
Bob Picciano, VP WW Information Management Sales said.
"This is something no one else has and will take years to get here."
There's a lot of innovation in Viper.Let’s go and explore ….
Explore yourself for free withDB2 Express-C 9:
à http://www-306.ibm.com/software/data/db2/express/
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Agenda – Start the hybrid engine
• < Overview />
• pureXML Storage
• XML Indexes
• XQuery & SQL/XML support
• XML Schema support (XSR)
• Utilities, Tools & API’s
• Summary
• XML Query Execution
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
History of XML: How it all began …
XML 1.0W3C Recommendation
1998 2000
2004
beginning 1980’sThe relational data-model
becomes popular
XML 1.02nd Edition
XML 1.03rd Edition
& XML 1.1
1983
SGML Standardization(ISO)
1993HTML
1st Version
2005
The “Standard Generalized Markup Language”(SGML) is a metalanguage in which you can definemarkup languages for documents. SGML was originallydesigned for sharing machine readable documents. Italso has been used extensively in the printing andpublishing industries.
The Extensible Markup Language (XML) is a general-purpose markup language for creating special-purpose markup languages, capable of describing many different kinds of data. It is a simplified subset of SGML.
“HyperText Markup Language” (HTML) is a markup language (subset of SGML) designed for the creation of web pages and other information viewablein a browser. HTML is used to structure informationand can be used to describe the appearance and semantics of a document.
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
… and where we are today – the hype
XHTML
JAX-RPC
DOMSAX
Windows Installer XML
XML Schema XQuery
XUpdate
XPath
XSLT Ajax
UDDISOAP
WSDL
SQL/XML
XPointer
XLink
RSS
XML-FO CSS
XML INFOSET
Native XML Databases
XML-enabled Databases
XML 1.0W3C Recommendation
1998 2000
2004
XML 1.02nd Edition
XML 1.03rd Edition
& XML 1.1
1983
SGML Standardization(ISO)
1993HTML
1st Version
2005
Pls Note: The order of the different technologies mentioned above does not reflect their 100% order of invention/ appearance.
XForms
2006
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
The evolution to a new database technology
û no chance to query XML directly on database tierû XML is read in strings and passed to “middle-tier”, which then queries against XML data
Last generation database technology ..
ü new XML datatype allows to store native data inside the databaseü run queries against XML with XQuery or XPathü embed XQuery statements directly into SQL statementsü special XML indexes are used to boost performanceü assign a schema to XML data, ensure that XML data is valid
.. needed data types like CLOB or text to store XML data:
Next generation database technology …… interacts directly with the XML data:
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XML Developer“I see a sophisticated XML repository that also supports SQL."
SQL Developer"I see a sophisticated
RDBMS that also supports XML."
Familiar Programming Models
OptimizedStorage Models
MatureServices
Familiar Tooling
OptimizedPerformance &
Scale
A New Model is Emerging – a hybrid system
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Agenda – Start the hybrid engine
• Overview
• < pureXML Storage />
• XML Indexes
• XQuery & SQL/XML support
• XML Schema support (XSR)
• Utilities, Tools & API’s
• Summary
• XML Query Execution
9 DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
pureXML in DB2 9
§ Standards compliant & driving the standards – XML, XQuery, SQL/XML, XML Schema …
§ 100% integrated in DB2 – leveraging performance, scalability, reliability, availability …
§ 100% integrated with SQL – XML is a new SQL type
– Access relational and XML data in same statement
§ 100% integrated with application APIs: – JDBC, ODBC, .NET, embedded SQL, PHP
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
pureXML in DB2 9§ What does pureXML® support mean?
– Storage, compiler, optimizer, indexing, tools, utilities, APIs, …à XML capabilities in all DB2 components
§ pureXML® Storage– XML stored in parsed, annotated DOM-like trees – the XQuery Data Model is persistedà NOT shredded, NOT as LOB
– XML data is formatted to buffered data pages (LOB pages or not buffered!)
– XML data can be placed in separate table spaceà Shared with LOB data of that table
– New data XDA object on disk (new data type)
§ Customer benefits– Faster navigation and queries– Simpler indexing– Natural XML user paradigm
The XDA object
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Integration of XML & Relational Capabilities
DB2 HYBRID DATA SERVER
CLIENT SQL/XML
XQuery
DB2 Engine
XML
Interface
Relational
Interface Relational
XML
DB2 Storage:
DB2 Client /Customer Client Application
§ Native XML data type (server & client side)– not Varchar, not CLOB, not object-relational !
§ XML capabilities in all DB2 components
§ Applications can combine XML & relational data
CompIler
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
The XML Data Type
DB2 storage
………
<dept>
<emp>…</emp>
</dept>
…“PR27”
deptdoc…deptID
§ A column of type XML can hold one well-formed XML document for everyrow of the table
create table dept (deptID char(8),…, deptdoc xml);
Relational columns are stored in relational format (tables)
XML values are stored nativelyin the XQuery Data Model
A descriptor pointing to the XML storage is stored in the row
§ XML and relational columns are stored differently:
§ no limit on size of XML document (no length associated with XML data type, client-server protocol limits document size to 2GB at the moment)
§ Parse-once paradigm: No XML parsing at query time!
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
DB2 XML Storage – XML to the core
§ Document is stored in parsed hierarchical representation– This is similar to a DOM representation of the XML INFOSET– IBM’s version of open-source Xerces is used.– The XQuery Data Model is persisted
§ All XML nodes are type annotated, according to the XQuery Specification (W3C)– XML Schema types if validated.– Default types otherwise.
§ All data is stored in UTF-8– Regardless of the document encoding– Regardless of the locale– Regardless of the codepage of the database
store XML intact with full DBMS knowledge of documents internal structure
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Information stored with every node:
§ Name (e.g. element name, encoded as StringID from the string table)§ A nodeID§ Type of node (e.g. element, attribute, etc.)§ Namespace§ Namespace prefix§ Data type§ Pointer to parent§ Array of child pointers§ For text/attribute notes: the data itself
DB2 XML Storage – XML to the core
15 DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
DB2 XML storage – XML to the core
§ Node hierarchy of an XML document stored on DB2 pages
– Large documents split into pages/regions
§ Nodes are physically connected– Query performance
§ Regions are logically connected– Regions index is a system
component
page page page
Regions index
Large XML document
split into
regions
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
New System Indexes
§ Entries in SYSCAT.INDEXES with the following INDEXTYPE:
§ XRGN: XML Region Index– Created once for table with XML column(s)– Maps logical pointers to XML data pages
§ XPTH: XML Path Index– Created for each XML column– Holds local subset of global path/pathID mapping information /
path table– Can be used for wildcard resolution
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
DB2 XML Storage
…PR28
…ACC
…PR27
DEPTDOC…ID
Region Path
/dept/dept/employee/dept/employee/@id…
INX ObjectDAT Object
XDA Object
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
• Overview
• pureXML Storage
• < XML Indexes />
• XQuery & SQL/XML support
• XML Schema support (XSR)
• Utilities, Tools & API’s
• Summary
Agenda – Start the hybrid engine
• XML Query Execution
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XML Indexes for High Query Performance
§ Index elements and attributes inside the document
§ Uses an XML pattern expression to index paths and values in XML documents stored in a single XML column
§ Specify the type to index§ Should be the same as used in the query§ Query /Person[Age = 5] needs a numeric index on Age
§ 0,1 or multiple index entries per document
create table t1 (docID int, XMLDoc xml);create index AgeIndex on t1( XMLDoc);
generate key using xmlpattern '/Person/Age' as sql Double;
NOTE: Declaration & use of namespace prefix supported (not shown above)
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XML Index DDL
AS SQL VARCHAR (integer)
CREATE index-name
ON table-name (xml-column-name) GENERATE KEY USING xmlpattern
UNIQUE
text()@attribute-tag@*
///
element-tag*
///
INDEX
DOUBLEDATETIMESTAMP
VARCHAR (HASHED)
xmlpattern:
xmlpattern = XPath without predicates, only child axis (/) and descendent-or-self axis (//)
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
§ create unique index idx1 on dept(deptdoc) generate keyusing xmlpattern '/dept/@bldg' as sql double;
§ create unique index idx2 on dept(deptdoc) generate keyusing xmlpattern '/dept/employee/@id' as sql double;
§ create index idx3 on dept(deptdoc) generate keyusing xmlpattern '/dept/employee/name' as sql varchar(35);
<dept bldg=101><employee id=901>
<name>John Doe</name><phone>408 555 1212</phone><office>344</office>
</employee><employee id=902>
<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>
</employee></dept>
XML Index Examplescreate table dept(deptID char(8) primary key, deptdoc xml);
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XML Index Wizard (DB2 Control Center)
Create a value index onXML elements orXML attributes by right-clicking in the document structure
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XML Index Wizard (DB2 Control Center)
A pop-up menu showspossibilities to createXML value index onselected XML node
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
The Big Indexing Picture
SQL Table with XML Column
XML ColumnPathsIndex
XML Storage.XDA file
XML Regions Index
Catalog Path Table
Index on XML Column
Relational Column 1 Relational Column 2 XML Column
Relational Index
Maps paths to path ids for each XML column. Subset of paths stored in global catalog path table.
Logical mapping of regions in an XML document used to retrieve the document data
Created by users to improve performance during queries on XML documents
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
A full text index for XML
§ XML is less like traditional data stored in database
§ Applications on XML documents often rely on a full text index
§ DB2 9 offers both – Traditional-behaving database indexes – Full-text indexing
§ Existing Net Search Extender is used for full text index– XML aware: limit search to specific elements or attributes– Proximity searches– Wildcard searches– and a lot more … text
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
• Overview
• pureXML Storage
• XML Indexes
• < XQuery & SQL/XML support />
• XML Schema support (XSR)
• Utilities, Tools & API’s
• Summary
Agenda – Start the hybrid engine
• XML Query Execution
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XQuery and SQL/XML
§ DB2 treats both, SQL and XQuery as primary query languages(hybrid system)
§ SQL and XQuery independently operate on their respective data models
§ DB2 also allows to combine and correlate relational and XML types of data
Two ways to query XML data:
Next section:- Querying XML data with SQL- Optional: XQuery embedded in SQL
This section: - Querying XML data with XQuery- Optional: SQL embedded in XQuery
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
What is XQuery?
XQuery 1.0 & XPath 2.0 Data Model
XQUERY
www.w3.org/TR/xquery-operators/
www.w3.org/TR/query-datamodel/
Expressions
Functions & Operators
XPath 2.0XMLSchema
www.w3.org/TR/xquery
www.w3.org/TR/xpath20/
www.w3.org/XML/Schema
A query language designed for XML data……and supported in DB2 9.
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
The FLWOR Expression
§ FOR: iterates through a sequence, bind variable to items
§ LET: binds a variable to a sequence
§ WHERE: eliminates items of the iteration
§ ORDER: reorders items of the iteration
§ RETURN: constructs query results
for $movie in db2-fn:xmlcolumn(‘movies.doc’)let $actors := $movie//actorwhere $movie/duration > 90 order by $movie/@yearreturn <movie>
{$movie/title, $actors}</movie>
<movie><title>Chicago</title><actor>Renee Zellweger</actor><actor>Richard Gere</actor><actor>Catherine Zeta-Jones</actor>
</movie>
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Which data does XQuery (as a primary language) work on?
§ All XML data is in XML typed columns in tables
§ XQuery standard defines a “collection” function– Very abstract, implementation dependent
§ DB2 XQuery uses 2 XQuery functions to get data:
– db2-fn:xmlcolumn()– db2-fn:sqlquery()
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
1)1) Identifying XML data by column: Identifying XML data by column: db2-fn:xmlcolumn()
Querying XML data – with XQuery
for $d in db2-fn:xmlcolumn(‘dept.deptdoc’)/dept/employee
operate on entire XML column
2)2) Identifying XML data via a select statement: Identifying XML data via a select statement: db2-fn:sqlquery()Leverage predicates/ indexes on relational columns:
§ for $d in db2-fn:sqlquery(“select deptdoc from dept”)/dept/employee
§ for $d in db2-fn:sqlquery(“select deptdoc from dept where deptID = ‘PR27’ ”)
§ for $d in db2-fn:sqlquery(“select deptdoc from dept where contains(deptdoc, SECTION(/dept/employee/) ‘John’)=1”)
… entire column
… single document
… some documents
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Querying XML data – with XQueryThis query returns all customerinfo elements in documents in the CUSTOMER.INFO column where the value of the attribute Cid is greater than 1000.
Prefix each XQueryquery with thekeyword ‘XQuery’to indicate the DB2 parser to use the XQuery language.
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Visual XQuery Builder integrated in DB2 Developer Workbench (Eclipse IDE)
Querying XML data – with XQuery
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XQuery and SQL/XML
§ DB2 treats both, SQL and XQuery as primary query languages(hybrid system)
§ SQL and XQuery independently operate on their respective data models
§ DB2 also allows to combine and correlate relational and XML types of data
Two ways to query XML data:
This section:- Querying XML data with SQL- Optional: XQuery embedded in SQL
Last section: - Querying XML data with XQuery- Optional: SQL embedded in XQuery
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Querying XML data – with SQL/XML
SQL/XML Publishing Functions since DB2 V8.2
Castconverts XML data type to serialized XML as a char/ varchar/ clob/ blob
XMLSERIALIZECastconverts the XML data type into a CLOBXML2CLOBAggregateto group or aggregate XML dataXMLAGGScalarproduces a namespace declarationXMLNAMESPACESScalarconcatenates a variable number of XML valuesXMLCONCAT
Scalarproduces a forest of XML elements from SQL values
XMLFORESTScalarused within XMLELEMENT, specifies attributesXMLATTRIBUTESScalargenerates an XML elementXMLELEMENTTypeDescriptionFunction
Several functions are available to enable XML values to be constructed, or published, from SQL values.
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Querying XML data – new SQL/XML functions in DB2 9
Executes an XQuery, returns the result sequence as a relational table (if possible)
Executes an XQuery and returns the result sequence
Determines if an XQuery returns a result (i.e. a sequence of one or more items, non-empty sequence)
Validates XML value against XML schema and type-annotates the XML value.
Parses character/ BLOB data and produces XML value.
Description
Refer to following slides
SELECT ID, XMLQUERY(‘for $i in $d/dept
let $j := $i//namereturn $j’passing xmldoc as “d”)
FROM T1
SELECT ID FROM T1 WHERE
XMLEXSISTS (‘$d/dept[@bldg = 101]’passing xmldoc as “d”)
INSERT INTO T1(XMLDOC) VALUES(XMLVALIDATE (XMLPARSE(DOCUMENT ‘<a>...</a>’))
according to xmlschema id ‘ibm.invoice’)
INSERT INTO T1(XMLDOC) VALUES(XMLPARSE(DOCUMENT ‘<a>some XML doc</a>’))
Example
XMLTABLE
XMLQUERY
XMLEXISTS
XMLVALIDATE
XMLPARSEFunction
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Querying XML data – with SQL/XML
-- Create a new table with xml datatype columnCREATE TABLE dept(deptID char(8) primary key, deptdoc xml)
-- Plain SQL to get full XML document(s)SELECT deptID, deptdoc FROM dept WHERE deptID = “PR37”
-- SQL with embedded XPath or XQuery expressionSELECT deptID,
XMLQUERY(‘for $i in $d/deptlet $j := $i//namereturn $j’ passing deptdoc as “d”)
FROM deptWHERE deptID LIKE “PR%”AND XMLEXISTS(‘$d/dept[@bldg = 101]’ passing deptdoc as “d”)
Examples
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Querying XML data – with SQL/XML
<dept bldg=101><employee id=901>
<name><first>John</first><last>Doe</last>
</name><office>344</office>
</employee><employee id=902>
<name><first>Peter</first><last>Pan</last>
</name><office>216</office>
</employee></dept>
216PanPeter902
344DoeJohn901
officelastnamefirstnameempID
SELECT X.* FROM dept,XMLTABLE (‘$d/dept/employee’ passing deptdoc as “d”
COLUMNS “empID” INTEGER PATH ‘@id’,“firstname” VARCHAR(30) PATH ‘name/first’,“lastname” VARCHAR(30) PATH ‘name/last’,“office” INTEGER PATH ‘office’) AS “X”
XMLTABLE(), generates a table from XML data
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Querying XML data – with SQL/XMLVisual SQL Builder integrated in DB2 Developer Workbench (Eclipse IDE)
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Querying XML data – with SQL/XML
Graphically create SQL andSQL/XML queries with the support of an Expression Builder
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XML Data Modification
§ Data Modification Language (DML) only supports full document replace (no XUpdate standard yet):update dept set deptdoc = ? where …
§ DB2 provides a Stored Procedure for sub-document level updates:– Value updates of text nodes or attributes
– Replace elements or document subtrees
– Delete any node or subtree
– Insert (append) any element or subtree
– Document to update: identified by SQL or XQuery
– New values or elements can be static, or produced on the fly by SQL or XQuery
– One or multiple updates in 1 stored procedure call
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XML Data Modification
Call DB2XMLFUNCTIONS.XMLUPDATE (
'<updates>
<update action="replace" col=“1”
path="/dept/employee[@id=301]/phone"><phone>408-463-4963</phone>
</update>
(…)
</updates>',
'Select deptdoc from dept where deptid=1006','',?,?);
Which doc to update
What to update
New value(static)
Type of update
action = replace | append | delete
1 or moreupdates
“Update the phone number of employee 301”
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XML Data Modification
Call DB2XMLFUNCTIONS.XMLUPDATE (
'<updates>
<update using="XQUERY" action="replace" col="1“
path="/dept/employee[@id=301]/phone">for $i in db2-fn:xmlcolumn(‘T.col’)/Phone
where $i/change/emp/@id = 301
return $i/phone
</update>
(…)
</updates>',
'Select deptdoc from dept where deptid=1006','',?,?);
using = XQUERY | SQL
New value, produced by an XQuery
“Update the phone number of employee 301”
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
• Overview
• pureXML Storage
• XML Indexes
• XQuery & SQL/XML support
• XML Schema support (XSR)
• Utilities, Tools & API’s
• Summary
Agenda – Start the hybrid engine
• < XML Query Execution />
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
DB2 Query Operators (Explain)§ Base access methods: TBSCAN, IXSCAN, FETCH
§ Joins: NLJOIN, MSJOIN, HSJOIN
§ Aggregation: GRPBY
§ Temping: TEMP
§ Sorting: SORT
§ Index AND’ing, dynamic bit map indexing: IXAND
§ Index OR’ing, list prefetch: RIDSCN
§ XML Scan and Navigation: XSCAN
§ XML Index access: XISCAN
§ XML Index anding: XANDOR
§ Table queues (xTQ)
New !
Matthias Nicola, IBM SVL
Extended hybrid optimizer
Tom Eliaz,
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Query: /customerinfo[name=“Matt Foreman” and phone=“905-555-4789”]
<customerinfo><name>Matt Foreman</name><phone>905-555-4789</phone>
</customerinfo>
XSCAN – XML Scan Operator
RETURN |
NLJOIN |
/-+-\/ \
TBSCAN XSCAN||
TABLE: MNICOLA.MYTEST
No indexXSCAN = XML Document Scan
• Navigates 1 document at a time
• Evaluates the expression /customerinfo[…]• Returns XML nodes that satisfy the expression
• Takes input via sideways passing NLJOIN
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XISCAN – XML Index Scan Operator
RETURN |
NLJOIN |
/-+-\/ \
FETCH XSCAN|
/---+---\/ \
RIDSCN TABLE: | MNICOLA.MYTEST
SORT |
XISCAN
1 index, on name Find matching rows efficiently
using XML Indexes
• Evaluates the expression /customerinfo[name=“Matt Foreman”]
• Varchar(hashed) index may produce falsepositives -> eliminated by XSCAN
•Only for value comparisons, not for “structural” predicates (element existence)
Matthias Nicola, IBM SVLTom Eliaz,
Query: /customerinfo[name=“Matt Foreman” and phone=“905-555-4789”]<customerinfo>
<name>Matt Foreman</name><phone>905-555-4789</phone>
</customerinfo>
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XANDOR – Pivot XML Anding (and oring)
RETURN |
NLJOIN |
/-+-\/ \
FETCH XSCAN|
/---+---\/ \
RIDSCN TABLE: | MNICOLA.MYTEST
SORT |
XANDOR|
/-+-\/ \
XISCAN XISCAN
2 indexes, on name and phone
Efficient XML Index ANDingusing pivot algorithm
• Combine the results of 2 or more XISCANs
• Only for equality predicates without wildcards,traditional IXAND used otherwise
Matthias Nicola, IBM SVLTom Eliaz,
Query: /customerinfo[name=“Matt Foreman” and phone=“905-555-4789”]
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
• Overview
• pureXML Storage
• XML Indexes
• XQuery & SQL/XML support
• < XML Schema support (XSR) />
• Utilities, Tools & API’s
• Summary
Agenda – Start the hybrid engine
• XML Query Execution
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
DB2 XML Schema Repository (XSR)§ Database needs a Schema repository
– Stable & high performance access to Schemas for validation at XML insert/update time
– Support for XML Schema management
§ DB2 XML Schema Repository (XSR)– XML Schemas are registered
• Consistent set of .xsd document– Registered Schema identification
• A SQL 2-part name• The URL the Schema is externally known as (e.g. used in schemaLocation attributes)• The "primary namespace"
– Also used by Shred• Stores annotated Schema• Internal formats to make Shredding effecient
– Also DTDs and External entities• Used for entity reference resolution and defaults• NOT used for validation
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Register a Schema – via DB2 Control Center
Already registeredXML Schema documents
Register new XMLSchema via wizard.
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Register a new Schema – via DB2 Developer Workbench
Invoke Schemaregistration wizard
Browse registeredSchemas
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Validation is optional.
Can override schema location found in the documentby referencing a schema from DB2’s schema repository:
insert into dept(deptdoc) values (xmlvalidate(?))
insert into dept(deptdoc) values (xmlvalidate(? according to xmlschema id “ibm.invoice”)
insert into dept(deptdoc) values (xmlvalidate(? according to xmlschema uri ‘http://my.world.com’)
XMLVALIDATE
create table dept(deptID char(8) primary key, deptdoc xml);
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Schema evolution with DB2 9
§ No agreement how to evolve schemas because the general problem is very complex
§ Applications do it anyway because there are point solutions
§ Enable schema evolution (don't prevent it)
§ DB2 XML Schema Repository is very flexible– Register conflicting schemas– Register schemas with same namespace– Register schemas with same URL
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Shredding into relational tables§ There are still reasons to shred XML:
– Co-existence with legacy applications– Relational processing is faster than XML– Analytics/cubes work over non-XML data
§ Mapping from XML to relational: – Annotate the XML schema– Register XML schemas in the schema repository– Shred via CLP commands or stored procedure calls
§ Replaces XML Extender shred (XML collection)– Faster; using XML Schema
Annotation Example:<xsd:element name=“phone" type="xsd:string“
db2-xdb:rowSet="employee_tab"db2-xdb:column=“phone_col"/>
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Define XML mapping rules – DB2 Developer Workbench
Invoke AnnotatedXSD mapping editor
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Define XML mapping rules – DB2 Developer Workbench
Graphically define mapping rules fromXML to a relational schema
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
• Overview
• pureXML Storage
• XML Indexes
• XQuery & SQL/XML support
• XML Schema support (XSR)
• < Utilities, Tools & API’s />
• Summary
Agenda – Start the hybrid engine
• XML Query Execution
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XML Utilities & Tools
§ XML Import & Export
§ XML Runstats
§ XML type support in stored procedures
§ XML type supported by HADR replication
§ Control Center extensions (e.g. Index creation wizard)
§ DB2 Developer Workbench
§ and more…
Enhancements for the new XML data type
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
XML API enhancements
§ New XML type support added to APIs:
– JDBC, .NET, ODBC/CLI, Embedded SQL
§ SQL/XML supported by all APIs
§ XQuery supported by all APIs
– Result sequence will be treated as a resultset
– Each item will be treated as a row
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Agenda – Start the hybrid engine
• Overview
• pureXML Storage
• XML Indexes
• XQuery & SQL/XML support
• XML Schema support (XSR)
• Utilities, Tools & API’s
• < Summary />
• XML Query Execution
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Summary
§ Standards compliant & driving the standards – XML, XQuery, SQL/XML, XML Schema …
§ 100% integrated in DB2 – leveraging performance, scalability, reliability, availability …
§ 100% integrated with SQL – XML is a new SQL type
– Access relational and XML data in the same statement
§ 100% integrated with application APIs: – JDBC, ODBC, .NET, embedded SQL
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Summary
§ Flexibility because that is what XML is all about…– any document, any schema, not just the ones that are mapped to relational
tables
§ pureXML storage – XML is parsed and stored hierarchical. – shredded: using annotated Schema– CLOB/ BLOB
§ Sophisticated XML indexing
§ Broad XQuery support– both embedded in SQL and as a primary language
§ Supports Digital Signatures– signatures can be validated on retrieved documents
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Users will see1. New XML data type for columns
create s1.t1 (c1 int, c2 xml)
2. Language bindings for the new XML typeJava, .Net, C, Cobol, Embedded SQL
3. New XML indexescreate index ix1 on t1(c2) generate keys using
pattern ‘/dept/emp/@empno’
4. An XML Schema/DTD repository
5. Support for XQuery as a primary language as well as:Support for SQL within XQuerySupport for XQuery with SQLSupport for new SQL/XML functions
6. Performance, scale, and everything else expected from a DBMS
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
References
www.ibm.com/db2/viperDB2 9 on the net
http://www-128.ibm.com/developerworks/db2/Articles @ IBM developerWorks
DB2 9 – The next generation hybrid data server
IBM Software Group - Information Management
Thank you for your attention
Holger SeubertSoftware Engineer
DB2 Information [email protected]