d15 - what's new for purexml in db2 9.7

Upload: srikalan

Post on 05-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    1/38

    May 15, 2009 12:30 pm 01:30 pmPlatform: DB2 for Linux, UNIX, and Windows

    Matthias NicolaIBM Silicon Valley Lab

    Session: D15What's new for pureXML in DB2 9.7

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    2/38

    2

    Key Points

    Learn about new pureXML capabilities planned forDB2 and how you can benefit from it.

    Learn new ways for managing, partitioning, andscaling the XML data that you accumulate.

    Learn how to support and exploit XML in data

    warehouses. Learn how you can query XML data with plain SQL

    queries, without any XPath or XQuery involved !

    Learn how IBM might respond to specific featurerequests from DBAs and application developers.

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    3/38

    3

    Agenda Recap: pureXML in DB2 9 and 9.5 Admin functions for the DBA

    Compressing XML Data and Indexes SQL Access to XML Data Partitioning and Clustering with XML Data

    XML in Range-Partitioned Tables XML in MDC Tables XML in Partitioned Databases (DPF)

    XML in User-Defined Functions Online CREATE and REORG of XML Indexes Bulk Decomposition

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    4/38

    4

    Recap: pureXML in DB2 9.1 and 9.5create table customer (cid integer, info XML)

    insert into customer (cid, info) values (?,?)

    select cid, info from customer

    select xmlquery ('$INFO /customer/name ')from customerwhere cid > 1234 andxmlexists ('$INFO /customer/addr [zip = 95123]')

    xquery for $i in db2-fn:xmlcolumn(" CUSTOMER. INFO")/ customerwhere $i /addr/zip = 95123return {$i /name }

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    5/38

    5

    create index idx1 on customer( info) generate key usingxmlpattern ' /customer/addr/zip ' as sql varchar(5)

    update customerset info = ?where .

    update customerset info = xmlquery (copy $new := $INFOmodify do replace value of $new /customer/addr/zip

    with 95141return $new ')

    where ;

    Plus: XML Schema Support, Utilities, Shredding, XSLT, etc.

    Recap: pureXML in DB2 9.1 and 9.5

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    6/38

    6

    Recap: XML Storage in DB2 9.1

    PR28

    ACC

    PR27

    DOC (XML)ID Regions

    Index

    INX Object (Indexes)DAT Object (rel. Data)

    XDA Object (XML Data)

    Regions indexfacilitates access todocument regionsIn the XML dataarea.

    Like LOBs, XML datais stored separatelyfrom the base table.

    Unlike LOBs, XMLdata is buffered inthe buffer pool.

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    7/38

    7

    DB2 9.5: Base Table Inlining for small docs

    PR28

    ACC

    PR27

    DOC (XML)ID Regions

    Index

    INX Object (Indexes)DAT Object (rel. Data)

    XDA Object (XML Data)

    Documents that aresmall enough can bestored in the base

    table.

    and can becompressed !

    XDA can notbe compressedin DB2 9.5.

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    8/38

    8

    Agenda Recap: pureXML in DB2 9 and 9.5 Admin functions for the DBA

    Compressing XML Data and Indexes SQL Access to XML Data Partitioning and Clustering with XML Data

    XML in Range-Partitioned Tables XML in MDC Tables XML in Partitioned Databases (DPF)

    XML in User-Defined Functions

    Online CREATE and REORG of XML Indexes Bulk Decomposition

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    9/38

    9

    Admin Functions for Inlining

    ADMIN_IS_INLINED(xmlcol

    ) 1 , if the document in the current row is inlined. 0 , if the document in the current row is not inlined.

    ADMIN_EST_INLINE_LENGTH( xmlcol ) Inline length that would allow the XML document in the

    current row to be inlined (estimated value)

    -1 , if the document is too large to be inlined. -2 , for documents inserted in previous versions of DB2

    Both functions return NULL if the XML column is NULL

    CREATE TABLE customer(id int, xmlcol XML INLINE LENGTH 1000 );

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    10/38

    10

    Admin Functions for InliningSELECT count(xmlcol) as total,

    sum( ADMIN_IS_INLINED (xmlcol)) as inlinedFROM customer;

    TOTAL INLINED----------- -----------

    6 2

    1 record(s) selected.

    SELECT id, ADMIN_IS_INLINED (xmlcol) AS inlined

    FROM customer;

    ID INLINED---------------- ----------------

    1000 11001 01002 11003 01004 01005 0

    6 record(s) selected.

    2 out of 6documents are

    inlined

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    11/38

    11

    Admin Functions for InliningSELECT id, ADMIN_IS_INLINED (xmlcol) AS inlined,

    ADMIN_EST_INLINE_LENGTH (xmlcol) AS inline_lengthFROM customer;

    ID INLINED INLINE_LENGTH----------- ----------- -------------

    1000 1 770

    1001 0 23451002 1 7961003 0 14891004 0 19101005 0 -1

    6 record(s) selected.

    Is inlined, uses770 bytes.

    Too large to be in-linedfor the given page size

    Not inlined,requires inlinelength > 1489

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    12/38

    12

    Agenda Recap: pureXML in DB2 9 and 9.5 Admin functions for the DBA

    Compressing XML Data and Indexes SQL Access to XML Data Partitioning and Clustering with XML Data

    XML in Range-Partitioned Tables XML in MDC Tables XML in Partitioned Databases (DPF)

    XML in User-Defined Functions

    Online CREATE and REORG of XML Indexes Bulk Decomposition

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    13/38

    13

    XML Data & Index Compression

    Compression of DAT Object Compression of XDA Object*

    Separate compressiondictionaries for DAT and XDA*

    Compression of any user-

    defined index* No compression of regions

    index or MDC block indexes

    CREATE TABLE customer(id int, xmlcol XML)COMPRESS YES;

    *see next slide for details

    PR28

    ACC

    PR27

    DOC (XML)ID

    PR28

    ACC

    PR27

    DOC (XML)ID RegionsIndex

    INX Object (Indexes)DAT Object (rel. Data)

    XDA Object (XML Data)

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    14/38

    14

    XDA compression not for XML columns that werecreated in a previous version of DB2

    Move XML data to a new XML column first,e.g. online table move ( SYSPROC.ADMIN_MOVE_TABLE )

    Default: index compressed if table is compressed But, index compression can be controlled separately

    ALTER INDEX myxmlidx COMPRESS YES; CREATE INDEX myxmlidx2 ON COMPRESS NO;

    REORG RESETDICTIONARY rebuilds dictionary for relational data only

    REORG LONGLOBDATA RESETDICTIONARY rebuilds dictionary for XML and relational data

    XML Data & Index Compression

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    15/38

    15

    XDA Compression saves 60% to 80%of the storage spaceXDA Compression Ratio for Various XML Data Sets

    74%

    61%

    63%

    77%

    77%

    63%

    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

    XBench

    TPoX

    DITA

    *Customer C

    *Customer B

    *Customer A

    *data sets from DB2 customers

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    16/38

    16

    XDA Compression saves 60% to 80%of the storage spaceXDA Compression Ratio for Various XML Data Sets

    74%

    61%

    63%

    77%

    77%

    63%

    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

    XBench

    TPoX

    DITA

    *Customer C

    *Customer B

    *Customer A

    *data sets from DB2 customers

    Doc size: 10MBto 100MB

    Doc size: 2KBto 20KB

    Doc size: 20KBto 10MB

    Doc size: 10KBto 100KB

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    17/38

    17

    Agenda Recap: pureXML in DB2 9 and 9.5 Admin functions for the DBA

    Compressing XML Data and Indexes SQL Access to XML Data Partitioning and Clustering with XML Data

    XML in Range-Partitioned Tables XML in MDC Tables XML in Partitioned Databases (DPF)

    XML in User-Defined Functions

    Online CREATE and REORG of XML Indexes Bulk Decomposition

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    18/38

    1818

    SQL Access to Relational Data

    SQLXML

    Relational

    XMLTABLEView

    JohnDoe

    344

    .

    Create relational view over XML data, then use plain oldSQL queries against that view

    Problem in DB2 9.5: SQL predicates cannot use XMLindexes on the source data table scans

    Now the problem is solved!

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    19/38

    1919

    SQL Access to XML Data

    John Doe

    344

    Peter Pan

    216

    CREATE VIEW empview(empid, firstname, lastname, office)

    AS SELECT X.* FROM dept,XMLTABLE ('$DOC /dept/employee ' COLUMNSempid INTEGER PATH ' @id ',firstname VARCHAR(30) PATH 'name/first',lastname VARCHAR(30) PATH 'name/last',

    office INTEGER PATH 'office') AS X

    216PanPeter902

    344DoeJohn901

    officelastnamefirstnameempid

    create table dept (doc XML);select lastname, officefrom empvviewwhere empid = 901 ;

    create index idx1 on dept (doc ) generate keys usingxmlpattern ' /dept/employee/@id as sql double;

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    20/38

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    21/38

    21

    Data Warehousing and XML

    1. Accumulating large amounts of XML inoperational systems ?

    Need to analyze and warehouse that dataeventually, even without shredding

    2. Looking for ways to improve flexibility of

    existing warehouses?XML columns for flexible dimensions

    3. Need to ingest XML data into a relational warehouse more efficiently?

    All pureXML features available with DPF

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    22/38

    22

    A relational Star SchemaSTORE PERIODSTOREKEY PERKEY STORE_NUMBER DAILY_SALES CALENDAR_DATECITY PERKEY DAY_OF_WEEKSTATE PRODKEY WEEKDISTRICT STOREKEY PERIODREGION CUSTKEY YEAR

    PROMOKEY HOLIDAY_FLAG

    CUSTOMER QUANTITY_SOLD WEEK_ENDING_DATECUSTKEY EXTENDED_PRICE MONTHNAME EXTENDED_COSTADDRESS SHELF_LOCATION

    C_CITY SHELF_NUMBER PRODUCTC_STATE START_SHELF_DATE PRODKEY ZIP SHELF_HEIGHT UPC_NUMBERPHONE SHELF_WIDTH (more) PACKAGE_TYPEAGE_LEVEL FLAVORAGE_LEVEL_DESC FORMINCOME_LEVEL CATEGORY

    INCOME_LEVEL_DESC DAILY_FORECAST SUB_CATEGORYMARITAL_STATUS PERKEY CASE_PACKGENDER STOREKEY PACKAGE_SIZEDISCOUNT PRODKEY ITEM_DESC

    QUANTITY_FORECAST P_PRICE

    PROMOTION EXTENDED_PRICE_FORECAST CATEGORY_DESCPROMOKEY EXTENDED_COST_FORECAST P_COSTPROMOTYPE SUB_CATEGORY_DESCPROMODESCPROMOVALUEPROMOVALUE2PROMO_COST

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    23/38

    23

    A relational Star SchemaSTORE PERIODSTOREKEY PERKEY STORE_NUMBER DAILY_SALES CALENDAR_DATECITY PERKEY DAY_OF_WEEKSTATE PRODKEY WEEKDISTRICT STOREKEY PERIODREGION CUSTKEY YEAR

    PROMOKEY HOLIDAY_FLAG

    CUSTOMER QUANTITY_SOLD WEEK_ENDING_DATECUSTKEY EXTENDED_PRICE MONTHNAME EXTENDED_COSTADDRESS SHELF_LOCATION

    C_CITY SHELF_NUMBER PRODUCTC_STATE START_SHELF_DATE PRODKEY ZIP SHELF_HEIGHT UPC_NUMBERPHONE SHELF_WIDTH (more) PACKAGE_TYPEAGE_LEVEL FLAVORAGE_LEVEL_DESC FORMINCOME_LEVEL CATEGORY

    INCOME_LEVEL_DESC DAILY_FORECAST SUB_CATEGORYMARITAL_STATUS PERKEY CASE_PACKGENDER STOREKEY PACKAGE_SIZEDISCOUNT PRODKEY ITEM_DESC

    QUANTITY_FORECAST P_PRICE

    PROMOTION EXTENDED_PRICE_FORECAST CATEGORY_DESCPROMOKEY EXTENDED_COST_FORECAST P_COSTPROMOTYPE SUB_CATEGORY_DESCPROMODESCPROMOVALUEPROMOVALUE2PROMO_COST

    DOC XML

    DOC XML

    DOC XML

    DOC XML

    DOC XML

    DOC XML

    .extended with XML

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    24/38

    24

    XML in DPF, RP, MDC Tables

    All of the following have to be relational columns: DPF distribution key Range partitioning key MDC clustering columns

    XML column is payload in DPF, RP, MDC table Cannot distribute, partition, or organize by XML values Can extract XML values into relational columns, then

    use those to distribute, partition, or organize

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    25/38

    25

    Indexes in MDC and RP Tables

    Both MDC block indexes and XML indexes can

    be used in the same query Index AND-ing plans of block indexes and XMLindexes !

    Range partitioned tables: Relational indexes can be local (partitioned) or

    global (non-partitioned) indexes XML Regions Index is always a local index User-defined XML Indexes are (for now) global

    indexes

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    26/38

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    27/38

    27

    XML in User Defined Functions

    XML Data Type allowed for parameters andvariables in UDFs

    UDFs can manipulate XML data without XMLparsing

    You can encapsulate XML operations in a UDF Extract XML element or attribute values Update selected elements or attributes Use table UDFs to produce relational tables from

    XML documents etc.

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    28/38

    28

    Scalar UDF with XMLCREATE FUNCTION getname ( doc XML)RETURNS VARCHAR(25)BEGIN ATOMIC

    RETURN XMLCAST(XMLQUERY('$d/customerinfo/name'PASSING doc AS "d")

    AS VARCHAR(25));END#

    SELECT getname (info) AS nameFROM customer

    WHERE cid = 1002 #

    NAME-------------------------Jim Noodle

    1 record(s) selected.

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    29/38

    29

    Table UDF with XMLCREATE FUNCTION getphone ( doc XML)RETURNS TABLE(type VARCHAR(10), number VARCHAR(20))BEGIN ATOMIC

    RETURN

    SELECT type, numberFROM XMLTABLE('$d/customerinfo/phone' PASSING doc AS "d"COLUMNS

    type VARCHAR(10) PATH '@type',number VARCHAR(20) PATH '.') ;

    END # SELECT cid, p.type, p.numberFROM customer, TABLE( getphone (info)) p

    WHERE cid = 1004 #

    CID TYPE NUMBER ---------------- ---------- --------------------

    1004 work 905-555-47891004 home 416-555-3376

    2 record(s) selected.

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    30/38

    30

    Agenda Recap: pureXML in DB2 9 and 9.5 Admin functions for the DBA

    Compressing XML Data and Indexes SQL Access to XML Data Partitioning and Clustering with XML Data

    XML in Range-Partitioned Tables XML in MDC Tables XML in Partitioned Databases (DPF)

    XML in User-Defined Functions

    Online CREATE and REORG of XML Indexes Bulk Decomposition

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    31/38

    31

    Online CREATE and REORGof XML Indexescreate table customer (cid integer, info XML);

    create index idx1 on customer( info) generate key usingxmlpattern ' /customerinfo/addr/zip ' as sql varchar(5) ;

    reorg indexes all for table customerallow write access ;

    Writes are notblocked.

    Writes are notblocked.

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    32/38

    32

    Agenda Recap: pureXML in DB2 9 and 9.5 Admin functions for the DBA

    Compressing XML Data and Indexes SQL Access to XML Data Partitioning and Clustering with XML Data

    XML in Range-Partitioned Tables XML in MDC Tables XML in Partitioned Databases (DPF)

    XML in User-Defined Functions

    Online CREATE and REORG of XML Indexes Bulk Decomposition

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    33/38

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    34/38

    34

    New: Shred all document identified by a query LOAD first, then use CLP command or SP call to shred:

    DECOMPOSE XML DOCUMENTS IN'SELECT cid, info FROM customer'XMLSCHEMA db2admin.customerxsdMESSAGES /home/matthias/errorreport.xml ;

    CALL XDB_DECOMP_XML_FROM_QUERY ('DB2ADMIN', 'CUSTOMERXSD','SELECT cid, info FROM customer ',

    0, 0, 0, NULL, NULL, 1,:numInput, :numDecomposed, :errorreportBuf);

    Bulk Decomposition

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    35/38

    35

    Summary

    New admin functions to check inlining Can Compress all your XML Data and Indexes

    SQL Access to XML Data via XMLTABLE Views XML in the Warehouse

    XML in DPF, MDC, and Range-Partitioned Tables XML in User-Defined Functions Online CREATE and REORG of XML Indexes Bulk Decomposition

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    36/38

    36

    Comprehensive coverage of

    pureXML inDB2 for Linux, UNIX, WindowsandDB2 for z/OS

    Available in July

    Available for pre-order now:

    http://tinyurl.com/pureXML

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    37/38

    37

    Further Reading "pureXML in DB2 9: Which way to query your XML Data?"

    http://www.ibm.com/developerworks/db2/library/techarticle/dm-0606nicola/

    "Query XML data that contains namespaces"http://www.ibm.com/developerworks/db2/library/techarticle/dm-0611saracco/

    "XMLTABLE by Example", Part 1 & 2http://www.ibm.com/developerworks/db2/library/techarticle/dm-0708nicola/ http://www.ibm.com/developerworks/db2/library/techarticle/dm-0709nicola/

    Update XML in DB2 9.5:http://www.ibm.com/developerworks/db2/library/techarticle/dm-0710nicola /

    DB2 Documentation & Resources:http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsphttp://www.ibm.com/developerworks/wikis/display/db2xml/Technical+Papers+and+Articles

    15 best practices for pureXML performance in DB2 9http://www.ibm.com/developerworks/db2/library/techarticle/dm-0610nicola/

    Performance of DB2 9 pureXML vs. CLOB and shredded XML storagehttp://www.ibm.com/developerworks/db2/library/techarticle/dm-0612nicola/

    XML Database Benchmark: Transaction Processing over XML (TPoX)http://tpox.sourceforge.net/ , http://tpox.sourceforge.net/Sigmod2007_TPoX.pdf

  • 7/31/2019 D15 - What's New for PureXML in DB2 9.7

    38/38

    38

    Matthias NicolaIBM Silicon Valley Lab

    [email protected]

    Session: D15

    What's new for DB2 pureXML