idug - 07 programming support for purexml

Upload: manoj-k-sardana

Post on 07-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    1/77

    IBM Software Group

    Leveraging pureXML in your application

    IDUG IndiaManoj K [email protected]

    May, 2007

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    2/77

    IBM Software Group | DB2 Information Management Software

    2

    Agenda

    XML Data management DifferentWays.

    DB2 pureXML An innovative way.

    XML Data Storage.Query Language Support.Application Programming SupportUse Cases.

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    3/77

    IBM Software Group | DB2 Information Management Software

    3

    XML Data Management Traditional Approach

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    4/77

    IBM Software Group | DB2 Information Management Software

    4

    XML Data Management

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    5/77

    IBM Software Group | DB2 Information Management Software

    5

    DB2 pureXML An Innovative way

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    6/77

    IBM Software Group | DB2 Information Management Software

    6

    XML Data management

    47; John Doe; 58; Peter Pan; Database systems; 29; SQL; relational

    Let's start with an example...

    Here is some data from an ordinary delimited flat file:

    What does this data mean?

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    7/77

    IBM Software Group | DB2 Information Management Software

    7

    What is XML?

    John Doe

    Peter Pan

    Database systems29

    SQL

    relational

    XML = eXtensible Markup Language

    XML is "self-describing data"

    XML: Describes data

    HTML: Describes display

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    8/77

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    9/77

    IBM Software Group | DB2 Information Management Software

    9

    XML vs. Relational

    10CHRISTINESMITH408-463-496352750.00

    27MICHAELTHOMPSON41250.00

    DepartmentDEPTID DEPTNAME

    15 Sales

    EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY

    15 27 MICHAEL THOMPSON NULL 4125015 10 CHRISTINE SMITH 408-463-4963 52750

    Relational XML

    Set oriented Sequences (ordered!)Structure Semi-structured

    Strong schema Schema-chaos

    Strongly typed Optionally typed

    Tabular data model XML data model

    Flat Nested, hierarchical

    3 value logic 2 value logic

    "Null" Not there at all

    ANSI/ISO W3C

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    10/77

    IBM Software Group | DB2 Information Management Software

    10

    Why XML?

    Flexibility, Flexibility, Flexibility !

    XML is a very flexible data model:

    for structured data, semi-structured data, schema-less data

    Easy to extend: define new tags as needed XML is self-describing: any XML parser can "understand" it !

    Easy to transform XML documents into other formats (HTML, etc.)

    XML is vendor and platform independent

    Easy to shareXML between applications, businesses, processes,

    Easy to "validate" XML, i.e. to check compliance with a schema

    - any XML parser can do it!

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    11/77

    IBM Software Group | DB2 Information Management Software

    11

    XML can be a better choice than relational for...

    Data thats inherently hierarchical or nested in nature

    Example: Medical data, Bill-of-materials, etc., OO & Multi-value

    Data sets with sparsely populated attributes

    Example: FIXML, FpML, Customer profiles

    Schema evolution Example: Frequently changing services/products/processes

    Variable schemas, many schemas

    Example: Data integration, consolidation of diverse data sources

    Combining structured & unstructured data

    Example: CM, Life Sciences, News & Media

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    12/77

    IBM Software Group | DB2 Information Management Software

    12

    Importance ofXML data?

    More and more XML

    data generated

    everyday

    XML is pervasive inall kinds of

    organizations

    Almost every sector

    has XML based

    standards

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    13/77

    IBM Software Group | DB2 Information Management Software

    13

    Where is your XML?

    In files Storage not managed and not secure

    In LOBS Content and business value locked up

    Shred to tables Complex and fragile mapping

    In XML DB Scalability & integration concerns

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    14/77

    IBM Software Group | DB2 Information Management Software

    14

    XML Data Needs Relational Maturity

    XML Data Needs Protection Backup and recovery features to ensure continuity

    Data is protected using database security

    SimplifiedXML Data Access Centrally store and access difficult to retrieve data

    SQL or XQuery can be used to retrieve data

    Join XML data with its related relational data

    SearchSpeed

    Search documents quickly and efficiently using provensearch optimization engine of mature database

    Optimize ExistingInvestments Use existing technology infrastructure and skills to store

    and manage both relational and XML

    5

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    15/77

    IBM Software Group | DB2 Information Management Software

    15

    DB2

    DB2 9 pureXML

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    16/77

    IBM Software Group | DB2 Information Management Software

    RelationalInterface

    DB2

    XInterface

    DB2 9: Hybrid Data ServerpureXML and Relational Storage

    DB2 9 SERVER

    CLIENT SQL +SQL/XML

    XQuery

    Relational

    XML

    DB2 Client /

    ClientApplication

    Relational and XML data are stored differently, but closely linked Seamlessly Join Relational and XML data

    DB2 Storage:

    Engine

    Hybrid

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    17/77

    IBM Software Group | DB2 Information Management Software

    17

    DB2 9 Summary of pureXML Support

    XML as a native data type

    PureXML storage and indexing

    XQuery and SQL/XML support

    XML Schema Repository

    Schema validation

    Application Support

    Java, C/C++, .NET, PHP, etc.

    Visual Tooling, Control CenterEnhancements

    Annotated schema shredding

    DB2 Utilities: Import/Export, HADR, etc.

    and more

    Secure andResilient

    Infrastructurefor a New

    Breed of Agile

    Applications

    DB2 9

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    18/77

    IBM Software Group | DB2 Information Management Software

    18

    pureXML Usage Scenarios

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    19/77

    IBM Software Group | DB2 Information Management Software

    19

    pureXML Usage Scenarios

    1. Industry standards and data exchange applications

    2. Web services, SOA data transport and message persistence

    3. Business object / transaction record

    4. Integration of diverse data sources

    5. Forms and workflow processing

    6. Document storage and querying

    7. XML Feeds and Web 2.0 Syndication

    8. Mapping XML in relational applications

    9. Better data model for certain types of data10. Rapid application prototyping and development

    and many more!

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    20/77

    IBM Software Group | DB2 Information Management Software

    20

    1. Industry standards and data exchangeapplications

    Banking

    IFX, OFX, SWIFT, SPARCS,

    MISMO +++

    Financial Markets

    FIXML, MDDL,RIXML, FpML +++

    Insurance

    ACORD

    XML for P&C, Life +++

    Chemical & Petroleum

    Chemical eStandards

    CyberSecurity

    PDX Standard+++

    Healthcare

    HL7, DICOM, SNOMED,

    LOINC, SCRIPT +++

    Life Sciences

    MIAME, MAGE,

    LSID, HL7, DICOM,

    CDIS, LAB, ADaM +++

    Retail

    IXRetail, UCCNET, EAN-UCC

    ePC Network +++

    Electronics

    PIPs, RNIF, Business Directory,

    Open Access Standards +++

    Automotive

    ebXML,other B2B Stds.

    Telecommunications

    eTOM, NGOSS, etc.

    Parlay Specification +++

    Energy & Utilities

    IEC Working Group 14

    Multiple Standards

    CIM, MultispeakCross Industry

    PDES/STEPml

    SMPI Standards

    RFID, DOD XML+++

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    21/77

    IBM Software Group | DB2 Information Management Software

    21

    Case1 : Industry Standard FIXML

    Buying 1000 Shares of IBM Stock..8=FIX.4.2^9=251^35=D^49=AFUNDMGR^56=ABROKER^34=2

    ^52=20030615-01:14:49^11=12345^1=111111^63=0^64=2003

    0621^21=3^110=1000^111=50000^55=IBM^48=459200101^22=

    1^54=1^60=2003061501:14:4938=5000^40=1^44=15.75^15=USD

    ^59=0^10=127

    New FIXML

    Protocolextensible

    lower appl development &maintenance cost

    Old FIX

    Protocol

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    22/77

    IBM Software Group | DB2 Information Management Software

    22

    Case 2: FpML (Derivative Trading)

    Financial Products Markup Language

    XML vocabulary for describing derivatives, their trades, andtheir risks

    Derivatives: risk-shifting agreement, based on any tradableinstrument (interest rate, stock, index, currency,)

    OTC (over-the-counter) derivatives: privately negotiated, nostandards, customized contracts

    Large variety & rapid changes in derivative products &transactions

    Not manageable in a relational database schema

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    23/77

    IBM Software Group | DB2 Information Management Software

    23

    Derivatives Trading beforeFpML:

    Highly manual, i.e. error-proneand of poor timeliness

    No automated system

    Fear: system not able tohandle variety & rapid change

    Fear: too costly to buildautomated trading system

    Fear: system is obsolete bythe time its implemented

    Solution: XML-based tradingsystem, automated, able toevolve rapidly -> FpML

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    24/77

    IBM Software Group | DB2 Information Management Software

    24

    Benefits of FpML

    Integration of trading services across diverse systems andapplications

    HW & SW independence

    Lower system implementation & maintenance cost Higher trading volumes with higher accuracy

    Increased business opportunities

    Reduced operational risks

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    25/77

    IBM Software Group | DB2 Information Management Software

    25

    Case 3: ACORD Insurance Industry

    ACORD = Agent-Company Organization for Research and

    Development

    Non-profit standards body for insurance data exchange

    1970: Forms development for property & casualty insurance

    1980: EDI standards for P&C industry

    1996: Standards for Life Insurance

    2000+:

    XML-based standards for P&C, Life, Reinsurance

    Data and application integration

    Real-time information exchange for B2B and B2C

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    26/77

    IBM Software Group | DB2 Information Management Software

    26

    Why is ACORD moving to XML?

    eBusiness and Internet-based business: connecting backoffices, agents, brokers, consumers, etc.

    Diversified & multi-channel distribution

    Streamlined & simplified data transfer

    Straight-through processing of applications & claims

    Cross platform and cross-system data exchange

    Integration of diverse data sources

    Extensible: for hybrid & aggregate insurance products

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    27/77

    IBM Software Group | DB2 Information Management Software

    27

    DB2 pureXML Quick Start Samples

    Each Quick Start Sample provides instructions of Creating a sample database. Registering standard industry schemas. Inserting and validating sample XML messages. Building XML indexes and querying the stored data.

    Two sample packages are available right now at

    http://www.alphaworks.ibm.com/tech/purexml/download Acord.zip : Insurance Industry Cdisc.zip : Clinical Data Fixml.zip : Financial Trading FpML.zip : Financial Derivatives Mismo.zip : Mortgages

    ZosAcord.zip : Financial Trading on zOS platform ZosFixml.zip : Financial Derivatives on zOS platform

    And many more to come soon..

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    28/77

    IBM Software Group | DB2 Information Management Software

    28

    2. XML - the foundation for SOA and Web Services

    XML is the transport for messages and data in SOA

    XML DBs can provide SOA data services

    ServiceRequestor

    ServiceProvider

    XML

    SOA messages/data often need to be persisted

    Temporary Cache

    Audit Logs

    Compliance Records

    Insight

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    29/77

    IBM Software Group | DB2 Information Management Software

    29

    3. XML Transaction Records / Business Objects

    Transactions being conducted as XML

    Within SOA environments

    Between value chain members

    Need to store the transaction record and query later

    Many business objects being represented as XML

    Purchase orders

    Invoices Insurance policies

    Need to store XML business objects intact

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    30/77

    IBM Software Group | DB2 Information Management Software

    30

    4. Integration of Diverse Data Sources

    XML database as integration hub XML schema flexibility integrate data with differing formats

    XQuery language excellent for joining different data sources

    Integration using SOA environments

    Services Oriented Integration (SOI)

    DB2 9

    Applications,

    Services,

    Employee/

    Customer

    Portals,

    Suppliers,

    Distributors,

    Partners,

    Agencies

    Z

    O

    E

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    31/77

    IBM Software Group | DB2 Information Management Software

    31

    5. Forms and theirprocessing

    Forms exist for virtually all types of goods and services Insurance applications, bank loans, tax filings,

    Paper forms being replaced by electronic forms

    Online forms are becoming XML based e.g. XForms

    Store entire form (XML document) as a whole in XML databaserather than shred into relational columns

    DB2 9

    Broker

    ApplicationForm

    Status

    Audit

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    32/77

    IBM Software Group | DB2 Information Management Software

    32

    6. Document storage and querying

    Document-centric XML mostly has unstructured data Can contain some structured elements

    E.g. Legal Contracts, Manuals, etc.

    Application managed document processing

    Contract Performance ManagementContract Performance Management

    DB2 9DB2 9

    ContractDates

    Prices

    Liabilities

    Milestones

    Quantities Certificates

    Structured Unstructured

    Procurement / Sales /

    Legal / Finance

    Create

    Update

    Manage

    Business Exec /

    Analyst

    Search

    Report

    Analyze

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    33/77

    IBM Software Group | DB2 Information Management Software

    33

    7. XML Feeds and Syndication

    Syndication is heartbeat of Web 2.0

    RSS/ATOM Feeds encapsulated as XML

    Use XML database for serving and strong feeds

    E.g. Stock ticker feeds, inventory feeds, etc.

    DB2 9

    Web Server

    ATOM/RSSReader

    Web Server

    ATOM/RSSProvider

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    34/77

    IBM Software Group | DB2 Information Management Software

    34

    8. MappingXML for relational applications

    Shredding may be ok if:

    Simple data / Schema notcomplicated

    XML is merely a transport

    i.e. XML structure notrelevant

    Existing SQL Apps have onlyrelational APIs

    E.g. BI apps, reporting tools

    DB2 9 Annotated SchemaShredding

    Acme12.99

    DB2 9

    ID Name Price

    129 Acme 12.99

    Insight

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    35/77

    IBM Software Group | DB2 Information Management Software

    35

    9. XML as a better data model

    XML provides a better data model for many new apps Flexibility, schema versatility, hierarchical nature

    Semi-structured or unstructured data

    E.g. healthcare records, biological data, contracts, insuranceclaims, etc.

    Inherently hierarchical, nested or complex data

    E.g. manuals, books, catalogs, bills of materials, land records, etc.

    Data with changing orevolving schemas

    E.g. Forms, changing industry standard documents, new product

    versions, etc. Data with Null, Multiple or Unknown values

    E.g., Phone numbers (home, office, mobile), in patient records, etc.

    pureXML database a natural choice for XML data

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    36/77

    IBM Software Group | DB2 Information Management Software

    36

    10. pureXML for Rapid Application Prototyping andDevelopment

    Represent multiple elements as asingle object

    e.g.: Purchase Order Relational:

    Many tables: Customer, Product,Shipping,

    Normalization Foreign key relationships Insert involves many columns Complex queries with joins Conform to column definition

    XML:

    Single Purchase Order column Easily access individual

    elements

    Write less code with pureXML!

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    37/77

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    38/77

    IBM Software Group | DB2 Information Management Software

    38

    Who uses DB2 pureXML?

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    39/77

    IBM Software Group | DB2 Information Management Software

    Profile

    Challenge

    Status

    U.S. State Tax office

    Have 3600 different tax

    forms Schema Diversity

    Typically not every field ina form is used

    Sparse Data

    Many forms change everyyear

    Schema Evolution

    A case forXML !

    Need to store/ manage thousands ofdifferent tax forms in a database,changing every year. Today they use 640

    generic columns in RDBMS.

    Chose DB2 pureXML : Much simpler storage and processing of

    tax forms in XML format Handles schema diversity, schema

    changes, and sparsity On AIX: reduces cost and dependency

    on mainframe/Cobol skills

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    40/77

    IBM Software Group | DB2 Information Management Software

    40

    Solution 1: Each form has a different set of fields(schema)

    Thousands of Tables i.e. one per form ? Considered not feasible

    Too many tables to maintain

    Relational schema would deteriorateover time

    Not sufficiently flexible and extensible

    Solution 2: Single table whose rowscan store anyform

    100s of generic columns Ouch!

    Typical Current Usage: Relational Database

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    41/77

    IBM Software Group | DB2 Information Management Software

    41

    Generic columns XML

    col1 col2 col3 col4 col5 col1000

    134 NULL 11/23/05 NULL NULL NULL

    NULL 276 NULL NULL Yes NULL

    12 NULL NULL 99.99 NULL NULL

    NULL NULL NULL 123.23

    NULL No

    13411/

    23/

    05

    XML:Avoids sparsity. Proper data labeling. 2 columns, not1000. Transformable. Extensible. Simplifies mapping.

    Current relational storage,inefficient, anonymouscolumns, requires complexmappings in the application

    New XML format:

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    42/77

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    43/77

    IBM Software Group | DB2 Information Management Software

    43

    Storebrands Service Oriented Architecture (SOA)

    LifeInsurance

    YTPPensions

    ITPPensions

    Investments

    Banking

    Mortgage

    StorebrandIntegration

    Architecture

    Customer

    BusinessServices

    ArchiveDataWare

    house

    ProcessManage

    ment

    Internet

    WAP

    Financial adviser

    Call Center

    XML

    XML

    XML

    XML

    XML

    XML

    XML

    XML

    XMLXML

    XML

    XML

    XML

    XML

    Business goals: improve customer focus, manage costs, access data 24/7,

    speed time to market for new products, increase product customization,combine products into packages

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    44/77

    IBM Software Group | DB2 Information Management Software

    Profile

    Challenge

    Status

    NA Bank

    Requirement1. load 500,000 XML documentsper day

    achieved in less than 1 hour onDB2 9 pureXML One major vendor unable toeven load all the data

    2. Queries:Retrieve XML doc for any specific

    trade (by trade number)Retrieve all trades for acounterpartyRetrieve all trades by trade createtimeRetrieve all trades by maturitydate range

    Retrieve trades for a givenacquire day range, and tradenumber rangeAll transactional queriescompleted sub-second perrecord

    Moving to a Service OrientedArchitecture

    Creating a flexible, on demandinformation architecture.

    Send data from operational system asXML message and store in repository.Downstream applications retrieve data asneeded via web services

    Held PoC with DB2 pureXML : Met all criteriaPerformance was significantly faster

    than customer expected

    Win in competitive environment

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    45/77

    IBM Software Group | DB2 Information Management Software

    45

    Some of our partners

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    46/77

    IBM Software Group | DB2 Information Management Software

    Profile

    Challenge

    Status

    It is exciting to seeIBMevolve toembrace XML.

    -- Tiffany Riley, VP

    Approximately 90% of the

    valuable data in contractual

    agreements are unstructured. XML technology is

    essentialin tapping into this

    hidden reservoirof

    information,enabling

    companies to actively

    manage andmaximizecustomer, supplierand

    partnerrelationships.

    -- Nextancepress release,

    11/05

    Need to store/ manage contract fragmentsand allow users to compose contractdocuments from these.

    Prior experience with XML in other database Successful migration to DB2 9 pureXML Major production deployments using DB2/XML

    under way.

    Simplified application developmentBetter schema evolution

    Better scalability

    Very enthusiastic about DB2 9

    Nextance: IBM business partner, providesXML-based enterprise contract managementsoftware.

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    47/77

    IBM Software Group | DB2 Information Management Software

    Profile

    Challenge

    Status

    "The combination of

    industrialstrength database

    management fornative XML

    byDB2Viperand Skytide's

    ability toprovide direct

    multidimensionalanalysis ofXML data,removes two key

    barriers towidespread

    adoption of XML and the

    transformation of this data

    into actionable business

    information."

    -- Joseph Rozenfeld,VP of Skytide

    Need to integrate structured, semi-structured and unstructured data forbusiness analytics.

    Enabled forViper XMLSchema flexibilityFirst-class support for analytics over

    XML data

    Offering free Skytide / DB2 PoC

    "With DB29,we've seen a 5 to 10times performance improvementovera non-databaseenvironment."

    -- Keith Feingold, CEO, Skytide

    IBM business partner providingOLAP-style analytics for XML.

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    48/77

    IBM Software Group | DB2 Information Management Software

    Profile

    Challenge

    Status

    "Inxight has foundIBMDB2spureXML to be highly

    performant and an excellent

    complement toourextreme high-

    throughput SmartDiscovery

    Extraction Server(SDX). []Viperallows us to quickly and

    efficiently store information in a

    rich XML representation. Inxights

    integrationprocess with

    DB2/XML has been a smooth

    one,requiringminimaleffort.

    - Renzo Lazzarato,VP of Advanced Development,Inxight Software

    IBM business partner providing entityand fact extraction from text data.

    Enabling SDX forViper XML

    Very impressed with DB2s XMLperformance, especially insertand indexing throughput

    Estimates a 10x developmentproductivity improvement byavoiding shredding and schemamapping

    Need to store large amounts of XMLdata at a very high rate, and it needsto be immediately query-ready.

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    49/77

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    50/77

    IBM Software Group | DB2 Information Management Software

    50

    More about DB2 9 pureXML

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    51/77

    IBM Software Group | DB2 Information Management Software

    51

    XML Databases

    XML-enabled Databases

    The core data model is not XML (but e.g. relational)

    Mapping between XML data model and DBs datamodel is required, or XML is stored as text

    E.g.: DB2 XML Extender (V

    7,V

    8) pureXML Databases

    Use the hierarchical XML data model to store andprocess XML internally

    No mapping, no storage as text Storage format = processing format

    E.g.: Viper

    IBM S ft G | DB2 I f ti M t S ft

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    52/77

    IBM Software Group | DB2 Information Management Software

    52

    XML-Enabled Databases: Two Main Options

    XMLDOC

    Extract selected

    elements/attr.

    Side Tables

    CLOB/Varchar

    XML DOC

    XML DOC

    XML DOC

    XMLDOC

    Varchar or clobcolumn

    Fixed

    Mapping Shredder

    (regular tables forfaster lookup)

    (regular relational tables)

    Decompositon

    Shredding

    IBM S ft G | DB2 I f ti M t S ft

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    53/77

    IBM Software Group | DB2 Information Management Software

    53

    Problems of XML-enabled Databases

    CLOB storage:

    Query evaluation & sub-document level accessrequires costly XML Parsing too slow !

    Shredding:

    Mapping from XML to relational often too complex

    Often requires dozens or hundreds of tables

    Complex multi-way joins to reconstruct documents

    XML schema changes break the mapping no schema flexibility !

    For example: Change element from single- to multi-occurrencerequires normalization of relational schema & data

    IBM S ft G | DB2 I f ti M t S ft

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    54/77

    IBM Software Group | DB2 Information Management Software

    54

    Shredding: A simple case

    10CHRISTINESMITH408-463-496352750.00

    27MICHAELTHOMPSON406-463-123441250.00

    Depart ent

    DEPTID DEPTNAME15 Sales

    Empl eeDEPTID EMPNO FI TNAME LASTNAME PHONE SALARY

    15 27 MICHAEL THOMPSON 406-463-1234 41250

    15 10 CHRISTINE SMITH 408-463-4963 52750

    IBM S ft G | DB2 I f ti M t S ft

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    55/77

    IBM Software Group | DB2 Information Management Software

    55

    Shredding: A schema changeEmployeesarenowallowedtohavemultiplephonenumbers

    10CHRISTINESMITH408-463-4963

    415-010-123452750.00

    27MICHAELTHOMPSON406-463-1234

    41250.00

    P neEMPNO PHONE

    27 406-463-123410 415-010-1234

    10 40 -463-4 63

    Requires: N rmalizati n fexisting data ! Modificationof t e mapping Changeof applications

    Costly!

    Department

    DEPTID DEPTNAME15 S l s

    EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY

    15 27 MICHAEL THOMPSON 406-463-1234 41250

    15 10 CHRISTINE SMITH 40 -463-4 63 52750

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    56/77

    IBM Software Group | DB2 Information Management Software

    56

    XML A First Class Citizen

    Data Definitioncreate table dept(deptID int, deptdoc xml);

    Insert

    insert into dept(deptID, deptdoc) values (?,?)

    Retrieve

    select deptdoc from dept where deptID = ?

    Queryselect deptID, xmlquery('$d/dept/name' passing

    deptdoc as d") from dept where deptID PR27;

    SQL as theprimary

    language

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    57/77

    IBM Software Group | DB2 Information Management Software

    57

    XML data is stored in XML-typed columns in tables

    create tabledept (deptID char(8),, deptdocxml);

    XML is stored in aparsed hierarchicalformat

    Relational columnsare stored in relationalformat

    Native XML Storage

    deptID deptdoc

    PR27

    DB2 Storage

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    58/77

    IBM Software Group | DB2 Information Management Software

    58

    XML Storage: Regions Index

    maps nodeIDs toregions & pages

    allows intelligent

    prefetching

    page page page

    Regions index

    System defined, default component of XML storage layer

    Reuse RDBMS featuresPages

    Buffer PoolsTablespacesPrefetchingLocking

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    59/77

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    60/77

    IBM Software Group | DB2 Information Management Software

    60

    What is SQL/XML?

    Extension of the SQL language standard (ANSI/ISO)

    XML Data Type

    XML publishing functions (relational datap XML)

    Conversion function: XML typem char/varchar/clob

    Integration of SQL and XQuery languages

    Other functions, e.g. validation, parsing, serialization

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    61/77

    IBM Software Group | DB2 Information Management Software

    61

    SQL/XML: Use SQL to produce XML from Relational Data

    SELECT

    XMLELEMENT (NAME "Department",

    XMLATTRIBUTES (e.dept AS "name" ),

    XMLAGG ( XMLELEMENT (NAME "emp", e.firstname) )

    )AS "dept_list"

    FROM employeeeWHERE ..

    GROUP BY e.dept;

    dept_list

    CHRISTINE

    VINCENZO

    SEAN

    MICHAEL

    A00LEESEAN

    B01JOHNSONMICHAEL

    A00BARELLIVINCENZO

    A00SMITHCHRISTINE

    deptlastnamefirstname

    Start With Produce

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    62/77

    IBM Software Group | DB2 Information Management Software

    62

    XMLTABLE: Return XML in tabular format

    John

    Doe

    344

    Peter

    Pan

    216

    empID firstname lastname office

    901 John Doe 344

    902 Peter Pan 216

    SELECT X.* FROM dept,XMLTABLE ($d/dept/employee passing deptdoc as d)

    COLUMNSempID INTEGER PATH @id,firstname V ARCHAR(30) PATH name/first,lastname V ARCHAR(30) PATH name/last,

    office INTEGER PATH office) AS X

    SQL/XMLXQuery

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    63/77

    IBM Software Group | DB2 Information Management Software

    63

    XQuery: The FLWOR Expression

    FOR: iterates through a sequence, bind variable to items LET: binds a variable to a sequence WHERE: eliminates items of the iteration ORDER: reorders items of the iteration RETURN: constructs query results

    XQuery

    XQUERY for $movie in xmlcolumn(movies.doc)

    let $actors := $movie//actor

    where $movie/duration > 90

    order by $movie/@yearreturn

    {$movie/title, $actors}

    ChicagoRenee Zellweger

    Richard Gere

    Catherine Zeta-Jones

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    64/77

    IBM Software Group | DB2 Information Management Software

    64

    Mixing SQL and XQueryCalling SQL from XQuery

    Identifying XML data by columnFOR $d in xmlcolumn(DEPT.DEPTDOC) always operates on the entire

    column!

    Identifying XML data via a select statementLeverage predicates/indexes on relational columns

    FOR $d in sqlquery('select deptdoc from dept')

    FOR $d in sqlquery('select deptdoc from dept where deptID = PR27 ')

    FOR $d in sqlquery('select deptdoc from dept where deptID LIKE PR% ')

    FOR $d in sqlquery('select dept.deptdoc from dept, unit

    where dept.deptID=unit.ID and unit.headcount > 200)..

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    65/77

    IBM Software Group | DB2 Information Management Software

    65

    XML Indexing: Examples

    create table dept(deptID char(8) primary key, deptdoc xml);

    create unique index idx2 on dept(deptdoc) generate key

    using xmlpattern '/dept/employee/@id' as sql double;

    create index idx3 on dept(deptdoc) generate key

    using xmlpattern '/dept/employee/name' as sql varchar(35);

    John Doe

    408 555 1212

    344

    Peter Pan

    408 555 9918

    216

    xmlpattern '//name' as sql varchar(35); (Index on ALL name elements)

    xmlpattern '//@*' as sql double; (Index on ALL numeric attributes)

    xmlpattern '/dept/employee//text()' as sql varchar(128); (All text nodes underemployee)

    xmlpattern 'declare namespace m="http://www.myself.com/";/m:dept/m:employee/m:name

    as sql varchar(45);

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    66/77

    p | g

    66

    XML Schema Repository (XSR)

    XSR_REGISTER (dbschema, identifier, schemalocation, xsd, docproperty)

    XSR_ADDSCHEMADOC (dbschema, identifier, schemalocation, xsd, docproperty)

    XSR_COMPLETE (dbschema, identifier, schemaproperties, isusedforshred)

    XSR = New DB2 catalog tables

    + Command Line and Stored Procedure Interfaces

    The XSRStores XML Schema documents, assign identifiers

    Keeps track of relationships between schema documents

    Keeps precompiled schema grammars

    Provides mapping from schema location to schema identifier

    SYSCAT.XSROBJECTSSYSCAT.XSROBJECTCOMPONENTSSYSCAT.XSROBJECTAUTHSYSCAT.XSROBJECTHIERARCHIES

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    67/77

    p | g

    67

    Can override schema location found in the document,

    reference a schema from DB2s schema repository:

    insert into deptvalues(?,xmlvalidate(? according to xmlschema id dept.schema1))

    insert into deptvalues(?,xmlvalidate(? according to xmlschema uri http://my.dept.com))

    create table dept(deptID char(8), deptdoc xml);

    Validation is optional, and per document (per row):

    insert into dept values (?, ?)

    insert into dept values (?, xmlvalidate(?))

    Schemareferencedby identifier

    Schema referencedby namespace URI

    No Validation

    With Validation

    Validation usingXML Schemas

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    68/77

    p | g

    68

    API Support for XML in V9.1

    New XML type support added to APIs including:

    JDBC, SQLJ, .NET, CLI, Embedded SQL, PHP

    SQL/XML supported by all APIs XQuery supported by all APIs

    Result sequencewillbe treated as a resultset

    Each itemwillbe treated as a row.

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    69/77

    | g

    69

    Language Bindings

    Java ExamplePreparedStatement stmt1 = con.prepareStatement("Select deptdoc from deptwhere id = 001 );ResultSet rs = stmt1.executeQuery();rs.next();// Get the first returned document as a stringStringxmlString = rs.getString(1);// As a binary streamInputStream is = rs.getBinaryStream(1);// As an XML objectcom.ibm.db2.jcc.DB2Xml xml = (com.ibm.db2.jcc.DB2Xml) rs.getObject

    (1);rs.close();stmt1.close();

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    70/77

    70

    Utilities & Tools for XML in DB2

    XML Import & Export

    Runstats collects stats for XML data

    XML type support in SQL stored procedures

    XML columns supported by HADR

    XML columns supported by backup/restore

    XQuery Builder GUI

    GUI for XML Schema annotations for shredding

    XML Index Definition GUI

    Control Center extensions for XML

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    71/77

    71

    XML Schema Flexibility in DB2 9

    No Schema One Schema Schema V1 Documents Any mix you want!& Schema V2 w/ and w/o

    schema

    Document validation for zero, one, or many schemas perXML column:

    (a) (b) (c) (d) (e)

    Most Databases only support (a) and (b). DB2 9 allows (a) through (e).

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    72/77

    2

    pureXML Insert vs. Shredding Performance

    , XML Documents, to 2

    XML Insert vs. Shre Perf r ance

    DB2 V XML

    tender Shred

    DB2 ew Shred DB2 pureXML

    Insert

    1

    14.2%

    3.4%

    Seconds

    Fixed

    apping

    (regular relational tables)

    X

    L

    DOC

    X

    L

    DOC

    XML Column

    X

    L

    Index

    shredded to87 columns in

    12 tables.

    pureXMLinsert, 1 XML

    column, 1 table.

    TM

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    73/77

    73

    Query 2: XML vs CLOB column (10 concurrent users)

    # Documents

    ElapsedTimeinS

    econd

    B 2pure M 2

    2

    pure M vs B Query Performance

    Query 2: Retrieves onedocument based on asingle search condition.

    No index is used.

    The larger the table( , to ,documents) the biggerthe performance benefitof pure M !

    Query Response Time lower is better

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    74/77

    74

    Constantly High XML Insert Throughput

    AIX 5.3, P-Series P5-560Q, 8 CPUs, TotalStorage DS8100,

    http://www.ibm.com/developerworks/db2/library/techarticle/dm-0606schiefer

    XM cument nsert ate over Time

    0

    1000

    2000

    3000

    4000

    5000

    6000

    0 5 10 15 20 25 30

    Inserts

    erse

    conds

    umber ofdocumentsa ready inserted (in millions)

    100 concurrent clients insertingFIXML order documents, with XMLindex building, at ~30GB/hour.

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    75/77

    75

    Challenges Solved by DB2 V9 with pureXML

    Data Query Need the ability to query any

    element in the XML document

    Need to quickly retrieve sets of

    data Shredding

    Need to remove the complexityaround shredding

    Standard XML Technology XQuery And XPath

    Flexibility Need the ability to change any

    data element at any time

    Native storage defineseach field

    SQL or XQuery to retrieve

    sets of data

    No more shredding

    Schema evolution allowsmultiple schemas

    Easy to learn

    Use same technology forapplication and database

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    76/77

    76

    Summary

    XML is widely accepted and increasingly usedin customer applications and solutions

    Integration, SOA, Document Management

    Driving business processes and workflows

    XML based applications need robust, scalableenterprise class database capabilities

    Structure clash between XML and relationaldata models

    DB2 Viper provides pureXML support indatabase engine

    IBM Software Group | DB2 Information Management Software

  • 8/6/2019 IDUG - 07 Programming Support for PureXML

    77/77

    Resources

    DB2 pureXML Enablement web site: http://www-03.ibm.com/developerworks/wikis/display/db2xml/Home News and Success stories

    Books and magazine issues

    Technical papers and articles

    Webcasts and demonstrations

    Free software downloads

    Education

    Emails:

    [email protected] : Thuan Bui

    [email protected] : Mallarswami Nonvinkere