creating structure in unstructured data

89
Creating Structure in Unstructured Data What is possible, today…? Marco Gralike

Upload: others

Post on 03-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Creating Structure in Unstructured Data

Creating Structure in Unstructured Data

What is possible, today…?

Marco Gralike

Page 2: Creating Structure in Unstructured Data
Page 3: Creating Structure in Unstructured Data
Page 4: Creating Structure in Unstructured Data
Page 5: Creating Structure in Unstructured Data

“Big Data” = XML ?

Page 6: Creating Structure in Unstructured Data

Challenges are! Ahum, the problems are!

Page 7: Creating Structure in Unstructured Data

WikiPedia

• One string of XML data with structured and unstructured data sections

• Language: English

• Size : 42,15 GB

• Pages : 12.961.997

• Date : 21 Dec 2012

Page 8: Creating Structure in Unstructured Data

Adventures into the unknown…?

Page 9: Creating Structure in Unstructured Data

Setup

• VirtualBox VM

– OEL 5U8 (64)

– 8 GB RAM

• LaCie Little Big Disk

– RAID 0

– Thunderbolt

• Database

– SGA 4GB

– PGA 2GB

Page 10: Creating Structure in Unstructured Data

My new LaCie LBD is really fast -

Page 11: Creating Structure in Unstructured Data

Defeat?! - 1.000.000 pages only

Page 12: Creating Structure in Unstructured Data

Status of Technology used

Page 13: Creating Structure in Unstructured Data

XML - Where are we…?

Gartner

Page 14: Creating Structure in Unstructured Data

Achieved…?

Page 15: Creating Structure in Unstructured Data

On the Horizon!

• JSoniq

• Zorba

Page 16: Creating Structure in Unstructured Data

Building (streaming) Bridges

Page 17: Creating Structure in Unstructured Data

Oracle XML DB

• NO cost option • C (native / embedded kernel)

• (XQuery) Standards • Code maintained by Oracle

Page 18: Creating Structure in Unstructured Data

XMLIndex

DOM Tree Model

Streaming XPath Evaluation

Object-Relational

Relational Storage Secure Files

Binary XML

XQuery

XMLType Abstraction

XVM (use “no query rewrite”)

Pushdown XQuery Rewrite

Procedural XQuery DB XQuery

SQL Execution

Relational Access

Methods

Source: S317428: Building Really Scalable XML Applications with Oracle XML DB and Oracle Text

Page 19: Creating Structure in Unstructured Data

So about what are we talking ?

Page 20: Creating Structure in Unstructured Data
Page 21: Creating Structure in Unstructured Data
Page 22: Creating Structure in Unstructured Data
Page 23: Creating Structure in Unstructured Data
Page 24: Creating Structure in Unstructured Data
Page 25: Creating Structure in Unstructured Data
Page 26: Creating Structure in Unstructured Data
Page 27: Creating Structure in Unstructured Data

WikiPedia

• Structured & Unstructured bits and pieces

• A lot of “unbounded” elements

• Not a lot of restrictions

• The bit with value is in element “tekst”

Page 28: Creating Structure in Unstructured Data

How do we get this Structured?

Page 29: Creating Structure in Unstructured Data
Page 30: Creating Structure in Unstructured Data
Page 31: Creating Structure in Unstructured Data

Strings = small & defined (12c?)

Ename pointer += 100;

Page 32: Creating Structure in Unstructured Data

<string1/><string2/><string3/>

Page 33: Creating Structure in Unstructured Data

Flexible, Humans No Design Patterns

Page 34: Creating Structure in Unstructured Data

<small/><verybigggr/><bigger/>

Page 35: Creating Structure in Unstructured Data

<small/><verybigggr/><bigger/>

<verybigggr> <empno>1</empno><ename>Marco</ename> <empno>2</empno> </verybigggr>

Page 36: Creating Structure in Unstructured Data
Page 37: Creating Structure in Unstructured Data
Page 38: Creating Structure in Unstructured Data
Page 39: Creating Structure in Unstructured Data
Page 40: Creating Structure in Unstructured Data

We need options!

Page 41: Creating Structure in Unstructured Data

“XMLType” Container

In Memory (document)

CLOB (document)

Object Relational (data)

Binary XML (data)

Page 42: Creating Structure in Unstructured Data

XMLType

XOB XML Schema

In Memory (document)

Page 43: Creating Structure in Unstructured Data

XMLType

Post Parse LOB Index

Binary XML Securefile (document/content)

Page 44: Creating Structure in Unstructured Data

XMLType

Fully Shredded Indexes

Object Relational (content)

Page 45: Creating Structure in Unstructured Data

Something else to Realize !

Page 46: Creating Structure in Unstructured Data

“What is the fastest way to get this stuff in the database…?”

Page 47: Creating Structure in Unstructured Data

“…it depends…”

Page 48: Creating Structure in Unstructured Data

“So what is the fastest way to get

XML in the database…?”

Page 49: Creating Structure in Unstructured Data

“…it depends…”

Page 50: Creating Structure in Unstructured Data

“So what is the fastest way to get XML

in the database…

… and useful in my case…?”

Page 51: Creating Structure in Unstructured Data

Garbage IN – Garbage OUT

Page 52: Creating Structure in Unstructured Data

WikiPedia

• SQL*Loader

• Parallel or Direct

• Securefile LOB Column

• 2.5 hours

And no (performant) way to get the details out…

a.k.a “completely useless”

Page 53: Creating Structure in Unstructured Data

WikiPedia

• SQL*Loader

• Parallel or Direct

• Securefile Binary XML

• …2.5 hours ???

Page 54: Creating Structure in Unstructured Data

XML Parsing

• SAX - Simple API for XML

• DOM - Document Object Module

Page 55: Creating Structure in Unstructured Data

(domain) indexes

CLOB

XMLType CLOB

XMLType Binary XML

XMLType Object Relational

inse

rt p

erfo

rman

ce

select performance

fast

fast

Page 56: Creating Structure in Unstructured Data
Page 57: Creating Structure in Unstructured Data

XML Partitioning

• Object Relational Partitioning

– Equi-Partitioning since version Oracle 11.1.0.7.0

• Binary XML Partitioning

– Range, List, Hash

• Local partitioned XMLIndex

– LOCAL keyword in XMLIndex create syntax

• Partition Key on virtual column (Binary XML)

• Partition Key on column (Object Relational)

Page 58: Creating Structure in Unstructured Data

XMLType

Post Parse LOB Index

Binary XML Securefile (document/content)

Page 59: Creating Structure in Unstructured Data

Structured XMLIndex

Unstructured

XMLIndex

Driving access on CONTENT

paragraph

book

title author author

whitepaper

title author id

structured

content

bookstore

chapter

content

BTree

Index

Function based

Index (XPath)

BTree

Index Oracle XML

Text Index

Page 60: Creating Structure in Unstructured Data

Structured Data

Page 61: Creating Structure in Unstructured Data

Unstructured XMLIndex (UXI)

• PATH TABLE

• Use Path Subsetting

– Full Blown XMLIndex can be BIG

• Token Tables (XDB.X$......)

– Query re-write on Tokens

– Fuzzy Searches, //

– Optimizer Statistics

• Can be maintained manually

– Recorded in Pending Table

• Secondary indexes possible Path Table

Unstructured

XMLIndex

f (x)

Page 62: Creating Structure in Unstructured Data

Describe PATH TABLE

Page 63: Creating Structure in Unstructured Data

What’s hidden…

Page 64: Creating Structure in Unstructured Data

Path Table

Unstructured

XMLIndex

f (x)

PATH TABLE

Page 65: Creating Structure in Unstructured Data

Structured XMLIndex (SXI)

• CONTENT TABLE(s)

• Based on XMLTABLE syntax

• XMLTable construct can be

nested:

– VIRTUAL column alias

• Can be maintained manually

• Secondary indexes possible

Content

Tables

Structured

XMLIndex

f (x)

Page 66: Creating Structure in Unstructured Data

• A “regular” heap table with columns…

• Ideal for secondary indexes, if needed.

Describe CONTENT TABLE

Page 67: Creating Structure in Unstructured Data

CONTENT TABLE(s)

Content

Tables

Structured

XMLIndex

f (x)

Page 68: Creating Structure in Unstructured Data

Binary XML – No Index

Page 69: Creating Structure in Unstructured Data

Binary XML + XMLIndex (SXI)

Page 70: Creating Structure in Unstructured Data

Binary XML + XMLIndex + Sec.Ind.

Page 71: Creating Structure in Unstructured Data

Binary XML + XMLIndex + Sec.Ind.

Page 72: Creating Structure in Unstructured Data

Un-Structured Data

Page 73: Creating Structure in Unstructured Data

XML Full Tekst Index

• Based on Oracle Text Index, XQuery Full Text

• XML Namespace Aware

• XML Semantic aware full text search

– Full-Tekst Selection Expression – contains text

– Logical Full Text Operator – ftor, ftand, ftMildNot

– Context Aware full text search

Page 74: Creating Structure in Unstructured Data
Page 75: Creating Structure in Unstructured Data
Page 76: Creating Structure in Unstructured Data
Page 77: Creating Structure in Unstructured Data
Page 78: Creating Structure in Unstructured Data
Page 79: Creating Structure in Unstructured Data
Page 80: Creating Structure in Unstructured Data
Page 81: Creating Structure in Unstructured Data
Page 82: Creating Structure in Unstructured Data
Page 83: Creating Structure in Unstructured Data
Page 84: Creating Structure in Unstructured Data

Balanced Design

• Inserts, Updates & Deletes

– XML Future Changes

– Index Maintenance

• Selects

– In Memory

– Via Indexes

• XML Validation

– Strict, Lazy

– Client Side Possibilities

In Memory On Disk

Page 85: Creating Structure in Unstructured Data

Reward

• Optimal performance

• Out performing XML

• Proper design will give

performance increase over

XML handling…

…proper design is still key…

Page 86: Creating Structure in Unstructured Data
Page 87: Creating Structure in Unstructured Data

References

Oracle XML DB

– http://www.oracle.com/pls/db112/homepage

XML DB FAQ Thread

– http://forums.oracle.com/forums/thread.jspa?threadID=410714

Personal Blog

– http://www.xmldb.nl

– http://technology.amis.nl

Page 88: Creating Structure in Unstructured Data

References

Daniela Florescu, Oracle Corporation

Advances in XML and XQuery

Sam Idicula, Oracle XML DB Development Team

Binary XML Storage and Query Processing in Oracle

Jinyu Wang, Scott Brewton

Making XML Technology Easier to Use

Joel Spolsky - Joel on Software

Back to Basics