the connection factory jeroen van rotterdam, cto may 19th, www9

The Connection Factory

Jeroen van Rotterdam, CTO

May 19th, WWW9

Contents

- Xhive setup

- Xpath

- Xpath performance issues within XML collections

Xhive

- OO-XML database- Highly scalable- High granularity- W3C DOM L2 compliant- Xpath 1.0 compliant

Architecture

Xhive Core

OODB

XP

ath

DO

M C

ore

L2

Ex

ten

de

d D

OM

DO

M T

rav

ers

al

DB

Ad

min

istr

ato

r

RMI Layer ( EJB / CORBA / SOAP )

RMI Layer

Client

Sc

he

ma

SQ

L l

oa

de

r

Architecture

Why XPath

Competing solutions:

- XML-QL: Where-In constructs- XQL: limited- SQL: no alternative

Xpath a complete pattern match language.

Xpath

Advantages:

- fairly complete- multiple axes- supported by W3C- base for Xpointer, Xlink- base for XML Query WG- user based functions

Disadvantages:

- document oriented- minor different tree model- no updates

Extending DOMCollection setup:

Every document is a “Bastard Node”

getLastChild()getFirstChild()

null

Library Node

Document Nodes

getParentNode()

Library Node

Advantages

- Natural extension of DOM- extendible- closely related to directory structures- searchable with Xpath

Library Node

Disadvantages

- potential bottleneck

Xpath

- Xpath in a large PDOM collection environment:

1. Address memory issues2. Solve differences in specs3. Address performance issues

Memory issues

- Avoid recursion- make subresults persistent capable

Solve differences

Differences in specs are f.i.:

- getParent on attributes vs. ownerElement- namespace nodes

Performance

Increase Xpath performance:

- Query analysis- Avoid reparsing- Lazy evaluation- Index structures- Cache strategy- DTD analysis- Statistical data

Performance

1. Query analysis:

a. Can I simplify my query

f.i: /child::chapter[5+5]

Performance

1. Query analysis:

b. Does your query depends on the context node.

Absolute queries are context independent:

“Give me all chapters where the title is the same as the book title”

//chapter[title=string(/book/title)]Evaluate string(/book/title) only once.

Performance

2. Storing parsed queries:

“Compile”, optimize queries only once

Performance3. Lazy evaluation:

f.i. operations on Nodesets

- booleans (evaluate first node)- strings (first in doc order)- number (string to number)

Example: “give me all chapters which have paragraphs”

/chapter[paragraph]

Finding 1 paragraph will do

Performance

4. Indexing:

- getFirstChildElementByName(String name)- getNextSiblingElementBySameName()- getFirstChildByType( short type )- getNextSiblingByType( short type )

Performance5. Caching strategy:

top level paging/cluster strategy

Library Node

Document Nodes......

...... Root elements

Performance

6. Use DTD information:

f.i. /child::chapter/child::book[4]

Might return null if you have info on the DTD’s used.

Performance

7. Gather statistical info:

DTD’s or Xschema specify structures that may occur, not what’s actually in your collection.

Conclusion

- DOM within database environments- Xpath on top of a PDOM - Xpath is fairly complete- Focus on performance

WWW9

Beta testers, Developers wanted.

Email: [email protected]

Have fun…...

the connection factory jeroen van rotterdam, cto may 19th, www9

Documents

performance slide

xpath slide

compliant slide

www9 slide

architecture slide

xpath xpath

bastard node slide

xpath performance issues