the connection factory jeroen van rotterdam, cto may 19th, www9
TRANSCRIPT
![Page 1: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/1.jpg)
The Connection Factory
Jeroen van Rotterdam, CTO
May 19th, WWW9
![Page 2: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/2.jpg)
Contents
- Xhive setup
- Xpath
- Xpath performance issues within XML collections
![Page 3: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/3.jpg)
Xhive
- OO-XML database- Highly scalable- High granularity- W3C DOM L2 compliant- Xpath 1.0 compliant
![Page 4: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/4.jpg)
Architecture
Xhive Core
OODB
XP
ath
DO
M C
ore
L2
Ex
ten
de
d D
OM
DO
M T
rav
ers
al
DB
Ad
min
istr
ato
r
RMI Layer ( EJB / CORBA / SOAP )
RMI Layer
Client
Sc
he
ma
SQ
L l
oa
de
r
![Page 5: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/5.jpg)
Architecture
![Page 6: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/6.jpg)
Why XPath
Competing solutions:
- XML-QL: Where-In constructs- XQL: limited- SQL: no alternative
Xpath a complete pattern match language.
![Page 7: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/7.jpg)
Xpath
Advantages:
- fairly complete- multiple axes- supported by W3C- base for Xpointer, Xlink- base for XML Query WG- user based functions
Disadvantages:
- document oriented- minor different tree model- no updates
![Page 8: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/8.jpg)
Extending DOMCollection setup:
Every document is a “Bastard Node”
getLastChild()getFirstChild()
null
Library Node
Document Nodes
getParentNode()
![Page 9: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/9.jpg)
Library Node
Advantages
- Natural extension of DOM- extendible- closely related to directory structures- searchable with Xpath
![Page 10: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/10.jpg)
Library Node
Disadvantages
- potential bottleneck
![Page 11: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/11.jpg)
Xpath
- Xpath in a large PDOM collection environment:
1. Address memory issues2. Solve differences in specs3. Address performance issues
![Page 12: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/12.jpg)
Memory issues
- Avoid recursion- make subresults persistent capable
![Page 13: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/13.jpg)
Solve differences
Differences in specs are f.i.:
- getParent on attributes vs. ownerElement- namespace nodes
![Page 14: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/14.jpg)
Performance
Increase Xpath performance:
- Query analysis- Avoid reparsing- Lazy evaluation- Index structures- Cache strategy- DTD analysis- Statistical data
![Page 15: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/15.jpg)
Performance
1. Query analysis:
a. Can I simplify my query
f.i: /child::chapter[5+5]
![Page 16: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/16.jpg)
Performance
1. Query analysis:
b. Does your query depends on the context node.
Absolute queries are context independent:
“Give me all chapters where the title is the same as the book title”
//chapter[title=string(/book/title)]Evaluate string(/book/title) only once.
![Page 17: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/17.jpg)
Performance
2. Storing parsed queries:
“Compile”, optimize queries only once
![Page 18: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/18.jpg)
Performance3. Lazy evaluation:
f.i. operations on Nodesets
- booleans (evaluate first node)- strings (first in doc order)- number (string to number)
Example: “give me all chapters which have paragraphs”
/chapter[paragraph]
Finding 1 paragraph will do
![Page 19: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/19.jpg)
Performance
4. Indexing:
- getFirstChildElementByName(String name)- getNextSiblingElementBySameName()- getFirstChildByType( short type )- getNextSiblingByType( short type )
![Page 20: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/20.jpg)
Performance5. Caching strategy:
top level paging/cluster strategy
Library Node
Document Nodes......
...... Root elements
![Page 21: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/21.jpg)
Performance
6. Use DTD information:
f.i. /child::chapter/child::book[4]
Might return null if you have info on the DTD’s used.
![Page 22: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/22.jpg)
Performance
7. Gather statistical info:
DTD’s or Xschema specify structures that may occur, not what’s actually in your collection.
![Page 23: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9](https://reader036.vdocument.in/reader036/viewer/2022062511/5514bd06550346b0478b45ac/html5/thumbnails/23.jpg)
Conclusion
- DOM within database environments- Xpath on top of a PDOM - Xpath is fairly complete- Focus on performance