scalable xquery processing with zorba on top of mongodb
DESCRIPTION
Since a couple of years, the NoSQL movement has developed a variety of open-source document stores. Most of them focus on high availability, horizontal scalability, and are designed to run on commodity hardware. These products have gained great traction in the industry to store large amounts of flexible data (mostly JSON). In the meantime, XQuery has evolved to a standardized, full-fledged programming language for XML with native support for complex queries, indexes, updates, full-text search, and scripting. Moreover, JSON has recently been added as a first-level datatype into the language. As of today, it is without doubt the most robust and productive technology to process flexible data. The aim of this talk is to showcase the benefits that can be achieved by integrating the Zorba XQuery Processor with MongoDB. We will introduce the 28msec platform that seamlessly stores, indexes, and manages flexible data entirely in XQuery. The data itself is stored in MongoDB. The platform leverages MongoDB’s indexes, sharding, and consistency guarantees to scale-out horizontally. The talk will conclude by showing a benchmark of the platform and discuss perspectives of the outlined approach.TRANSCRIPT
Scalable XQuery ProcessingZorba Meets MongoDBWilliam Candillon {[email protected]}
msec
Two Drivers
Flexible Data
Scalability
MongoDBCouchBase
BaseXeXist-db
Standardized Query Language X ✔
Modern Query Processing X ✔
Typing X ✔
High Availability ✔ X
Sharding ✔ X
Available as a Service ✔ X
Flexible Data
Scalability
What can XML contribute to JSON Datastores?
A Standardized, Rock Solid Query Language
JSONiq - The SQL of NoSQL 28
JSONiq
• Open Specification: jsoniq.org
• Extension of the mature XQuery for JSON- Joins, Group-by, Filters, Search...
• Leverage the complete XQuery Family- Scripting, Updates, Full-Text
• Standardized Query Language- Run the same code accross multiple JSON stores
28
What can JSON datastore contribute to XML?
A Distributed and Scalable Store
The Goal
Depth of functionality
Scal
abili
ty &
Per
form
ance
• RDBMS
• memcached
• key/value • MongoDB
• XML DB
28
The Goal
Depth of functionnality
Scal
abili
ty &
Per
form
ance
RDBMS
• memcached
• key/value • MongoDB
• XML DB
• 28msec
28msec - XQuery on top of MongoDB
28
Meet Zorba
• Open Source XQuery Processor- Apache 2 License- Contributors: Oracle, 28msec, FLWOR Foundation
• The Complete Family- XQuery 3.0, Updates, Full-Text, Scripting, JSONiq- XQuery Data Definition Facility
• Pluggable Store API- Run Zorba on your own persistency layer
28
Zorba Architecture 28
Meet MongoDB
• Open Source JSON Document Store- License AGPL 3.0
• Focus on scalability- Replication accross multiple availability zones- Sharding- Atomic updates on documents
• Available as a service- MongoHQ, MongoLab
28
C2 MongoD
Config Servers
C3 MongoD
C1 MongoD
MongoS
App Server
Shard1 Shard2 Shard3
MongoS
App Server
MongoD
Replica set
MongoDB Deployment Example 28
The Goal
Runtime CollectionsXDM Indexes
MongoS CollectionsBSON Indexes
Zorba
MongoDB
28
The Goal
• Seamless XQuery Integration into MongoDB
Runtime CollectionsXDM Indexes
MongoS CollectionsBSON Indexes
Zorba
MongoDB
28
Application Example 28
Application Example 28
• Fetching sports news from XMLTeam.com
• Stored and indexed on MongoDB
• 1 million documents and counting
• Entirely built in XQuery from backend to frontend
• 1k loc, 1 developer, 1 week work
Collection Declarations 28
declare collection sports:docs as document-node();
Compiler Runtime
Store API
MongoDB
Zorba
declare collection ...
Collection Declarations
1.
2.
3.
Compile Query
createCollection(QName)
Create Collection
28
Index Declarations 28
declare %an:value-range index sports:by-datetime on nodes db:collection(xs:QName('sports:docs')) by ./sports-content/sports-metadata/@date-time;
Compiler Runtime
Store API
MongoDB
Zorba
declare index ...
Index Declarations
1.
2.
3.
Compile Query
Create Index
createIndex( qname, ordpath, keys)
28
Insert Nodes 28
let $uri := 'http://xmlteam.com/...'let $doc := http:get($uri)return db:insert-nodes($sports:docs, $doc)
Compiler Runtime
Store API
MongoDB
Zorba
db:insert-nodes(...)
Insert Nodes
1.
2.
3.
Process Query
Insert BSON
insertNode(qname, xdm)
28
MongoDB Store Layer
• Direct XQuery to MongoDB mapping- Collections- Indexes
• Converts XDM to BSON
• Inherits MongoDB consistency model
28
Request Processing on 28msec
Sausalito
MongoDB
Request Handler
Zorba
ELB
DataCompiled Code
HTTP Client
1
2
3
4
5 6
7
8
R
R
Store
ProcessorR
9
28
Availability Zone 1
Scaling Out 28
0
250
500
750
1000
10 40 50 70 80 100 120 150
Number of concurrent requests
Avg Response Time in ms
2 App Servers 4 App Servers
XQuery on Top of MongoDB
• Seamless Integration of XQuery with MongoDB- XDM to BSON- Collections and indexes mapping- Atomicity per document
• 28msec- XQuery Platform on top of MongoDB- Deploy your XQuery apps in 1-click- Scale up & down automatically
28
Take Away
• Two Drivers- Flexible Data- Scalability
• Two Champions- XQuery for Flexible Data- JSON Stores for Scalability
• Two Contributions- JSONiq: The SQL of NoSQL- XQuery Platform on top of MongoDB
28
Thank You!Questions?
msec