torodb: a bridge between the nosql and relational worlds
DESCRIPTION
In the recent years, NoSQL databases have been gaining a lot of traction. Most of them haven been designed and written from scratch. Building on the principles of schema-less and high scalability, they offer a distinct approach to that of relational databases. But rather than re-using what the industry has learned in the last 3 decades of database development, most of these databases are re-inventing the wheel and designing the data storage layers -one of the toughest part when building a database- from scratch. ToroDB is a database that uses instead relational databases as well-known, durable, scalable and fast -despite what many would saystorage layers as a foundation to build a schema-less, document-oriented, scalable database. This project has been recently published as open-source software. It will effectively be the very first general-purpose database ever built in Spain. Document databases store documents, which are basically hierarchical, nested data structures of sets of key-value pairs. Current approaches to store them in relational databases are limited to storing documents in some form of binary serialization. We found is a set of algorithms to transform a document into a set of document-parts that can individually be stored in relational tables. This includes dynamic creation of tables, when needed, to match a table's structure to that of the information to be stored. This means there is no engineering effort required in building the storage subsystem, which should handle durability, isolation and concurrency –all of which are tough properties to implement. But even more importantly, there are very significant performance advantages, both in query time and storage savings. Query time improves as queries targeting subsets of the documents (which are most of the queries) need only to address a subset of the data -as it is partitioned into tables- rather than reading the whole database. Storage savings are achieved by avoiding repetition of the schema of every document –many documents share the same schema (“structure”) but all them need to repeat that. Our benchmarks shows that JSON documents require in ToroDB 29% to 68% of the storage required for the same data on a MongoDB database. These means significant less I/O, significant less cost, and greater (vertical) scalability. This presentation shows how ToroDB works, how the JSON documents are split into tables. Why current document-oriented databases fail to maximize the performance of BigData requirements –ToroDB also includes a mechanism for storing in columnar format parts of the documents to improve aggregate-type queries, obtaining impressive performance benefits. And, finally, how this all can be done in a compatible way with existing systems: ToroDB includes a layer that natively speaks the MongoDB protocol, hence becoming a drop-in replacement for MongoDB installations, but running on top of existing relational databases.TRANSCRIPT
![Page 2: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/2.jpg)
About *8Kdata*
● Research & Development in databases
● Consulting, Training and Support in PostgreSQL
● Founders of PostgreSQL España, 3rd largest PUG in the world (322 members as of today)
● About myself: CEO at 8Kdata:@ahachetehttp://linkd.in/1jhvzQ3
www.8kdata.com
![Page 3: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/3.jpg)
How big is “NoSQL”?
Source: 451 Research
![Page 4: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/4.jpg)
Why people want “NoSQL”?
● Schema-less
● High availability
● It's cool
![Page 5: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/5.jpg)
The schema-less fallacy
{“name”: “Álvaro”,“surname”: “Hernández”,“height”: 200,“hobbies”: [
“PostgreSQL”, “triathlon”]
}
![Page 6: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/6.jpg)
The schema-less fallacy
{“name”: “Álvaro”,“surname”: “Hernández”,“height”: 200,“hobbies”: [
“PostgreSQL”, “triathlon”]
}metadata → Isn't that... schema?
![Page 7: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/7.jpg)
The schema-less fallacy: BSON
metadata → Isn't that... schema?
{“name”: (string) “Álvaro”,“surname”: (string) “Hernández”,“height”: (number) 200,“hobbies”: {
“0”: (string) “PostgreSQL” , “1”: (string) “triathlon”
}}
![Page 8: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/8.jpg)
The schema-less fallacy
● It's not schema-less
● It is “attached-schema”
● It carries an overhead which is not 0
![Page 9: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/9.jpg)
High availability: at what cost?
MongoDB:➔ Unacknowledged: 42% data loss➔ Safe: 37% data loss➔ Only majority is safe
http://aphyr.com/posts/284-call-me-maybe-mongodb
Jepsen!!! :)
![Page 10: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/10.jpg)
More NoSQL struggle
● Durability is sometimes not guaranteed on a single node
● Programming for AP systems may be a big burden
● Most (all?) NoSQL databases wrote their storage from scratch. Journaling, concurrency are really hard
![Page 11: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/11.jpg)
Can we do a better “NoSQL”?
● Document model is very appealing to many. Let's offer it
● DRY: why not use relational databases? They are proven, durable, concurrent and flexible
● Why not base it on relational databases, like PostgreSQL?
![Page 12: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/12.jpg)
Schema-attached repetition
{ “a”: 1, “b”: 2 }{ “a”: 3 }{ “a”: 4, “c”: 5 }{ “a”: 6, “b”: 7 }{ “b”: 8 }{ “a”: 9, “b”: 10 }{ “a”: 11, “b”: 12, “j”: 13 }{ “a”: 14, “c”: 15 }
Counting “document types” in collections of millions: at most, 1000s of different types
![Page 13: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/13.jpg)
Schema-attached repetition
How data is stored in schema-less
![Page 14: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/14.jpg)
Pettus and BTP inspired us
https://wiki.postgresql.org/images/b/b4/Pg-as-nosql-pgday-fosdem-2013.pdfhttp://www.slideshare.net/nosys/billion-tables-project-nycpug-2013
![Page 15: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/15.jpg)
ToroDB – Teaser https://flic.kr/p/9HzWhT
ToroDB
![Page 16: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/16.jpg)
What is ToroDB
● Open source, document-oriented, JSON database that runs on top of PostgreSQL
● JSON documents are stored relationally, not as a blob: significant storage and I/O savings
● Wire-protocol compatibility with Mongo
![Page 17: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/17.jpg)
ToroDB benefits
● 100% durable database
● High concurrency and performance
● Compatible with existing mongo API programs, clients
● Full set of JSON operations (MongoDB's “SELECT” API)
![Page 18: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/18.jpg)
ToroDB storage
● Data is stored in tables
● JSON documents are split by hierarchy levels, and each (plain) level goes to a different table
● Subdocuments are classified by “type”, which maps to tables
![Page 19: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/19.jpg)
ToroDB storage (II)
● A “structure” table keeps the subdocument “schema”
● Keys in JSON are mapped to attributes, which retain the original name
● Tables are created dinamically and transparently to match the exact types of the documents
![Page 20: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/20.jpg)
ToroDB storage (III)
How data is stored in ToroDB
![Page 21: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/21.jpg)
ToroDB storage internals
{ "name": "ToroDB", "data": { "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } }}
![Page 22: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/22.jpg)
ToroDB storage internals
The document is split into the following subdocuments:
{ "name": "ToroDB", "data": {}, "nested": {} }
{ "a": 42, "b": "hello world!"}
{ "j": 42, "deeper": {}}
{ "a": 21, "b": "hello"}
![Page 23: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/23.jpg)
ToroDB storage internals
select * from demo.t_3┌─────┬───────┬────────────────────────────┬────────┐│ did │ index │ _id │ name │├─────┼───────┼────────────────────────────┼────────┤│ 0 │ ¤ │ \x5451a07de7032d23a908576d │ ToroDB │└─────┴───────┴────────────────────────────┴────────┘select * from demo.t_1┌─────┬───────┬────┬──────────────┐│ did │ index │ a │ b │├─────┼───────┼────┼──────────────┤│ 0 │ ¤ │ 42 │ hello world! ││ 0 │ 1 │ 21 │ hello │└─────┴───────┴────┴──────────────┘select * from demo.t_2┌─────┬───────┬────┐│ did │ index │ j │├─────┼───────┼────┤│ 0 │ ¤ │ 42 │└─────┴───────┴────┘
![Page 24: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/24.jpg)
ToroDB storage internals
select * from demo.structures┌─────┬────────────────────────────────────────────────────────────────────────────┐│ sid │ _structure │├─────┼────────────────────────────────────────────────────────────────────────────┤│ 0 │ {"t": 2, "data": {"t": 1}, "nested": {"t": 3, "deeper": {"i": 1, "t": 1}}} │└─────┴────────────────────────────────────────────────────────────────────────────┘
select * from demo.root;┌─────┬─────┐│ did │ sid │├─────┼─────┤│ 0 │ 0 │└─────┴─────┘
![Page 25: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/25.jpg)
ToroDB storage and I/O savings
29% - 68% storage required,compared to Mongo 2.6
![Page 26: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/26.jpg)
ToroDB performance
![Page 27: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/27.jpg)
ToroDB performance (II)
![Page 28: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/28.jpg)
ToroDB: query “by structure”
● ToroDB is effectively partitioning by type
● Structures (schemas, partitioning types) are cached in ToroDB memory
● Queries only scan a subset of the data.
● Negative queries are served directly from memory.
![Page 29: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/29.jpg)
ToroDB: Developer Preview
● ToroDB launched on October 2014, as a Developer Preview. Support for CRUD and most of the SELECT API
● github.com/torodb
● RERO policy. Comments, feedback, patches... greatly appreciated
● AGPLv3
![Page 30: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/30.jpg)
ToroDB: Developer Preview
● Clone the repo, build with Maven
● Or download the JAR:http://maven.torodb.com/release/com/torodb/torodb/0.11/torodb-0.11-jar-with-dependencies.jar
●Usage:java -jar torodb-version.jar –helpjava -jar torodb/target/torodb-version.jar -d dbname -u dbuser -P 27017Connect with normal mongo console!
![Page 31: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/31.jpg)
ToroDB: Community Response
![Page 32: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/32.jpg)
ToroDB: Community Response
![Page 33: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/33.jpg)
ToroDB: Roadmap
● Current Developer Preview is single-node
● Version 1.0:➔ Expected Q1 2015➔ Production-ready➔ MongoDB Replication support (Paxos-based replication protocol?)
➔ Very high compatibility with Mongo API
![Page 34: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/34.jpg)
Big Data speaking mongo:Vertical ToroDB
What if we use CitusData's cstore to store the JSON documents?
![Page 35: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/35.jpg)
1.17% - 20.26% storage required,compared to Mongo 2.6
Big Data speaking mongo:Vertical ToroDB
![Page 36: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/36.jpg)
“Software acknowledgements”
● PostgreSQL!
● The Netty framework
● jOOQ
● Guava, guice, findbugs
● Hikari CP
![Page 37: ToroDB: a bridge between the NoSQL and Relational worlds](https://reader034.vdocument.in/reader034/viewer/2022052622/559445761a28ab02738b4583/html5/thumbnails/37.jpg)