agile web-development with rethinkdb - the …...4 what is rethinkdb? • open-source database for...
Post on 11-Jul-2020
4 Views
Preview:
TRANSCRIPT
Agile web-developmentwith RethinkDB
Ilya Verbitskiy
2
Ilya Verbitskiy
• Distributed systems, application security, fintech
• ilya@verbitskiy.co
• @ilich_x86
• https://github.com/ilich
4
What is RethinkDB?
• Open-source database for building realtime web applications.
• NoSQL database that stores schemaless JSON documents.
• Distributed database that is easy to scale.
• High availability database with automatic failover and robust fault tolerance.
• The second most popular database on GitHub
5
The Good
• Changefeeds.
• Map-reduce.
• Geospatial queries.
• Collaborative web and mobile apps.
• Streaming analytics apps.
• Multiplayer games.
• Realtime marketplaces.
• Connected devices.
6
The Bad
• RethinkDB is not a good choice if you need full ACID support or strong schema enforcement.
• If you are doing deep, computationally-intensive analytics you are better off using a system like Hadoop.
• In some cases RethinkDB trades off write availability in favor of data consistency.
7
RethinkDB vs. …
• MongoDB
• Firebase
8
Can I use my programming language?
• JavaScript/Node.js
• Python
• Ruby
• Java
• C#/.NET
• C++
• Go
• PHP
• … and even more on https://rethinkdb.com/docs/install-drivers/
9
RethinkDB Structure
• Database → Table → Document
• Document is a schemaless JSON documents.
10
Introduction to ReQL
• ReQL is the RethinkDB query language.
• ReQL key principles:• ReQL embeds into your programming language.
• All ReQL queries are chainable.
• All queries execute on the server.
• Good starting point if you already know SQL: https://www.rethinkdb.com/docs/sql-to-reql/javascript/
11
Understanding ReQL
• Client driver translates ReQL queries RethinkDB protocol and sends to the server for execution.
• Anonymous function must return a valid ReQL expression.
• In JavaScript you should use lt and gt commands instead of < and >operators.
12
Supported data types
• Number
• String (UTF-8)
• Boolean
• Null
• Object
• Array (by default, up to 100,000 elements)
• Dates and times
• Binary objects
• Geometry objects and geospatial queries (indexes, GeoJSON support)
13
Data modeling in RethinkDB
• Embedded arrays• Similar to MongoDB
• Queries are simpler.
• The data is often colocated on disk. If you have a dataset that doesn’t fit into RAM, data is loaded from disk faster.
• Any update to the main document atomically updates both the main data and the linked data.
• Up to 100,000 elements by default.
• Deleting, adding or updating a document requires loading the entire array, modifying it, and writing the entire document back to disk.
• Because of the previous limitation, it’s best to keep the size of the array to no more than a few hundred documents.
14
Data modeling in RethinkDB
• Multiple tables• Similar to SQL
• Operations on parent document don’t require loading the data for every child document for a given parent into memory.
• There is no limitation on the number of child documents, so this approach is more suitable for large amounts of data.
• The queries linking the data tend to be more complicated.
• With this approach you cannot atomically update both the parent data and the child data.
15
Changefeeds
• Changefeeds allow clients to receive changes on a table.
• The changes command returns a cursor that receives updates.
• Each update includes the new and old value of the modified record.
• Changefeeds cannot guarantee delivery, since they are unidirectional with no acknowledgement returned from clients.
16
// Node.js
r.table('users').changes().run(conn, function(err, cursor) {
// Use cursor to process changes
});
….
// Sample cursor
{
old_val: null,
new_val: {
{ "city" : "MINNEAPOLIS", "state" : "MN", "_id" : "55311" },
}
}
17
Commands supporting changefeeds
• filter
• getAll
• map
• pluck
• between
• union
• min
• max
• orderBy.limit
18
Sharding and replication
• RethinkDB is designed for clustering and easy scalability.
• To add a new server to the cluster, just launch it with the --join parameter.
• Configure sharding and replication per table.
• Any feature that works with a single database will work in a shardedcluster.
19
Sharding and replication
• There is a hard limit of 64 shards.
• All sharding is currently done based on the table’s primary key only.
• RethinkDB uses system statistics for the table to find the optimal set of split points to break up the table evenly
• Sharding and replication is configured through table configurations• Number of shards
• Number of replicas
• Replicas can be associated with servers using server tags.• Tags are assigned to a server using --server-tag parameter
• Use rebalance command to rebalances the shards of a table.
• Use reconfigure command to setup a table’s sharding and replication.
20
RethinkDB Security
• There is a little chance to have an injection attack against RethinkDB because it embeds into your programming language.• Make sure that you use the latest database drivers!
• Be careful with .match() function. It may cause regular expression injection attack.
• Do not use r.js(‘…’) to execute JavaScript code on the server. It is vulnerable to JavaScript injection attack.
• Use TLS encryption.
• By default, admin account does not have password. Always run you primary server with --initial-password parameter.
• You cannot set password to administrator web-interface. Make sure it is behind firewall or bound to localhost (--bind-http parameter).
21
Additional Resources
• RethinkDB: https://www.rethinkdb.com/
• RethinkDB installation: https://www.rethinkdb.com/docs/install/
• Thirty-second quickstart: https://www.rethinkdb.com/docs/quickstart/
• Ten-minute guide: https://www.rethinkdb.com/docs/guide/javascript/
• Cookbook: https://www.rethinkdb.com/docs/cookbook/javascript/
• Cheat sheet: https://www.rethinkdb.com/docs/sql-to-reql/javascript/
22
Questions?
top related