scalable databases - from relational databases to polyglot persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

SCALABLE DATABASESFrom Relational Databases

To Polyglot Persistence

Sergio Bossa [email protected]://twitter.com/sbtourist


About Me● Software architect and engineer

● Gioco Digitale (online gambling and casinos)● Open Source enthusiast

● Terracotta Messaging (http://forge.terracotta.org)● Terrastore (http://code.google.com/p/terrastore)● Actorom (http://code.google.com/p/actorom)

● (Micro-)Blogger● http://twitter.com/sbtourist● http://sbtourist.blogspot.com


Five fallacies of data-centric systems

Data model is static.Data volume is predictable.

Data access load is predictable.Database topology doesn't change.

Database never fails.


Scalable databases in action

● Scaling your database as a way to solve fallacies above.● Scale to handle heterogeneous data.● Scale to handle more data.● Scale to handle more load.● Scale to handle topology changes due to:

● Unplanned growth.● Unpredictable failures.


Scaling Relational Databases


Master-Slave replication● Master - Slave replication.

● One (and only one) master database.

● One or more slaves.● All writes goes to the master.

● Replicated to slaves.● Reads are balanced among master

and slaves.● Major issues:

● Single point of failure.● Single point of bottleneck.● Static topology.


Master-Master replication● Master - Master replication.

● One or more masters.● Writes and reads can go to any

master node.● Writes are replicated among

masters.● Major issues:

● Limited performance and scalability (typically due to 2PC).

● Complexity.● Static topology.


Vertical partitioning● Vertical partitioning.

● Put tables belonging to different functional areas on different database nodes.● Scale your data and load by

function.● Move joins to the application

level.● Major issues:

● No more truly relational.● What if a functional area grows too

much?


Horizontal partitioning● Horizontal partitioning.

● Split tables by key and put partitions (shards) on different nodes.● Scale your data and load by key.● Move joins to the application

level.● Needs some kind of routing.

● Major issues:

● No more truly relational.● What if your partition grows too

much?


Caching● Put a cache in front of your database.

● Distribute.● Write-through for scaling reads.● Write-behind for scaling reads and

writes.● Saves you a lot of pain, but ...

● “Only” scales read/write load.


Did we solve our fallacies?● We tried, but ...

● Still bound to the relational model.● Replication only covers a few use cases.● Partitioning is hard.● Caching is good, but not definitive.● ...

● Can we do any better?


It's Not Only SQL


NOSQL Characteristics● Main traits of characterization:

● Data Model.● Data Processing.● Consistency Model.● Scale Out.


Data Model (1)● Column-family based.● Structure:

● Key-identified rows with a sparse number of columns.● Columns grouped in families.● Multiple families for the same key.

● Highlights:● Dynamically add and remove columns.● Efficiently access columns in the same group (column

family).


Data Model (2)● Document based.● Structure:

● Key-identified documents.● Schema-less (but optionally constrained).

– JSON, XML ...● Highlights:

● Dynamically change inner documents structure.● Efficiently access documents as a unit.


Data Model (3)● Graph based.● Structure:

● Nodes to represent your data.● Relations as meaningful links between nodes.● Properties to enrich both.

● Highlights:● Rich data model.● Efficient, fast, traversal of nodes and relations.


Data Model (4)● Key-Value based.● Structure:

● Key-identified opaque values.● Highlights:

● Great flexibility.● Fast reads/writes for single entries.


Data Processing● Several options:

● Map/Reduce.● Predicates.● Range Queries.● ...

● One common principle:● Move processing toward related data.


Consistency Model (1)● Strict Consistency.

● All nodes ...● At every point in time ...● See a consistent view of the stored data.

– Per-key consistency.– Multi-key consistency.


Consistency Model (2)● Eventual Consistency.

● Only a subset of all nodes ...● At a specific point in time ...● See a consistent view of the stored data.

– Other nodes will serve stale data.– Other nodes will eventually get updates later.


Scale Out (1)● Master-based.

● Membership managed and broadcasted by masters.

● Data consistency guaranteed by masters.

● No SPOF with active/passive masters.

● No SPOB with active/active masters or cluster-cluster replication.

● Prone to partitioning failures.


Scale Out (2)● Peer-to-peer.

● Membership is maintained through multicast or gossip-based protocols.

● Data consistency is maintained through quorum protocols.

● Easier to scale.● Harder to maintain consistency.


NOSQL Use Cases● Use cases evolve along the following kinds of data:

● Rich.● Runtime.● Hot Spot.● Massive.● Computational.

● Do not use the same product for all cases.● Pick multiple products for different use cases.


NOSQL Products - Cassandra● Cassandra (http://incubator.apache.org/cassandra)● Data Model:

● Column-family based.● Data Processing:

● Range queries, Predicates.● Consistency:

● Eventual consistency.● Scalability:

● Peer-to-peer, gossip based.


NOSQL Products - Mongo DB● Mongo DB (http://www.mongodb.org)● Data Model:

● Document based (JSON).● Data Processing:

● Map/Reduce, SQL-like queries.● Consistency:

● Per-document strict consistency.● Scalability:

● Replication, partitioning (alpha).


NOSQL Products - Neo4j● Neo4j (http://neo4j.org)● Data Model:

● Graph based.● Data Processing:

● Path traversal, Index-based search.● Consistency:

● Strict consistency.● Scalability:

● Replication.


NOSQL Products - Riak● Riak (http://riak.basho.com)● Data Model:


● Map/Reduce.● Consistency:




NOSQL Products - Terrastore● Terrastore (http://code.google.com/p/terrastore)● Data Model:


● Range queries, Predicates.● Consistency:

● Per-document strict consistency.● Scalability:

● Master-based.


NOSQL Products - Voldemort● Voldemort (http://project-voldemort.com)● Data Model:

● Key-Value.● Data Processing:

● None.● Consistency:




NOSQL Products and Use Cases


Final words● A New World.

● New paradigms.● New use cases.● New products.

● Don't dismiss the old stuff.● Relational databases still have their place.

● Embrace change.● May the NOSQL power be with you.

● Let the Polyglot Persistence era begin!

scalable databases - from relational databases to polyglot persistence

Technology

sergio bossa http

sql sergio bossa

data processing

fallacies of data

comsbtouristsergio bossa

rich data model

data volume

related data