non-relational data storage - ut · • non-relational data models aim to store aggregated data...
TRANSCRIPT
![Page 1: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/1.jpg)
Non-Relational Databases
Pelle Jakovits
25 October 2017
![Page 2: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/2.jpg)
Outline
• Background– Relational model– Database scaling– The NoSQL Movement– CAP Theorem
• Non-relational data models– Key-value– Document-oriented– Column family– Graph
• Example databases
2/36
![Page 3: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/3.jpg)
The Relational Model
• Data is stored in tables
• Strict relationships between tables
– Foreign key references between columns
• The expected format of the data is specified with a restrictive schema
• Data is typically accessed using Structured Query Language (SQL)
3/36
![Page 4: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/4.jpg)
Database Scaling
• Vertical scaling – on one machine
• Horizontal scaling – across a cluster of machines
• Relational model does not scale well horizontally
– Because there are too many dependencies in relational model
– Database sharding is one approach
4/36
![Page 5: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/5.jpg)
Sharding
5/36
![Page 6: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/6.jpg)
Relational database
6/38
https://dev.mysql.com/doc/sakila/en/
![Page 7: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/7.jpg)
The NoSQL Movement
• Emergence of persistence solutions using non-relational data models
• Driven by the rise of „Big Data“ and Cloud Computing
• Non-relational data models are based on Key - Value structure
• Simpler schema-less key-value based data models scale better than the relational model
• NoSQL is a broad term with no clear boundaries– The term NoSQL itself is very misleading
– No SQL?
– Not only SQL?
7/36
![Page 8: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/8.jpg)
CAP Theorem
• It is impossible for a distributed computer system to simultaneously provide all three of the following:– Consistency - every read receives the most recent write or
an error– Availability - every request receives a response– Partition/Fault tolerance - the system continues to operate
despite arbitrary partitioning (network failures, dropped
• Have to choose between consistency or availability• NoSQL solutions which are more focused on
availability try to achieve eventual consistency• When trying to aim for both consistency or availability
– Will have high latency
8/36
![Page 9: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/9.jpg)
CAP Theorem
9/36
![Page 10: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/10.jpg)
Benefits of the Key-value Model
• Horizontal scalability
– Data with the same Key stored close to each other
– Suitable for cloud computing
• Flexible schema-less models suitable for unstructured data
• Fetching data by key can be very fast
10/36
![Page 11: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/11.jpg)
Indexing and Partitioning
• In Key-value model, Key acts as an index
• Secondary indexes can be created in some solutions
– Implementations vary from partitioning to local distributed indexes
• Data is partitioned between different machines in the cluster
• Usually data is partitioned by rows or/and column families
• Users can often specify partitioning parameters
– Gives control over how data is distributed
– Very important for optimizing query speed
11/36
![Page 12: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/12.jpg)
Aggregate-oriented DB
• Non-relational data models aim to store aggregated data together
• Aggregate is a collection of data that is treated as a unit– E.g. a customer and all of his orders
• In normalized relational databases aggregates are computed using GroupBy operations
• Keyed aggregates make for a natural unit of data sharding
12/36
![Page 13: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/13.jpg)
Non-relational Data Models
![Page 14: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/14.jpg)
The Key-value Model
• Data stored as key-value pairs
• The value is an opaque blob to the database
• Examples: Dynamo, Riak
14
![Page 15: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/15.jpg)
The Document-oriented Model
• Data is also stored as key-value pairs
• Value is a „document“ and has further structure
• No strict schema
• Examples: CouchDB, MongoDB
15/36
![Page 16: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/16.jpg)
Example
• Aggregates described in JSON using map and array data structures
16
![Page 17: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/17.jpg)
The Column Family Model
• Data stored in large sparse tabular structures
• Columns are grouped into column families
– Column family is a meaningful group of columns
– Similar concept as a table in relational database
• A record can be thought of as a two-level map
• New columns can be added at any time
• Examples: BigTable, Cassandra, HBase
17/36
![Page 18: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/18.jpg)
Column Family Example
18/36
a001
Names
username jsmith
firstname John
lastname Smith
Contactsphone 5 550 001
email [email protected]
Messages
item1 Message1
item2 Message2
… …
itemN MessageN
b014
Names
username pauljones
Contacts
Messages
item1 new Message
![Page 19: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/19.jpg)
https://neo4j.com/developer/guide-build-a-recommendation-engine/
Graph Databases
• Data stored as nodes and edges of a graph
• The nodes and edges can have fields and values
• Aimed at graph traversal queries in connected data
• Examples: Neo4J, FlockDB
19/36
![Page 20: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/20.jpg)
Non-relational Data Models
• In non-relational data stores data is denormalized
• It is common to also store JSON documents in key-value stores
• The key-value, document-oriented and column family models are aggregate oriented models
• The other models are based on the key-value model
• In reality the classification of databases into different models is not as straight forward
• Multiparadigm databases also exist (e.g ArangoDB)
20/36
![Page 21: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/21.jpg)
Non Relational Database Examples
![Page 22: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/22.jpg)
Riak
• Based on Amazon's Dynamo specification
• Key-value model
• Distributed decentralized persistent hash table
– Keyspace is partitioned between nodes
• Consistent hashing
– Avoid hash-to-node remapping when a node is added or removed
• Eventual Consistency using Multiversion Concurrency Control (MVCC)
22/36
![Page 23: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/23.jpg)
Riak
• RESTful query interface
– Basic PUT, GET, POST, and DELETE functions
• Links and link walking
– One way relationships between data objects
– Turns Key-Value store into a simple Graph database
• Higher level querying on-top of Key-Value structure:
– Riak search is based on Apache Solr search engine
– MapReduce in Erlang and JavaScript
23/36
![Page 24: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/24.jpg)
MongoDB
• Document oriented (BSON)
• Query language based on JavaScript
• GridFS for BLOB file storage
• Master-slave architecture
• Linking between documents
• Supports MapReduce in JavaScript
24/36
![Page 25: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/25.jpg)
Query example
db.inventory.find({
status: "A",
$or: [ {qty: { $lt: 30 }}, { item: /^p/ }]
})
• Matches the following SQL query:
SELECT * FROM inventory WHERE status = "A" AND ( qty < 30 OR item LIKE "p%")
25/38
![Page 26: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/26.jpg)
CouchDB
• Document oriented (JSON)
• RESTful query interface
• Built in web server
• Web based query front-end Futon
26/36
![Page 27: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/27.jpg)
CouchDB
• MapReduce is the primary query method (JavaScript and Erlang)
• Materialized views as results of incremental MapReduce jobs
• CouchApps – javascript-heavy web applications built entirely on top of CouchDBwithout a separate web server for the logic layer
27/36
![Page 28: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/28.jpg)
Query example
Documents:
{ „id“: 2, "position" : "programmer", "first_name": "James", "salary" : 100 }
{ „id“: 7, "position" : "support", "first_name": "John", "salary" : 23 }
Lets extract average salary for each unique position
Map:
function(doc, meta) {
if (doc.position && doc.salary) { emit(doc.position,doc.salary); }
}
Reduce:
function(key, values, rereduce) {
return sum(values)/values.length;
}
Some Reduce functions have been pre-defined: _sum(), _count(), _stats()
28/38
![Page 29: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/29.jpg)
Cassandra
• Column family model
• Data model from BigTable, distribution model from Dynamo (decentralized)
• Uses Cassandra Query Language (CQL)
– SQL-like querying language
29/36
![Page 30: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/30.jpg)
Cassandra
• Provides Availability & Partition-Tolerance from the CAP theorem
• Static and dynamic column families
• Dynamic column families as materialized views on data
• The concept of super-columns
– Family of column families
30/36
![Page 31: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/31.jpg)
Neo4J
• Open source NoSQL Graph Database
• Uses the Cypher Query Language for querringgraph data
• Graph consists of Nodes, Edges and Attributes
31/36
![Page 32: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/32.jpg)
Neo4J Querry example
MATCH (you {name:"You"})
MATCH (expert)-[:WORKED_WITH]->
(db:Database {name:"Neo4j"})
MATCH path = shortestPath(
(you)-[:FRIEND*..5]-(expert)
)
RETURN db,expert,path
32/38https://neo4j.com/developer/cypher-query-language/
![Page 33: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/33.jpg)
Why use Non-Relational DB?
• Volume of Data
• Elasticity
• Flexible Schemas
• Cost
33/38
![Page 34: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/34.jpg)
http://martinfowler.com/bliki/PolyglotPersistence.html
Polyglot Persistence
34/36
![Page 35: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/35.jpg)
Conclusions
• In recent years there has been a rise in non-relational (NoSQL) data stores
• This is related to the rise of cloud computing –key-value models offer better scalability
• The NoSQL landscape is extremely varied• The four basic non-relational data models are:
– The key-value model– The document-oriented model– The column family model– Graph databases
35/36
![Page 36: Non-Relational Data Storage - ut · • Non-relational data models aim to store aggregated data together • Aggregate is a collection of data that is treated as a unit –E.g. a](https://reader034.vdocument.in/reader034/viewer/2022042115/5e92fe12f15bf82da619a3a9/html5/thumbnails/36.jpg)
That is All
• Next week's practice session
– Using NoSQL databases
• Exam times
– Wednesday - 01 November 2017
– Thursday - 02 November 2017
• NB! A set of example exam questions are available on the course website
36/36