nosql - università degli studi di milano-bicocca
TRANSCRIPT
NoSql
• Definizione e ragioni del NoSql
• Modelli document based
• Modelli a grafo
• Modelli key/value
• Modelli wide column
• confronti
Indice
In principio fu…
http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
Storage!
• Relational model is very stricted– Closed world assumption– Minimization value
• RDBMS (the software able to manage the relationalmodel)– More than 35 years of R&D (security, optimization,
standardization)– ACID properties
• Very well know
• A large amount of data are still stored in DBMS– Porting data is a nightmare!
• For a large number of tasks is still the best option
Positive aspects of relational model
• ACID properties
• Atomic
• Consistence
• Isolation
• Durability
Positive aspect of RDBMS
• Relational model is very stricted
– Closed world assumption
– Minimization value
– One attribute →one value
– Not compatibile with modern programming language
– Not able to support loop in data (see later)
• RDBMS (the software able to manage the relational model)
– Hard to modify tables
– Not scalable
Limitation of relational model
Sviluppo di applicaizoni
Relational Database
Object Relational Mapping
Application
Code XML Config DB Schema
©Massimo Brignoli, Mongodb
And Even Harder To Iterate
New Table
New Table
New Column
Name Pet Phone Email
New Column
3 months later…
©Massimo Brignoli, Mongodb
• RDBMS are able to scale up easly, less scale out
– Scale up
– Scale out
Performance
• Fino a un limite…
Performance
• MultiValue databases at TRW in 1965.
• DBM is released by AT&T in 1979.
• Lotus Domino released in 1989.
• Carlo Strozzi used the term NoSQL in 1998 to name his lightweight, open-source relational database that did not expose the standard SQL interface.
• Graph database Neo4j is started in 2000.
• Google BigTable is started in 2004. Paper published in 2006.
• CouchDB is started in 2005.
• The research paper on Amazon Dynamo is released in 2007.
• The document database MongoDB is started in 2007 as a part of a open source cloud computing stack and first standalone release in 2009.
• Facebooks open sources the Cassandra project in 2008.
• Project Voldemort started in 2008.
• The term NoSQL was reintroduced in early 2009.
Storia
• Not only SQL
• Insieme di modelli di rappresentazione dei dati e relativi software di gestione
• Schema free (o schemaless)
• CAP theorem
• Base
– Basic Available, Soft state, Eventually consistency
NoSQL
• In the relational model usually
– First define the model (the set of attrbitues describe data and its relation)
– Then populated data
• If there is the need to add a new attribute (or change an existin one
– First modify the model, then (if possibile) change data
• In most NoSQL model there is no strict model; it isbased on the data you insert (see later)
• All NoSQL models assume the open word assumption
Schema free
CAP theorem
CAP theorem
• RDBMS are basically CA
– In some cases it is possibile to have a CAP basedRDBMS
• I NOSQL systems are mainly CP or AP
– CP-> data are coherent but the dbms cannot works 24/7
– AP -> sometime data cannot be consistent
CAP theorem
CAP theorem
https://www.mysoftkey.com/architecture/understanding-of-cap-theorem/
• Basic Availability: fulfill request, even in partial consistency.
• Soft State: abandon the consistency requirements of the ACID model
pretty much completely
• Eventual Consistency: at some point in the future, data will
converge to a consistent state; delayed consistency, as opposed to immediate consistency of the ACID properties.
– purely a liveness guarantee (reads eventually return the requested value); but
– does not make safety guarantees, i.e.,
– an eventually consistent system can return any value before it converges
BASE principle
• Key-Value Stores
• Column Family Stores
• Document Databases
• Graph Databases
• RDF databases as well as Tuple stores
Modelli NoSQL
• Dynamo, Voldemort, redis, riak...
– DeCandia et al. "Dynamo: Amazon’s Highly Available Key-value Store", 2007
• Key-Value sono tabelle di hash dove la chiavepunta a un particolare valore
• Il mapping chiave-valore è supportato damecchanismi di hash per massimizzare le performance
Key value
• BigTable, Cassandra, HBase,...
– Chang et al. "Bigtable: A Distributed Storage System for Structured Data", 2006
• La chiave punta colonne multiple
Wide column
• CouchDB, MongoDB,...
• I documenti sono indirizzati nel db tramite una chiaveunica
• Ricerca nei documenti
Document Based
• Neo4J, FlockDB, GraphBase, InfoGrip, ...
• Graph Databases sono costruiti da nodi e relazioni fra nodi (archi).
• I nodi hanno proprietà
– Nodes rappresentano entità (e.g. "Bob" or "Alice").
– Proprietà sono informazioni pertinenti ai nodi (e. g. età:18).
• I graph DBs non scalano bene
Graph store
Comparison
Comparazione
Volume and complexity
• In all data models connect data is a key issue with a great impact wrt perfomance/analysis
Data is singular or plural?
Data Model
Relational Document based
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location:
[45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value : 330000, … }
]
}
Relations are included in data
ID Name Surname DateofBirth
1 Tom Hanks …
Model Comparison
Id Title Director
1 The Da Vinci Code 2
2 The Green Mile 3
3 That thing you do 1
..
Movie Actor
1 1
2 1
3 1
{ “Name":“Tom", “Surname":”Hanks”, “Works_on”: [
{“Title”:”The Da Vinci Code”, “role”:”Actor”},
{“Title”: “That thing you do”, “role”:[”Actor”,”Director”]}
]}