switching from the relational to the graph model
DESCRIPTION
One of the main resistences of RDBMS users to pass to a NoSQL product are related to the complexity of the model: Ok, NoSQL products are super for BigData and BigScale but what about the model?TRANSCRIPT
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1www.orientechnologies.com
Luca Garulli – Founder and CEO @NuvolaBase LtdAuthor of OrientDB Doc/Graph DB
Oct 6th 2012 in Barcelona
Switching from the Relational to the Graph model
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2
One of the main resistences ofRDBMS users to pass to a NoSQL product
are related to thecomplexity of the model:
Ok, NoSQL products are super forBigData and BigScale
but...
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3
...but what about the model?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 4
What is the NoSQL answer about managing complex domains?
Key-Value storesColumn-Based
Document databaseGraph database
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5
CAUTION!This presentation will not use a
social like domain withthe classic paradigm of
friend-of-friendN
where the graph databasesare already widely used...
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6
...But rather we will explore howto think «graphically» with one of the
most common domains in theenterprise world:
The old-classic CRM* domain
* today in 99% of the cases a RDBMS is used
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7
Every developer knowsthe Relational Model,but who knows the
Graph one?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8
Back to school:Graph Theory crash course
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 9
Basic Graph
LucaLucaNoSQL Matters
conference
NoSQL Matters
conference
Likes
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10
Property Graph Model*
Lucaname: Luca
surname: Garullicompany: NuvolaBase
Lucaname: Luca
surname: Garullicompany: NuvolaBase
NoSQLMatters
conferenceeditions: [Cologne, Barcelona]
NoSQLMatters
conferenceeditions: [Cologne, Barcelona]
Likes
since: 2012
Vertices and Edges can have propertiesVertices and Edges can have propertiesVertices and Edges can have properties
Vertices are directed
* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11
Property Graph Model
LucaLucaNoSQL Matters
conference
NoSQL Matters
conference
Likes
since: 2012
Speakstitle: «Switching...»
abstract: «This talk presents...»
An Edge connects 2 vertices: use multiple
vertices to represents 1-N and N-M relationships
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 12
Property Graph Model
Likes
KatjaKatja
LucaLuca
Organizes
FriendOfNoSQLMatters
conference
NoSQLMatters
conference
PerePereFriendOf
Joins
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13
Compliments, this is your diploma in«Graph Theory»
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14
Now go backto our domain:
the CRM
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15
Domain: minimal CRM
CustomerCustomer AddressAddress
OrderOrder StockStock
Registry system
Order system
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16
StockStock
Registry system
Domain: minimal CRM
OrderOrder
Order system
CustomerCustomer AddressAddress
How doesRelational DBMS
manage relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17
Relational World: 1-1 Relationships
JOIN Customer.Address -> Address.Id
Customer
Id Name Address
10 Luca 34
11 Katja 44
34 Sylvia 54
56 Mark 66
88 Steve 68
Address
Id Location
34 Rome, London
44 Cologne
54 Rome
66 New Mexico
68 Palo Alto
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18
Relational World: 1-N Relationships
Inverse JOIN Address.Customer -> Customer.Id
Customer
Id Name
10 Luca
11 Katja
34 Sylvia
56 Mark
88 Steve
Address
Id Customer Location
24 10 Rome
33 10 London
44 34 Rome
66 11 Cologne
68 88 Palo Alto
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19
Relational World: N-M Relationships
Additional table with 2 JOINs(1) CustomerAddress.Id -> Customer.Id and(2) CustomerAddress.Address -> Address.Id
Customer
Id Name
10 Luca
11 Katja
34 Sylvia
56 Mark
88 Steve
Address
Id Location
24 Rome
33 London
44 Rome
66 Cologne
68 Palo Alto
CustomerAddress
Id Address
10 24
10 33
34 24
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20
Relational World: N-M Relationships
Additional table with 2 JOINs(1) CustomerAddress.Id -> Customer.Id and(2) CustomerAddress.Address -> Address.Id
Customer
Id Name
10 Luca
11 Katja
34 Sylvia
56 Mark
88 Steve
Address
Id Location
24 Rome
33 London
44 Rome
66 Cologne
68 Palo Alto
CustomerAddress
Id Address
10 24
10 33
34 24
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21
What’s wrong with theRelational Model?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22
These are all JOINs executedeverytime you traverse a
relationship
The JOIN is the evil!Customer
Id Name
10 Luca
11 Katja
34 Sylvia
56 Mark
88 Steve
Address
Id Location
24 Rome
33 London
44 Rome
66 Cologne
68 Palo Alto
These are all JOINs executedeverytime you traverse a
relationship
These are all JOINs executedeverytime you traverse a
relationship
These are all JOINs executedeverytime you traverse a
relationship!
CustomerAddress
Id Address
10 24
10 33
34 24
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23
A JOIN means searching for a key inanother table
The first rule to improve performanceis indexing all the keys
Index speeds up searches but slows downinsert, updates and deletes
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24
So in the best case a JOIN is a lookupinto in an index
This is done per single join!
If you traverse hundreds of relationshipsyou’re executing hundreds of JOINs
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25
Index Lookupit is really that fast?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26
Index Lookup: how does it works?
A-Z
A-L M-Z
Think to an Address Book
where we have to find the Luca’s phone
number
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27
Index Lookup: how does it works?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
Index algorithms are all similar and based on
balanced trees
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28
Index Lookup: how does it works?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29
Index Lookup: how does it works?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
E-G
E-F G
H-L
H-J K-L
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30
Index Lookup: how does it works?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
E-G
E-F G
H-L
H-J K-L
Luca
Found! Each lookup takes X steps, where X
grows with the index size!
Found! Each lookup takes X steps, where X
grows with the index size!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31
An index lookup is executedfor each JOIN
Querying more tables can easilyproduce millions of JOINs/Lookups!
Here the rule: more entries= more lookup steps = slower JOIN
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32
Is there a better way tomanage relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33
“A graph database is anystorage systemthat provides
index-free adjacency”
- Marko Rodriguez
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34
How does GraphDB manageindex-free relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35
an Open Source (Apache 2)document-graph NoSQL dbms
supports: transactions, extended-SQL,Multi-Master replication, etc
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36
LucaLucaLives
OrientDB: traverse a relationship
out : [#14:54]label : ‘Customer’name : ‘Luca’
out : [#14:54]label : ‘Customer’name : ‘Luca’
out: [#13:35]in: [#13:100]Label : ‘Lives’
out: [#13:35]in: [#13:100]Label : ‘Lives’
RID = #13:35RID = #13:35
RID = #14:54RID = #14:54
RID = #13:100RID = #13:100
in: [#14:54]label = ‘Address’name = ‘Rome’
in: [#14:54]label = ‘Address’name = ‘Rome’
The Record ID (RID)is a Physical position
RomeRome
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37
GraphDB handles relationships as aphysical LINK to the record
assigned when the edge is created
on the other side
RDBMS computes therelationship every time you query a database
Is not that crazy?!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38
This means jumping from aO(log N) algorithm to a near O(1)
traversing cost is not more affectedby database size!
This is huge in the BigData age
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39
OrientDB in the Blueprints micro-benchmark,on common hw, with a hot cache,
traverses 29,6 Millionsof records in less than 5 seconds
about 6 Millions of nodes traversed per sec!
*unless you live in the Google’s server farm
Do not this at home with a RDBMS*!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40
Create the graph in SQL$luca> cd bin$luca> ./console.shOrientDB console v.1.2.0-SNAPSHOT (www.orientdb.org) Type 'help' to display all the commands supported.
orientdb> create vertex V set name = ‘Luca’, label = ‘Customer’Created vertex #13:35 in 0.03 secs
orientdb> create vertex V set name = ‘Rome’, label = ‘Address’Created vertex #13:100 in 0.02 secs
orientdb> create edge E from #13:35 to #13:100 set label = ‘Lives’Created edge #14:54 in 0.02 secs
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41
Create the graph in JavaOGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”);
ODocument luca = graph.createVertex();luca.field(“name", “Luca");luca.field(“label", “Customer");
ODocument rome = graph.createVertex();rome.field(“name", “Rome”);rome.field(“label", “Address”);
ODocument edge = graph.createEdge(luca, rome).field(“label”, “Lives”);edge.save();
graph.close();
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 42
Query the graph in SQLorientdb> select in[label=‘Lives’].out from V where label = ‘Address’ and name = ‘Rome’---+--------+--------------------+--------------------+--------------------+ #| REC ID |label |out |in |---+--------+--------------------+--------------------+--------------------+ 0| 13:35|Luca |[#14:54] | |---+--------+--------------------+--------------------+--------------------+1 item(s) found. Query executed in 0.007 sec(s).
orientdb> select * from V where label = ‘Address’ AND in[label=‘Lives’].size() > 0---+--------+--------------------+--------------------+--------------------+ #| REC ID |label |out |in |---+--------+--------------------+--------------------+--------------------+ 0| 13:100| Rome | |[#14:54] |---+--------+--------------------+--------------------+--------------------+1 item(s) found. Query executed in 0.007 sec(s).
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 43
Query the graph in JavaOGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”);
// GET ALL THE THE CUSTOMER FROM ROME, ITALYList<ODocument> result = graph.command( new OCommandSQL ( “select in[label=‘Lives’].out from V where label = ‘Address’ and name = ?”) ).execute( “Rome”);
for( ODocument v : result ) { System.out.println(“Result: “ + v.field(“label”) );}
---------------------------------------------------------------------------------------Result: Luca
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 44
Query vs traversal
Once you’ve a well connected databasein the form of a Super Graph you cancross records instead of query them!
All you need is some root verticeswhere to start to traverse
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 45
Query vs traversal
CustomersCustomers
LucaLuca JohnJohn SylviaSylvia
Order2332Order2332
Order8834Order8834
WhiteSoapWhiteSoap
StocksStocksSpecialCustomers
SpecialCustomers
This is aroot
vertex
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 46
Query the graph in SQLSupposing that the root node #30:0 links all theCustomer vertices
Get all the customers:
orientdb> select out.in from #30:0
Get all the customers who bought at least one ‘White Soap’ product:
orientdb> select * from ( select out.in from #30:0 ) where out.in.out[label=‘Bought’].in.name = ‘White Soap’
Customers#30:0
Customers#30:0
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 47
Demo time!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 48
NuvolaBase.com
The firstGraph Database
on the Cloud
always availablefew seconds to setup it
use it from app & mobile
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 49
«Graphs change the way of modelling data»
Luca Garulli
www.twitter.com/lgarulli
CEO atAuthor of
Document-Graph NoSQLOpen Source project Ltd, London UK