switching from the relational to the graph model · pdf fileauthor of orientdb doc/graph db...
TRANSCRIPT
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1 www.orientechnologies.com
Luca Garulli – Founder and CEO @NuvolaBase Ltd Author of OrientDB Doc/Graph DB
Oct 6th 2012 in Barcelona
Switching from the Relational to the
Graph model
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2
One of the main resistences of
RDBMS users to pass to a NoSQL product
are related to the
complexity of the model:
Ok, NoSQL products are super for
BigData and BigScale
but...
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3
...but what about the model?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 4
What is the NoSQL answer
about managing complex domains?
Key-Value stores
Column-Based
Document database
Graph database
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5
CAUTION! This presentation will not use a
social like domain with
the classic paradigm of
friend-of-friendN
where the graph databases
are already widely used...
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6
...But rather we will explore how
to think «graphically» with one of the
most common domains in the
enterprise world:
The old-classic CRM* domain
* today in 99% of the cases a RDBMS is used
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7
Every developer knows
the Relational Model,
but who knows the
Graph one?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8
Back to school:
Graph Theory crash course
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 9
Basic Graph
Luca
NoSQL
Matters
conference
Likes
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10
Property Graph Model*
Luca
name: Luca
surname: Garulli
company: NuvolaBase
NoSQL
Matters
conference
editions: [Cologne, Barcelona]
Likes
since: 2012
Vertices and Edges
can have properties
Vertices and Edges
can have properties
Vertices and Edges
can have properties
Vertices are directed
* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11
Property Graph Model
Luca
NoSQL
Matters
conference
An Edge connects 2 vertices:
use multiple vertices to
represents 1-N and N-M
relationships
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 12
Property Graph Model
Likes
Katja
Luca
FriendOf
NoSQL
Matters
conference
Pere
Joins
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13
Compliments, this is your diploma in
«Graph Theory»
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14
Now go back
to our domain:
the CRM
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15
Domain: minimal CRM
Customer Address
Order Stock
Registry system
Order system
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16
Stock
Registry system
Domain: minimal CRM
Order
Order system
Customer Address
How does
Relational DBMS
manage relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17
Relational World: 1-1 Relationships
JOIN Customer.Address -> Address.Id
Customer
Id Name Address
10 Luca 34
11 Katja 44
34 Sylvia 54
56 Mark 66
88 Steve 68
Address
Id Location
34 Rome, London
44 Cologne
54 Rome
66 New Mexico
68 Palo Alto
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18
Relational World: 1-N Relationships
Inverse JOIN Address.Customer -> Customer.Id
Customer
Id Name
10 Luca
11 Katja
34 Sylvia
56 Mark
88 Steve
Address
Id Customer Location
24 10 Rome
33 10 London
44 34 Rome
66 11 Cologne
68 88 Palo Alto
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19
Relational World: N-M Relationships
Additional table with 2 JOINs
(1) CustomerAddress.Id -> Customer.Id and
(2) CustomerAddress.Address -> Address.Id
Customer
Id Name
10 Luca
11 Katja
34 Sylvia
56 Mark
88 Steve
Address
Id Location
24 Rome
33 London
44 Rome
66 Cologne
68 Palo Alto
CustomerAddress
Id Address
10 24
10 33
34 24
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20
Relational World: N-M Relationships
Additional table with 2 JOINs
(1) CustomerAddress.Id -> Customer.Id and
(2) CustomerAddress.Address -> Address.Id
Customer
Id Name
10 Luca
11 Katja
34 Sylvia
56 Mark
88 Steve
Address
Id Location
24 Rome
33 London
44 Rome
66 Cologne
68 Palo Alto
CustomerAddress
Id Address
10 24
10 33
34 24
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21
What’s wrong with the
Relational Model?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22
These are all JOINs executed
everytime you traverse a
relationship
The JOIN is the evil!
Customer
Id Name
10 Luca
11 Katja
34 Sylvia
56 Mark
88 Steve
Address
Id Location
24 Rome
33 London
44 Rome
66 Cologne
68 Palo Alto
These are all JOINs executed
everytime you traverse a
relationship
These are all JOINs executed
everytime you traverse a
relationship
These are all JOINs executed
everytime you traverse a
relationship!
CustomerAddress
Id Address
10 24
10 33
34 24
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23
A JOIN means searching for a key in
another table
The first rule to improve performance
is indexing all the keys
Index speeds up searches but slows down
insert, updates and deletes
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24
So in the best case a JOIN is a lookup
into in an index
This is done per single join!
If you traverse hundreds of relationships
you’re executing hundreds of JOINs
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25
Index Lookup
it is really that fast?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26
Index Lookup: how does it works?
A-Z
A-L M-Z
Think to an
Address Book
where we have to find
the Luca’s phone number
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27
Index Lookup: how does it works?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
Index algorithms are all
similar and based on
balanced trees
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28
Index Lookup: how does it works?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29
Index Lookup: how does it works?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
E-G
E-F G
H-L
H-J K-L
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30
Index Lookup: how does it works?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
E-G
E-F G
H-L
H-J K-L
Luca
Found!
Each lookup takes
X steps, where X
grows with the
index size!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31
An index lookup is executed
for each JOIN
Querying more tables can easily
produce millions of JOINs/Lookups!
Here the rule: more entries
= more lookup steps = slower JOIN
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32
Is there a better way to
manage relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33
“A graph database is any
storage system
that provides
index-free adjacency”
- Marko Rodriguez
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34
How does GraphDB manage
index-free relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35
an Open Source (Apache 2)
document-graph NoSQL dbms
supports: transactions, extended-SQL, Multi-Master replication, etc
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36
Luca Lives
OrientDB: traverse a relationship
out : [#14:54]
label : ‘Customer’
name : ‘Luca’
out: [#13:35]
in: [#13:100]
Label : ‘Lives’
RID = #13:35
RID = #14:54
RID = #13:100
in: [#14:54]
label = ‘Address’
name = ‘Rome’
The Record ID (RID)
is a Physical position
Rome
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37
GraphDB handles relationships as a
physical LINK to the record
assigned when the edge is created
on the other side
RDBMS computes the
relationship every time you query a database
Is not that crazy?!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38
This means jumping from a
O(log N) algorithm to a near O(1)
traversing cost is not more affected
by database size!
This is huge in the BigData age
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39
OrientDB in the Blueprints micro-benchmark,
on common hw, with a hot cache,
traverses 29,6 Millions
of records in less than 5 seconds
about 6 Millions of nodes traversed per sec!
*unless you live in the Google’s server farm
Do not this at home with a
RDBMS*!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40
Create the graph in SQL
$luca> cd bin
$luca> ./console.sh
OrientDB console v.1.2.0-SNAPSHOT (www.orientdb.org)
Type 'help' to display all the commands supported.
orientdb> create vertex V set name = ‘Luca’, label = ‘Customer’
Created vertex #13:35 in 0.03 secs
orientdb> create vertex V set name = ‘Rome’, label = ‘Address’
Created vertex #13:100 in 0.02 secs
orientdb> create edge E from #13:35 to #13:100 set label = ‘Lives’
Created edge #14:54 in 0.02 secs
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41
Create the graph in Java
OGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”);
ODocument luca = graph.createVertex();
luca.field(“name", “Luca");
luca.field(“label", “Customer");
ODocument rome = graph.createVertex();
rome.field(“name", “Rome”);
rome.field(“label", “Address”);
ODocument edge = graph.createEdge(luca, rome).field(“label”, “Lives”);
edge.save();
graph.close();
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 42
Query the graph in SQL
orientdb> select in[label=‘Lives’].out from V where label = ‘Address’
and name = ‘Rome’
---+--------+--------------------+--------------------+--------------------+
#| REC ID |label |out |in |
---+--------+--------------------+--------------------+--------------------+
0| 13:35|Luca |[#14:54] | |
---+--------+--------------------+--------------------+--------------------+
1 item(s) found. Query executed in 0.007 sec(s).
orientdb> select * from V where label = ‘Address’ AND
in[label=‘Lives’].size() > 0
---+--------+--------------------+--------------------+--------------------+
#| REC ID |label |out |in |
---+--------+--------------------+--------------------+--------------------+
0| 13:100| Rome | |[#14:54] |
---+--------+--------------------+--------------------+--------------------+
1 item(s) found. Query executed in 0.007 sec(s).
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 43
Query the graph in Java
OGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”);
// GET ALL THE THE CUSTOMER FROM ROME, ITALY
List<ODocument> result = graph.command( new OCommandSQL (
“select in[label=‘Lives’].out from V where label = ‘Address’
and name = ?”)
).execute( “Rome”);
for( ODocument v : result ) {
System.out.println(“Result: “ + v.field(“label”) );
}
---------------------------------------------------------------------------------------
Result: Luca
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 44
Query vs traversal
Once you’ve a well connected database
in the form of a Super Graph you can
cross records instead of query them!
All you need is some root vertices
where to start to traverse
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 45
Query vs traversal
Customers
Luca John Sylvia
Order
2332
Order
8834
White
Soap
Stocks Special
Customers
This is a
root vertex
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 46
Query the graph in SQL
Supposing that the root node #30:0 links all the
Customer vertices
Get all the customers:
orientdb> select out.in from #30:0
Get all the customers who bought at least one ‘White Soap’ product:
orientdb> select * from (
select out.in from #30:0
) where out.in.out[label=‘Bought’].in.name = ‘White Soap’
Customers
#30:0
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 47
Demo time!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 48
NuvolaBase.com
The first
Graph Database
on the Cloud
always available
few seconds to setup it
use it from app & mobile
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 49
«Graphs change the way of modelling data»
Luca Garulli
www.twitter.com/lgarulli
CEO at Author of
Document-Graph NoSQL Open Source project Ltd, London UK