switching from the relational to the graph model · pdf fileauthor of orientdb doc/graph db...

49
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1 www.orientechnologies.com Luca Garulli Founder and CEO @NuvolaBase Ltd Author of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

Upload: donhi

Post on 31-Jan-2018

229 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1 www.orientechnologies.com

Luca Garulli – Founder and CEO @NuvolaBase Ltd Author of OrientDB Doc/Graph DB

Oct 6th 2012 in Barcelona

Switching from the Relational to the

Graph model

Page 2: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2

One of the main resistences of

RDBMS users to pass to a NoSQL product

are related to the

complexity of the model:

Ok, NoSQL products are super for

BigData and BigScale

but...

Page 5: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5

CAUTION! This presentation will not use a

social like domain with

the classic paradigm of

friend-of-friendN

where the graph databases

are already widely used...

Page 6: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6

...But rather we will explore how

to think «graphically» with one of the

most common domains in the

enterprise world:

The old-classic CRM* domain

* today in 99% of the cases a RDBMS is used

Page 10: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10

Property Graph Model*

Luca

name: Luca

surname: Garulli

company: NuvolaBase

NoSQL

Matters

conference

editions: [Cologne, Barcelona]

Likes

since: 2012

Vertices and Edges

can have properties

Vertices and Edges

can have properties

Vertices and Edges

can have properties

Vertices are directed

* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model

Page 11: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11

Property Graph Model

Luca

NoSQL

Matters

conference

An Edge connects 2 vertices:

use multiple vertices to

represents 1-N and N-M

relationships

Page 17: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17

Relational World: 1-1 Relationships

JOIN Customer.Address -> Address.Id

Customer

Id Name Address

10 Luca 34

11 Katja 44

34 Sylvia 54

56 Mark 66

88 Steve 68

Address

Id Location

34 Rome, London

44 Cologne

54 Rome

66 New Mexico

68 Palo Alto

Page 18: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18

Relational World: 1-N Relationships

Inverse JOIN Address.Customer -> Customer.Id

Customer

Id Name

10 Luca

11 Katja

34 Sylvia

56 Mark

88 Steve

Address

Id Customer Location

24 10 Rome

33 10 London

44 34 Rome

66 11 Cologne

68 88 Palo Alto

Page 19: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19

Relational World: N-M Relationships

Additional table with 2 JOINs

(1) CustomerAddress.Id -> Customer.Id and

(2) CustomerAddress.Address -> Address.Id

Customer

Id Name

10 Luca

11 Katja

34 Sylvia

56 Mark

88 Steve

Address

Id Location

24 Rome

33 London

44 Rome

66 Cologne

68 Palo Alto

CustomerAddress

Id Address

10 24

10 33

34 24

Page 20: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20

Relational World: N-M Relationships

Additional table with 2 JOINs

(1) CustomerAddress.Id -> Customer.Id and

(2) CustomerAddress.Address -> Address.Id

Customer

Id Name

10 Luca

11 Katja

34 Sylvia

56 Mark

88 Steve

Address

Id Location

24 Rome

33 London

44 Rome

66 Cologne

68 Palo Alto

CustomerAddress

Id Address

10 24

10 33

34 24

Page 22: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22

These are all JOINs executed

everytime you traverse a

relationship

The JOIN is the evil!

Customer

Id Name

10 Luca

11 Katja

34 Sylvia

56 Mark

88 Steve

Address

Id Location

24 Rome

33 London

44 Rome

66 Cologne

68 Palo Alto

These are all JOINs executed

everytime you traverse a

relationship

These are all JOINs executed

everytime you traverse a

relationship

These are all JOINs executed

everytime you traverse a

relationship!

CustomerAddress

Id Address

10 24

10 33

34 24

Page 23: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23

A JOIN means searching for a key in

another table

The first rule to improve performance

is indexing all the keys

Index speeds up searches but slows down

insert, updates and deletes

Page 24: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24

So in the best case a JOIN is a lookup

into in an index

This is done per single join!

If you traverse hundreds of relationships

you’re executing hundreds of JOINs

Page 27: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27

Index Lookup: how does it works?

A-Z

A-L M-Z

A-L

A-D E-L

M-Z

M-R S-Z

Index algorithms are all

similar and based on

balanced trees

Page 30: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30

Index Lookup: how does it works?

A-Z

A-L M-Z

A-L

A-D E-L

M-Z

M-R S-Z

A-D

A-B C-D

E-L

E-G H-L

E-G

E-F G

H-L

H-J K-L

Luca

Found!

Each lookup takes

X steps, where X

grows with the

index size!

Page 31: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31

An index lookup is executed

for each JOIN

Querying more tables can easily

produce millions of JOINs/Lookups!

Here the rule: more entries

= more lookup steps = slower JOIN

Page 36: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36

Luca Lives

OrientDB: traverse a relationship

out : [#14:54]

label : ‘Customer’

name : ‘Luca’

out: [#13:35]

in: [#13:100]

Label : ‘Lives’

RID = #13:35

RID = #14:54

RID = #13:100

in: [#14:54]

label = ‘Address’

name = ‘Rome’

The Record ID (RID)

is a Physical position

Rome

Page 37: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37

GraphDB handles relationships as a

physical LINK to the record

assigned when the edge is created

on the other side

RDBMS computes the

relationship every time you query a database

Is not that crazy?!

Page 38: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38

This means jumping from a

O(log N) algorithm to a near O(1)

traversing cost is not more affected

by database size!

This is huge in the BigData age

Page 39: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39

OrientDB in the Blueprints micro-benchmark,

on common hw, with a hot cache,

traverses 29,6 Millions

of records in less than 5 seconds

about 6 Millions of nodes traversed per sec!

*unless you live in the Google’s server farm

Do not this at home with a

RDBMS*!

Page 40: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40

Create the graph in SQL

$luca> cd bin

$luca> ./console.sh

OrientDB console v.1.2.0-SNAPSHOT (www.orientdb.org)

Type 'help' to display all the commands supported.

orientdb> create vertex V set name = ‘Luca’, label = ‘Customer’

Created vertex #13:35 in 0.03 secs

orientdb> create vertex V set name = ‘Rome’, label = ‘Address’

Created vertex #13:100 in 0.02 secs

orientdb> create edge E from #13:35 to #13:100 set label = ‘Lives’

Created edge #14:54 in 0.02 secs

Page 41: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41

Create the graph in Java

OGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”);

ODocument luca = graph.createVertex();

luca.field(“name", “Luca");

luca.field(“label", “Customer");

ODocument rome = graph.createVertex();

rome.field(“name", “Rome”);

rome.field(“label", “Address”);

ODocument edge = graph.createEdge(luca, rome).field(“label”, “Lives”);

edge.save();

graph.close();

Page 42: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 42

Query the graph in SQL

orientdb> select in[label=‘Lives’].out from V where label = ‘Address’

and name = ‘Rome’

---+--------+--------------------+--------------------+--------------------+

#| REC ID |label |out |in |

---+--------+--------------------+--------------------+--------------------+

0| 13:35|Luca |[#14:54] | |

---+--------+--------------------+--------------------+--------------------+

1 item(s) found. Query executed in 0.007 sec(s).

orientdb> select * from V where label = ‘Address’ AND

in[label=‘Lives’].size() > 0

---+--------+--------------------+--------------------+--------------------+

#| REC ID |label |out |in |

---+--------+--------------------+--------------------+--------------------+

0| 13:100| Rome | |[#14:54] |

---+--------+--------------------+--------------------+--------------------+

1 item(s) found. Query executed in 0.007 sec(s).

Page 43: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 43

Query the graph in Java

OGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”);

// GET ALL THE THE CUSTOMER FROM ROME, ITALY

List<ODocument> result = graph.command( new OCommandSQL (

“select in[label=‘Lives’].out from V where label = ‘Address’

and name = ?”)

).execute( “Rome”);

for( ODocument v : result ) {

System.out.println(“Result: “ + v.field(“label”) );

}

---------------------------------------------------------------------------------------

Result: Luca

Page 44: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 44

Query vs traversal

Once you’ve a well connected database

in the form of a Super Graph you can

cross records instead of query them!

All you need is some root vertices

where to start to traverse

Page 46: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 46

Query the graph in SQL

Supposing that the root node #30:0 links all the

Customer vertices

Get all the customers:

orientdb> select out.in from #30:0

Get all the customers who bought at least one ‘White Soap’ product:

orientdb> select * from (

select out.in from #30:0

) where out.in.out[label=‘Bought’].in.name = ‘White Soap’

Customers

#30:0

Page 49: Switching from the Relational to the Graph model · PDF fileAuthor of OrientDB Doc/Graph DB Oct 6th 2012 in Barcelona Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 49

«Graphs change the way of modelling data»

Luca Garulli

www.twitter.com/lgarulli

CEO at Author of

Document-Graph NoSQL Open Source project Ltd, London UK