switching from the relational to the graph model

Post on 10-May-2015

3.793 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

One of the main resistences of RDBMS users to pass to a NoSQL product are related to the complexity of the model: Ok, NoSQL products are super for BigData and BigScale but what about the model?

TRANSCRIPT

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1www.orientechnologies.com

Luca Garulli – Founder and CEO @NuvolaBase LtdAuthor of OrientDB Doc/Graph DB

Oct 6th 2012 in Barcelona

Switching from the Relational to the Graph model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2

One of the main resistences ofRDBMS users to pass to a NoSQL product

are related to thecomplexity of the model:

Ok, NoSQL products are super forBigData and BigScale

but...

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3

...but what about the model?

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 4

What is the NoSQL answer about managing complex domains?

Key-Value storesColumn-Based

Document databaseGraph database

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5

CAUTION!This presentation will not use a

social like domain withthe classic paradigm of

friend-of-friendN

where the graph databasesare already widely used...

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6

...But rather we will explore howto think «graphically» with one of the

most common domains in theenterprise world:

The old-classic CRM* domain

* today in 99% of the cases a RDBMS is used

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7

Every developer knowsthe Relational Model,but who knows the

Graph one?

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8

Back to school:Graph Theory crash course

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 9

Basic Graph

LucaLucaNoSQL Matters

conference

NoSQL Matters

conference

Likes

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10

Property Graph Model*

Lucaname: Luca

surname: Garullicompany: NuvolaBase

Lucaname: Luca

surname: Garullicompany: NuvolaBase

NoSQLMatters

conferenceeditions: [Cologne, Barcelona]

NoSQLMatters

conferenceeditions: [Cologne, Barcelona]

Likes

since: 2012

Vertices and Edges can have propertiesVertices and Edges can have propertiesVertices and Edges can have properties

Vertices are directed

* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11

Property Graph Model

LucaLucaNoSQL Matters

conference

NoSQL Matters

conference

Likes

since: 2012

Speakstitle: «Switching...»

abstract: «This talk presents...»

An Edge connects 2 vertices: use multiple

vertices to represents 1-N and N-M relationships

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 12

Property Graph Model

Likes

KatjaKatja

LucaLuca

Organizes

FriendOfNoSQLMatters

conference

NoSQLMatters

conference

PerePereFriendOf

Joins

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13

Compliments, this is your diploma in«Graph Theory»

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14

Now go backto our domain:

the CRM

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15

Domain: minimal CRM

CustomerCustomer AddressAddress

OrderOrder StockStock

Registry system

Order system

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16

StockStock

Registry system

Domain: minimal CRM

OrderOrder

Order system

CustomerCustomer AddressAddress

How doesRelational DBMS

manage relationships?

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17

Relational World: 1-1 Relationships

JOIN Customer.Address -> Address.Id

Customer

Id Name Address

10 Luca 34

11 Katja 44

34 Sylvia 54

56 Mark 66

88 Steve 68

Address

Id Location

34 Rome, London

44 Cologne

54 Rome

66 New Mexico

68 Palo Alto

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18

Relational World: 1-N Relationships

Inverse JOIN Address.Customer -> Customer.Id

Customer

Id Name

10 Luca

11 Katja

34 Sylvia

56 Mark

88 Steve

Address

Id Customer Location

24 10 Rome

33 10 London

44 34 Rome

66 11 Cologne

68 88 Palo Alto

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19

Relational World: N-M Relationships

Additional table with 2 JOINs(1) CustomerAddress.Id -> Customer.Id and(2) CustomerAddress.Address -> Address.Id

Customer

Id Name

10 Luca

11 Katja

34 Sylvia

56 Mark

88 Steve

Address

Id Location

24 Rome

33 London

44 Rome

66 Cologne

68 Palo Alto

CustomerAddress

Id Address

10 24

10 33

34 24

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20

Relational World: N-M Relationships

Additional table with 2 JOINs(1) CustomerAddress.Id -> Customer.Id and(2) CustomerAddress.Address -> Address.Id

Customer

Id Name

10 Luca

11 Katja

34 Sylvia

56 Mark

88 Steve

Address

Id Location

24 Rome

33 London

44 Rome

66 Cologne

68 Palo Alto

CustomerAddress

Id Address

10 24

10 33

34 24

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21

What’s wrong with theRelational Model?

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22

These are all JOINs executedeverytime you traverse a

relationship

The JOIN is the evil!Customer

Id Name

10 Luca

11 Katja

34 Sylvia

56 Mark

88 Steve

Address

Id Location

24 Rome

33 London

44 Rome

66 Cologne

68 Palo Alto

These are all JOINs executedeverytime you traverse a

relationship

These are all JOINs executedeverytime you traverse a

relationship

These are all JOINs executedeverytime you traverse a

relationship!

CustomerAddress

Id Address

10 24

10 33

34 24

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23

A JOIN means searching for a key inanother table

The first rule to improve performanceis indexing all the keys

Index speeds up searches but slows downinsert, updates and deletes

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24

So in the best case a JOIN is a lookupinto in an index

This is done per single join!

If you traverse hundreds of relationshipsyou’re executing hundreds of JOINs

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25

Index Lookupit is really that fast?

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26

Index Lookup: how does it works?

A-Z

A-L M-Z

Think to an Address Book

where we have to find the Luca’s phone

number

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27

Index Lookup: how does it works?

A-Z

A-L M-Z

A-L

A-D E-L

M-Z

M-R S-Z

Index algorithms are all similar and based on

balanced trees

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28

Index Lookup: how does it works?

A-Z

A-L M-Z

A-L

A-D E-L

M-Z

M-R S-Z

A-D

A-B C-D

E-L

E-G H-L

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29

Index Lookup: how does it works?

A-Z

A-L M-Z

A-L

A-D E-L

M-Z

M-R S-Z

A-D

A-B C-D

E-L

E-G H-L

E-G

E-F G

H-L

H-J K-L

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30

Index Lookup: how does it works?

A-Z

A-L M-Z

A-L

A-D E-L

M-Z

M-R S-Z

A-D

A-B C-D

E-L

E-G H-L

E-G

E-F G

H-L

H-J K-L

Luca

Found! Each lookup takes X steps, where X

grows with the index size!

Found! Each lookup takes X steps, where X

grows with the index size!

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31

An index lookup is executedfor each JOIN

Querying more tables can easilyproduce millions of JOINs/Lookups!

Here the rule: more entries= more lookup steps = slower JOIN

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32

Is there a better way tomanage relationships?

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33

“A graph database is anystorage systemthat provides

index-free adjacency”

- Marko Rodriguez

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34

How does GraphDB manageindex-free relationships?

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35

an Open Source (Apache 2)document-graph NoSQL dbms

supports: transactions, extended-SQL,Multi-Master replication, etc

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36

LucaLucaLives

OrientDB: traverse a relationship

out : [#14:54]label : ‘Customer’name : ‘Luca’

out : [#14:54]label : ‘Customer’name : ‘Luca’

out: [#13:35]in: [#13:100]Label : ‘Lives’

out: [#13:35]in: [#13:100]Label : ‘Lives’

RID = #13:35RID = #13:35

RID = #14:54RID = #14:54

RID = #13:100RID = #13:100

in: [#14:54]label = ‘Address’name = ‘Rome’

in: [#14:54]label = ‘Address’name = ‘Rome’

The Record ID (RID)is a Physical position

RomeRome

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37

GraphDB handles relationships as aphysical LINK to the record

assigned when the edge is created

on the other side

RDBMS computes therelationship every time you query a database

Is not that crazy?!

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38

This means jumping from aO(log N) algorithm to a near O(1)

traversing cost is not more affectedby database size!

This is huge in the BigData age

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39

OrientDB in the Blueprints micro-benchmark,on common hw, with a hot cache,

traverses 29,6 Millionsof records in less than 5 seconds

about 6 Millions of nodes traversed per sec!

*unless you live in the Google’s server farm

Do not this at home with a RDBMS*!

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40

Create the graph in SQL$luca> cd bin$luca> ./console.shOrientDB console v.1.2.0-SNAPSHOT (www.orientdb.org) Type 'help' to display all the commands supported.

orientdb> create vertex V set name = ‘Luca’, label = ‘Customer’Created vertex #13:35 in 0.03 secs

orientdb> create vertex V set name = ‘Rome’, label = ‘Address’Created vertex #13:100 in 0.02 secs

orientdb> create edge E from #13:35 to #13:100 set label = ‘Lives’Created edge #14:54 in 0.02 secs

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41

Create the graph in JavaOGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”);

ODocument luca = graph.createVertex();luca.field(“name", “Luca");luca.field(“label", “Customer");

ODocument rome = graph.createVertex();rome.field(“name", “Rome”);rome.field(“label", “Address”);

ODocument edge = graph.createEdge(luca, rome).field(“label”, “Lives”);edge.save();

graph.close();

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 42

Query the graph in SQLorientdb> select in[label=‘Lives’].out from V where label = ‘Address’ and name = ‘Rome’---+--------+--------------------+--------------------+--------------------+  #| REC ID |label             |out |in |---+--------+--------------------+--------------------+--------------------+  0|   13:35|Luca                |[#14:54]            |            |---+--------+--------------------+--------------------+--------------------+1 item(s) found. Query executed in 0.007 sec(s).

orientdb> select * from V where label = ‘Address’ AND in[label=‘Lives’].size() > 0---+--------+--------------------+--------------------+--------------------+  #| REC ID |label             |out |in |---+--------+--------------------+--------------------+--------------------+  0|  13:100| Rome |            |[#14:54] |---+--------+--------------------+--------------------+--------------------+1 item(s) found. Query executed in 0.007 sec(s).

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 43

Query the graph in JavaOGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”);

// GET ALL THE THE CUSTOMER FROM ROME, ITALYList<ODocument> result = graph.command( new OCommandSQL ( “select in[label=‘Lives’].out from V where label = ‘Address’ and name = ?”) ).execute( “Rome”);

for( ODocument v : result ) { System.out.println(“Result: “ + v.field(“label”) );}

---------------------------------------------------------------------------------------Result: Luca

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 44

Query vs traversal

Once you’ve a well connected databasein the form of a Super Graph you cancross records instead of query them!

All you need is some root verticeswhere to start to traverse

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 45

Query vs traversal

CustomersCustomers

LucaLuca JohnJohn SylviaSylvia

Order2332Order2332

Order8834Order8834

WhiteSoapWhiteSoap

StocksStocksSpecialCustomers

SpecialCustomers

This is aroot

vertex

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 46

Query the graph in SQLSupposing that the root node #30:0 links all theCustomer vertices

Get all the customers:

orientdb> select out.in from #30:0

Get all the customers who bought at least one ‘White Soap’ product:

orientdb> select * from ( select out.in from #30:0 ) where out.in.out[label=‘Bought’].in.name = ‘White Soap’

Customers#30:0

Customers#30:0

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 47

Demo time!

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 48

NuvolaBase.com

The firstGraph Database

on the Cloud

always availablefew seconds to setup it

use it from app & mobile

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 49

«Graphs change the way of modelling data»

Luca Garulli

www.twitter.com/lgarulli

CEO atAuthor of

Document-Graph NoSQLOpen Source project Ltd, London UK

top related