how graph databases started the multi-model revolution · how graph databases started the...

65
How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015

Upload: doliem

Post on 23-Jul-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

How graph databases started the multi-model revolutionLuca GarulliAuthor and CEO @OrientDB

QCon Sao Paulo - March 26, 2015

“90% of the data in the world today has been created in the last two years alone.”

- IBM

Welcome to Big Data

Just Data

Order #134 (Order) Luca

(Provider)

Commodore Amiga 1200

(Product)

Jill (Customer)

Monitor 40” (Product)

Mouse (Product)

Bruno (Provider)

Just Data

Order #134 (Order) Luca

(Provider)

Commodore Amiga 1200

(Product)

Jill (Customer)

Monitor 40” (Product)

Mouse (Product)

Bruno (Provider)

Data by itself has little value, it’s the relationship between data that gives it

incredible value

Relationships give data “meaning”

Order #134 (Order) Luca

(Provider)

Commodore Amiga 1200

(Product)

(Sells)

Jill (Customer)

(Has)(Makes)

Monitor 40” (Product)

(Sells)(Has)

Mouse (Product)

Bruno (Provider)

(Sells)

(Has)

Top NoSQL categories

Key/Value DatabasesDocument Databases

Graph DatabasesColumn Databases

Top NoSQL categories

Key/Value Databases

Document Databases Graph Databases

Column Databases

Why do most NoSQL products avoid

managing relationships?

Joins is the Evil

ID Name

10 John

11 John

24 Mike

28 Mike

ID Address

10 24

10 33

32 44

ID Location

24 Milan

33 London

18 Paris

18 Madrid

44 Moscow

Customer CustomerAddress Address

Is this familiar?

Why  is  the  join  so  slow?

A-­‐Z

A-­‐L M-­‐Z

Imagine  an    Address  Book  

where  we  want  to  find  Luca’s  phone  number

Index Lookup: how does it work?

A-­‐Z

A-­‐L M-­‐Z

A-­‐L

A-­‐D E-­‐L

M-­‐Z

M-­‐R S-­‐Z

Index  algorithms  are  all  similar  and  based  on  

balanced  trees

Index Lookup: how does it work?

A-­‐Z

A-­‐L M-­‐Z

A-­‐L

A-­‐D E-­‐L

M-­‐Z

M-­‐R S-­‐Z

A-­‐D

A-­‐B C-­‐D

E-­‐L

E-­‐G H-­‐L

Index Lookup: how does it work?

A-­‐Z

A-­‐L M-­‐Z

A-­‐L

A-­‐D E-­‐L

M-­‐Z

M-­‐R S-­‐Z

A-­‐D

A-­‐B C-­‐D

E-­‐L

E-­‐G H-­‐L

E-­‐G

E-­‐F G

H-­‐L

H-­‐J K-­‐L

Index Lookup: how does it work?

Index Lookup: how does it work?

A-­‐Z

A-­‐L M-­‐Z

A-­‐L

A-­‐D E-­‐L

M-­‐Z

M-­‐R S-­‐Z

A-­‐D

A-­‐B C-­‐D

E-­‐L

E-­‐G H-­‐L

E-­‐G

E-­‐F G

H-­‐L

H-­‐J K-­‐L

Luca

Found!    This  lookup  took  5  steps.  With  millions  of  indexed  records,  the  tree  depth  could  be  1000’s  of  levels!

Joins Kill Performance

ID Name

10 John

11 John

24 Mike

28 Mike

ID Address

10 24

10 33

32 44

ID Location

24 Milan

33 London

18 Paris

18 Madrid

44 Moscow

Customer CustomerAddress AddressJoins are executed every time

you cross relationships

Querying million of records joining 3-4 tables could

generate billions of combinations

This is why the database query performance

suffers as the database increases in size

O(Log N)

PERFORMANCE

DATABASE SIZE

RDBMS performance on traversal

In a world that’s becoming more connected, we need a better way to store data and manage relationships

Read: Data is important, but relationships are even more fundamental today

“A graph database is any storage system that provides

index-free adjacency”

- Marko Rodriguez (author of TinkerPop Blueprints)

Every developer knows the Relational Model, but who knows the

Graph one?

Back to school: Graph Theory crash course

Basic Graph

Luca Sao  PauloVisited

Vertices  and  Edges  can  have  properties

Vertices  are  directed

*  https://github.com/tinkerpop/blueprints/wiki/Property-­‐Graph-­‐Model

Property Graph Model*

Sao  Paulo  

people:  12,000,000

Luca  company:  

OrientTechnologies

Vertices  and  Edges  can  have  properties

Vertices  and  Edges  can  have  properties

Visited  on:  2015

Luca Sao  Paulo

Visited  on:  2015

An  Edge  connects  only  2  vertices    

Use  multiple  edges  to  represent  1-­‐N  and  N-­‐M  relationships

Worked  on:  2015

1-N and N-M Relationships

Congrats! This is your diploma in «Graph Theory»

The Graph theory is so simple,

yet so powerful

How does a true* Graph Database

manage relationships?

*a “Graph” layer on top of a DBMS doesn’t qualify as a true GraphDB

Luca Sao  Paulo

Visited  on:  2015

#13:55#15:99

Each element in the Graph has own immutable Record ID

#22:11

(Edge)

(Vertex)(Vertex)

Each element in the Graph has own immutable Record ID

Each element in the Graph has own immutable Record ID

Luca Sao  Paulo

Visited  on:  2015

#13:55#15:99

Connections use persistent pointers

out = #22:11

in = #22:11

#22:11

(Edge)

(Vertex)(Vertex)

out = #13:55in = #15:99

Luca Sao  Paulo

Visited  on:  2015

#13:55#15:99out = #22:11

in = #22:11

#22:11

(Edge)

(Vertex)(Vertex)

out = #13:55in = #15:99

Luca Sao  Paulo

Visited  on:  2015

#13:55#15:99out = #22:11

in = #22:11

#22:11

(Edge)

(Vertex)(Vertex)

out = #13:55in = #15:99

A Graph Database creates the relationship just once

(when the edge is created)

VS

RDBMS computes the relationship every time you query a database

When you move from a RDBMS to a Graph Database you jump

from a O(log N) speed to a near O(1)

With a Graph Database, the traversing time is

not affected by database size!

This is huge in the BigData age

Graph Databases Easily Manage Complex Relationships

No costs to traverse relationships: • Recommendation engines • Social Applications • Spatial Apps • Master Data Management • Information Clustering

John

Thriller

Comedy

Pulp Fiction

Mr Bean

Theater B

Theater A

Theater C

NYC

San Josè

Lives in

Likes

GraphDB Database QuadrantR

elat

ions

hips

Com

plex

ity >

Data Complexity >

Relational

Key Value

Column

Graph

Document

GraphDB Database QuadrantR

elat

ions

hips

Com

plex

ity >

Data Complexity >

Relational

Key Value

Column

Graph

Document

These were 1st generation NoSQL products, where each tool was only good at a few use cases

Oracle (RDBMS)

Redis or Memcache (Key/Value)

MongoDB (DocDB)

Neo4j (GraphDB)

Application

ETL

1st Generation NoSQL: Scenario

Primary DB

1st Generation NoSQL: Fact

In > 90% of use cases, NoSQL products are used as second DBMS

Oracle (RDBMS)

Redis or Memcache (Key/Value)

MongoDB (DocDB)

Neo4j (GraphDB)

Application

ETL

1st Generation NoSQL: Problems

- No standard between NoSQL products - Multiple vendors = multiple skills - ETL + synchronization code is costly to write and maintain - Performance and Reliability is hard to predict

2nd Generation NoSQL is

Multi-Model

What’s Multi-Model DBMS?

GraphDocument

Object

Key/Value

Multi Model represents the intersection

of multiple models in just one product

What’s Multi-Model DBMS?

GraphDocument

Object

Key/Value

Multi Model represents the intersection

of multiple models in just one product

- Just one product to learn and maintain - Just one vendor relationship to manage - No ETL, no synchronization required - Performance and Reliability is easy to test from the

beginning

Relationships give data “meaning”

Order #134 (Order) Luca

(Provider)

Commodore Amiga 1200

(Product)

(Sells)

Jill (Customer)

(Has)(Makes)

Monitor 40” (Product)

(Sells)(Has)

3 Wheel Mouse

(Product)

Bruno (Provider)

(Sells)

(Has)

Multi-Model domain schema

Customer Provider

Product name: string

qty: int

Actor name: string

surname: string

Sells price: decimal

Inherits

Edge

Legenda:

V Vertex

Makes

Order number: int

date: datetime

Has price: decimal

`

Vertices and Edges are Documents

{ ”@rid": “12:382”, ”@class": ”Customer", “name”: “Jill”, “surname” : “Raggio”, “phone” : “+39 33123212”, “details”: { “city”:”London", “tags”:”millennial” } }

Jill

Order

Makes

General purpose solution: • JSON • Schema-less • Schema-full • Schema-hybrid • Nested documents • Rich indexing and querying • Developer friendly

Polymorphic queries

Luca (Provider)

Jill (Customer)SELECT * FROM Customer

SELECT * FROM Provider

SELECT * FROM Actor

Bruno (Provider)

Bruno (Provider)

Jill (Customer)

Luca (Provider)

Multi-Model complex domains schema

Band Genre

AccountMusicTaste

Location

Likes

Performs

Inherits

Edge

Legenda:

V Vertex

Plays

Multi-Model complex domains

Snow Patrol (Band)

Luca (Account)

Indie (Genre)

123, 1st Street Austin, TX (Location)

(Performs) April 7, 2015

9pm-11.30pm

(Likes)

Jill (Account)

(Likes)

(Likes)

Rock (Genre)

(Likes)

(Plays)

Multi-Model Database QuadrantR

elat

ions

hips

Com

plex

ity >

Data Complexity >

Relational

Key Value

Column

Graph Multi-Model

Document

Multi-Model Solutions

There are a few DBMSs that claim to be Multi-Model, but they do not have a true Graph Engine.

The “Graph” is only a layer on top of the engine.

Under the hood they do JOINs, which means traversal time is affected by database size.

Meet OrientDBThe First Ever Multi-Model Database Combining Flexibility of Documents with Connectedness of Graphs

With a true Graph, Document, Key/Value and Object Oriented engine

FEATURES ORIENTDB)) MONGODB NEO4J MYSQL)(RDBMS)

Operational Database X X X Graph Database X X Document Database X X Object-Oriented Concepts X Schema-full, Schema-less, Schema mix X User and Role & Record Level Security X Record Level Locking X X X SQL X X ACID Transaction X X X Relationships (Linked Documents) X X X Custom Data Types X X X Embedded Documents X X Multi-Master Zero Configuration Replication X Sharding X X Server Side Functions X X X Native HTTP Rest/ JSON X X Embeddable with No Restrictions X

OrientDB features

DEMO

• Support for TinkerPop standard for Graph DB: Gremlin language and Blueprints API

• SQL + extensions for graphs• JDBC driver to connect any BI tool• HTTP/JSON support• Drivers in Java, Node.js, Python,

PHP, .NET, Perl, C/C++ and more

API & Standards

Availability and Integrity

• Atomic, Consistent, Isolated and Durable (ACID) multi-statement transactions

Master Node

Master Node

C

C C C

CC

C

Multi-master Replication

Scalability and Performance

• Multi-Master Replication, Sharding and Auto-Discovery to Simplify Ops

• +200k Tps on Commodity Hardware

Master Node

Master Node

C

C C C

CC

C

Auto-Discovered

Node

Some numbers

50,000 Downloads per

Month from 200+ countries.

70+ Committers

contributing to the product

1000s Users from SMBs

to Fortune 10 Companies.

17+ Years of Research have been put in

the product

A Bright Future

Graph DBMS increased their popularity by 500% within the last 2 years Document DBMS are the 3rd fastest growing category

Some of Our Customers

Get Started for Free

OrientDB Community Edition is FREE for any purpose (Apache 2 license)

Udemy Getting Started Training is ★★★★★ and Freehttp://www.orientechnologies.com/getting-started

OrientDB Enterprise is Free for Development

Thank you.

Ask your questions on Twitter for theBig Data Panel using #QCONBIGDATA

Luca [email protected]