raph databases with neo4j – emil eifrem

44
1 Neo4j is Teh Awesome Graph Database 101 1 @emileifrem #neo4j

Upload: buildacloud

Post on 27-Jan-2015

122 views

Category:

Technology


1 download

DESCRIPTION

Graph Databases with Neo4j

TRANSCRIPT

Page 1: raph Databases with Neo4j – Emil Eifrem

1

Neo4j is Teh Awesome

Graph Database 101

1

@emileifrem#neo4j

Page 2: raph Databases with Neo4j – Emil Eifrem

2

“So what’s a graph database?”

2

Page 3: raph Databases with Neo4j – Emil Eifrem

33

“A traditional relational database may tell you the

average age of everyone in this room...”

Page 4: raph Databases with Neo4j – Emil Eifrem

44

“... but a graph database will tell you who is most

likely tobuy you a beer!”

Page 5: raph Databases with Neo4j – Emil Eifrem

5

No. Srsly.

๏ Nodes

๏ Relationships

๏ Properties

5

Page 6: raph Databases with Neo4j – Emil Eifrem

6

How fast is it?

6

Page 7: raph Databases with Neo4j – Emil Eifrem

7

How fast is it?๏ a sample social graph

•with ~1,000 persons

๏ average 50 friends per person

๏ pathExists(a,b) limited to depth 4

๏ caches warmed up to eliminate disk I/O

# persons query time

Relational database 1,000 2000ms

Neo4j 1,000 2ms

Neo4j 1,000,000

Page 8: raph Databases with Neo4j – Emil Eifrem

8

How fast is it?๏ a sample social graph

•with ~1,000 persons

๏ average 50 friends per person

๏ pathExists(a,b) limited to depth 4

๏ caches warmed up to eliminate disk I/O

# persons query time

Relational database 1,000 2000ms

Neo4j 1,000 2ms

Neo4j 1,000,000 2ms

Page 9: raph Databases with Neo4j – Emil Eifrem

9

So how do you query it?

9

Page 10: raph Databases with Neo4j – Emil Eifrem

1010

Cypher

(A) -[:LOVES]-> (B)

LOVESAA BB

Graph Patterns

START A=node:person(name=“A”)MATC

HRETURN B as lover

ASCII art

Page 11: raph Databases with Neo4j – Emil Eifrem

11

// step 1: find starting pointSTART andreas=node:persons(name = ‘Andreas’)// step 2: describe pattern and resultsSTART andreas=node:persons(name = ‘Andreas’)MATCH (andreas)-->()-->(foaf) RETURN foaf

Example: Finding Friends of Friends

11

(andreas)

Page 12: raph Databases with Neo4j – Emil Eifrem

12

Neo4j is a Graph Database๏ A Graph Database:

•a Property Graph with Nodes, Relationships

•and Properties on both

•perfect for complex, highly connected data

๏ A Graph Database:

•reliable with real ACID Transactions

•scalable: high availability clustering in Neo4j Enterprise

•server with HTTP API, or embeddable on the JVM

•high-performance with High-Availability (read scaling)

12

Page 13: raph Databases with Neo4j – Emil Eifrem

13

Who’s using graphs today?

13

Accenture

Page 14: raph Databases with Neo4j – Emil Eifrem

14

So how do you get your hands on Neo4j?

14

๏ Option A: Download and install locally...

•go to http://neo4j.org

•click the shiny “Download Neo4j Now” button

•expand the archive, read the readmes

๏ Or B...

Page 15: raph Databases with Neo4j – Emil Eifrem

15

Graphs In The Cloud (BETA)

// new to heroku? get help$ heroku help

// create a new application$ heroku create intro-to-neo4j

// add Neo4jheroku addons:add neo4j --app intro-to-neo4j

// find out about the applicationheroku info --app intro-to-neo4j

// find the Neo4j Webadminheroku config --app intro-to-neo4j

// done trying it out? remove the applicationheroku destroy --app intro-to-neo4j

15

Page 16: raph Databases with Neo4j – Emil Eifrem

16

I needs thy help๏ Cloud Devops Engineer (San Mateo, London or

Malmö, SE)

•Come help us build a world class graph database cloud platform!

๏ Director of Community North America (San Mateo)

•Head up our developer outreach and evangelism in NA

๏ Developer Evangelist (San Mateo)

•Preach graphs to silicon valley and the world 16

Page 17: raph Databases with Neo4j – Emil Eifrem

1717

thank you!

stay connected

Page 18: raph Databases with Neo4j – Emil Eifrem

Core Industries Core Industries & Use Cases:& Use Cases: Web / ISVWeb / ISV Finance & Finance &

InsuranceInsuranceDatacom / Datacom / TelecomTelecom

Network /Cloud Network /Cloud MgmtMgmt

MDMMDM

SocialSocial

GeoGeo

Early Adopter Graph Database Segments

Page 19: raph Databases with Neo4j – Emil Eifrem

Core Industries Core Industries & Use Cases:& Use Cases: Web / ISVWeb / ISV Finance & Finance &

InsuranceInsuranceDatacom / Datacom / TelecomTelecom

Network /Cloud Network /Cloud MgmtMgmt

MDMMDM

SocialSocial

GeoGeo

Core Core Industries Industries

& Use Cases:& Use Cases:Web / ISVWeb / ISV

Finance Finance & &

InsurancInsurancee

Datacom Datacom / Telecom/ Telecom LogisticsLogistics Life Life

SciencesSciences

Media & Media & PublishinPublishin

gg

Education, Education, Not-for-Not-for-ProfitProfit

Government, Government, Aerospace, Aerospace, Gaming, ...Gaming, ...

Network Network /Cloud Mgmt/Cloud Mgmt

MDMMDM

SocialSocial

GeoGeo

Resource Auth Resource Auth & Access & Access ControlControl

Content Content ManagementManagement

Recommend-Recommend-ationsations

Data Center Data Center ManagementManagement

Fraud Fraud Detection, ...Detection, ...

Early Adopter Graph Database Segments

Early Adopters Going Mainstream

Page 20: raph Databases with Neo4j – Emil Eifrem

21

TelenorResource authorization & Access Control in Telecommunications

21

Background•10th largest Telco provider in the world, leading in the Nordics

•Online self-serve system where large business customers manage employee subscriptions and plans

•24/7 availability critical to customer satisfaction

Business problem

Solution & Benefits

Page 21: raph Databases with Neo4j – Emil Eifrem

22

TelenorResource authorization & Access Control in Telecommunications

22

Page 22: raph Databases with Neo4j – Emil Eifrem

23

TelenorResource authorization & Access Control in Telecommunications

23

Background•10th largest Telco provider in the world, leading in the Nordics

•Online self-serve system where large business customers manage employee subscriptions and plans

•24/7 availability critical to customer satisfaction

Business problem

Solution & Benefits•Resource authorization and access control across millions of plans, customers, administrators and groups, all interconnected, becomes a challenge

•Used Sybase RDBMS for pre-computing access rights daily

•Pre-computation time projected to reach 9 hours in 2014

•Users cannot log in until their rights are computed

•Resource graph easily modeled and queried in Neo4j

•1500 lines of stored procedures => 10s of lines of Neo4j code

•All requests computed in real time in milliseconds

•Changes to customer resources reflected immediately

•Customer retention risks mitigated

Page 23: raph Databases with Neo4j – Emil Eifrem

24

SFRNetwork Management in Telecommunications

24

Background•Second largest Telco in France

•Part of Vivendi Group, partnering with Vodaphone

Business problem

Solution & Benefits

RouterRouter

Service

Service

DEPENDS_ON

SwitchSwitch SwitchSwitch

RouterRouter

Fiber Link

Fiber Link Fiber

LinkFiber Link

Fiber Link

Fiber Link

Oceanfloor

Cable

Oceanfloor

Cable

DEP

END

S_O

N

DEPEN

DS_O

N

DEPENDS_ON

DEPE

ND

S_O

N

DEPENDS_ON

DEPENDS_ON

DEPENDS_ONDEPENDS_ON

DEPEN

DS

_ON

LINKED

LINKED

LINKE

D

DEPENDS_ON

Page 24: raph Databases with Neo4j – Emil Eifrem

25

SFRNetwork Management in Telecommunications

25

RouterRouter

ServiceService

DEPENDS_ON

SwitchSwitch SwitchSwitch

RouterRouter

Fiber LinkFiber LinkFiber LinkFiber Link

Fiber LinkFiber Link

Oceanfloor Cable

Oceanfloor Cable

DEP

END

S_O

N

DEPEN

DS_O

N

DEPENDS_ON

DEPE

ND

S_O

N

DEPENDS_ON

DEPENDS_ON

DEPENDS_ON

DEPENDS_ON

DEPEN

DS

_ON

LINKED

LINKED

LINKE

D

DEPENDS_ON

Page 25: raph Databases with Neo4j – Emil Eifrem

26

SFRNetwork Management in Telecommunications

26

Background•Second largest Telco in France

•Part of Vivendi Group, partnering with Vodaphone

Business problem

Solution & Benefits•Need for flexible network inventory management, aggregation, and troubleshooting

•Impact analysis of planned and unplanned network outages, so that affected services can be notified or receive increased redundancy

•Highly volatile network structure changing daily, with business requirements changing as well

•Neo4j Enterprise with a highly available cluster

•Dynamic system allowing for new applications to tie into network structure data

•Near 1:1 mapping of real world to graph, greatly reducing modeling work

•High adaptability to changing business requirements

RouterRouter

Service

Service

DEPENDS_ON

SwitchSwitch SwitchSwitch

RouterRouter

Fiber Link

Fiber Link Fiber

LinkFiber Link

Fiber Link

Fiber Link

Oceanfloor

Cable

Oceanfloor

Cable

DEP

END

S_O

N

DEPEN

DS_O

N

DEPENDS_ON

DEPE

ND

S_O

N

DEPENDS_ON

DEPENDS_ON

DEPENDS_ONDEPENDS_ON

DEPEN

DS

_ON

LINKED

LINKED

LINKE

D

DEPENDS_ON

Page 26: raph Databases with Neo4j – Emil Eifrem

27

AdobeContent Management, Access Control & Collaboration

27

Background•Creative Cloud, announced 2011, is a

cloud-based offering for professional users of Adobe’s creative suiteCollaborative Cloud is the social element of the Creative Cloud, connecting professional users around the world

Business problem

Solution & Benefits

Page 27: raph Databases with Neo4j – Emil Eifrem

2828

Domain GraphDeployment Architecture

AdobeContent Management, Access Control & Collaboration

Page 28: raph Databases with Neo4j – Emil Eifrem

29

AdobeContent Management, Access Control & Collaboration

29

Background•Creative Cloud, announced 2011, is a

cloud-based offering for professional users of Adobe’s creative suiteCollaborative Cloud is the social element of the Creative Cloud, connecting professional users around the world

Business problem

Solution & Benefits•Identifies which collections a user has

access toFinds third-party assets that are like a user’s assetsInfers professional relations based on user-generated content

•Fit:

•Graph model is a natural fit for social network

•Collaborative user experience adds competitive advantage to Adobe offering

•Flexibility: Data model can be easily evolved to support permissions and more sophisticated recommendation strategies

•Performance: Sub-second results for large, densely-connected datasets

Page 29: raph Databases with Neo4j – Emil Eifrem

30

ViadeoRecommendations in Social

๏ Customer: a professional social network

• 35 millions users, adding 30,000+ each day

๏ Goal: up-to-date recommendations

• Scalable solution with real-time end-user experience

• Low maintenance and reliable architecture

• 8-week implementation

30

๏ Problem:

• Real-time recommendation imperative to attract new users and maintain positive user retentionClustered MySQL solution not scalable or fast enough to support real-time requirementsUpgrade from running a batch job

๏ initial hour-long batch job

• but then success happened, and it became a day

• then two days

• With Neo4j, real time recommendations

Page 30: raph Databases with Neo4j – Emil Eifrem

32

First off: the name

๏WE ALL HATES IT, M’KAY?

32

Page 31: raph Databases with Neo4j – Emil Eifrem

NOSQL is NOT...

๏ NO to SQL

๏ NEVER SQL

Page 32: raph Databases with Neo4j – Emil Eifrem

Not Only SQL

NOSQL is simply

Page 33: raph Databases with Neo4j – Emil Eifrem

But why now?

Page 34: raph Databases with Neo4j – Emil Eifrem

36

Trends in BigData & NOSQL

36

๏ 1. increasing data size (big data)

•“Every 2 days we create as much information as we did up to 2003” - Eric Schmidt

๏ 2. increasingly connected data (graph data)

•for example, text documents to html

๏ 3. semi-structured data

•individualization of data, with common sub-set

๏ 4. architecture - a facade over multiple services

•from monolithic to modular, distributed applications

Page 35: raph Databases with Neo4j – Emil Eifrem

37

4 Categories of NOSQL

37

Page 36: raph Databases with Neo4j – Emil Eifrem

38

Key-Value Category๏ “Dynamo: Amazon’s Highly Available Key-Value Store”

(2007)

๏ Data model:

•Global key-value mapping

•Big scalable HashMap

•Highly fault tolerant (typically)

๏ Examples:

•Riak, Redis, Voldemort

38

Page 37: raph Databases with Neo4j – Emil Eifrem

39

Key-Value: Pros & Cons๏ Strengths

•Simple data model

•Great at scaling out horizontally

•Scalable

•Available

๏Weaknesses:

•Simplistic data model

•Poor for complex data

39

Page 38: raph Databases with Neo4j – Emil Eifrem

40

Column-Family Category๏ Google’s “Bigtable: A Distributed Storage System for

Structured Data” (2006)

•Column-Family are essentially Big Table clones

๏ Data model:

•A big table, with column families

•Map-reduce for querying/processing

๏ Examples:

•HBase, HyperTable, Cassandra

40

Page 39: raph Databases with Neo4j – Emil Eifrem

41

Column-Family: Pros & Cons๏ Strengths

•Data model supports semi-structured data

•Naturally indexed (columns)

•Good at scaling out horizontally

๏Weaknesses:

•Unsuited for connected data

41

Page 40: raph Databases with Neo4j – Emil Eifrem

42

Document Database Category๏ Data model

•Collections of documents

•A document is a key-value collection

•Index-centric, lots of map-reduce

๏ Examples

•CouchDB, MongoDB

42

Page 41: raph Databases with Neo4j – Emil Eifrem

43

Document Database: Pros & Cons๏ Strengths

•Simple, powerful data model

•Good scaling (especially if sharding supported)

๏Weaknesses:

•Unsuited for connected data

•Query model limited to keys (and indexes)

43

Page 42: raph Databases with Neo4j – Emil Eifrem

44

Graph Database Category๏ Data model:

•Nodes & Relationships

•Hypergraph, sometimes (edges with multiple endpoints)

๏ Examples:

•Neo4j (of course), OrientDB, InfiniteGraph, AllegroGraph

44

Page 43: raph Databases with Neo4j – Emil Eifrem

45

Graph Database: Pros & Cons๏ Strengths

•Powerful data model, as general as RDBMS

•Fast, for connected data

•Easy to query

๏Weaknesses:

•Requires conceptual shift

‣though graph-like thinking becomes addictive

45

Page 44: raph Databases with Neo4j – Emil Eifrem

46

Scaling to Size

Scaling to Complexity

Key/Value stores

ColumnFamily stores

Document databases

Graph databases

My subjective view: > 90% of use cases

100+ billion of nodesand relationships

The NOSQL Space