data modeling with neo4j

187
Data Modeling with Neo4j 1 Michael Hunger, Neo Technology @neo4j | [email protected] Thanks to: Ian Robinson, Mark Needham, Alistair Jones Samstag, 31. August 13

Upload: neo4j-the-fastest-and-most-scalable-native-graph-database

Post on 16-Apr-2017

33.169 views

Category:

Technology


0 download

TRANSCRIPT

Data Modeling with Neo4j

1

Michael Hunger, Neo Technology@neo4j | [email protected]

Thanks to: Ian Robinson, Mark Needham, Alistair Jones

Samstag, 31. August 13

Please ask questionsin the chat

I‘ll answer at the end.

Follow up email with missing answers, video and slides.

2

Samstag, 31. August 13

(Michael) -[:WORKS_ON]-> (Neo4j)

ME

Spring Cloud

Community

Cypher

console

community graph

Server

3

Samstag, 31. August 13

This Webinar

๏Graphs are everywhere

๏Graph Model Building Blocks

๏(NOSQL) Data Models

๏Designing a Data Model

๏Embrace the Paradigm

4

Samstag, 31. August 13

Addressing Data Complexity With Graphs

Samstag, 31. August 13

complexity = f(size, semi-structure, connectedness)

Data Complexity

Samstag, 31. August 13

Are Graphs Everywhere ?

Samstag, 31. August 13

Social Network

Samstag, 31. August 13

(Network) Impact Analysis

Samstag, 31. August 13

Route Finding

Samstag, 31. August 13

Recommendations

Samstag, 31. August 13

Logistics

Samstag, 31. August 13

Access Control

Samstag, 31. August 13

Fraud Analysis

Samstag, 31. August 13

Securities and Debt

Image: orgnet.com

Samstag, 31. August 13

Graphs Are Everywhere !!

Samstag, 31. August 13

Graph Model Building Blocks

Samstag, 31. August 13

Property Graph Data Model

Samstag, 31. August 13

Four Building Blocks

1.Nodes2.Relationships3.Properties4.Labels

Samstag, 31. August 13

Nodes

Samstag, 31. August 13

Nodes๏ Used to represent entities in your domain๏ Can contain properties

• Used to represent entity attributes and/or metadata (e.g. timestamps, version)

• Key-value pairs‣Java primitives‣Arrays‣null is not a valid value

• Every node can have different properties

Samstag, 31. August 13

Relationships

Samstag, 31. August 13

Relationships๏ Every relationship has a name and a direction

• Add structure to the graph• Provide semantic context for nodes

๏ Can contain properties• Used to represent quality or weight of relationship,

or metadata๏ Every relationship must have a start node and end node

• No dangling relationships

Samstag, 31. August 13

Relationships (continued)

Nodes can have more than one relationship

Self relationships are allowed

Nodes can be connected by more than one relationship

Samstag, 31. August 13

Variable Structure๏ Relationships are defined with regard to node

instances, not classes of nodes• Different nodes can be connected in different ways• Allows for structural variation in the domain• Contrast with relational schemas, where foreign key

relationships apply to all rows in a table

Samstag, 31. August 13

Labels

Samstag, 31. August 13

Labels๏ Every node can have zero or more labels attached๏ Used to represent roles (e.g. user, product, company)

• Group nodes• Allow us to associate indexes and constraints with

groups of nodes

Samstag, 31. August 13

Four Building Blocks๏ Nodes

• Entities๏ Relationships

• Connect entities and structure domain๏ Properties

• Attributes and metadata๏ Labels

• Group nodes by role

Samstag, 31. August 13

Aggregate vs. Connected Data-Model

24

Samstag, 31. August 13

NOSQL

RelationalGraph

Document

KeyValue

Riak

Column oriented

25

Redis

Cassandra

Mongo

Couch

Neo4j

MySQL Postgres

NOSQL Databases

Samstag, 31. August 13

26

“There is a significant downside - the whole approach works really well when data access is aligned with the aggregates, but what if you want to look at the data in a different way? Order entry naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the database is that it allows you to slice and dice your data different ways for different audiences.

This is why aggregate-oriented stores talk so much about map-reduce.”

Martin Fowler

Aggregate Oriented Model

Samstag, 31. August 13

27

The connected data model is based on fine grained elements that are richly connected, the emphasis is on extracting many

dimensions and attributes as elements. Connections are cheap and can be used not only for the

domain-level relationships but also for additional structures that allow efficient access for different use-cases. The fine

grained model requires a external scope for mutating operations that ensures Atomicity, Consistency, Isolation and

Durability - ACID also known as Transactions.

Michael Hunger

Connected Data Model

Samstag, 31. August 13

28

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

foo

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

foo bar

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

foo barfoo_bar

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

foo barfoo_bar

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

foo barfoo_bar

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

foo barfoo_bar

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

... now consider a graph

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

... now consider a graph

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

... now consider a graph

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

... now consider a graph

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

... now consider a graph

Relational vs. Graph

Samstag, 31. August 13

You know relational

28

... now consider a graph

Relational vs. Graph

Samstag, 31. August 13

28

Relational vs. Graph

Samstag, 31. August 13

Designing a Graph Modelan Example

Samstag, 31. August 13

Models

Images: en.wikipedia.org

Purposeful abstraction of a domain designed to satisfy particular application/end-user goals

Samstag, 31. August 13

Design for Queryability

Model

Samstag, 31. August 13

Query

Design for Queryability

Samstag, 31. August 13

Design for Queryability

Model

Samstag, 31. August 13

Method1. Identify application/end-user goals2. Figure out what questions to ask of the domain3. Identify entities in each question4. Identify relationships between entities in each

question5. Convert entities and relationships to paths

These become the basis of the data model6. Express questions as graph patterns

These become the basis for queries

Samstag, 31. August 13

From User Story to Model and Query

1. User story

4. Paths

3. Entities and relationships

?2. Questions we want

to ask5.

Data model

6. Query

Samstag, 31. August 13

1. Application/End-User Goals

As an employeeI want to know who in the company has similar skills to meSo that we can exchange knowledge

Samstag, 31. August 13

2. Questions To Ask of the Domain

Which people, who work for the same company as me, have similar skills to me?

As an employeeI want to know who in the company has similar skills to me

So that we can exchange knowledge

Samstag, 31. August 13

Which people, who work for the same company as me, have similar skills to me?

PersonCompanySkill

3. Identify Entities

Samstag, 31. August 13

Which people, who work for the same company as me, have similar skills to me?

Person WORKS_FOR CompanyPerson HAS_SKILL Skill

4. Identify Relationships Between Entities

Samstag, 31. August 13

5. Convert to Cypher Paths

Person WORKS_FOR CompanyPerson HAS_SKILL Skill

Samstag, 31. August 13

5. Convert to Cypher Paths

Person WORKS_FOR CompanyPerson HAS_SKILL Skill

Relationship

Label

Samstag, 31. August 13

5. Convert to Cypher Paths

Person WORKS_FOR CompanyPerson HAS_SKILL Skill

Relationship

Label

(:Person)-[:WORKS_FOR]->(:Company),(:Person)-[:HAS_SKILL]->(:Skill)

Samstag, 31. August 13

Consolidate Paths

(:Person)-[:WORKS_FOR]->(:Company),(:Person)-[:HAS_SKILL]->(:Skill)

Samstag, 31. August 13

Consolidate Paths

(:Person)-[:WORKS_FOR]->(:Company),(:Person)-[:HAS_SKILL]->(:Skill)

(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

Samstag, 31. August 13

Consolidate Paths

(:Person)-[:WORKS_FOR]->(:Company),(:Person)-[:HAS_SKILL]->(:Skill)

(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

Samstag, 31. August 13

Candidate Data Model

(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

Samstag, 31. August 13

6. Express Question as Graph PatternWhich people, who work for the same company as me, have similar skills to me?

Samstag, 31. August 13

Cypher Query

Which people, who work for the same company as me, have similar skills to me?

MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

Samstag, 31. August 13

Which people, who work for the same company as me, have similar skills to me?

MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

Graph Pattern

Samstag, 31. August 13

Which people, who work for the same company as me, have similar skills to me?

MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

Anchor Pattern in Graph

Samstag, 31. August 13

Which people, who work for the same company as me, have similar skills to me?

MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

Create Projection of Results

Samstag, 31. August 13

First Match

Samstag, 31. August 13

Second Match

Samstag, 31. August 13

Third Match

Samstag, 31. August 13

Running the Query+-----------------------------------+| name | score | skills |+-----------------------------------+| "Lucy" | 2 | ["Java","Neo4j"] || "Bill" | 1 | ["Neo4j"] |+-----------------------------------+2 rows

Samstag, 31. August 13

From User Story to Model and Query

MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

As an employeeI want to know who in the company has similar skills to me

So that we can exchange knowledge

(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

Person WORKS_FOR CompanyPerson HAS_SKILL Skill

?Which people, who work for the same company as me, have similar skills to me?

Samstag, 31. August 13

Embrace the paradigm

Samstag, 31. August 13

Use the building blocks๏ Nodes

๏ Relationships

๏ Properties name: value

RELATIONSHIP_NAME

Samstag, 31. August 13

Anti-pattern: rich properties

name: “Canada”languages_spoken: “[ ‘English’, ‘French’ ]”

Samstag, 31. August 13

Normalize Nodes

Samstag, 31. August 13

Anti-Pattern: Node represents multiple concepts

nameagepositioncompanydepartmentprojectskills

Person

Samstag, 31. August 13

HAS_SKILL

Normalize into separate concepts

nameage

Person

namenumber_of_employees

Company

WORKS_FOR

Skillname

Samstag, 31. August 13

Challenge: Property or Relationship?

๏ Can every property be replaced by a relationship?• Hint: triple stores. Are they easy to use?

๏ Should every entity with the same property values be connected?

Samstag, 31. August 13

Object Mapping๏ Similar to how you would map objects to a relational

database, using an ORM such as Hibernate๏ Generally simpler and easier to reason about๏ Examples

• Java: Spring Data Neo4j• Ruby: Active Model

๏ Why Map?• Do you use mapping because you are scared of SQL?• Following DDD, could you write your repositories

directly against the graph API?

Samstag, 31. August 13

CONNECT for fast accessIn-Graph Indices

Samstag, 31. August 13

Relationships for querying๏ like in other databases

• same structure for different use-cases (OLTP and OLAP) doesn‘t work

• graph allows: add more structures๏ Relationships should the primary means to access

nodes in the database๏ Traversing relationships is cheap – that’s the whole

design goal of a graph database๏ Use lookups only to find starting nodes for a query

Data Modeling examples in Manual

Samstag, 31. August 13

Anti-pattern: unconnected graph

name: “Jones” name: “Jones”

name: “Jones”

name: “Jones”name: “Jones”

name: “Jones”

name: “Jones” name: “Jones”

name: “Jones”

name: “Jones”

name: “Jones”

Samstag, 31. August 13

Pattern: Linked List

61

Samstag, 31. August 13

Pattern: Multiple Relationships

62

Samstag, 31. August 13

Pattern-Trees: Tags and Categories

63

Samstag, 31. August 13

Pattern-Tree: Multi-Level-Tree

64

Samstag, 31. August 13

Pattern-Trees: R-Tree (spatial)

65

Samstag, 31. August 13

Example: Activity Stream

66

Samstag, 31. August 13

Graph Evolution

67

Samstag, 31. August 13

Combine multiple Domains in a Graph๏ you start with a single domain๏ add more connected domains as your system evolves๏ more domains allow to ask different queries๏ one domain „indexes“ the other๏ Example Facebook Graph Search

• social graph• location graph• activity graph• favorite graph• ...

Samstag, 31. August 13

Notes on the Graph Data Model๏Schema free, but constraints

๏Model your graph with a whiteboard and a wise man

๏Nodes as main entities but useless without connections

๏Relationships are first level citizens in the model and database

๏Normalize more than in a relational database

๏use meaningful relationship-types, not generic ones like IS_

๏use in-graph structures to allow different access paths

๏evolve your graph to your needs, incremental growth

70

Samstag, 31. August 13

71

Samstag, 31. August 13

๏ Worldwide one-day Neo4j Tutorials

๏ Books

• Graph Databases, Neo4j in Action

๏ neo4j.org

• http://neo4j.org/develop/modeling

๏ docs.neo4j.org

• Data Modeling Examples

๏ http://console.neo4j.org

๏ http://gist.neo4j.org

๏ Get Neo4j

• http://neo4j.org/download

๏ Participate

• http://groups.google.com/group/neo4j

How to get started?

72

Samstag, 31. August 13

Thank You!Questions ?

73

Samstag, 31. August 13

An Example

74

Samstag, 31. August 13

What language do they speak here?

Language Country

Samstag, 31. August 13

What language do they speak here?

Language Country

Samstag, 31. August 13

What language do they speak here?

Language Country

Samstag, 31. August 13

Tables

language_codelanguage_nameword_count

Languagecountry_codecountry_nameflag_uri

Country

Samstag, 31. August 13

Need to model the relationship

language_codelanguage_nameword_count

Languagecountry_codecountry_nameflag_urilanguage_code

Country

Samstag, 31. August 13

What if the cardinality changes?

language_codelanguage_nameword_countcountry_code

Languagecountry_codecountry_nameflag_uri

Country

Samstag, 31. August 13

Or we go many-to-many?

language_codelanguage_nameword_count

Languagecountry_codecountry_nameflag_uri

Countrylanguage_codecountry_code

LanguageCountry

Samstag, 31. August 13

Or we want to qualify the relationship?

language_codelanguage_nameword_count

Languagecountry_codecountry_nameflag_uri

Countrylanguage_codecountry_codeprimary

LanguageCountry

Samstag, 31. August 13

Start talking about Graphs

Samstag, 31. August 13

Explicit Relationship

nameword_count

Languagenameflag_uri

Country

IS_SPOKEN_IN

Samstag, 31. August 13

Relationship Properties

nameword_count

Languagenameflag_uri

Country

IS_SPOKEN_INas_primary

Samstag, 31. August 13

What’s different?

language_codelanguage_nameword_count

Languagecountry_codecountry_nameflag_uri

Countrylanguage_codecountry_codeprimary

LanguageCountryIS_SPOKEN_IN

Samstag, 31. August 13

What’s different?๏ Implementation of maintaining relationships is left up

to the database๏ Artificial keys disappear or are unnecessary๏ Relationships get an explicit name

• can be navigated in both directions

Samstag, 31. August 13

Relationship specialisation

nameword_count

Languagenameflag_uri

Country

IS_SPOKEN_INas_primary

Samstag, 31. August 13

Bidirectional relationships

nameword_count

Languagenameflag_uri

Country

IS_SPOKEN_IN

PRIMARY_LANGUAGE

Samstag, 31. August 13

Weighted relationships

nameword_count

Languagenameflag_uri

Country

POPULATION_SPEAKSpopulation_fraction

Samstag, 31. August 13

Keep on adding relationships

nameword_count

Languagenameflag_uri

Country

POPULATION_SPEAKSpopulation_fraction

SIMILAR_TO ADJACENT_TO

Samstag, 31. August 13

Realworld Examples

92

Samstag, 31. August 13

93

Samstag, 31. August 13

93

Real World Use Cases:

Samstag, 31. August 13

93

Real World Use Cases:

•[A] ACL from Hell

Samstag, 31. August 13

93

Real World Use Cases:

•[A] ACL from Hell

•[B] Timely recommendations

Samstag, 31. August 13

93

Real World Use Cases:

•[A] ACL from Hell

•[B] Timely recommendations

•[C] Global collaboration

Samstag, 31. August 13

94

Samstag, 31. August 13

94

Real World Use Cases:

Samstag, 31. August 13

94

Real World Use Cases:

•[A] ACL from Hell

Samstag, 31. August 13

94

Real World Use Cases:

•[A] ACL from Hell

•[B] Timely recommendations

Samstag, 31. August 13

94

Real World Use Cases:

•[A] ACL from Hell

•[B] Timely recommendations

•[C] Global collaboration

Samstag, 31. August 13

[A] ACL from Hell

95

Samstag, 31. August 13

[A] ACL from Hell๏ Customer:

• leading consumer utility company with tons and tons of users

๏ Goal:

• comprehensive access control administration for customers

๏ Benefits:

• Flexible and dynamic architecture

• Exceptional performance

• Extensible data model supports new applications and features

• Low cost

95

Samstag, 31. August 13

[A] ACL from Hell๏ Customer:

• leading consumer utility company with tons and tons of users

๏ Goal:

• comprehensive access control administration for customers

๏ Benefits:

• Flexible and dynamic architecture

• Exceptional performance

• Extensible data model supports new applications and features

• Low cost

95

• A Reliable access control administration system for

5 million customers, subscriptions and agreements

• Complex dependencies between groups, companies, individuals, accounts, products, subscriptions, services and agreements

• Broad and deep graphs (master customers with 1000s of customers, subscriptions & agreements)

Samstag, 31. August 13

[A] ACL from Hell๏ Customer:

• leading consumer utility company with tons and tons of users

๏ Goal:

• comprehensive access control administration for customers

๏ Benefits:

• Flexible and dynamic architecture

• Exceptional performance

• Extensible data model supports new applications and features

• Low cost

95

• A Reliable access control administration system for

5 million customers, subscriptions and agreements

• Complex dependencies between groups, companies, individuals, accounts, products, subscriptions, services and agreements

• Broad and deep graphs (master customers with 1000s of customers, subscriptions & agreements)

name: Andreas

subscription: sports

service: NFL

account: 9758352794

agreement: ultimate

owns

subscribes to

has plan

includes

provides group: graphistas

promotion: fall

member of

offered

discounts

company: Neo Technologyworks with

gets discount on

subscription: local

subscribes to

provides service: Ravens

includes

Samstag, 31. August 13

[B] Timely Recommendations

96

Samstag, 31. August 13

[B] Timely Recommendations๏ Customer:

• a professional social network

• 35 millions users, adding 30,000+ each day

๏ Goal: up-to-date recommendations

• Scalable solution with real-time end-user experience

• Low maintenance and reliable architecture

• 8-week implementation

96

Samstag, 31. August 13

[B] Timely Recommendations๏ Customer:

• a professional social network

• 35 millions users, adding 30,000+ each day

๏ Goal: up-to-date recommendations

• Scalable solution with real-time end-user experience

• Low maintenance and reliable architecture

• 8-week implementation

96

๏ Problem:

• Real-time recommendation imperative to attract new users and maintain positive user retention

• Clustered MySQL solution not scalable or fast enough to support real-time requirements

๏ Upgrade from running a batch job

• initial hour-long batch job

• but then success happened, and it became a day

• then two days

๏ With Neo4j, real time recommendations

Samstag, 31. August 13

[B] Timely Recommendations๏ Customer:

• a professional social network

• 35 millions users, adding 30,000+ each day

๏ Goal: up-to-date recommendations

• Scalable solution with real-time end-user experience

• Low maintenance and reliable architecture

• 8-week implementation

96

๏ Problem:

• Real-time recommendation imperative to attract new users and maintain positive user retention

• Clustered MySQL solution not scalable or fast enough to support real-time requirements

๏ Upgrade from running a batch job

• initial hour-long batch job

• but then success happened, and it became a day

• then two days

๏ With Neo4j, real time recommendations

name:Andreasjob: talking

name: Allisonjob: plumber

name: Tobiasjob: coding

knows

knows

name: Peterjob: building

name: Emiljob: plumber

knows

name: Stephenjob: DJ

knows

knows

name: Deliajob: barking

knows

knows

name: Tiberiusjob: dancer

knows

knows

knows

knows

Samstag, 31. August 13

[C] Collaboration on Global Scale

97

Samstag, 31. August 13

[C] Collaboration on Global Scale๏ Customer: a worldwide software leader

• highly collaborative end-users

๏ Goal: offer an online platform for global collaboration

• Highly flexible data analysis

• Sub-second results for large, densely-connected data

• User experience - competitive advantage

97

Samstag, 31. August 13

[C] Collaboration on Global Scale๏ Customer: a worldwide software leader

• highly collaborative end-users

๏ Goal: offer an online platform for global collaboration

• Highly flexible data analysis

• Sub-second results for large, densely-connected data

• User experience - competitive advantage

97

• Massive amounts of data tied to members, user groups, member content, etc. all interconnected

• Infer collaborative relationships through user-generated content

• Worldwide Availability

Samstag, 31. August 13

[C] Collaboration on Global Scale๏ Customer: a worldwide software leader

• highly collaborative end-users

๏ Goal: offer an online platform for global collaboration

• Highly flexible data analysis

• Sub-second results for large, densely-connected data

• User experience - competitive advantage

97

• Massive amounts of data tied to members, user groups, member content, etc. all interconnected

• Infer collaborative relationships through user-generated content

• Worldwide Availability

Asia North America Europe

Samstag, 31. August 13

[C] Collaboration on Global Scale๏ Customer: a worldwide software leader

• highly collaborative end-users

๏ Goal: offer an online platform for global collaboration

• Highly flexible data analysis

• Sub-second results for large, densely-connected data

• User experience - competitive advantage

97

• Massive amounts of data tied to members, user groups, member content, etc. all interconnected

• Infer collaborative relationships through user-generated content

• Worldwide Availability

Asia North America Europe

Asia North America Europe

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

San Jose, CA

Cisco.com

Industry: CommunicationsUse case: Recommendations

• Call center volumes needed to be lowered by improving the efficacy of online self service

• Leverage large amounts of knowledge stored in service cases, solutions, articles, forums, etc.

• Problem resolution times, as well as support costs, needed to be lowered

• Cisco.com serves customer and business customers with Support Services

• Needed real-time recommendations, to encourage use of online knowledge base

• Cisco had been successfully using Neo4j for its internal master data management solution.

• Identified a strong fit for online recommendations

• Cases, solutions, articles, etc. continuously scraped for cross-reference links, and represented in Neo4j

• Real-time reading recommendations via Neo4j

• Neo4j Enterprise with HA cluster

• The result: customers obtain help faster, with decreased reliance on customer support

Support

Support

Solution

Message

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

San Jose, CA

Cisco HMP

Industry: CommunicationsUse case: Master Data Management

• Sales compensation system had become unable to meet Cisco’s needs

• Existing Oracle RAC system had reached its limits:

• Insufficient flexibility for handling complex organizational hierarchies and mappings

• “Real-time” queries were taking > 1 minute!

• Business-critical “P1” system needs to be continually available, with zero downtime

• One of the world’s largest communications equipment manufacturers

• #91 Global 2000. $44B in annual sales.

• Needed a system that could accommodate its master data hierarchies in a performant way

• HMP is a Master Data Management system at whose heart is Neo4j. Data access services available 24x7 to applications companywide

• Cisco created a new system: the Hierarchy Management Platform (HMP)

• Allows Cisco to manage master data centrally, and centralize data access and business rules

• Neo4j provided “Minutes to Milliseconds” performance over Oracle RAC, serving master data in real time

• The graph database model provided exactly the flexibility needed to support Cisco’s business rules

• HMP so successful that it has expanded toinclude product hierarchy

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

Industry: LogisticsUse case: Parcel Routing

• 24x7 availability, year round

• Peak loads of 2500+ parcels per second

• Complex and diverse software stack

• Need predictable performance & linear scalability

• Daily changes to logistics network: route from any point, to any point

• One of the world’s largest logistics carriers

• Projected to outgrow capacity of old system

• New parcel routing system

• Single source of truth for entire network

• B2C & B2B parcel tracking • Real-time routing: up to 5M parcels per day

• Neo4j provides the ideal domain fit:

• a logistics network is a graph

• Extreme availability & performance with Neo4j clustering

• Hugely simplified queries, vs. relational for complex routing

• Flexible data model can reflect real-world data variance much better than relational

• “Whiteboard friendly” model easy to understand

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

Sausalito, CA

GlassDoor

Industry: Online Job SearchUse case: Social / Recommendations

• Wanted to leverage known fact that most jobs are found through personal & professional connections

• Needed to rely on an existing source of social network data. Facebook was the ideal choice.

• End users needed to get instant gratification• Aiming to have the best job search service, in a very

competitive market

• Online jobs and career community, providing anonymized inside information to job seekers

• First-to-market with a product that let users find jobs through their network of Facebook friends

• Job recommendations served real-time from Neo4j

• Individual Facebook graphs imported real-time into Neo4j

• Glassdoor now stores > 50% of the entire Facebook social graph

• Neo4j cluster has grown seamlessly, with new instances being brought online as graph size and load have increased

KNO

WS

KNOWS

KN

OW

S

WORKS_AT

WORKS_AT

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

San Jose, CA

Adobe

Industry: Web/ISVUse case: Content Management, Social, Access Control

• Adobe needed a highly robust and available, 24x7 distributed global system, supporting collaboration for users of its highest revenue product line

• Storing creative artifacts in the cloud meant managing access rights for (eventually) millions of users, groups, collections, and pieces of content

• Complex access control rules controlling who was connected to whom, and who could see or edit what, proved a significant technical challenge

• One of the ten largest software companies globally

• $4B+ in revenue. Over 11,000 employees.

• Launched Creative Cloud in 2012, allowing its Creative Suite users to collaborate via the Cloud

• Selected Neo4j to meet very aggressive project deadlines. The flexibility of the graph model, and performance, were the two major selection factors.

• Easily evolve the system to meet tomorrow’s needs

• Extremely high availability and transactional performance requirements. 24x7 with no downtime.

• Neo4j allows consistently fast response times with complex queries, even as the system grows

• First (and possibly still only) database cluster to run across three Amazon EC2 regions: U.S., Europe, Asia

User-Content-Access

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

Paris, France

SFR

Industry: CommunicationsUse case: Network Management

• Infrastructure maintenance took one full week to plan, because of the need to model network impacts

• Needed rapid, automated “what if” analysis to ensure resilience during unplanned network outages

• Identify weaknesses in the network to uncover the need for additional redundancy

• Network information spread across > 30 systems, with daily changes to network infrastructure

• Business needs sometimes changed very rapidly

• Second largest communications company in France

• Part of Vivendi Group, partnering with Vodafone

• Flexible network inventory management system, to support modeling, aggregation & troubleshooting

• Single source of truth (Neo4j) representing the entire network

• Dynamic system loads data from 30+ systems, and allows new applications to access network data

• Modeling efforts greatly reduced because of the near 1:1 mapping between the real world and the graph

• Flexible schema highly adaptable to changing business requirements

Router

DEPEN

DS_ON

Switch Switch

Router

Fiber Fiber

Fiber

DEP

END

S_O

N

DEPEN

DS_O

N

DEPENDS_ON

DEPEN

DS_O

NDEPENDS_ON

DEPENDS_ON

DEPENDS_ON

DEPENDS_ON

DEP

END

S_O

N

LINKED

LINKED

LINKE

D

DEPENDS_ON

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

Frankfurt, Germany

Deutsche Telecom

Industry: CommunicationsUse case: Social gaming

• The Fanorakel application allows fans to have an interactive experience while watching sports

• Fans can vote for referee decisions and interact with other fans watching the game

• Highly connected dataset with real-time updates• Queries need to be served real-time on rapidly changing

data

• One technical challenge is to handle the very high spikes of activity during popular games

• Europe’s largest communications company

• Provider of mobile & land telephone lines to consumers and businesses, as well as internet services, television, and other services

• Interactive, social offering gives fans a way to experience the game more closely

• Increased customer stickiness for Deutsche Telekom

• A completely new channel for reaching customers with information, promotions, and ads

• Clear competitive advantage

Interactive Television

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

Global (U.S., France)

Hewlett Packard

Industry: Web/ISV, CommunicationsUse case: Network Management

• Use network topology information to identify root problems causes on the network

• Simplify alarm handling by human operators

• Automate handling of certain types of alarms Help operators respond rapidly to network issues

• Filter/group/eliminate redundant Network Management System alarms by event correlation

• World’s largest provider of IT infrastructure, software & services

• HP’s Unified Correlation Analyzer (UCA) application is a key application inside HP’s OSS Assurance portfolio

• Carrier-class resource & service management, problem determination, root cause & service impact analysis

• Helps communications operators manage large, complex and fast changing networks

• Accelerated product development time

• Extremely fast querying of network topology

• Graph representation a perfect domain fit

• 24x7 carrier-grade reliability with Neo4j HA clustering

• Met objective in under 6 months

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

Oslo, Norway

Telenor

Industry: CommunicationsUse case: Resource Authorization & Access Control

• Degrading relational performance. User login taking minutes while system retrieved access rights

• Millions of plans, customers, admins, groups. Highly interconnected data set w/massive joins

• Nightly batch workaround solved the performance problem, but meant data was no longer current

• Primary system was Sybase. Batch pre-compute workaround projected to reach 9 hours by 2014: longer than the nightly batch window

• 10th largest Telco provider in the world, leading in the Nordics

• Online self-serve system where large business admins manage employee subscriptions and plans

• Mission-critical system whose availability and responsiveness is critical to customer satisfaction

• Moved authorization functionality from Sybase to Neo4j

• Modeling the resource graph in Neo4j was straightforward, as the domain is inherently a graph

• Able to retire the batch process, and move to real-time responses: measured in milliseconds

• Users able to see fresh data, not yesterday’s snapshot

• Customer retention risks fully mitigated

SUBSCRIBED_BY

CONTROLLED_B

PART_O

User

USER_ACCESS

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

Silicon Valley & France

Viadeo

Industry: Professional Social NetworkUse case: Social, Recommendations

• Business imperative for real-time recommendations: to attract new users and retain existing ones

• Key differentiator: show members how they are connected to any other member

• Real-time traversals of social graph not feasible with MySQL cluster. Batch precompute meant stale data.

• Process taking longer & longer: > 1 week!

• World’s second-largest professional network(after LinkedIn)

• 50M members. 30K+ new members daily.

• Over 400 staff with offices in 12 countries

• Neo4j solution implemented in 8 weeks with 3 part-time programmers

• Able to move from batch to real-time: improved responsiveness with up-to-date data.

• Viadeo (at the time) had 8M members and 35M relationships. • Neo4j cluster now sits at the heart of Viadeo’s professional

network, connecting 50M+ professionals

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

Hong Kong

Maaii

Industry: CommunicationsUse case: Social, Mobile

• Launched a new mobile communication app “Maaii” allowing consumers to communicate by voice & text(Similar to Line, Viber, Rebtel, VoxOx...)

• Needed to store & relate devices, users, and contacts

• Import phone numbers from users’ address books. Rapidly serve up contacts from central database to the mobile app

• Currently around 3M users w/200M nodes in the graph

• Hong Kong based telephony infrastructure provider(aka M800 aka Pop Media)

• Exclusive China Mobile partner for international toll-free services. SMS Hub & other offerings

• 2012 Red Herring Top 100 Global Winner

• Quick transactional performance for key operations:

• friend suggestions (“friend of friend”)

• updating contacts, blocking calls, etc.

• etc.

• High availability telephony app uses Neo4j clustering• Strong architecture fit: Scala w/Neo4j embedded

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

Zürich, Switzerland

Junisphere

Industry: Web/ISV, CommunicationsUse case: Data Center Management

• “Business Service Management” requires mapping of complex graph, covering: business processes--> business services--> IT infrastructure

• Embed capability of storing and retrieving this information into OEM application

• Re-architecting outdated C++ application based on relational database, with Java

• Junisphere AG is a Zurich-based IT solutions provider

• Founded in 2001. Profitable. Self funded.

• Software & services.

• Novel approach to infrastructure monitoring: Starts with the end user, mapped to business processes and services, and dependent infrastructure

• Actively sought out a Java-based solution that could store data as a graph

• Domain model is reflected directly in the database:

• “No time lost in translation”

• “Our business and enterprise consultants now speak the same language, and can model the domain with the database on a 1:1 ratio.”

• Spring Data Neo4j strong fit for Java architecture

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

San Francisco, CA

Teachscape

Industry: EducationUse case: Resource Authorization & Access Control

• Neo4j was selected to be at the heart of a new architecture.

• The user management system, centered around Neo4j, will be used to support single sign-on, user management, contract management, and end-user access to their subscription entitlements.

• Teachscape, Inc. develops online learning tools for K-12 teachers, school principals, and other instructional leaders.

• Teachscape evaluated relational as an option, considering MySQL and Oracle.

• Neo4j was selected because the graph data model provides a more natural fit for managing organizational hierarchy and access to assets.

• Domain and technology fit

• simple domain model where the relationships are relatively complex. Secondary factors included support for transactions, strong Java support, and well-implemented Lucene indexing integration

• Speed and Flexibility

• The business depends on being able to do complex walks quickly and efficiently. This was a major factor in the decision to use Neo4j.

• Ease of Use

• accommodate efficient access for home-grown and commercial off-the-shelf applications, as well as ad-hoc use.

• Extreme availability & performance with Neo4j clustering

• Hugely simplified queries, vs. relational for complex routing

• Flexible data model can reflect real-world data variance much better than

Samstag, 31. August 13

© All Rights Reserved 2013 | Neo Technology, Inc.

Background

Business problem

Neo Technology Confidential

Solution & Benefits

Cambridge, Massachusetts

SevenBridges Genomics

Industry: Life SciencesUse case: Content Management

• Neo4j is used to store metadata about each sequenced genome (including a pointer to the sequenced genome itself, which is a binary file stored on Amazon S3), and to support search and other forms of information processing against the genomic data.

• graph database was chosen because “Our specific domain maps naturally onto graph paradigm”.

• Bioinformatics company offering gene sequencing "as a service" (over the web)

• Provider of genomic information services

• Needed a new platform to support storage & retrieval of sequenced genomes in the cloud

•Domain fit

• Domain naturally lends itself to a graph representation.

• Graph model determined to be a perfect fit.

•Agility & Performance

• Saved time with Neo4j as compared to the alternatives.• Queries “practically write themselves.”

•Solution Completeness

• “Neo4j is incomparably better than other graph databases.”

Samstag, 31. August 13

112

Samstag, 31. August 13

112

Really, once you start thinking in graphs it's hard to stop

Recommendations MDM

Systems Management

Geospatial

Social computing

Business intelligence

Biotechnology

Making Sense of all that data

your brainaccess control

linguistics

catalogs

genealogy routing

compensation market vectors

Samstag, 31. August 13

112

Really, once you start thinking in graphs it's hard to stop

Recommendations MDM

Systems Management

Geospatial

Social computing

Business intelligence

Biotechnology

Making Sense of all that data

your brainaccess control

linguistics

catalogs

genealogy routing

compensation market vectors

What will you build?

Samstag, 31. August 13

113

Samstag, 31. August 13

is a

114

Samstag, 31. August 13

115

NOSQL

Samstag, 31. August 13

Graph Database

116

Samstag, 31. August 13

A graph database...

117

Samstag, 31. August 13

A graph database...

117

NO: not for charts & diagrams, or vector artwork

Samstag, 31. August 13

A graph database...

117

NO: not for charts & diagrams, or vector artwork

YES: for storing data that is structured as a graph

Samstag, 31. August 13

A graph database...

117

NO: not for charts & diagrams, or vector artwork

YES: for storing data that is structured as a graph

remember linked lists, trees?

Samstag, 31. August 13

A graph database...

117

NO: not for charts & diagrams, or vector artwork

YES: for storing data that is structured as a graph

remember linked lists, trees?

graphs are the general-purpose data structure

Samstag, 31. August 13

A graph database...

117

NO: not for charts & diagrams, or vector artwork

YES: for storing data that is structured as a graph

remember linked lists, trees?

graphs are the general-purpose data structure

“A relational database may tell you the average age of everyone in this place,

but a graph database will tell you who is most likely to buy you a beer.”

Samstag, 31. August 13

Data Modeling

118

Samstag, 31. August 13

Why Data Modeling

119

๏What is modeling?

๏Aren‘t we schema free?

๏How does it work in a graph?

๏Where should modeling happen? DB or Application

Samstag, 31. August 13

Data Models

120

Samstag, 31. August 13

Model mis-match

Real World Model

Samstag, 31. August 13

Model mis-match

Application Model Database Model

Samstag, 31. August 13

Trinity of models

Samstag, 31. August 13

Whiteboard --> Data

124

Samstag, 31. August 13

Whiteboard --> Data

124

Andreas Peter

Emil

Allison

Samstag, 31. August 13

Whiteboard --> Data

124

Andreas Peter

Emil

Allison

knows

knows knows

knows

Samstag, 31. August 13

Whiteboard --> Data

124

Andreas Peter

Emil

Allison

knows

knows knows

knows

Samstag, 31. August 13

Whiteboard --> Data

124

Andreas Peter

Emil

Allison

knows

knows knows

knows

// Cypher query - friend of a friendstart n=node(0)match (n)-->()-[:KNOWS|LIKES]->(foaf)WHERE NOT((n)[:KNOWS]-->(foaf)) return foaf

Samstag, 31. August 13

125

Samstag, 31. August 13

You traverse the graph

125

Samstag, 31. August 13

You traverse the graph

125

Samstag, 31. August 13

// lookup starting point in an indexSTART n=node:People(name = ‘Andreas’)

Andreas

You traverse the graph

125

Samstag, 31. August 13

// lookup starting point in an indexSTART n=node:People(name = ‘Andreas’)

Andreas

You traverse the graph

125

// then traverse to find resultsSTART me=node:People(name = ‘Andreas’MATCH (me)-[:FRIEND]-(friend)-[:FRIEND]-(friend2) RETURN friend2

Samstag, 31. August 13

125

Samstag, 31. August 13

SELECT skills.*, user_skill.* FROM users JOIN user_skill ON users.id = user_skill.user_id JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1

126

START user = node(1) MATCH user -[user_skill]-> skill RETURN skill, user_skill

Samstag, 31. August 13