data modeling with neo4j
DESCRIPTION
This presentation covers several aspects of modeling data and domains with a graph database like Neo4j. The graph data model allows high fidelity modeling. Using the first class relationships of the graph model allow to use much higher forms of normalization than you would use in a relational database. Video here: https://vimeo.com/67371996TRANSCRIPT
(Michael) -[:WORKS_ON]-> (Neo4j)
ME
Spring Cloud
Community
Cypher
console
community graph
Server
2
3
is a
4
5
NOSQL
Graph Database
6
A graph database...
7
A graph database...
7
NO: not for charts & diagrams, or vector artwork
A graph database...
7
NO: not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
A graph database...
7
NO: not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
remember linked lists, trees?
A graph database...
7
NO: not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
remember linked lists, trees?
graphs are the general-purpose data structure
A graph database...
7
NO: not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
remember linked lists, trees?
graphs are the general-purpose data structure
“A relational database may tell you the average age of everyone in this place,
but a graph database will tell you who is most likely to buy you a beer.”
8
You know relational
8
You know relational
8
You know relational
8
foo
You know relational
8
foo bar
You know relational
8
foo barfoo_bar
You know relational
8
foo barfoo_bar
You know relational
8
foo barfoo_bar
You know relational
8
foo barfoo_bar
You know relational
8
now consider relationships...
You know relational
8
now consider relationships...
You know relational
8
now consider relationships...
You know relational
8
now consider relationships...
You know relational
8
now consider relationships...
You know relational
8
now consider relationships...
8
9
We're talking about aProperty Graph
9
We're talking about aProperty Graph
9
Nodes
We're talking about aProperty Graph
9
Nodes
Relationships
Emil
Andrés
Lars
Johan
Allison
Peter
Michael
Tobias
Andreas
IanMica
Delia
knows
knows
knowsknows
knows
knows
knows
knows
knows
knowsMica
knowsknowsMica
Delia
knows
We're talking about aProperty Graph
9
Nodes
Relationships
Properties (each a key+value)
Emil
Andrés
Lars
Johan
Allison
Peter
Michael
Tobias
Andreas
IanMica
Delia
knows
knows
knowsknows
knows
knows
knows
knows
knows
knowsMica
knowsknowsMica
Delia
knows
We're talking about aProperty Graph
9
Nodes
Relationships
Properties (each a key+value)
+ Indexes (for easy look-ups)
Aggregate vs. Connected Data-Model
10
NOSQL
RelationalGraph
Document
KeyValue
Riak
Column oriented
11
Redis
Cassandra
Mongo
Couch
Neo4j
MySQL Postgres
NOSQL Databases
12
“There is a significant downside - the whole approach works really well when data access is aligned with the aggregates, but what if you want to look at the data in a different way? Order entry naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the database is that it allows you to slice and dice your data different ways for different audiences.
This is why aggregate-oriented stores talk so much about map-reduce.”
Martin Fowler
Aggregate Oriented Model
13
The connected data model is based on fine grained elements that are richly connected, the emphasis is on extracting many
dimensions and attributes as elements. Connections are cheap and can be used not only for the
domain-level relationships but also for additional structures that allow efficient access for different use-cases. The fine
grained model requires a external scope for mutating operations that ensures Atomicity, Consistency, Isolation and
Durability - ACID also known as Transactions.
Michael Hunger
Connected Data Model
Data Modeling
14
Why Data Modeling
15
๏What is modeling?
๏Aren‘t we schema free?
๏How does it work in a graph?
๏Where should modeling happen? DB or Application
Data Models
16
Model mis-match
Real World Model
Model mis-match
Application Model Database Model
Trinity of models
Whiteboard --> Data
20
Whiteboard --> Data
20
Andreas Peter
Emil
Allison
Whiteboard --> Data
20
Andreas Peter
Emil
Allison
knows
knows knows
knows
Whiteboard --> Data
20
Andreas Peter
Emil
Allison
knows
knows knows
knows
Whiteboard --> Data
20
Andreas Peter
Emil
Allison
knows
knows knows
knows
// Cypher query - friend of a friendstart n=node(0)match (n)--()--(foaf) return foaf
21
You traverse the graph
21
You traverse the graph
21
// lookup starting point in an indexSTART n=node:People(name = ‘Andreas’)
Andreas
You traverse the graph
21
// lookup starting point in an indexSTART n=node:People(name = ‘Andreas’)
Andreas
You traverse the graph
21
// then traverse to find resultsSTART me=node:People(name = ‘Andreas’MATCH (me)-[:FRIEND]-(friend)-[:FRIEND]-(friend2) RETURN friend2
21
SELECT skills.*, user_skill.* FROM users JOIN user_skill ON users.id = user_skill.user_id JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1
22
START user = node(1) MATCH user -[user_skill]-> skill RETURN skill, user_skill
An Example
23
What language do they speak here?
Language Country
What language do they speak here?
Language Country
What language do they speak here?
Language Country
Tables
language_codelanguage_nameword_count
Languagecountry_codecountry_nameflag_uri
Country
Need to model the relationship
language_codelanguage_nameword_count
Languagecountry_codecountry_nameflag_urilanguage_code
Country
What if the cardinality changes?
language_codelanguage_nameword_countcountry_code
Languagecountry_codecountry_nameflag_uri
Country
Or we go many-to-many?
language_codelanguage_nameword_count
Languagecountry_codecountry_nameflag_uri
Countrylanguage_codecountry_code
LanguageCountry
Or we want to qualify the relationship?
language_codelanguage_nameword_count
Languagecountry_codecountry_nameflag_uri
Countrylanguage_codecountry_codeprimary
LanguageCountry
Start talking about Graphs
Explicit Relationship
nameword_count
Languagenameflag_uri
Country
IS_SPOKEN_IN
Relationship Properties
nameword_count
Languagenameflag_uri
Country
IS_SPOKEN_INas_primary
What’s different?
language_codelanguage_nameword_count
Languagecountry_codecountry_nameflag_uri
Countrylanguage_codecountry_codeprimary
LanguageCountryIS_SPOKEN_IN
What’s different?๏ Implementation of maintaining relationships is left up
to the database๏ Artificial keys disappear or are unnecessary๏ Relationships get an explicit name
• can be navigated in both directions
Relationship specialisation
nameword_count
Languagenameflag_uri
Country
IS_SPOKEN_INas_primary
Bidirectional relationships
nameword_count
Languagenameflag_uri
Country
IS_SPOKEN_IN
PRIMARY_LANGUAGE
Weighted relationships
nameword_count
Languagenameflag_uri
Country
POPULATION_SPEAKSpopulation_fraction
Keep on adding relationships
nameword_count
Languagenameflag_uri
Country
POPULATION_SPEAKSpopulation_fraction
SIMILAR_TO ADJACENT_TO
EMBRACE the paradigm
Use the building blocks๏ Nodes
๏ Relationships
๏ Properties name: value
RELATIONSHIP_NAME
Anti-pattern: rich properties
name: “Canada”languages_spoken: “[ ‘English’, ‘French’ ]”
Normalize Nodes
Anti-Pattern: Node represents multiple concepts
nameflag_urilanguage_namenumber_of_wordsyes_in_languageno_in_languagecurrency_codecurrency_name
Country
USES_CURRENCY
Split up in separate concepts
nameflag_uricurrency_codecurrency_name
Countrynamenumber_of_wordsyesno
Country
SPEAKS
Currencycurrency_codecurrency_name
Challenge: Property or Relationship?
๏ Can every property be replaced by a relationship?• Hint: triple stores. Are they easy to use?
๏ Should every entities with the same property values be connected?
Object Mapping๏ Similar to how you would map objects to a relational
database, using an ORM such as Hibernate๏ Generally simpler and easier to reason about๏ Examples
• Java: Spring Data Neo4j• Ruby: Active Model
๏ Why Map?• Do you use mapping because you are scared of SQL?• Following DDD, could you write your repositories
directly against the graph API?
CONNECT for fast accessIn-Graph Indices
Relationships for querying๏ like in other databases
• same structure for different use-cases (OLTP and OLAP) doesn‘t work
• graph allows: add more structures๏ Relationships should the primary means to access
nodes in the database๏ Traversing relationships is cheap – that’s the whole
design goal of a graph database๏ Use lookups only to find starting nodes for a query
Data Modeling examples in Manual
Anti-pattern: unconnected graph
name: “Jones” name: “Jones”
name: “Jones”
name: “Jones”name: “Jones”
name: “Jones”
name: “Jones” name: “Jones”
name: “Jones”
name: “Jones”
name: “Jones”
Pattern: Linked List
52
Pattern: Multiple Relationships
53
Pattern-Trees: Tags and Categories
54
Pattern-Tree: Multi-Level-Tree
55
Pattern-Trees: R-Tree (spatial)
56
Example: Activity Stream
57
Graph Evolution
58
Evolution: Relationship to Node
59
PeterSENT_EMAIL Michael
Peter EMAIL_FROM MichaelEMAIL_TO
Emil
EMAIL_
CC
Community
TAGGED
. . .
see Hyperedges
Combine multiple Domains in a Graph๏ you start with a single domain๏ add more connected domains as your system evolves๏ more domains allow to ask different queries๏ one domain „indexes“ the other๏ Example Facebook Graph Search
• social graph• location graph• activity graph• favorite graph• ...
Notes on the Graph Data Model๏Schema free, but constraints
๏Model your graph with a whiteboard and a wise man
๏Nodes as main entities but useless without connections
๏Relationships are first level citizens in the model and database
๏Normalize more than in a relational database
๏use meaningful relationship-types, not generic ones like IS_
๏use in-graph structures to allow different access paths
๏evolve your graph to your needs, incremental growth
61
Realworld Examples
62
63
63
Real World Use Cases:
63
Real World Use Cases:
•[A] ACL from Hell
63
Real World Use Cases:
•[A] ACL from Hell
•[B] Timely recommendations
63
Real World Use Cases:
•[A] ACL from Hell
•[B] Timely recommendations
•[C] Global collaboration
[A] ACL from Hell
64
[A] ACL from Hell๏ Customer:
• leading consumer utility company with tons and tons of users
๏ Goal:
• comprehensive access control administration for customers
๏ Benefits:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new applications and features
• Low cost
64
[A] ACL from Hell๏ Customer:
• leading consumer utility company with tons and tons of users
๏ Goal:
• comprehensive access control administration for customers
๏ Benefits:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new applications and features
• Low cost
64
• A Reliable access control administration system for
5 million customers, subscriptions and agreements
• Complex dependencies between groups, companies, individuals, accounts, products, subscriptions, services and agreements
• Broad and deep graphs (master customers with 1000s of customers, subscriptions & agreements)
[A] ACL from Hell๏ Customer:
• leading consumer utility company with tons and tons of users
๏ Goal:
• comprehensive access control administration for customers
๏ Benefits:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new applications and features
• Low cost
64
• A Reliable access control administration system for
5 million customers, subscriptions and agreements
• Complex dependencies between groups, companies, individuals, accounts, products, subscriptions, services and agreements
• Broad and deep graphs (master customers with 1000s of customers, subscriptions & agreements)
name: Andreas
subscription: sports
service: NFL
account: 9758352794
agreement: ultimate
owns
subscribes to
has plan
includes
provides group: graphistas
promotion: fall
member of
offered
discounts
company: Neo Technologyworks with
gets discount on
subscription: local
subscribes to
provides service: Ravens
includes
[B] Timely Recommendations
65
[B] Timely Recommendations๏ Customer:
• a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user experience
• Low maintenance and reliable architecture
• 8-week implementation
65
[B] Timely Recommendations๏ Customer:
• a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user experience
• Low maintenance and reliable architecture
• 8-week implementation
65
๏ Problem:
• Real-time recommendation imperative to attract new users and maintain positive user retention
• Clustered MySQL solution not scalable or fast enough to support real-time requirements
๏ Upgrade from running a batch job
• initial hour-long batch job
• but then success happened, and it became a day
• then two days
๏ With Neo4j, real time recommendations
[B] Timely Recommendations๏ Customer:
• a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user experience
• Low maintenance and reliable architecture
• 8-week implementation
65
๏ Problem:
• Real-time recommendation imperative to attract new users and maintain positive user retention
• Clustered MySQL solution not scalable or fast enough to support real-time requirements
๏ Upgrade from running a batch job
• initial hour-long batch job
• but then success happened, and it became a day
• then two days
๏ With Neo4j, real time recommendations
name:Andreasjob: talking
name: Allisonjob: plumber
name: Tobiasjob: coding
knows
knows
name: Peterjob: building
name: Emiljob: plumber
knows
name: Stephenjob: DJ
knows
knows
name: Deliajob: barking
knows
knows
name: Tiberiusjob: dancer
knows
knows
knows
knows
[C] Collaboration on Global Scale
66
[C] Collaboration on Global Scale๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
66
[C] Collaboration on Global Scale๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
66
• Massive amounts of data tied to members, user groups, member content, etc. all interconnected
• Infer collaborative relationships through user-generated content
• Worldwide Availability
[C] Collaboration on Global Scale๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
66
• Massive amounts of data tied to members, user groups, member content, etc. all interconnected
• Infer collaborative relationships through user-generated content
• Worldwide Availability
Asia North America Europe
[C] Collaboration on Global Scale๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
66
• Massive amounts of data tied to members, user groups, member content, etc. all interconnected
• Infer collaborative relationships through user-generated content
• Worldwide Availability
Asia North America Europe
Asia North America Europe
How to get started?
67
How to get started?๏ Documentation
67
How to get started?๏ Documentation
• neo4j.org
67
How to get started?๏ Documentation
• neo4j.org
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
• docs.neo4j.org - tutorials+reference
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
• docs.neo4j.org - tutorials+reference
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
• docs.neo4j.org - tutorials+reference
‣Data Modeling Examples
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
• docs.neo4j.org - tutorials+reference
‣Data Modeling Examples
• http://console.neo4j.org
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
• docs.neo4j.org - tutorials+reference
‣Data Modeling Examples
• http://console.neo4j.org
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
• docs.neo4j.org - tutorials+reference
‣Data Modeling Examples
• http://console.neo4j.org
• Neo4j in Action
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
• docs.neo4j.org - tutorials+reference
‣Data Modeling Examples
• http://console.neo4j.org
• Neo4j in Action
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
• docs.neo4j.org - tutorials+reference
‣Data Modeling Examples
• http://console.neo4j.org
• Neo4j in Action
• Good Relationships
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
• docs.neo4j.org - tutorials+reference
‣Data Modeling Examples
• http://console.neo4j.org
• Neo4j in Action
• Good Relationships
๏ Worldwide one-day Neo4j Trainings
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
• docs.neo4j.org - tutorials+reference
‣Data Modeling Examples
• http://console.neo4j.org
• Neo4j in Action
• Good Relationships
๏ Worldwide one-day Neo4j Trainings
๏ Get Neo4j
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
• docs.neo4j.org - tutorials+reference
‣Data Modeling Examples
• http://console.neo4j.org
• Neo4j in Action
• Good Relationships
๏ Worldwide one-day Neo4j Trainings
๏ Get Neo4j
• http://neo4j.org/download
67
How to get started?๏ Documentation
• neo4j.org
‣http://www.neo4j.org/learn/nosql
• docs.neo4j.org - tutorials+reference
‣Data Modeling Examples
• http://console.neo4j.org
• Neo4j in Action
• Good Relationships
๏ Worldwide one-day Neo4j Trainings
๏ Get Neo4j
• http://neo4j.org/download
• http://addons.heroku.com/neo4j/
67
68
68
Really, once you start thinking in graphs it's hard to stop
Recommendations MDM
Systems Management
Geospatial
Social computing
Business intelligence
Biotechnology
Making Sense of all that data
your brainaccess control
linguistics
catalogs
genealogy routing
compensation market vectors
68
Really, once you start thinking in graphs it's hard to stop
Recommendations MDM
Systems Management
Geospatial
Social computing
Business intelligence
Biotechnology
Making Sense of all that data
your brainaccess control
linguistics
catalogs
genealogy routing
compensation market vectors
What will you build?
Thank You!Questions ?
69