data modeling with neo4j - goto conference data modeling with neo4j 1 stefan armbruster, neo...

64
1 Data Modeling with Neo4j 1 Stefan Armbruster, Neo Technology (slides from Michael Hunger)

Upload: ngodiep

Post on 07-Jul-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

1

Data Modeling with Neo4j

1

Stefan Armbruster, Neo Technology(slides from Michael Hunger)

33

4

is a

4

55

NOSQL

6

Graph Database

6

7

A graph database...

7

NO: not for charts & diagrams, or vector artwork

YES: for storing data that is structured as a graph

remember linked lists, trees?

graphs are the general-purpose data structure

“A relational database may tell you the average age of everyone in this place,

but a graph database will tell you who is most likely to buy you a beer.”

8

You know relational

8

foo barfoo_bar

9

now consider relationships...

9

10

We're talking about aProperty Graph

10

Properties (each a key+value)

+ Indexes (for easy look-ups)

11

Aggregate vs. Connected Data-Model

11

12

12

NOSQL Databases

1313

“There is a significant downside - the whole approach works really well when data access is

aligned with the aggregates, but what if you want to look at the data in a different way? Order entry

naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate

structure. The advantage of not using an aggregate structure in the database is that it

allows you to slice and dice your data different ways for different audiences.

This is why aggregate-oriented stores talk so much about map-reduce.”

Martin Fowler

Aggregate Oriented Model

1414

The connected data model is based on fine grained elements that are richly connected, the emphasis is on extracting many dimensions and

attributes as elements. Connections are cheap and can be used not only

for the domain-level relationships but also for additional structures that allow efficient access for

different use-cases. The fine grained model requires a external scope for mutating operations that ensures Atomicity, Consistency, Isolation and

Durability - ACID also known as Transactions.

Michael Hunger

Connected Data Model

15

Data Modeling

15

16

Why Data Modeling

16

๏What is modeling?

๏Aren‘t we schema free?

๏How does it work in a graph?

๏Where should modeling happen? DB or Application

17

Data Models

17

Model mis-match

Real World Model

Model mis-match

Application Model Database Model

Trinity of models

21

Whiteboard --> Data

21

Andreas

Peter

Emil

Allison

knows

knows knows

knows

// Cypher query - friend of a friendstart n=node(0)match (n)--()--(foaf) return foaf

22

// lookup starting point in an indexSTART n=node:People(name = ‘Andreas’)

You traverse the graph

22

// then traverse to find resultsSTART me=node:People(name = ‘Andreas’MATCH (me)-[:FRIEND]-(friend)-[:FRIEND]-(friend2) RETURN friend2

23

SELECT skills.*, user_skill.* FROM users JOIN user_skill ON users.id = user_skill.user_id JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1

23

START user = node(1) MATCH user -[user_skill]-> skill RETURN skill, user_skill

24

An Example

24

What language do they speak here?

Language Country

What language do they speak here?

Language Country

What language do they speak here?

Language Country

Tables

language_code

language_name

word_count

Language

country_code

country_name

flag_uri

Country

Need to model the relationship

language_code

language_name

word_count

Language

country_code

country_name

flag_uri

language_code

Country

What if the cardinality changes?

language_code

language_name

word_count

country_code

Language

country_code

country_name

flag_uri

Country

Or we go many-to-many?

language_code

language_name

word_count

Language

country_code

country_name

flag_uri

Country

language_code

country_code

LanguageCountry

Or we want to qualify the relationship?

language_code

language_name

word_count

Language

country_code

country_name

flag_uri

Country

language_code

country_code

primary

LanguageCountry

Start talking about Graphs

Explicit Relationship

name

word_count

Language

name

flag_uri

Country

IS_SPOKEN_IN

Relationship Properties

name

word_count

Language

name

flag_uri

Country

IS_SPOKEN_IN

as_primary

What’s different?

language_code

language_name

word_count

Language

country_code

country_name

flag_uri

Country

language_code

country_code

primary

LanguageCountry

IS_SPOKEN_IN

What’s different?๏ Implementation of maintaining relationships is left up

to the database

๏ Artificial keys disappear or are unnecessary

๏ Relationships get an explicit name

• can be navigated in both directions

Relationship specialisation

name

word_count

Language

name

flag_uri

Country

IS_SPOKEN_IN

as_primary

Bidirectional relationships

name

word_count

Language

name

flag_uri

Country

IS_SPOKEN_IN

PRIMARY_LANGUAGE

Weighted relationships

name

word_count

Language

name

flag_uri

Country

POPULATION_SPEAKS

population_fraction

Keep on adding relationships

name

word_count

Language

name

flag_uri

Country

POPULATION_SPEAKS

population_fraction

SIMILAR_TO ADJACENT_TO

EMBRACE the paradigm

Use the building blocks

๏ Nodes

๏ Relationships

๏ Properties name: value

RELATIONSHIP_NAME

Anti-pattern: rich properties

name: “Canada”

languages_spoken: “[ ‘English’, ‘French’ ]”

Normalize Nodes

Anti-Pattern: Node represents multiple concepts

name

flag_uri

language_name

number_of_words

yes_in_language

no_in_language

currency_code

currency_name

Country

USES_CURRENCY

Split up in separate concepts

name

flag_uri

currency_code

currency_name

Country

name

number_of_words

yes

no

Country

SPEAKS

Currency

currency_code

currency_name

Challenge: Property or Relationship?๏ Can every property be replaced by a relationship?

๏ Should every entities with the same property values be connected?

Object Mapping๏ Similar to how you would map objects to a relational

database, using an ORM such as Hibernate

๏ Generally simpler and easier to reason about

๏ Examples

• Java: Spring Data Graph

• Ruby: Active Model

๏ Why Map?

• Do you use mapping because you are scared of SQL?

• Following DDD, could you write your repositories directly against the graph API?

CONNECT for fast access

In-Graph Indices

Relationships for querying๏ like in other databases

• same structure for different use-cases (OLTP and OLAP) doesn‘t work

• graph allows: add more structures

๏ Relationships should the primary means to access nodes in the database

๏ Traversing relationships is cheap – that’s the whole design goal of a graph database

๏ Use lookups only to find starting nodes for a query

Data Modeling examples in Manual

Anti-pattern: unconnected graph

name: “Jones” name: “Jones”

name: “Jones”

name: “Jones”

name: “Jones”

name: “Jones”

name: “Jones” name: “Jones”

name: “Jones”

name: “Jones”

name: “Jones”

53

Pattern: Linked List

53

54

Pattern: Multiple Relationships

54

55

Pattern-Trees: Tags and Categories

55

56

Pattern-Tree: Multi-Level-Tree

56

57

Pattern-Trees: R-Tree (spatial)

57

58

Example: Activity Stream

58

59

Graph Evolution

59

60

Evolution: Relationship to Node

60

SENT_EMAIL

EMAIL_FROMEMAIL_TO

EMAIL

_CC

TAGGED

. . .

see Hyperedges

Combine multiple Domains in a Graph๏ you start with a single domain

๏ add more connected domains as your system evolves

๏ more domains allow to ask different queries

๏ one domain „indexes“ the other

๏ Example Facebook Graph Search

• social graph

• location graph

• activity graph

• favorite graph

• ...

62

Notes on the Graph Data Model๏ Schema free, but constraints

๏Model your graph with a whiteboard and a wise man

๏Nodes as main entities but useless without connections

๏ Relationships are first level citizens in the model and database

๏Normalize more than in a relational database

๏ use meaningful relationship-types, not generic ones like IS_

๏ use in-graph structures to allow different access paths

๏ evolve your graph to your needs, incremental growth

62

68

How to get started?๏ Documentation

• neo4j.org

‣http://www.neo4j.org/learn/nosql

• docs.neo4j.org - tutorials+reference

‣Data Modeling Examples

• http://console.neo4j.org

• Neo4j in Action

• Good Relationships

๏ Worldwide one-day Neo4j Trainings

๏ Get Neo4j

• http://neo4j.org/download

• http://addons.heroku.com/neo4j/

๏ Participate

• http://groups.google.com/group/neo4j

• http://neo4j.meetup.com

• a session like this one ;)

68

69

69

Really, once you start thinking in graphs it's hard to stop

Recommendations MDM

Systems Management

Geospatial

Social computing

Business intelligence

Biotechnology

Making Sense of all that data

your brainaccess control

linguistics

catalogs

genealogy routing

compensation market vectors

What will you build?

70

Thank You!Questions ?

70