20141216 graph database prototyping ams meetup

Post on 12-Jul-2015

6.592 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Graph  Database  Prototyping  

@  AMS  GraphDB  

meetup  

Agenda  for  Tonight  

•  Building  a  Graph  Database  Prototype  •  3  parts  – Graph  database  &  modeling  concepts  – Prototyping  tools  &  import  – Graph  querying  with  Cypher  

Data  Modeling  With  Neo4j  

Topics  

•  Graph  model  building  blocks  •  Quick  intro  to  Cypher  •  Example  modeling  process  •  Modeling  Eps  •  Recipes  for  common  modeling  scenarios  •  Refactoring  •  Test-­‐driven  data  modeling  

Graph  Model  Building  Blocks  

Property  Graph  Data  Model  

Four  Building  Blocks  

•  Nodes  •  RelaEonships  •  ProperEes  •  Labels  

Nodes  

Nodes  

•  Used  to  represent  en##es  and  complex  value  types  in  your  domain  

•  Can  contain  properEes  – Used  to  represent  enEty  a1ributes  and/or  metadata  (e.g.  Emestamps,  version)  

– Key-­‐value  pairs  •  Java  primiEves  •  Arrays  •  null  is  not  a  valid  value  

– Every  node  can  have  different  properEes  

EnEEes  and  Value  Types  

•  EnEEes  – Have  unique  conceptual  idenEty  – Change  aWribute  values,  but  idenEty  remains  the  same  

•  Value  types  – No  conceptual  idenEty  – Can  subsEtute  for  each  other  if  they  have  the  same  value  •  Simple:  single  value  (e.g.  colour,  category)  •  Complex:  mulEple  aWributes  (e.g.  address)  

RelaEonships  

RelaEonships  

•  Every  relaEonship  has  a  name  and  a  direc#on  – Add  structure  to  the  graph  – Provide  semanEc  context  for  nodes  

•  Can  contain  properEes  – Used  to  represent  quality  or  weight  of  relaEonship,  or  metadata  

•  Every  relaEonship  must  have  a  start  node  and  end  node  – No  dangling  relaEonships  

RelaEonships  (conEnued)  

Nodes  can  have  more  than  one  relaEonship  

Self  relaEonships  are  allowed  

Nodes  can  be  connected  by  more  than  one  relaEonship  

Variable  Structure  

•  RelaEonships  are  defined  with  regard  to  node  instances,  not  classes  of  nodes  – Two  nodes  represenEng  the  same  kind  of  “thing”  can  be  connected  in  very  different  ways  •  Allows  for  structural  variaEon  in  the  domain  

– Contrast  with  relaEonal  schemas,  where  foreign  key  relaEonships  apply  to  all  rows  in  a  table  •  No  need  to  use  null  to  represent  the  absence  of  a  connecEon    

Labels  

Labels  

•  Every  node  can  have  zero  or  more  labels  •  Used  to  represent  roles  (e.g.  user,  product,  company)  – Group  nodes  – Allow  us  to  associate  indexes  and  constraints  with  groups  of  nodes  

Four  Building  Blocks  

•  Nodes  – EnEEes  

•  RelaEonships  – Connect  enEEes  and  structure  domain  

•  ProperEes  – EnEty  aWributes,  relaEonship  qualiEes,  and  metadata  

•  Labels  – Group  nodes  by  role  

Designing  a  Graph  Model  

Models  

Images:  en.wikipedia.org  

Purposeful  abstracEon  of  a  domain  designed  to  saEsfy  parEcular  applicaEon/end-­‐user  goals  

Design  for  Queryability  

Model  Query  

Method  

1.  IdenEfy  applicaEon/end-­‐user  goals  2.  Figure  out  what  quesEons  to  ask  of  the  domain  3.  IdenEfy  enEEes  in  each  quesEon  4.  IdenEfy  relaEonships  between  enEEes  in  each  

quesEon  5.  Convert  enEEes  and  relaEonships  to  paths  –  These  become  the  basis  of  the  data  model  

6.  Express  quesEons  as  graph  paWerns  –  These  become  the  basis  for  queries  

ApplicaEon/End-­‐User  Goals  

As  an  employee    

I  want  to  know  who  in  the  company  has  similar  skills  to  me    

So  that  we  can  exchange  knowledge  

QuesEons  To  Ask  of  the  Domain  

Which  people,  who  work  for  the  same  company  as  me,  have  similar  skills  to  me?  

As  an  employee    I  want  to  know  who  in  the  company  has  similar  skills  to  me    So  that  we  can  exchange  knowledge  

IdenEfy  EnEEes  

Which  people,  who  work  for  the  same  company  as  me,  have  similar  skills  to  me?    Person  Company  Skill  

IdenEfy  RelaEonships  Between  EnEEes  

Which  people,  who  work  for  the  same  company  as  me,  have  similar  skills  to  me?    Person  WORKS_FOR  Company  Person  HAS_SKILL  Skill  

Convert  to  Cypher  Paths  

Person  WORKS_FOR  Company  Person  HAS_SKILL  Skill  

RelaEonship  

Label  

(:Person)-[:WORKS_FOR]->(:Company),(:Person)-[:HAS_SKILL]->(:Skill)

Consolidate  Paths  

(:Person)-[:WORKS_FOR]->(:Company),(:Person)-[:HAS_SKILL]->(:Skill)

(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

Create  Person  Subgraph  

MERGE (c:Company{name:'Acme'})MERGE (p:Person{name:'Ian'})MERGE (s1:Skill{name:'Java'})MERGE (s2:Skill{name:'C#'})MERGE (s3:Skill{name:'Neo4j'})CREATE UNIQUE (c)<-[:WORKS_FOR]-(p), (p)-[:HAS_SKILL]->(s1), (p)-[:HAS_SKILL]->(s2), (p)-[:HAS_SKILL]->(s3)RETURN c, p, s1, s2, s3

Candidate  Data  Model  

(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

Express  QuesEon  as  Graph  PaWern  

Which  people,  who  work  for  the  same  company  as  me,  have  similar  skills  to  me?  

Cypher  Query  

Which  people,  who  work  for  the  same  company  as  me,  have  similar  skills  to  me?  MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

Graph  PaWern  

Which  people,  who  work  for  the  same  company  as  me,  have  similar  skills  to  me?  MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

Which  people,  who  work  for  the  same  company  as  me,  have  similar  skills  to  me?  MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

Anchor  PaWern  in  Graph  

If  an  index  for  Person.name  exists,  Cypher  will  use  it  

Create  ProjecEon  of  Results  

Which  people,  who  work  for  the  same  company  as  me,  have  similar  skills  to  me?  MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

First  Match  

Second  Match  

Third  Match  

Running  the  Query  

+-----------------------------------+| name | score | skills |+-----------------------------------+| "Lucy" | 2 | ["Java","Neo4j"] || "Bill" | 1 | ["Neo4j"] |+-----------------------------------+2 rows

From  User  Story  to  Model  and  Query  

MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

As  an  employee    I  want  to  know  who  in  the  company  has  similar  skills  to  me    So  that  we  can  exchange  knowledge  

(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

Person  WORKS_FOR  Company  Person  HAS_SKILL  Skill

?Which  people,  who  work  for  the  same  company  as  me,  have  similar  skills  to  me?

Modeling  Tips  

ProperEes  Versus  RelaEonships  

Use  RelaEonships  When…  

•  You  need  to  specify  the  weight,  strength,  or  some  other  quality  of  the  rela#onship  

•  AND/OR  the  aWribute  value  comprises  a  complex  value  type  (e.g.  address)  

•  Examples:  –  Find  all  my  colleagues  who  are  expert  (relaEonship  quality)  at  a  skill  (aWribute  value)  we  have  in  common  

–  Find  all  recent  orders  delivered  to  the  same  delivery  address  (complex  value  type)  

Use  ProperEes  When…  

•  There’s  no  need  to  qualify  the  relaEonship  •  AND  the  aWribute  value  comprises  a  simple  value  type  (e.g.  colour)  

•  Examples:  – Find  those  projects  wriWen  by  contributors  to  my  projects  that  use  the  same  language  (aWribute  value)  as  my  projects  

If  Performance  is  CriEcal…  

•  Small  property  lookup  on  a  node  will  be  quicker  than  traversing  a  relaEonship  – But  traversing  a  relaEonship  is  sEll  faster  than  a  SQL  join…  

•  However,  many  small  proper#es  on  a  node,  or  a  lookup  on  a  large  string  or  large  array  property  will  impact  performance  – Always  performance  test  against  a  representaEve  dataset  

RelaEonship  Granularity  

Align  With  Use  Cases  

•  RelaEonships  are  the  “royal  road”  into  the  graph  

•  When  querying,  well-­‐named  relaEonships  help  discover  only  what  is  absolutely  necessary  – And  eliminate  unnecessary  porEons  of  the  graph  from  consideraEon  

General  RelaEonships  

•  Qualified  by  property  

Specific  RelaEonships  

Best  of  Both  Worlds  

Model  and  Query  Recipes  

Events  and  AcEons  

•  Oken  involve  mulEple  parEes  •  Can  include  other  circumstanEal  detail,  which  may  be  common  to  mulEple  events  

•  Examples  – Patrick  worked  for  Acme  from  2001  to  2005  as  a  Sokware  Developer  

– Sarah  sent  an  email  to  Lucy,  copying  in  David  and  Claire  

Timeline  Trees  

•  Discrete  events  – No  natural  relaEonships  to  other  events  

•  You  need  to  find  events  at  differing  levels  of  granularity  – Between  two  days  – Between  two  months  – Between  two  minutes  

Example  Timeline  Tree  

Pimalls  and  AnE-­‐PaWerns  

Modeling  EnEEes  as  RelaEonships  

•  Limits  data  model  evoluEon  – A  relaEonship  connects  two  things  – Modeling  an  enEty  as  a  relaEonship  prevents  it  from  being  related  to  more  than  two  things  

•  Smells:  – Lots  of  aWribute-­‐like  properEes  – Heavy  use  of  relaEonship  indexes  

•  EnEEes  hidden  in  verbs:  – E.g.  emailed,  reviewed  

 

Example:  Movie  Reviews  

•  IniEal  requirements:  – People  review  films  – ApplicaEon  aggregates  reviews  from  mulEple  sites  

IniEal  Model  

New  Requirements  

•  Allow  user  to  comment  on  each  other’s  reviews  – Can’t  connect  a  review  to  a  third  enEty  

Revised  model  

Model  AcEons  in  Terms  of  Products  

Now    for    

Some  Prototyping!  

Draw  a  Model!  

Eg.  Using  Visio,  www.apcjones.com/arrows,  hWp://graphjson.io,  Omnigraffle  

CreaEng  a  prototype  DB  out  of  our  model?  

Now  for  Some  

Queries!  

Next  meetup!  

•  January  22nd  :  how  to  create  an  APPLICATION  on  top  of  our  newly  created  database  

BACKUP  slides:    Cypher  Query  Language  

Nodes  and  RelaEonships  

()-->()

Labels  and  RelaEonship  Types  

(:Person)-[:FRIEND]->(:Person)

ProperEes  

(:Person{name:'Peter'})-[:FRIEND]->(:Person{name:'Lucy'})

IdenEfiers  

(p1:Person{name:'Peter'})-[r:FRIEND]->(p2:Person{name:'Lucy'})

Cypher  

MATCH graph_patternWHERE binding_and_filter_criteriaRETURN results

Cypher  

MATCH (p:Person)-[:FRIEND]->(friends)WHERE p.name = 'Peter'RETURN friends

Lookup  Using  IdenEfier  +  Label    

MATCH (p:Person)-[:FRIEND]->(friends)WHERE p.name = 'Peter'RETURN friends

top related