graph database workshop

38
Graph Database Workshop Jeremy Deane h3p://jeremydeane.net (UberConf:Conference)[:HOSTS]>(session:Session) (developer:Person)[:ATTENDS]>(session) (session)[:PROVIDES]>(skill:Skill) (developer)[:LEARNS]>(skill) Cover.cql

Upload: jeremy-deane

Post on 16-Aug-2015

381 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Graph Database workshop

Graph  Database  Workshop  

Jeremy  Deane    h3p://jeremydeane.net  

(UberConf:Conference)-­‐[:HOSTS]-­‐>(session:Session)  (developer:Person)-­‐[:ATTENDS]-­‐>(session)  (session)-­‐[:PROVIDES]-­‐>(skill:Skill)  (developer)-­‐[:LEARNS]-­‐>(skill)  

Cover.cql  

Page 2: Graph Database workshop

Agenda  

 Environment  Setup  

 IntroducBon  

 Fundamentals  

 Architecture  

 Advanced  Concepts  

Generated  with  Graphgen  -­‐  h3p://bit.ly/1HkTP20  

Page 3: Graph Database workshop

Environment  Setup  

①  Download  Neo4j  (2.2.3)  -­‐  h3p://neo4j.com/download/  

②  Install  to  $NEO4J_HOME  

③  Start  Neo4j  (%NEO4J_HOME%/bin\Neo4j  start    or  %NEO4J_HOME%\bin\Neo4j.bat)  

④  Launch  Browser  -­‐  h3p://localhost:7474    

⑤  Default  UID/PW  -­‐  neo4j/neo4  

Cypher  Syntax  HighlighBng:  

  Sublime  2  Package  (Sublime  3  Manual  Install)  

  Vim  Bundle  

  intelliJ  Plug-­‐in  

#Start  Neo4j  Bash    function  neorun()  {  cd  $NEO4J_HOME/bin  ./neo4j  start  cd  $HOME  }  

#Start  Neo4j  Bash    function  neostop()  {  cd  $NEO4J_HOME/bin  ./neo4j  stop  cd  $HOME  }  

Page 4: Graph Database workshop

Workshop  Setup  

①  Clone  or  Download  Github  Repo  -­‐  h3ps://github.com/jtdeane/graph-­‐workshop  

②  Unpack  to  $HOME/$WORKSHOP_HOME  

③  Open  $HOME/$WORKSHOP_HOME/Data  Cheat  Sheet  

④  Bookmark  or  Open  -­‐  h3p://neo4j.com/docs/stable/cypher-­‐refcard/  

⑤  Bookmark  or  Open  -­‐  h3p://neo4j.com/docs/stable/  

Suggested  Naming  ConvenBons     Labels  -­‐  CamelCase     RelaBonships  -­‐  SNAKE_CASE_UPPER_CASE     ProperBes  -­‐  snake_case_lower_case     Indexes  -­‐  snake_case_lower_case  

Page 5: Graph Database workshop

Domain  Model  

PracBBoner  

PaBent  

WORKS_FOR   OrganizaBon  

LOCATION  

TREATED_B

Y  MAINTAINS  

PracBBoner  

PaBent  

TREATED_B

Y  

Page 6: Graph Database workshop

Explore  Web  Console  

//Create  Node  CREATE  (:Practitioner  {name:"Zachary  Smith",  specialty:"General  Medicine"})  

//Retrieve Node MATCH (p:Practitioner) RETURN p

//Update Node MATCH (p) WHERE p.name="Zachary Smith" SET p.specialty="Neurosurgery"

//Retrieve Updated Node MATCH (p:Practitioner) WHERE p.specialty="Neurosurgery" RETURN p.name, p.specialty

//Retrieve Node by ID MATCH (p) WHERE ID(p)=0 RETURN p

//Delete Node By ID MATCH (p) WHERE ID(p)=0 DELETE p

//Merge Node MERGE (p:Practitioner {name:"Zachary Smith"}) ON CREATE SET n.created=timestamp() ON MATCH SET n.updated=timestamp()

Hello.cql  

Page 7: Graph Database workshop

Explore  Web  Console  //Create Node CREATE (:Patient {name:"Tim Smith", birth_date:"1965-06-27", conditions:["Diabetes", "Epilepsy"]})

//Create Relationship Long - Requires Patient Tim Smith and Practitioner Zachary Smith MATCH (p:Practitioner {name:"Zachary Smith"}) MATCH (m:Patient {name:"Tim Smith"}) CREATE (m)-[r:TREATED_BY]->(p) RETURN m, r, p

//Create Relationship Medium - Requires Practitioner Zachary Smith MATCH (p:Practitioner {name:"Zachary Smith"}) CREATE (m:Patient {name:"Holly Goodwin", birth_date:"1991-11-17"})-[r:TREATED_BY]->(p) RETURN m, r, p

//Create Nodes and Relationship Short CREATE (m:Patient {name:"Jackie Bonk", birth_date:"1978-12-15"})-[r:TREATED_BY]->(p:Practitioner {name:"Yuri Zhivago", specialty:"Immunology"}) RETURN m, r, p

//Clean out all Nodes and Relationships (careful!) MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r

Hello.cql  

Page 8: Graph Database workshop

IntroducBon  

Neuron  from  mouse  cerebellum  (160x)  -­‐  h3p://bit.ly/1Ja1VrJ  

Page 9: Graph Database workshop

What  are  Graphs?  

Graph  Theory:  a  graph  is  a  representaBon  of  a  set  of  Objects  where  some  pairs  of  objects  are  connected  by  Links  

Seven  Bridges  of  Königsberg  h3p://bit.ly/1Lv7C66  

Objects  are  Ver$ces  (Nodes)  

Links  are  Edges  (RelaBonships)  

Property  Graph:  Nodes  &  RelaBonships  with  key-­‐value  pairs  (ProperBes)  

Neo4j  Property  Graph:  Nodes  grouped  by  Labels  

Page 10: Graph Database workshop

NoSQL  Landscape  

Sadalage/Fowler  

h3p://amzn.to/1Lv8W8Z  

Column   Key-­‐Value  

Document  

Graph  

Page 11: Graph Database workshop

Graph  –  RelaBonal  Database  Comparison  

RelaBonal  Databases  are  great  for  storing  transac'onal  data  in  tabular  tables  

Graph  Databases  are  great  for  storing  semanBcally  rich  connected  data  in  nodes  and  relaBonships  

Depth& RDB&)me&(ms)& GDB&)me&(ms)& #&records&

2" 16" 10" ~2,500"

3" 30,267" 168" ~110,000"

4" 1,543,505" 1,359" ~600,000"

5" hang" 2,132" ~800,000"

From  “Graph  Databases”  by  Robinson,  Webber  and  Eifrem,  2013,  page  20  

Degrees  of  separaBon  between  you  and  Kevin  Bacon;  RelaBonship  Database  falls  over….  

RelaBonal  Databases  require  considerable  up-­‐front  design  (e.g.  NormalizaBon)  resulBng  in  a  ridged  schema  

Graph  Databases  require  no  schema  and  support  an  emergent  design  approach  

Page 12: Graph Database workshop

Graph  Database  Use  Cases  

Social  (Professional)  Network  

Route  Finding  and  LogisBcs  

Network  and  System  OperaBons  

Security  and  Advanced  AnalyBcs  

h3p://bit.ly/1fYwEOO  

Page 13: Graph Database workshop

Fundamentals  

Custom  Circuit  Board  Design  -­‐  h3p://bit.ly/1Ja4kTb  

Page 14: Graph Database workshop

Domain  Model  

PracBBoner  

PaBent  

TREATED_B

Y  

WORKS_FOR   OrganizaBon  

LOCATION  

MAINTAINS  

PracBBoner  

PaBent  

TREATED_B

Y  

WORKS_FOR   OrganizaBon  

LOCATION  

MAINTAINS  

Page 15: Graph Database workshop

IniBal  Data  Load  

①  Execute  Favorite  “Clean  database  or  nodes  and  relaBonships”  OR  execute:  

②  Import  new  Favorite  “IniBal  Data  Load”  

③  Execute  “IniBal  Data  Load”  OR  

④  Open  data.cql  and  copy  contents  

⑤  Paste  and  execute  in  Web  Console  

//Clean out all Nodes and Relationships (careful!) MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r

Clean.cql  

Page 16: Graph Database workshop

Nodes  

Smith  

(Node)  is  a  thing  or  noun  

(Node)  has  :ProperBes  

{name:  “Zachary  Smith”  specialty:"General  Medicine”}  

(:Label)  groups  (Node)s   :PracBBoner  

//Retrieve a Node with Label Practitioner with a property equal to Zachary Smith MATCH (p:Practitioner) WHERE p.name="Zachary Smith" RETURN p

//Retrieve all Nodes with Label Patient and order by birth date MATCH (m:Patient) RETURN UPPER(m.name), m.birth_date ORDER BY m.birth_date

//Retrieve all Nodes with Label Patient and with diabetes MATCH (m:Patient) WHERE "Diabetes" IN m.conditions RETURN m

//Retrieve all Nodes with Label Patient and without diabetes MATCH (m:Patient) WHERE NOT("Diabetes" IN m.conditions) RETURN m

Fundamentals.cql  

Page 17: Graph Database workshop

RelaBonships  

(:RelaBonship)  describes  how  (Node)s  are  related  

PracBBoner  

PaBent  

TREATED_B

Y  

(:RelaBonship)  are  direcBonal  and  cannot  exist  without  both  (Node)s  

//Retrieve all Nodes with WORKS_AT Relationship MATCH (a)-[r:WORKS_AT]->(b) RETURN a,r,b

(:RelaBonship)  are  verbs  and  can  have  :ProperBes  

{pcp:  true}  

//Retrieve all Nodes with TREATED_BY Relationship with PCP false MATCH (a)-[r:TREATED_BY {pcp:false}]->(b) RETURN a,r,b

//Retrieve all distinct list of Nodes that MAINTAIN a Node MATCH (a)-[:MAINTAINS]->(b) RETURN COUNT(DISTINCT a)

Fundamentals.cql  

Page 18: Graph Database workshop

Modeling  

 Graphs  read  as  natural  language  

Acts  Upon  {Verb}   Object  {Noun}  

Subject  {Noun}  

Graphs  are  modeled  with  Circles,  Boxes  and  Arrows  

Graphs  models  translate  to  Ascii-­‐Art  

MATCH(Identifier:Label)-­‐[Identifier:Relationship]-­‐>(Identifier:Label)  

Graph  modeling  is  very  expressive  and  white  board  friendly  

Modeling  Strategies  –  model  using  Domain  Driven  Design  (DDD)  or  model  by  QuesBons  (e.g.  What  do  want  to  do?)  

h3p://amzn.to/1GUkNKA  

Page 19: Graph Database workshop

Modeling  Guidelines  

•   Do  not  replicate  all  enBty  details  into  Node  ProperBes.  Leverage  a  relaBonal  or  document  database  as  System  of  Record  or  History.  

•   Create  semanBcally  rich  relaBonships  avoiding  words  verbs  like  HAS,  CONTAINS,  or  IS.  

•   When  possible  qualify  relaBonship  with  addiBonal  informaBon  (e.g.  weight,  origin,  or  date  range)  –  “Strengthen  vs.  Atrophy”  

•   Avoid  duplicate  relaBonships  –  (a)-­‐[:likes]-­‐>(b)-­‐[:likes]-­‐>(a)  

•   Use  Linked  Lists  to  increase  performance  (e.g.  head,  previous)  

•   Leverage  intermediate  Node  for  n-­‐ary relaBonships  (e.g.  Sorware,  Version,  Developer,  OrganizaBon)  

Page 20: Graph Database workshop

ApplicaBon  Programming  Interfaces  

REST  Web  Service  API  

Java  Plasorm  Support    

Other  Popular  Languages  (C#,  Ruby,  Python,  PHP)  

Under  the  covers  –  Java  OpBons:  •   Core  API  •   Traversal  Framework  •   Cypher  Query  Language  (CQL)  

Cypher  TransacBonal  HTTP  Endpoint  POST  http://localhost:7474/db/data/transaction/commit  

GET  http://localhost:7474/db/data  

Page 21: Graph Database workshop

HTTP  InteracBons  

①  Install  Postman  Chrome  Plug:  h3p://bit.ly/1NooOJr  (or  similar)  

②  Set  AuthorizaBon  Header  (HTTP  Basic)  

③  Issue  GET  http://localhost:7474/db/data  and  follow  explore  links  

④  Explore  links  (e.g.  GET  http://localhost:7474/db/data/relationship/types)  

⑤  Query  via  HTTP  TransacBonal  Endpoint:    

POST  http://localhost:7474/db/data/transaction/commit  Accept:  application/json  Content-­‐Type:  application/json  {      "statements"  :  [  {          "statement"  :  "MATCH  (p:Practitioner)  WHERE  p.name={name}  RETURN  p",          "parameters"  :  {              "name"  :  "Zachary  Smith"          }      }  ]  }  

Page 22: Graph Database workshop

TesBng  

OpBons:  

•   Manual  tesBng  via  REST  Clients  

•   Unit  TesBng  via  Framework  (e.g.  JUnit)  

•   FuncBonal  TesBng  via  Framework  (e.g.  RobotFramework  or  SoapUI)  

①  Requires  -­‐  h3ps://github.com/jtdeane/graph-­‐workshop    

②  Navigate  to  $HOME/$WORKSHOP_HOME/testing  

③  To  execute  tests  enter  mvn  test  

④  OpBonally  update  Java  to  output  results  to  console  

⑤  Re-­‐execute  tests  enter  mvn  test  

Page 23: Graph Database workshop

Architecture  

h3p://bit.ly/1Ja2npT  

Page 24: Graph Database workshop

Graph  Database  –  Architecture  

Language  APIs  

Caches  

Files  

HA  Support  Logging  

Plug-­‐ins  and  Extensions  

Neo4j  

Java  RunBme  Environment  

Community  &  Enterprise  EdiBon  

Community  is  GPLv3  

Enterprise  EdiBon  relaxes  Consistency  (ACID)  

$NEO4J_HOME    

Page 25: Graph Database workshop

Graph  Database  –  Server  Modes  

Java  RunBme  Environment  

Server  Libraries  

Embedded  Neo4j  

ApplicaBon  

Embedded  Web  Server  

Java  RunBme  Environment  

Server  Libraries  

Neo4j  Server  

Extensions  &  Plug-­‐ins  

External  ApplicaBon  (Client)  

Page 26: Graph Database workshop

Graph  Database  –  Server  Extension  

①  Requires  -­‐  h3ps://github.com/jtdeane/graph-­‐workshop  

②  Navigate  to  $HOME/$WORKSHOP_HOME/extension  

③  Build  the  extension  JAR  -­‐-­‐  graph-­‐extension-­‐1.0.0.jar  

④  Copy  the  JAR  from  ../target  to  $NEO4J_HOME/plugins  

⑤  Register  the  extension  by  updaBng  $NEO4J_HOME/Conf  

⑥  Restart  Neo4j  

⑦  Using  REST  Browser  Client  (e.g.  Postman)  query  pracBBoners  

org.neo4j.server.thirdparty_jaxrs_classes=ws.cogito.graphs=/extensions/  

h3p://localhost:7474/extensions/directory/pracBBoners  

Page 27: Graph Database workshop

Deployment  Topologies  

Single  Community  Server  (Non-­‐Produc$on  Environments)  

Non-­‐Clustered  Community  Servers  –  Cold  Standby  

HA  Clustered  Enterprise  Servers  (Master-­‐Slave)  

Linux VM

<Java Runtime Environment>

Neo4j (Master)

Linux VM

<Java Runtime Environment>

Neo4j (Slave)

Linux VM

<Java Runtime Environment>

Neo4j (Slave)

Enterprise Edition High Availability

Read  Consistent  –  Write  Lock    

Read  Write  Consistent    

Page 28: Graph Database workshop

OperaBons  &  Security  

•   OperaBng  System  and  Server  Process  Monitoring  (e.g.  Zabbix)  

•   Log  Monitoring  and  AlerBng  (e.g.  Splunk  or  Logstash)  

•   Secure  CommunicaBons  via  SSL  

•   Use  HTTP  Basic  AuthenBcaBon  for  Console  and  REST  API  Access  

•   Web  Console  and  REST  API  are  on  the  same  Port  

•   HTTP  Basic  requires  HTTP/S  

•   Graph  Governance  is  up  to  you!  

Page 29: Graph Database workshop

Advanced  Concepts  

Bee  PollinaBon  -­‐  h3p://bit.ly/1HkAa2c  

Page 30: Graph Database workshop

Domain  Model  PracBBoner  

PaBent  

TREATED_B

Y  

WORKS_FOR   OrganizaBon  

LOCATION  

Caregiver  

WORK

S_FO

R  

Page 31: Graph Database workshop

Bulk  Loads  

Batch  API  (transacBonal)  -­‐  POST  http://localhost:7474/db/data/batch  

Batch  Inserter  (by-­‐pass  TransacBons)  –  Java  Only  

ImporBng  Comma  Separated  Values  (CSV)  

//load caregiver nodes LOAD CSV WITH HEADERS FROM "file:///YOUR_LOCATION/CaregiverNodes.csv" AS csvLine CREATE (g:Caregiver {name: csvLine.name, guardian: csvLine.guardian}) RETURN *

//load caregiver patient relationships LOAD CSV WITH HEADERS FROM "file:///YOUR_LOCATION/PatientRelationships.csv" AS csvLine MATCH (giver:Caregiver { name:(csvLine.giver)}), (patient:Patient { name:(csvLine.patient)}) CREATE (giver)-[:CARES_FOR { type:(csvLine.type) }]->(patient) RETURN *

//load caregiver organization relationships LOAD CSV WITH HEADERS FROM "file:///YOUR_LOCATION/OrganizationRelationships.csv" AS csvLine MATCH (giver:Caregiver { name:(csvLine.giver)}), (org:Organization { name:(csvLine.organization)}) CREATE (giver)-[:WORKS_FOR { type:(csvLine.status) }]->(org) RETURN *

Advanced.cql  

Page 32: Graph Database workshop

More  Graph  Queries  

//Find patients who are also a practitioners MATCH (m:Patient), (p:Practitioner) WHERE m.name=p.name RETURN p

//All paths to Lovee Johnson MATCH paths = (m:Patient)-[*]-(node) WHERE m.name="Lovee Johnson" RETURN paths

//Shortest path from Lovee Jonhnson to Florence Nightingale MATCH (m:Patient {name:"Lovee Johnson"}), (g:Caregiver {name:"Florence Nightingale"}), path = shortestPath((m)-[*..10]-(g)) RETURN path

//Patients with more than one practitioner MATCH (patient:Patient)-[:TREATED_BY]->(practitioner) WITH patient, count (practitioner) AS practitioners WHERE practitioners > 1 RETURN patient

//All patients with a PCP having a name ending in ‘y’ ( REGEX) MATCH (m:Patient)-[TREATED_BY {pcp:true}]->(p:Practitioner) WHERE p.name=~ ".*y" RETURN m,p

Java  1.7  Regex  -­‐  h3p://bit.ly/1LEvt3j  

//Return the patients with a family cargiver and their practitioners MATCH (g:Caregiver)-[CARES_FOR {type:"Family"}]->(m:Patient)-[TREATED_BY]->(p:Practitioner) RETURN m, p, g

Advanced.cql  

Page 33: Graph Database workshop

Traversals  

Depth-­‐first  search  (DFS)  –  Default  Neo4j  Behavior  

1  

2  

5   6  

3   4  

8  7  Breadth-­‐first  search  (BFS)  

1,2,5,6,3,4,7,8  

1,2,3,4,5,6,7,8  

Evaluators  –  e.g.  Maximum  Depth    

Filters  –  e.g.  Uniqueness  

Path  Expanders  –  e.g.  DirecBon  

•   REST  API  –  Executes  arbitrary  JavaScript  code  

•   Java  API  –  Require  in-­‐depth  knowledge  of  your  Graph  

Page 34: Graph Database workshop

Indexes  

AutomaBc  Indexing  -­‐  $NEO4J_HOME/conf/neo4j.properties  

# Enable auto-indexing for nodes, default is false. node_auto_indexing=true

# The node property keys to be auto-indexed, if enabled. node_keys_indexable=name

# Enable auto-indexing for relationships, default is false. relationship_auto_indexing=true

# The relationship property keys to be auto-indexed, if enabled. relationship_keys_indexable=pcp,type

Cypher  Index  Commands  

//create index on Patient Label CREATE INDEX ON :Patient(name)

//drop index on Patient Label DROP INDEX ON :Patient(name)

GET  http://localhost:7474/db/data/schema/index/Patient  

Advanced.cql  

Page 35: Graph Database workshop

Constraints  

Create  and  Drop  Constraints  

//create Unique Practitioner constraint CREATE CONSTRAINT ON (practitioner:Practitioner) ASSERT practitioner.name IS UNIQUE

//attempt to create duplicate Practitioner - should fail CREATE (McCoy:Practitioner {name:"Leonard McCoy", specialty:"General Medicine"})

//drop Unique Practitioner constraint DROP CONSTRAINT ON (practitioner:Practitioner) ASSERT practitioner.name IS UNIQUE

No  way  to  retrieve  list  of  Indexes  or  Constraints  via  Cypher  (yet)  

GET  http://localhost:7474/db/data/schema/constraint/Practitioner  

Advanced.cql  

Page 36: Graph Database workshop

VisualizaBon  

Neo4j  Web  Console  http://localhost:7474/browser  

Data  Driven  Documents  (D3.js)  http://d3js.org/  

Alchemy.js  http://bit.ly/1NwH7fB  

Linkurious.js  http://linkurio.us/toolkit/    

VivaGaph.js  https://github.com/anvaka/VivaGraphJS    

Boston  Hubway  Graph  -­‐By  Max  De  Marzi  

ExecuBon  from  Scripts  <script>  or  Node.JS    

Require  data  transformaBon  (e.g.  Nodes  and  RelaBonship  Arrays)  

Page 37: Graph Database workshop

QuesBons  &  Feedback  QuesBons  &  Feedback  

My  Contact  informaEon:  

Jeremy  Deane    Director  of  Sorware  Architecture  NaviNet  [email protected]  h3p://jeremydeane.net  

h3ps://github.com/jtdeane/graph-­‐workshop  

Page 38: Graph Database workshop

Supplemental    //Aggregate all providers MATCH (c:Caregiver) RETURN c.name AS names UNION MATCH (p:Practitioner) RETURN p.name AS names

Supplemental.cql  

//Practitioners with patient counts MATCH (m:Patient) -[:TREATED_BY]-> (p:Practitioner) WITH p, COUNT(m) AS patients RETURN p.name, patients

//Patients with provider counts (Practitioner and/or Care Giver) MATCH (m:Patient) -[:TREATED_BY|:CARES_FOR]- (r) WITH DISTINCT (m), COUNT(r) AS providers RETURN m.name, providers

//All Patients with Caregiver (and without = null) MATCH (m:Patient) OPTIONAL MATCH (m) <-[:CARES_FOR]- (c:Caregiver) RETURN m.name, COALESCE(c.name,"INDEPENDENT")

//Profile simple query PROFILE MATCH (p:Practitioner) WHERE p.name="Zachary Smith" RETURN p

//Profile complex query PROFILE MATCH (m:Patient), (p:Practitioner) WHERE m.name=p.name RETURN p