an overview of neo4j - uniroma1.itrosati/dmds-1819/neo4j.pdf · neo4j: cypher’s introduction...

27
Data Management for Data Science Master of Science in Data Science Facoltà di Ing. dell'Informazione, Informatica e Statistica Sapienza Università di Roma AA 2018/2019 Domenico Lembo Dipartimento di Ingegneria Informatica, Automatica e Gestionale A. Ruberti An Overview of Neo4j

Upload: others

Post on 25-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

Data Management for Data Science

Master of Science in Data Science

Facoltà di Ing. dell'Informazione, Informatica e Statistica Sapienza Università di Roma

AA 2018/2019

Domenico Lembo Dipartimento di Ingegneria Informatica,

Automatica e Gestionale A. Ruberti

An Overview of Neo4j

Page 2: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

NEO4J:Overview

Neo4j:

•  uses a graph model for data representation.

•  supports full ACID transactions.

•  comes with a powerful, human readable graph query language.

•  provides a powerful traversal framework for high-speed graph queries.

•  can be used in embedded mode (the db is incorporated in the application), or server mode, the db is a process in itself which can be accessed through REST Interface.

•  does not allow for sharding, then the entire graph must be stored in a single machine (at the moment, Neo4j supports cache sharding, which allows for directing queries to instances that only have certain parts of the cache preloaded).

Page 3: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

NEO4J:DataModel

Neo4j is entirely implemented in Java. Neo4j's data model is a Property Graph, consists of labeled nodes and relationships each with properties, that is characterized by the following elements: •  Nodes are just data records, usually denoting entities (e.g., individuals). •  Relationships connect two nodes. •  Properties are simple key-value pairs. Properties can be attached to both nodes

and relationships

Page 4: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

NodesinNEO4J

•  Every node can have different properties

Page 5: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

RelationshipsinNEO4J

•  Every relationship has a direction

Page 6: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

PropertiesinNEO4J

Page 7: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

LabelsinNEO4J

•  Used to represent roles played by objects (said in other terms they indicate categories node objects belong to)

•  Every node can have zero or more labels

Page 8: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

PathsinNEO4J

•  It is one or more nodes with connecting relationships

Page 9: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

TraversalinNEO4J

•  A Traversal is how you query a Graph, navigating from starting nodes to related nodes according to an algorithm.

Page 10: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

NEO4J:ExampleofDataModel•  TomHanksisanActor.

•  RonHowardisaDirector.

•  “TheDaVinciCode”isamovie.

•  DirectorsandActorsarePersons.

•  TomHankshasanactingrolein“TheDaVinciCode”

•  “TheDaVinciCode”isdirectedbyRonHoward

•  TheroleofTomHanksin“TheDaVinciCode”isRobertLangdon

•  TomHanksknowsRonHowardsince1987.

Page 11: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

Example:Nodes

A

B C

Page 12: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

Example:Relationships

A

B C

ACTED_IN

DIRECTED

KNOWS

Page 13: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

Example:Properties

A

B C

name:TomHanks

name:RonHoward title:TheDaVinciCode

KNOWS

since:1987 roles:[RobertLangdon]

ACTED_IN

DIRECTED

Page 14: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

Example:Labels

A

B C

ACTED_IN

DIRECTED

KNOWS

name:TomHanks

since:1987 roles:[RobertLangdon]

name:RonHoward title:TheDaVinciCode

ACTOR

DIRECTOR

PERSON

PERSON

MOVIE

Page 15: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

NEO4J:Storage•  NEO4J uses native graph storage, which is optimized and designed for

storing and managing graphs. Coherently, it adopts a native graph processing: it leverages index-free adjacency, meaning that connected nodes physically “point” to each other in the database.

•  Neo4j integrates an indexing service based on Lucene that allows to store nodes referring to a label, and then access to the iterator of nodes. There are server plugins that allow to automatically index nodes.

•  It is finally provided with an indexing service based on the timestamp that allows to obtain the nodes corresponding to a time and a date included in a certain range

Page 16: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

NEO4J:Cypher’sintroduction

Cypher is a declarative, SQL inspired language for describing patterns in graphs. It allows us to describe what we want to select, insert, update or delete from a graph database without requiring us to describe exactly how to do it. Cypher uses ASCII-Art* to represent patterns. *ASCII-Art is a graphic design technique that uses computers for presentation and consists of pictures pieced together from the 95 printable (from a total of 128) characters defined by the ASCII - American Standard Code for Information Interchange (from Wikipedia)

Page 17: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

NO4J:NodesinCypher

A

B C

(A)(B)(C)

Thetranslationincypheris:

Page 18: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

NEO4J:RelationshipsinCypher

A

B C

ACTED_IN

DIRECTED

KNOWS

(B)-[:DIRECTED]->(C)(A)-[:ACTED_IN]->(C)(A)-[:KNOWS]->(B)

Thetranslationincypheris:

Page 19: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

NO4J:PropertiesinCypher

A

B C

name:TomHanks

name:RonHoward title:TheDaVinciCode

KNOWS

since:1987 roles:[RobertLangdon]

ACTED_IN

DIRECTED

Thetranslationincypheris:

(A{name:"TomHanks"})(B{name:"RonHoward"})(C{title:"TheDaVinciCode"})(A)-[:ACTED_IN{roles:["RobertLangdon"]}]->(C)(A)-[:KNOWS{since:1987}]->(B)

Page 20: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

NEO4J:LabelsinCypher

A

B C

ACTED_IN

DIRECTED

KNOWS

name:TomHanks

since:1987 roles:[RobertLangdon]

name:RonHoward title:TheDaVinciCode

ACTOR

DIRECTOR

PERSON

PERSON

MOVIE

Thetranslationincypheris:

(A:PERSON)(B:PERSON)(C:MOVIE)(A:ACTOR)(B:DIRECTOR)

Page 21: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

NEO4J:Cypher’squerystructureQueryingthegraph•  MATCH:Primarywayofgettingdatafromthedatabase.

WHERE:Filterstheresults.RETURN:Returnsandprojectsresultdata.ORDERBY:Sortsthequeryresult.SKIP/LIMIT:Paginatesthequeryresult.

Updatingthegraph•  CREATE:Createsnodesandrelationships.

DELETE:Removesnodes,relationships.SET:Updatespropertiesandlabels.REMOVE:Removespropertiesandlabels.FOREACH:Performsupdatingactionsonceperelementinalist,e.g.,returnedbyamatch.

Page 22: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

CYPHERSCRIPTCREATE(TheDaVinciCode:Movie{title:'TheDaVinciCode',released:2006,

tagline:'BreakTheCodes'})CREATE(TomH:Person:Actor{name:'TomHanks',born:1956})CREATE(RonH:Person:Director{name:'RonHoward',born:1954})CREATE(TomH)-[:ACTED_IN{roles:['Dr.RobertLangdon']}]->(TheDaVinciCode)CREATE(RonH)-[:DIRECTED]->(TheDaVinciCode)CREATE(TomH)-[:KNOWS{since:1987}]->(RonH)

Page 23: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

EXAMPLEQUERYINCYPHER

ReturnthetitlesofthefilmswhereTomHanksactedinanddirectedbyRonHoward

MATCH(node1)-[:ACTED_IN]->(node2)<-[:DIRECTED]-(node3)WHEREnode1.name="TomHanks"ANDnode3.name="RonHoward"RETURNnode2.titleastitle

MATCH(node1:Person{name:"TomHanks"})-[:ACTED_IN]->(node2)<-[:DIRECTED]-(node3{name:"RonHoward"})RETURNnode2.titleastitle

AlternativeFormulation

Page 24: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

WHERECLAUSE(basics)

YoucanusethebooleanoperatorsAND,OR,XORandNOT

Tofilternodesbylabel,writealabelpredicateaftertheWHEREkeywordusingWHEREn:foo.

MATCH(n)WHEREn.name='Peter'XOR(n.age<30ANDn.name='Timothy')ORNOT(n.name='Timothy'ORn.name='Peter')RETURNn.name,n.age

MATCH(n)WHEREn:SwedishRETURNn.name,n.age

Page 25: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

EXAMPLEUPDATINGinNEO4J

CreateanodePersonforTomHankswithnameattribute:CREATE(n:Person{name:"TomHanks"});

Deleteanodewithnameattribute="TomHanks"ifitexists:MATCH(n{name:"TomHanks"})DELETEn

Updateanodewithnameattribute="TomHanks"withtheattributeage=63:MATCH(n{name:"TomHanks"})SETn.age=63

Page 26: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

OthecommandsinCypher

ID:allowstoretrieveanodewithacertainneo4jassignedidentifiercount(rel/node/prop):addupthenumberofoccurrencesmin(n.prop):getthelowestvaluemax(n.prop):getthehighestvaluesum(n.prop):getthesumofnumericvaluesavg(n.prop):gettheaverageofanumericvalueDISTINCT:removeduplicatescollect(n.prop):collectsallthevaluesintoalistExamples:MATCH (s) WHERE ID(s)=100 RETURN s MATCH (n:Person) RETURN count(*) MATCH (n:Person) RETURN avg(n.age) MATCH (n:Person) RETURN collect(n.born)

Page 27: An Overview of Neo4j - uniroma1.itrosati/dmds-1819/Neo4J.pdf · NEO4J: Cypher’s introduction Cypher is a declarative, SQL inspired language for describing patterns in graphs. It

Credits

TheseSlidesforthemostareadaptedbytheoriginalslideofastudentprojectcarriedoutbyGiulioGanino.Themainbibliographicsourcesusedfortheirpreparationare:www.neo4j.org/ IanRobinson,JimWebber,andEmilEifrem,GraphDatabasesJonasPartner,AleksaVukotic,andNickiWatt.Neo4jinAction.2012