the neo4j manual v2.3.0
TRANSCRIPT
The Neo4j Manual v230
The Neo4j Team neo4jcom1
1 httpneo4jcom
The Neo4j Manual v230by The Neo4j Team neo4jcom1
Publication date 2015-10-16 232046Copyright copy 2015 Neo Technology
Starting points
bull What is the Neo4j graph databasebull Cypher Query Languagebull REST APIbull Installationbull Upgradingbull Securitybull Resources
License Creative Commons 30This book is presented in open source and licensed through Creative Commons 30 You are free to copy distribute transmit andoradapt the work This license is based upon the following conditions
Attribution You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests thatthey endorse you or your use of the work)
Share Alike If you alter transform or build upon this work you may distribute the resulting work only under the same similar ora compatible license
Any of the above conditions can be waived if you get permission from the copyright holder
In no way are any of the following rights affected by the license
bull Your fair dealing or fair use rightsbull The authorrsquos moral rightsbull Rights other persons may have either in the work itself or in how the work is used such as publicity or privacy rights
NoteFor any reuse or distribution you must make clear to the others the license terms of this work The best way to do this iswith a direct link to this page httpcreativecommonsorglicensesby-sa30 2
1 httpneo4jcom2 httpcreativecommonsorglicensesby-sa30
iv
Preface vI Introduction 1
1 Neo4j Highlights 32 Graph Database Concepts 4
II Tutorials 143 Introduction to Cypher 164 Use Cypher in an application 465 Basic Data Modeling Examples 476 Advanced Data Modeling Examples 627 Languages 96
III Cypher Query Language 1028 Introduction 1059 Syntax 11810 General Clauses 13611 Reading Clauses 15512 Writing Clauses 18613 Functions 21414 Schema 24315 Query Tuning 25316 Execution Plans 259
IV Reference 27717 Capabilities 27918 Transaction Management 28519 Data Import 29520 Graph Algorithms 29621 REST API 29722 Deprecations 433
V Operations 43423 Installation amp Deployment 43624 Configuration amp Performance 44825 High Availability 47226 Backup 49527 Security 50028 Monitoring 506
VI Tools 52729 Import tool 52930 Web Interface 54131 Neo4j Shell 542
VII Advanced Usage 55832 Extending the Neo4j Server 56033 Using Neo4j embedded in Java applications 57334 The Traversal Framework 60935 Legacy Indexing 61736 Batch Insertion 632
Terminology 636A Resources 640B Manpages 641
neo4j 642neo4j-shell 643neo4j-import 644neo4j-backup 646neo4j-arbiter 647
v
Preface
This is the reference manual for Neo4j version 230 authored by the Neo4j Team
The main parts of the manual are
bull Part I ldquoIntroductionrdquo [1] mdash introducing graph database concepts and Neo4jbull Part II ldquoTutorialsrdquo [14] mdash learn how to use Neo4jbull Part III ldquoCypher Query Languagerdquo [102] mdash details on the Cypher query languagebull Part IV ldquoReferencerdquo [277] mdash detailed information on Neo4jbull Part V ldquoOperationsrdquo [434] mdash how to install and maintain Neo4jbull Part VI ldquoToolsrdquo [527] mdash guides on toolsbull Part VII ldquoAdvanced Usagerdquo [558] mdash using Neo4j in more advanced waysbull Terminology [636] mdash terminology about graph databasesbull Appendix A Resources [640] mdash find additional documentation resourcesbull Appendix B Manpages [641] mdash command line documentation
The material is practical technical and focused on answering specific questions It addresses howthings work what to do and what to avoid to successfully run Neo4j in a production environment
The goal is to be thumb-through and rule-of-thumb friendly
Each section should stand on its own so you can hop right to whatever interests you When possiblethe sections distill ldquorules of thumbrdquo which you can keep in mind whenever you wander out of the housewithout this manual in your back pocket
The included code examples are executed when Neo4j is built and tested Also the REST API requestand response examples are captured from real interaction with a Neo4j server Thus the examples arealways in sync with how Neo4j actually works
Therersquos other documentation resources besides the manual as well see Appendix A Resources [640]
Who should read this
The topics should be relevant to architects administrators developers and operations personnel
Where to get help
You can learn a lot about Neo4j at different events To get information on upcoming Neo4j events havea look here
bull httpneo4jcomeventsbull httpneo4jmeetupcom
Get help from the Neo4j open source community here are some starting points
bull The neo4j tag at stackoverflow httpstackoverflowcomquestionstaggedneo4jbull Neo4j Discussions httpsgroupsgooglecomforumforumneo4jbull Twitter httpstwittercomneo4j
Report a bug or add a feature request
bull httpsgithubcomneo4jneo4jissues
Questions regarding the documentation The Neo4j Manual is published online with a commentfunction please use that to post any questions or comments regarding the documentation
If you want to contribute to the Neo4j open source project see httpneo4jcomdevelopercontribute
Part I IntroductionThis part gives a birdrsquos eye view of what a graph database is and also outlines some specifics of Neo4j
2
1 Neo4j Highlights 32 Graph Database Concepts 4
21 The Neo4j Graph Database 522 Comparing Database Models 11
3
Chapter 1 Neo4j Highlights
As a robust scalable and high-performance database Neo4j is suitable for full enterprise deployment
It features
bull true ACID transactionsbull high availabilitybull scales to billions of nodes and relationshipsbull high speed querying through traversalsbull declarative graph query language
Proper ACID behavior is the foundation of data reliability Neo4j enforces that all operations thatmodify data occur within a transaction guaranteeing consistent data This robustness extendsfrom single instance embedded graphs to multi-server high availability installations For details seeChapter 18 Transaction Management [285]
Reliable graph storage can easily be added to any application A graph can scale in size and complexityas the application evolves with little impact on performance Whether starting new development oraugmenting existing functionality Neo4j is only limited by physical hardware
A single server instance can handle a graph of billions of nodes and relationships When datathroughput is insufficient the graph database can be distributed among multiple servers in a highavailability configuration See Chapter 25 High Availability [472] to learn more
The graph database storage shines when storing richly-connected data Querying is performed throughtraversals which can perform millions of traversal steps per second A traversal step resembles a join ina RDBMS
4
Chapter 2 Graph Database Concepts
This chapter contains an introduction to the graph data model and also compares it to other datamodels used when persisting data
Graph Database Concepts
5
21 The Neo4j Graph DatabaseA graph database stores data in a graph the most generic of data structures capable of elegantlyrepresenting any kind of data in a highly accessible way
For terminology around graph databases see Terminology [636]
Herersquos an example graph which we will approach step by step in the following sections
Person
nam e = Tom Hanksborn = 1956
Movie
t it le = Forrest Gum preleased = 1994
ACTED_INroles = [ Forrest ]
Person
nam e = Robert Zem eckisborn = 1951
DIRECTED
Nodes
A graph records data in nodes and relationships Both can have properties This issometimes referred to as the Property Graph Model
The fundamental units that form a graph are nodes and relationships In Neo4j both nodes andrelationships can contain properties
Nodes are often used to represent entities but depending on the domain relationships may be used forthat purpose as well
Apart from properties and relationships nodes can also be labeled with zero or more labels
The simplest possible graph is a single Node A Node can have zero or more named values referred toas properties Letrsquos start out with one node that has a single property named title
t it le = Forrest Gum p
The next step is to have multiple nodes Letrsquos add two more nodes and one more property on the nodein the previous example
nam e = Tom Hanksborn = 1956
t it le = Forrest Gum preleased = 1994
nam e = Robert Zem eckisborn = 1951
Relationships
Relationships organize the nodes by connecting them A relationship connects twonodes mdash a start node and an end node Just like nodes relationships can have properties
Relationships between nodes are a key part of a graph database They allow for finding related dataJust like nodes relationships can have properties
A relationship connects two nodes and is guaranteed to have valid start and end nodes
Graph Database Concepts
6
Relationships organize nodes into arbitrary structures allowing a graph to resemble a list a treea map or a compound entity mdash any of which can be combined into yet more complex richly inter-connected structuresOur example graph will make a lot more sense once we add relationships to it
nam e = Tom Hanksborn = 1956
t it le = Forrest Gum preleased = 1994
ACTED_INroles = [ Forrest ]
nam e = Robert Zem eckisborn = 1951
DIRECTED
Our example uses ACTED_IN and DIRECTED as relationship types The roles property on the ACTED_INrelationship has an array value with a single item in it
Below is an ACTED_IN relationship with the Tom Hanks node as start node and Forrest Gump as end node
nam e = Tom Hanksborn = 1956
t it le = Forrest Gum preleased = 1994
ACTED_INroles = [ Forrest ]
You could also say that the Tom Hanks node has an outgoing relationship while the Forrest Gump node hasan incoming relationship
Relationships are equally well traversed in either directionThis means that there is no need to add duplicate relationships in the opposite direction(with regard to traversal or performance)
While relationships always have a direction you can ignore the direction where it is not useful in yourapplicationNote that a node can have relationships to itself as well
nam e = Tom Hanksborn = 1956 KNOWS
The example above would mean that Tom Hanks KNOWS himselfTo further enhance graph traversal all relationships have a relationship typeLetrsquos have a look at what can be found by simply following the relationships of a node in our examplegraph
nam e = Tom Hanksborn = 1956
t it le = Forrest Gum preleased = 1994
ACTED_INroles = [ Forrest ]
nam e = Robert Zem eckisborn = 1951
DIRECTED
Graph Database Concepts
7
Using relationship direction and typeWhat we want to know Start from Relationship type Direction
get actors in movie movie node ACTED_IN incoming
get movies with actor person node ACTED_IN outgoing
get directors of movie movie node DIRECTED incoming
get movies directed by person node DIRECTED outgoing
PropertiesBoth nodes and relationships can have properties
Properties are named values where the name is a string The supported property values are
bull Numeric valuesbull String valuesbull Boolean valuesbull Collections of any other type of value
NULL is not a valid property valueNULLs can instead be modeled by the absence of a key
For further details on supported property values see Section 333 ldquoProperty valuesrdquo [582]
LabelsLabels assign roles or types to nodes
A label is a named graph construct that is used to group nodes into sets all nodes labeled with thesame label belongs to the same set Many database queries can work with these sets instead of thewhole graph making queries easier to write and more efficient to execute A node may be labeled withany number of labels including none making labels an optional addition to the graphLabels are used when defining constraints and adding indexes for properties (see the section calledldquoSchemardquo [9])
An example would be a label named User that you label all your nodes representing users with Withthat in place you can ask Neo4j to perform operations only on your user nodes such as finding allusers with a given nameHowever you can use labels for much more For instance since labels can be added and removedduring runtime they can be used to mark temporary states for your nodes You might create an Offlinelabel for phones that are offline a Happy label for happy pets and so on
In our example wersquoll add Person and Movie labels to our graph
Person
nam e = Tom Hanksborn = 1956
Movie
t it le = Forrest Gum preleased = 1994
ACTED_INroles = [ Forrest ]
Person
nam e = Robert Zem eckisborn = 1951
DIRECTED
Graph Database Concepts
8
A node can have multiple labels letrsquos add an Actor label to the Tom Hanks node
PersonActor
nam e = Tom Hanksborn = 1956
Label namesAny non-empty Unicode string can be used as a label name In Cypher you may need to use thebacktick (`) syntax to avoid clashes with Cypher identifier rules or to allow non-alphanumeric charactersin a label By convention labels are written with CamelCase notation with the first letter in upper caseFor instance User or CarOwner
Labels have an id space of an int meaning the maximum number of labels the database can contain isroughly 2 billion
Traversal
A traversal navigates through a graph to find paths
A traversal is how you query a graph navigating from starting nodes to related nodes finding answersto questions like ldquowhat music do my friends like that I donrsquot yet ownrdquo or ldquoif this power supply goesdown what web services are affectedrdquo
Traversing a graph means visiting its nodes following relationships according to some rules In mostcases only a subgraph is visited as you already know where in the graph the interesting nodes andrelationships are found
Cypher provides a declarative way to query the graph powered by traversals and other techniques SeePart III ldquoCypher Query Languagerdquo [102] for more information
When writing server plugins or using Neo4j embedded Neo4j provides a callback based traversal APIwhich lets you specify the traversal rules At a basic level therersquos a choice between traversing breadth-or depth-first
If we want to find out which movies Tom Hanks acted in according to our tiny example database thetraversal would start from the Tom Hanks node follow any ACTED_IN relationships connected to the nodeand end up with Forrest Gump as the result (see the dashed lines)
Person
nam e = Tom Hanksborn = 1956
Movie
t it le = Forrest Gum preleased = 1994
ACTED_INroles = [ Forrest ]
Person
nam e = Robert Zem eckisborn = 1951
DIRECTED
Paths
A path is one or more nodes with connecting relationships typically retrieved as a queryor traversal result
In the previous example the traversal result could be returned as a path
Graph Database Concepts
9
Person nam e = Tom Hanksborn = 1956 Movie t it le = Forrest Gum p
released = 1994
ACTED_INroles = [ Forrest ]
The path above has length one
The shortest possible path has length zero mdash that is it contains only a single node and norelationships mdash and can look like this
Person
nam e = Tom Hanksborn = 1956
This path has length one
Person
nam e = Tom Hanksborn = 1956
KNOWS
Schema
Neo4j is a schema-optional graph database
You can use Neo4j without any schema Optionally you can introduce it in order to gain performance ormodeling benefits This allows a way of working where the schema does not get in your way until youare at a stage where you want to reap the benefits of having one
NoteSchema commands can only be applied on the master machine in a Neo4j cluster (seeChapter 25 High Availability [472]) If you apply them on a slave you will receive aNeoClientErrorTransactionInvalidType error code (see Section 212 ldquoNeo4j StatusCodesrdquo [307])
Indexes
Performance is gained by creating indexes which improve the speed of looking up nodesin the database
NoteThis feature was introduced in Neo4j 20 and is not the same as the legacy indexes (seeChapter 35 Legacy Indexing [617])
Once yoursquove specified which properties to index Neo4j will make sure your indexes are kept up to dateas your graph evolves Any operation that looks up nodes by the newly indexed properties will see asignificant performance boost
Indexes in Neo4j are eventually available That means that when you first create an index the operationreturns immediately The index is populating in the background and so is not immediately available forquerying When the index has been fully populated it will eventually come online That means that it isnow ready to be used in queries
If something should go wrong with the index it can end up in a failed state When it is failed it will notbe used to speed up queries To rebuild it you can drop and recreate the index Look at logs for cluesabout the failure
Graph Database Concepts
10
You can track the status of your index by asking for the index state through the API you are using Notehowever that this is not yet possible through Cypher
How to use indexes through the different APIs
bull Cypher Section 141 ldquoIndexesrdquo [244]bull REST API Section 2115 ldquoIndexingrdquo [367]bull Listing Indexes via Shell the section called ldquoListing Indexes and Constraintsrdquo [551]bull Java Core API Section 334 ldquoUser database with indexesrdquo [583]
Constraints
NoteThis feature was introduced in Neo4j 20
Neo4j can help you keep your data clean It does so using constraints that allow you to specify therules for what your data should look like Any changes that break these rules will be denied
In this version unique constraints is the only available constraint type
How to use constraints through the different APIs
bull Cypher Section 142 ldquoConstraintsrdquo [247]bull REST API Section 2116 ldquoConstraintsrdquo [369]bull Listing Constraints via Shell the section called ldquoListing Indexes and Constraintsrdquo [551]
Graph Database Concepts
11
22 Comparing Database ModelsA graph database stores data structured in the nodes and relationships of a graph How does thiscompare to other persistence models Because a graph is a generic structure letrsquos compare how a fewmodels would look in a graph
A Graph Database transforms a RDBMSTopple the stacks of records in a relational database while keeping all the relationships and yoursquoll seea graph Where an RDBMS is optimized for aggregated data Neo4j is optimized for highly connecteddata
Figure 21 RDBMS
A1
A2
A3
B1
B2
B3
B4
B5
B6
B7
C1
C2
C3
Figure 22 Graph Database as RDBMS
A1
B1B2
A2
B4B6
A3
B3B5 B7
C1 C2C3
A Graph Database elaborates a Key-Value StoreA Key-Value model is great for lookups of simple values or lists When the values are themselvesinterconnected yoursquove got a graph Neo4j lets you elaborate the simple data structures into morecomplex interconnected data
Graph Database Concepts
12
Figure 23 Key-Value Store
K1
K2
K3
V1
K2
V2
K1
K3
V3
K1
K represents a key V a value Note that some keys point to other keys as well as plain values
Figure 24 Graph Database as Key-Value Store
V1
V2
V3K1
K2
K3
A Graph Database relates Column-FamilyColumn Family (BigTable-style) databases are an evolution of key-value using families to allowgrouping of rows Stored in a graph the families could become hierarchical and the relationshipsamong data becomes explicit
A Graph Database navigates a Document StoreThe container hierarchy of a document database accommodates nice schema-free data that can easilybe represented as a tree Which is of course a graph Refer to other documents (or document elements)within that tree and you have a more expressive representation of the same data When in Neo4j thoserelationships are easily navigable
Graph Database Concepts
13
Figure 25 Document Store
D1
S1
D2
S2S3
V1D2S2 V2V3V4D1S1
D=Document S=Subdocument V=Value D2S2 = reference to subdocument in (other) document
Figure 26 Graph Database as Document Store
D1
S1D2 S2S3
V1
V2
V3
V4
Part II TutorialsThe tutorial part describes how use Neo4j It takes you from Hello World to advanced usage of graphs
15
3 Introduction to Cypher 1631 Background and Motivation 1732 Graphs Patterns and Cypher 1833 Patterns in Practice 2134 Getting the Results You Want 2635 How to Compose Large Statements 3036 Labels Constraints and Indexes 3237 Loading Data 3438 Utilizing Data Structures 3739 Cypher vs SQL 40
4 Use Cypher in an application 465 Basic Data Modeling Examples 47
51 Movie Database 4852 Social Movie Database 5053 Finding Paths 5254 Linked Lists 5655 TV Shows 58
6 Advanced Data Modeling Examples 6261 ACL structures in graphs 6362 Hyperedges 6763 Basic friend finding based on social neighborhood 6964 Co-favorited places 7065 Find people based on similar favorites 7266 Find people based on mutual friends and groups 7367 Find friends based on similar tagging 7468 Multirelational (social) graphs 7569 Implementing newsfeeds in a graph 76610 Boosting recommendation results 79611 Calculating the clustering coefficient of a network 80612 Pretty graphs 81613 A multilevel indexing structure (path tree) 85614 Complex similarity computations 89615 The Graphity activity stream model 90616 User roles in graphs 92
7 Languages 9671 How to use the REST API from Java 97
16
Chapter 3 Introduction to Cypher
This friendly guide will introduce you to Cypher Neo4jrsquos query language
The guide will help you
bull start thinking about graphs and patternsbull apply this knowledge to simple problemsbull learn how to write Cypher statementsbull use Cypher for loading databull transition from SQL to Cypher
If you want to keep a reference at your side while reading please see the Cypher Refcard1
Work in ProgressThere may still be unfinished parts in this chapter Please comment on it so we can make itsuit our readers better
1 httpneo4jcomdocs230cypher-refcard
Introduction to Cypher
17
31 Background and MotivationCypher provides a convenient way to express queries and other Neo4j actions Although Cypheris particularly useful for exploratory work it is fast enough to be used in production Java-basedapproaches (eg unmanaged extensions) can also be used to handle particularly demanding use cases
Query processingTo use Cypher effectively itrsquos useful to have an idea of how it works So letrsquos take a high-level look at theway Cypher processes queries
bull Parse and validate the querybull Generate the execution planbull Locate the initial node(s)bull Select and traverse relationshipsbull Change andor return values
PreparationParsing and validating the Cypher statement(s) is important but mundane However generating anoptimal search strategy can be far more challenging
The execution plan must tell the database how to locate initial node(s) select relationships for traversaletc This involves tricky optimization problems (eg which actions should happen first) but we can safelyleave the details to the Neo4j engineers So letrsquos move on to locating the initial node(s)
Locate the initial node(s)Neo4j is highly optimized for traversing property graphs Under ideal circumstances it can traversemillions of nodes and relationships per second following chains of pointers in the computerrsquos memory
However before traversal can begin Neo4j must know one or more starting nodes Unless the user (ormore likely a client program) can provide this information Neo4j will have to search for these nodes
A ldquobrute forcerdquo search of the database (eg for a specified property value) can be very time consumingEvery node must be examined first to see if it has the property then to see if the value meets thedesired criteria To avoid this effort Neo4j creates and uses indexes So Neo4j uses a separate indexfor each labelproperty combination
Traversal and actionsOnce the initial nodes are determined Neo4j can traverse portions of the graph and perform anyrequested actions The execution plan helps Neo4j to determine which nodes are relevant whichrelationships to traverse etc
Introduction to Cypher
18
32 Graphs Patterns and CypherNodes Relationships and PatternsNeo4jrsquos Property Graphs are composed of nodes and relationships either of which may have properties(ie attributes) Nodes represent entities (eg concepts events places things) relationships (which maybe directed) connect pairs of nodes
However nodes and relationships are simply low-level building blocks The real strength of the PropertyGraph lies in its ability to encode patterns of connected nodes and relationships A single node orrelationship typically encodes very little information but a pattern of nodes and relationships canencode arbitrarily complex ideas
Cypher Neo4jrsquos query language is strongly based on patterns Specifically patterns are used to matchdesired graph structures Once a matching structure has been found (or created) Neo4j can use it forfurther processing
Simple and Complex PatternsA simple pattern which has only a single relationship connects a pair of nodes (or occasionally a nodeto itself) For example a Person LIVES_IN a City or a City is PART_OF a Country
Complex patterns using multiple relationships can express arbitrarily complex concepts and supporta variety of interesting use cases For example we might want to match instances where a PersonLIVES_IN a Country The following Cypher code combines two simple patterns into a (mildly) complexpattern which performs this match
(Person) -[LIVES_IN]-gt (City) -[PART_OF]-gt (Country)
Pattern recognition is fundamental to the way that the brain works Consequently humans are verygood at working with patterns When patterns are presented visually (eg in a diagram or map) humanscan use them to recognize specify and understand concepts As a pattern-based language Cyphertakes advantage of this capability
Cypher ConceptsLike SQL2 (used in relational databases3) Cypher is a textual declarative query language It uses a formof ASCII art4 to represent graph-related patterns SQL-like clauses and keywords (eg MATCH WHERE DELETE)are used to combine these patterns and specify desired actions
This combination tells Neo4j which patterns to match and what to do with the matching items (egnodes relationships paths collections) However as a declarative5 language Cypher does not tellNeo4j how to find nodes traverse relationships etc (This level of control is available from Neo4jrsquos Java6
APIs7 see Section 322 ldquoUnmanaged Extensionsrdquo [565])
Diagrams made up of icons and arrows are commonly used to visualize graphs textual annotationsprovide labels define properties etc Cypherrsquos ASCII-art syntax formalizes this approach while adaptingit to the limitations of text
Node SyntaxCypher uses a pair of parentheses (usually containing a text string) to represent a node eg () (foo)This is reminiscent of a circle or a rectangle with rounded end caps Here are some ASCII-art encodingsfor example Neo4j nodes providing varying types and amounts of detail
()
2 httpsenwikipediaorgwikiSQL3 httpsenwikipediaorgwikiRelational_database_management_system4 httpsenwikipediaorgwikiASCII_art5 httpsenwikipediaorgwikiDeclarative_programming6 httpsenwikipediaorgwikiJava_(programming_language)7 httpsenwikipediaorgwikiApplication_programming_interface
Introduction to Cypher
19
(matrix)
(Movie)
(matrixMovie)
(matrixMovie title The Matrix)
(matrixMovie title The Matrix released 1997)
The simplest form () represents an anonymous uncharacterized node If we want to refer to thenode elsewhere we can add an identifier eg (matrix) Identifiers are restricted (ie scoped) to a singlestatement an identifier may have different (or no) meaning in another statement
The Movie label (prefixed in use with a colon) declares the nodersquos type This restricts the pattern keepingit from matching (say) a structure with an Actor node in this position Neo4jrsquos node indexes also uselabels each index is specific to the combination of a label and a property
The nodersquos properties (eg title) are represented as a list of keyvalue pairs enclosed within a pair ofbraces eg Properties can be used to store information andor restrict patterns For example wecould match nodes whose title is The Matrix
Relationship SyntaxCypher uses a pair of dashes (--) to represent an undirected relationship Directed relationships havean arrowhead at one end (eg lt-- --gt) Bracketed expressions (eg []) can be used to add detailsThis may include identifiers properties andor type information eg
--gt
-[role]-gt
-[ACTED_IN]-gt
-[roleACTED_IN]-gt
-[roleACTED_IN roles [Neo]]-gt
The syntax and semantics found within a relationshiprsquos bracket pair are very similar to those usedbetween a nodersquos parentheses An identifier (eg role) can be defined to be used elsewhere in thestatement The relationshiprsquos type (eg ACTED_IN) is analogous to the nodersquos label The properties (egroles) are entirely equivalent to node properties (Note that the value of a property may be an array)
Pattern SyntaxCombining the syntax for nodes and relationships we can express patterns The following could be asimple pattern (or fact) in this domain
(keanuPersonActor name Keanu Reeves )
-[roleACTED_IN roles [Neo] ]-gt
(matrixMovie title The Matrix )
Like with node labels the relationship type ACTED_IN is added as a symbol prefixed with a colonACTED_IN Identifiers (eg role) can be used elsewhere in the statement to refer to the relationshipNode and relationship properties use the same notation In this case we used an array property for theroles allowing multiple roles to be specified
Pattern Nodes vs Database NodesWhen a node is used in a pattern it describes zero or more nodes in the database Similarlyeach pattern describes zero or more paths of nodes and relationships
Pattern IdentifiersTo increase modularity and reduce repetition Cypher allows patterns to be assigned to identifiers Thisallow the matching paths to be inspected used in other expressions etc
acted_in = (Person)-[ACTED_IN]-gt(Movie)
The acted_in variable would contain two nodes and the connecting relationship for each path that wasfound or created There are a number of functions to access details of a path including nodes(path)rels(path) (same as relationships(path)) and length(path)
Introduction to Cypher
20
ClausesCypher statements typically have multiple clauses each of which performs a specific task eg
bull create and match patterns in the graphbull filter project sort or paginate resultsbull connectcompose partial statements
By combining Cypher clauses we can compose more complex statements that express what we wantto know or create Neo4j then figures out how to achieve the desired goal in an efficient manner
Introduction to Cypher
21
33 Patterns in PracticeCreating DataWersquoll start by looking into the clauses that allow us to create data
To add data we just use the patterns we already know By providing patterns we can specify whatgraph structures labels and properties we would like to make part of our graph
Obviously the simplest clause is called CREATE It will just go ahead and directly create the patterns thatyou specify
For the patterns wersquove looked at so far this could look like the following
CREATE (Movie titleThe Matrixreleased1997 )
If we execute this statement Cypher returns the number of changes in this case adding 1 node 1 labeland 2 properties
(empty result)
Nodes created 1Properties set 2Labels added 1
As we started out with an empty database we now have a database with a single node in it
Movie
t it le = The Matrix released = 1997
If case we also want to return the created data we can add a RETURN clause which refers to the identifierwersquove assigned to our pattern elements
CREATE (pPerson nameKeanu Reeves born1964 )
RETURN p
This is what gets returned
p
Node[1]nameKeanu Reeves born1964
1 rowNodes created 1Properties set 2Labels added 1
If we want to create more than one element we can separate the elements with commas or usemultiple CREATE statements
We can of course also create more complex structures like an ACTED_IN relationship with informationabout the character or DIRECTED ones for the director
CREATE (aPerson nameTom Hanks
born1956 )-[rACTED_IN roles [Forrest]]-gt(mMovie titleForrest Gumpreleased1994 )
CREATE (dPerson nameRobert Zemeckis born1951 )-[DIRECTED]-gt(m)
RETURN adrm
This is the part of the graph we just updated
Introduction to Cypher
22
Person
nam e = Tom Hanksborn = 1956
Movie
t it le = Forrest Gum preleased = 1994
ACTED_INroles = [ Forrest ]
Person
nam e = Robert Zem eckisborn = 1951
DIRECTED
In most cases we want to connect new data to existing structures This requires that we know how tofind existing patterns in our graph data which we will look at next
Matching PatternsMatching patterns is a task for the MATCH statement We pass the same kind of patterns wersquove used sofar to MATCH to describe what wersquore looking for It is similar to query by example only that our examplesalso include the structures
NoteA MATCH statement will search for the patterns we specify and return one row per successfulpattern match
To find the data wersquove created so far we can start looking for all nodes labeled with the Movie label
MATCH (mMovie)
RETURN m
Herersquos the result
Movie
t it le = The Matrix released = 1997
Movie
t it le = Forrest Gum preleased = 1994
This should show both The Matrix and Forrest Gump
We can also look for a specific person like Keanu Reeves
MATCH (pPerson nameKeanu Reeves )
RETURN p
This query returns the matching node
Person
nam e = Keanu Reevesborn = 1964
Note that we only provide enough information to find the nodes not all properties are required Inmost cases you have key-properties like SSN ISBN emails logins geolocation or product codes to lookfor
We can also find more interesting connections like for instance the movies titles that Tom Hanks actedin and the roles he played
MATCH (pPerson nameTom Hanks )-[rACTED_IN]-gt(mMovie)
Introduction to Cypher
23
RETURN mtitle rroles
mtitle rroles
Forrest Gump [Forrest]
1 row
In this case we only returned the properties of the nodes and relationships that we were interested inYou can access them everywhere via a dot notation identiferproperty
Of course this only lists his role as Forrest in Forrest Gump because thatrsquos all data that wersquove added
Now we know enough to connect new nodes to existing ones and can combine MATCH and CREATE toattach structures to the graph
Attaching StructuresTo extend the graph with new information we first match the existing connection points and thenattach the newly created nodes to them with relationships Adding Cloud Atlas as a new movie for TomHanks could be achieved like this
MATCH (pPerson nameTom Hanks )
CREATE (mMovie titleCloud Atlasreleased2012 )
CREATE (p)-[rACTED_IN roles [Zachry]]-gt(m)
RETURN prm
Herersquos what the structure looks like in the database
Person
nam e = Tom Hanksborn = 1956
Movie
t it le = Cloud At lasreleased = 2012
ACTED_INroles = [ Zachry]
TipIt is important to remember that we can assign identifiers to both nodes and relationshipsand use them later on no matter if they were created or matched
It is possible to attach both node and relationship in a single CREATE clause For readability it helps tosplit them up though
ImportantA tricky aspect of the combination of MATCH and CREATE is that we get one row per matchedpattern This causes subsequent CREATE statements to be executed once for each row Inmany cases this is what you want If thatrsquos not intended please move the CREATE statementbefore the MATCH or change the cardinality of the query with means discussed later or usethe get or create semantics of the next clause MERGE
Completing PatternsWhenever we get data from external systems or are not sure if certain information already exists inthe graph we want to be able to express a repeatable (idempotent) update operation In Cypher MERGE
Introduction to Cypher
24
has this function It acts like a combination of MATCH or CREATE which checks for the existence of datafirst before creating it With MERGE you define a pattern to be found or created Usually as with MATCHyou only want to include the key property to look for in your core pattern MERGE allows you to provideadditional properties you want to set ON CREATE
If we wouldnrsquot know if our graph already contained Cloud Atlas we could merge it in again
MERGE (mMovie titleCloud Atlas )
ON CREATE SET mreleased = 2012
RETURN m
m
Node[5]titleCloud Atlas released2012
1 row
We get a result in any both cases either the data (potentially more than one row) that was already inthe graph or a single newly created Movie node
NoteA MERGE clause without any previously assigned identifiers in it either matches the full patternor creates the full pattern It never produces a partial mix of matching and creating within apattern To achieve a partial matchcreate make sure to use already defined identifiers forthe parts that shouldnrsquot be affected
So foremost MERGE makes sure that you canrsquot create duplicate information or structures but it comeswith the cost of needing to check for existing matches first Especially on large graphs it can be costlyto scan a large set of labeled nodes for a certain property You can alleviate some of that by creatingsupporting indexes or constraints which wersquoll discuss later But itrsquos still not for free so whenever yoursquoresure to not create duplicate data use CREATE over MERGE
TipMERGE can also assert that a relationship is only created once For that to work you have topass in both nodes from a previous pattern match
MATCH (mMovie titleCloud Atlas )
MATCH (pPerson nameTom Hanks )
MERGE (p)-[rACTED_IN]-gt(m)
ON CREATE SET rroles =[Zachry]
RETURN prm
Person
nam e = Tom Hanksborn = 1956
Movie
t it le = Cloud At lasreleased = 2012
ACTED_INroles = [ Zachry]
In case the direction of a relationship is arbitrary you can leave off the arrowhead MERGE will thencheck for the relationship in either direction and create a new directed relationship if no matchingrelationship was found
Introduction to Cypher
25
If you choose to pass in only one node from a preceding clause MERGE offers an interesting functionalityIt will then only match within the direct neighborhood of the provided node for the given pattern andif not found create it This can come in very handy for creating for example tree structures
CREATE (yYear year2014 )
MERGE (y)lt-[IN_YEAR]-(m10Month month10 )
MERGE (y)lt-[IN_YEAR]-(m11Month month11 )
RETURN ym10m11
This is the graph structure that gets created
Year
year = 2014
Month
m onth = 11
IN_YEAR
Month
m onth = 10
IN_YEAR
Here there is no global search for the two Month nodes they are only searched for in the context of the2014 Year node
Introduction to Cypher
26
34 Getting the Results You WantLetrsquos first get some data in to retrieve results from
CREATE (matrixMovie titleThe Matrixreleased1997 )
CREATE (cloudAtlasMovie titleCloud Atlasreleased2012 )
CREATE (forrestGumpMovie titleForrest Gumpreleased1994 )
CREATE (keanuPerson nameKeanu Reeves born1964 )
CREATE (robertPerson nameRobert Zemeckis born1951 )
CREATE (tomPerson nameTom Hanks born1956 )
CREATE (tom)-[ACTED_IN roles [Forrest]]-gt(forrestGump)
CREATE (tom)-[ACTED_IN roles [Zachry]]-gt(cloudAtlas)
CREATE (robert)-[DIRECTED]-gt(forrestGump)
This is the data we will start out with
Movie
t it le = The Matrix released = 1997
Movie
t it le = Cloud At lasreleased = 2012
Movie
t it le = Forrest Gum preleased = 1994
Person
nam e = Keanu Reevesborn = 1964
Person
nam e = Robert Zem eckisborn = 1951
DIRECTED
Person
nam e = Tom Hanksborn = 1956
ACTED_INroles = [ Zachry]
ACTED_INroles = [ Forrest ]
Filtering ResultsSo far wersquove matched patterns in the graph and always returned all results we found Quite often thereare conditions in play for what we want to see Similar to in SQL those filter conditions are expressedin a WHERE clause This clause allows to use any number of boolean expressions (predicates) combinedwith AND OR XOR and NOT The simplest predicates are comparisons especially equality
MATCH (mMovie)
WHERE mtitle = The Matrix
RETURN m
m
Node[0]titleThe Matrix released1997
1 row
For equality on one or more properties a more compact syntax can be used as well
MATCH (mMovie title The Matrix )
RETURN m
Other options are numeric comparisons matching regular expressions and checking the existence ofvalues within a collection
The WHERE clause below includes a regular expression match a greater than comparison and a test tosee if a value exists in a collection
MATCH (pPerson)-[rACTED_IN]-gt(mMovie)
WHERE pname =~ K+ OR mreleased gt 2000 OR Neo IN rroles
RETURN prm
p r m
Node[5]nameTom Hanks
born1956
ACTED_IN[1]roles[Zachry] Node[1]titleCloud Atlas
released2012
1 row
Introduction to Cypher
27
One aspect that might be a little surprising is that you can even use patterns as predicates Where MATCHexpands the number and shape of patterns matched a pattern predicate restricts the current resultset It only allows the paths to pass that satisfy the additional patterns as well (or NOT)
MATCH (pPerson)-[ACTED_IN]-gt(m)
WHERE NOT (p)-[DIRECTED]-gt()
RETURN pm
p m
Node[5]nameTom Hanks born1956 Node[1]titleCloud Atlas released2012
Node[5]nameTom Hanks born1956 Node[2]titleForrest Gump released1994
2 rows
Here we find actors because they sport an ACTED_IN relationship but then skip those that ever DIRECTEDany movie
There are also more advanced ways of filtering like collection-predicates which we will look at later on
Returning ResultsSo far wersquove returned only nodes relationships or paths directly via their identifiers But the RETURNclause can actually return any number of expressions But what are actually expressions in Cypher
The simplest expressions are literal values like numbers strings and arrays as [123] and maps likenameTom Hanks born1964 movies[Forrest Gump ] count13 You can access individualproperties of any node relationship or map with a dot-syntax like nname Individual elements or slicesof arrays can be retrieved with subscripts like names[0] or movies[1-1] Each function evaluationlike length(array) toInt(12) substring(2014-07-0104) or coalesce(pnicknamena) is also anexpression
Predicates that yoursquod use in WHERE count as boolean expressions
Of course simpler expressions can be composed and concatenated to form more complex expressions
By default the expression itself will be used as label for the column in many cases you want to aliasthat with a more understandable name using expression AS alias You can later on refer to that columnusing its alias
MATCH (pPerson)
RETURN p pname AS name upper(pname) coalesce(pnicknamena) AS nickname name pname
labelhead(labels(p)) AS person
p name upper(pname) nickname person
Node[3]nameKeanu
Reeves born1964
Keanu Reeves KEANU REEVES na name -gt Keanu
Reeves label -gt
Person
Node[4]
nameRobert
Zemeckis
born1951
Robert Zemeckis ROBERT ZEMECKIS na name -gt Robert
Zemeckis label -gt
Person
Node[5]nameTom
Hanks born1956
Tom Hanks TOM HANKS na name -gt Tom
Hanks label -gt
Person
3 rows
If yoursquore interested in unique results you can use the DISTINCT keyword after RETURN to indicate that
Aggregating InformationIn many cases you want to aggregate or group the data that you encounter while traversing patternsin your graph In Cypher aggregation happens in the RETURN clause while computing your final results
Introduction to Cypher
28
Many common aggregation functions are supported eg count sum avg min and max but there areseveral more
Counting the number of people in your database could be achieved by this
MATCH (Person)
RETURN count() AS people
people
3
1 row
Please note that NULL values are skipped during aggregation For aggregating only unique values useDISTINCT like in count(DISTINCT role)
Aggregation in Cypher just works You specify which result columns you want to aggregate and Cypherwill use all non-aggregated columns as grouping keys
Aggregation affects which data is still visible in ordering or later query parts
To find out how often an actor and director worked together yoursquod run this statement
MATCH (actorPerson)-[ACTED_IN]-gt(movieMovie)lt-[DIRECTED]-(directorPerson)
RETURN actordirectorcount() AS collaborations
actor director collaborations
Node[5]nameTom Hanks
born1956
Node[4]nameRobert Zemeckis
born1951
1
1 row
Frequently you want to sort and paginate after aggregating a count(x)
Ordering and PaginationOrdering works like in other query languages with an ORDER BY expression [ASC|DESC] clause Theexpression can be any expression discussed before as long as it is computable from the returnedinformation
So for instance if you return personname you can still ORDER BY personage as both are accessible fromthe person reference You cannot order by things that you canrsquot infer from the information you returnThis is especially important with aggregation and DISTINCT return values as both remove the visibility ofdata that is aggregated
Pagination is a straightforward use of SKIP offset LIMIT count
A common pattern is to aggregate for a count (score or frequency) order by it and only return the top-nentries
For instance to find the most prolific actors you could do
MATCH (aPerson)-[ACTED_IN]-gt(mMovie)
RETURN acount() AS appearances
ORDER BY appearances DESC LIMIT 10
a appearances
Node[5]nameTom Hanks born1956 2
1 row
Collecting AggregationThe most helpful aggregation function is collect which as the name says collects all aggregatedvalues into a real array or list This comes very handy in many situations as you donrsquot loose the detailinformation while aggregating
Introduction to Cypher
29
Collect is well suited for retrieving the typical parent-child structures where one core entity (parentroot or head) is returned per row with all itrsquos dependent information in associated collections createdwith collect This means therersquos no need to repeat the parent information per each child-row or evenrunning 1+n statements to retrieve the parent and its children individually
To retrieve the cast of each movie in our database you could use this statement
MATCH (mMovie)lt-[ACTED_IN]-(aPerson)
RETURN mtitle AS movie collect(aname) AS cast count() AS actors
movie cast actors
Forrest Gump [Tom Hanks] 1
Cloud Atlas [Tom Hanks] 1
2 rows
The lists created by collect can either be used from the client consuming the Cypher results or directlywithin a statement with any of the collection functions or predicates
Introduction to Cypher
30
35 How to Compose Large StatementsLetrsquos first get some data in to retrieve results from
CREATE (matrixMovie titleThe Matrixreleased1997 )
CREATE (cloudAtlasMovie titleCloud Atlasreleased2012 )
CREATE (forrestGumpMovie titleForrest Gumpreleased1994 )
CREATE (keanuPerson nameKeanu Reeves born1964 )
CREATE (robertPerson nameRobert Zemeckis born1951 )
CREATE (tomPerson nameTom Hanks born1956 )
CREATE (tom)-[ACTED_IN roles [Forrest]]-gt(forrestGump)
CREATE (tom)-[ACTED_IN roles [Zachry]]-gt(cloudAtlas)
CREATE (robert)-[DIRECTED]-gt(forrestGump)
Combine statements with UNIONA Cypher statement is usually quite compact Expressing references between nodes as visual patternsmakes them easy to understand
If you want to combine the results of two statements that have the same result structure you can useUNION [ALL]
For instance if you want to list both actors and directors without using the alternative relationship-typesyntax ()-[ACTED_IN|DIRECTED]-gt() you can do this
MATCH (actorPerson)-[rACTED_IN]-gt(movieMovie)
RETURN actorname AS name type(r) AS acted_in movietitle AS title
UNION
MATCH (directorPerson)-[rDIRECTED]-gt(movieMovie)
RETURN directorname AS name type(r) AS acted_in movietitle AS title
name acted_in title
Tom Hanks ACTED_IN Cloud Atlas
Tom Hanks ACTED_IN Forrest Gump
Robert Zemeckis DIRECTED Forrest Gump
3 rows
Use WITH to Chain StatementsIn Cypher itrsquos possible to chain fragments of statements together much like you would do within adata-flow pipeline Each fragment works on the output from the previous one and its results can feedinto the next one
You use the WITH clause to combine the individual parts and declare which data flows from one to theother WITH is very much like RETURN with the difference that it doesnrsquot finish a query but prepares theinput for the next part You can use the same expressions aggregations ordering and pagination as inthe RETURN clause
The only difference is that you must alias all columns as they would otherwise not be accessible Onlycolumns that you declare in your WITH clause is available in subsequent query parts
See below for an example where we collect the movies someone appeared in and then filter out thosewhich appear in only one movie
MATCH (personPerson)-[ACTED_IN]-gt(mMovie)
WITH person count() AS appearances collect(mtitle) AS movies
WHERE appearances gt 1
RETURN personname appearances movies
personname appearances movies
Tom Hanks 2 [Cloud Atlas Forrest Gump]
1 row
Introduction to Cypher
31
TipIf you want to filter by an aggregated value in SQL or similar languages you would have touse HAVING Thatrsquos a single purpose clause for filtering aggregated information In CypherWHERE can be used in both cases
Introduction to Cypher
32
36 Labels Constraints and IndexesLabels are a convenient way to group nodes together They are used to restrict queries defineconstraints and create indexes
Using ConstraintsYou can also specify unique constraints that guarantee uniqueness of a certain property on nodes witha specific label
These constraints are also used by the MERGE clause to make certain that a node only exists once
The following will give an example of how to use labels and add constraints and indexes to them Letrsquosstart out adding a constraint mdash in this case we decided that all Movie node titles should be unique
CREATE CONSTRAINT ON (movieMovie) ASSERT movietitle IS UNIQUE
Note that adding the unique constraint will add an index on that property so we wonrsquot do thatseparately If we drop a constraint and still want an index on the same property we have to create suchan index
Constraints can be added after a label is already in use but that requires that the existing datacomplies with the constraints
Using indexesFor a graph query to run fast you donrsquot need indexes you only need them to find your starting pointsThe main reason for using indexes in a graph database is to find the starting points in the graph as fastas possible After the initial index seek you rely on in-graph structures and the first class citizenship ofrelationships in the graph database to achieve high performance
In this case we want an index to speed up finding actors by name in the database
CREATE INDEX ON Actor(name)
Indexes can be added at any time Note that it will take some time for an index to come online whentherersquos existing data
Now letrsquos add some data
CREATE (actorActor nameTom Hanks )(movieMovie titleSleepless IN Seattle )
(actor)-[ACTED_IN]-gt(movie)
Normally you donrsquot specify indexes when querying for data They will be used automatically This meanswe can simply look up the Tom Hanks node and the index will kick in behind the scenes to boostperformance
MATCH (actorActor name Tom Hanks )
RETURN actor
LabelsNow letrsquos say we want to add another label for a node Herersquos how to do that
MATCH (actorActor name Tom Hanks )
SET actor American
To remove a label from nodes this is what to do
MATCH (actorActor name Tom Hanks )
REMOVE actorAmerican
Related ContentFor more information on labels and related topics see
Introduction to Cypher
33
bull the section called ldquoLabelsrdquo [7]bull Chapter 14 Schema [243]bull Section 142 ldquoConstraintsrdquo [247]bull Section 141 ldquoIndexesrdquo [244]bull Section 108 ldquoUsingrdquo [152]bull Section 123 ldquoSetrdquo [200]bull Section 125 ldquoRemoverdquo [205]
Introduction to Cypher
34
37 Loading DataAs yoursquove seen you can not only query data expressively but also create data with Cypher statements
Naturally in most cases you wouldnrsquot want to write or generate huge statements to generate your databut instead use an existing data source that you pass into your statement and which is used to drivethe graph generation process
That process not only includes creating completely new data but also integrating with existingstructures and updating your graph
ParametersIn general we recommend passing in varying literal values from the outside as named parameters Thisallows Cypher to reuse existing execution plans for the statements
Of course you can also pass in parameters for data to be imported Those can be scalar values mapslists or even lists of maps
In your Cypher statement you can then iterate over those values (eg with UNWIND) to create your graphstructures
For instance to create a movie graph from JSON data structures pulled from an API you could use
movies [
title Stardust
released 2007
cast [
actor
name Robert de Niro
born 1943
characters [ Captain Shakespeare ]
actor
name Michelle Pfeiffer
born 1958
characters [ Lamia ]
]
]
UNWIND movies as movie
MERGE (mMovie titlemovietitle) ON CREATE SET mreleased = moviereleased
FOREACH (role IN moviecast |
MERGE (aPerson nameroleactorname) ON CREATE SET aborn = roleactorborn
MERGE (a)-[ACTED_IN rolesrolecharacters]-gt(m)
)
Importing CSVCypher provides an elegant built-in way to import tabular CSV data into graph structures
The LOAD CSV clause parses a local or remote file into a stream of rows which represent maps (withheaders) or lists Then you can use whatever Cypher operations you want to apply to either createnodes or relationships or to merge with existing graph structures
As CSV files usually represent either node- or relationship-lists you run multiple passes to create nodesand relationships separately
moviescsv
idtitlecountryyear
Introduction to Cypher
35
1Wall StreetUSA1987
2The American PresidentUSA1995
3The Shawshank RedemptionUSA1994
LOAD CSV WITH HEADERS FROM httpneo4jcomdocs230csvintromoviescsv AS line
CREATE (mMovie idlineidtitlelinetitle releasedtoInt(lineyear))
personscsv
idname
1Charlie Sheen
2Oliver Stone
3Michael Douglas
4Martin Sheen
5Morgan Freeman
LOAD CSV WITH HEADERS FROM httpneo4jcomdocs230csvintropersonscsv AS line
MERGE (aPerson idlineid )
ON CREATE SET aname=linename
rolescsv
personIdmovieIdrole
11Bud Fox
41Carl Fox
31Gordon Gekko
42AJ MacInerney
32President Andrew Shepherd
53Ellis Boyd Red Redding
LOAD CSV WITH HEADERS FROM httpneo4jcomdocs230csvintrorolescsv AS line
MATCH (mMovie idlinemovieId )
MATCH (aPerson idlinepersonId )
CREATE (a)-[ACTED_IN roles [linerole]]-gt(m)
Movie
id = 1t it le = Wall St reet released = 1987
Movie
released = 1995id = 2t it le = The Am erican President
Movie
released = 1994id = 3t it le = The Shawshank Redem pt ion
Person
id = 1nam e = Charlie Sheen
ACTED_INroles = [ Bud Fox]
Person
id = 2nam e = Oliver Stone
Person
id = 3nam e = Michael Douglas
ACTED_INroles = [ Gordon Gekko]
ACTED_INroles = [ President Andrew Shepherd]
Person
id = 4nam e = Mart in Sheen
ACTED_INroles = [ Carl Fox]
ACTED_INroles = [ AJ MacInerney]
Person
id = 5nam e = Morgan Freem an
ACTED_INroles = [ Ellis Boyd Red Redding]
If your file contains denormalized data you can either run the same file with multiple passes andsimple operations as shown above or you might have to use MERGE to create entities uniquely
For our use-case we can import the data using a CSV structure like this
movie_actor_rolescsv
titlereleasedactorborncharacters
Back to the Future1985Michael J Fox1961Marty McFly
Back to the Future1985Christopher Lloyd1938Dr Emmet Brown
LOAD CSV WITH HEADERS FROM httpneo4jcomdocs230csvintromovie_actor_rolescsv AS line
FIELDTERMINATOR
MERGE (mMovie titlelinetitle )
ON CREATE SET mreleased = toInt(linereleased)
MERGE (aPerson namelineactor )
ON CREATE SET aborn = toInt(lineborn)
MERGE (a)-[ACTED_IN rolessplit(linecharacters)]-gt(m)
Introduction to Cypher
36
Movie
id = 1t it le = Wall St reet released = 1987
Movie
released = 1995id = 2t it le = The Am erican President
Movie
released = 1994id = 3t it le = The Shawshank Redem pt ion
Person
id = 1nam e = Charlie Sheen
ACTED_INroles = [ Bud Fox]
Person
id = 2nam e = Oliver Stone
Person
id = 3nam e = Michael Douglas
ACTED_INroles = [ Gordon Gekko]
ACTED_INroles = [ President Andrew Shepherd]
Person
id = 4nam e = Mart in Sheen
ACTED_INroles = [ Carl Fox]
ACTED_INroles = [ AJ MacInerney]
Person
id = 5nam e = Morgan Freem an
ACTED_INroles = [ Ellis Boyd Red Redding]
Movie
t it le = Back to the Futurereleased = 1985
Person
nam e = Michael J Foxborn = 1961
ACTED_INroles = [ Marty McFly]
Person
nam e = Christopher Lloydborn = 1938
ACTED_INroles = [ Dr Em m et Brown]
If you import a large amount of data (more than 10000 rows) it is recommended to prefix your LOADCSV clause with a PERIODIC COMMIT hint This allows Neo4j to regularly commit the import transactions toavoid memory churn for large transaction-states
Introduction to Cypher
37
38 Utilizing Data StructuresCypher can create and consume more complex data structures out of the box As already mentionedyou can create literal lists ([123]) and maps (name value) within a statement
There are a number of functions that work with lists They range from simple ones like size(list) thatreturns the size of a list to reduce which runs an expression against the elements and accumulates theresults
Letrsquos first load a bit of data into the graph If you want more details on how the data is loaded see thesection called ldquoImporting CSVrdquo [34]
LOAD CSV WITH HEADERS FROM httpneo4jcomdocs230csvintromoviescsv AS line
CREATE (mMovie idlineidtitlelinetitle releasedtoInt(lineyear))
LOAD CSV WITH HEADERS FROM httpneo4jcomdocs230csvintropersonscsv AS line
MERGE (aPerson idlineid )
ON CREATE SET aname=linename
LOAD CSV WITH HEADERS FROM httpneo4jcomdocs230csvintrorolescsv AS line
MATCH (mMovie idlinemovieId )
MATCH (aPerson idlinepersonId )
CREATE (a)-[ACTED_IN roles [linerole]]-gt(m)
LOAD CSV WITH HEADERS FROM httpneo4jcomdocs230csvintromovie_actor_rolescsv AS line
FIELDTERMINATOR
MERGE (mMovie titlelinetitle )
ON CREATE SET mreleased = toInt(linereleased)
MERGE (aPerson namelineactor )
ON CREATE SET aborn = toInt(lineborn)
MERGE (a)-[ACTED_IN rolessplit(linecharacters)]-gt(m)
Now letrsquos try out data structures
To begin with collect the names of the actors per movie and return two of them
MATCH (movieMovie)lt-[ACTED_IN]-(actorPerson)
RETURN movietitle AS movie collect(actorname)[02] AS two_of_cast
movie two_of_cast
The American President [Michael Douglas Martin Sheen]
Back to the Future [Christopher Lloyd Michael J Fox]
Wall Street [Michael Douglas Martin Sheen]
The Shawshank Redemption [Morgan Freeman]
4 rows
You can also access individual elements or slices of a list quickly with list[1] or list[5-5] Otherfunctions to access parts of a list are head(list) tail(list) and last(list)
List PredicatesWhen using lists and arrays in comparisons you can use predicates like value IN list or any(x IN listWHERE x = value) There are list predicates to satisfy conditions for all any none and single elements
MATCH path =(Person)--gt(Movie)lt--(Person)
WHERE ANY (n IN nodes(path) WHERE nname = Michael Douglas)
RETURN extract(n IN nodes(path)| coalesce(nname ntitle))
extract(n IN nodes(path coalesce(nname ntitle))
[Martin Sheen Wall S et Michael Douglas]
[Charlie Sheen Wall eet Michael Douglas]
6 rows
Introduction to Cypher
38
extract(n IN nodes(path coalesce(nname ntitle))
[Michael Douglas Wal treet Martin Sheen]
[Michael Douglas Wal treet Charlie Sheen]
[Martin Sheen The Am can President Michael Douglas]
[Michael Douglas The erican President Martin Sheen]
6 rows
List ProcessingOftentimes you want to process lists to filter aggregate (reduce) or transform (extract) their valuesThose transformations can be done within Cypher or in the calling code This kind of list-processing canreduce the amount of data handled and returned so it might make sense to do it within the Cypherstatement
A simple non-graph example would be
WITH range(110) AS numbers
WITH extract(n IN numbers | nn) AS squares
WITH filter(n IN squares WHERE n gt 25) AS large_squares
RETURN reduce(a = 0 n IN large_squares | a + n) AS sum_large_squares
sum_large_squares
330
1 row
In a graph-query you can filter or aggregate collected values instead or work on array properties
MATCH (mMovie)lt-[rACTED_IN]-(aPerson)
WITH mtitle AS movie collect( name aname roles rroles ) AS cast
RETURN movie filter(actor IN cast WHERE actorname STARTS WITH M)
movie filter(actor IN cast WHERE actorname STARTSWITH M)
The American President [name -gt Michael Douglas roles -gt [President
Andrew Shepherd] name -gt Martin Sheen roles
-gt [A J MacInerney]]
Back to the Future [name -gt Michael J Fox roles -gt [Marty
McFly]]
Wall Street [name -gt Michael Douglas roles -gt [Gordon
Gekko] name -gt Martin Sheen roles -gt [Carl
Fox]]
The Shawshank Redemption [name -gt Morgan Freeman roles -gt [Ellis Boyd
Red Redding]]
4 rows
Unwind ListsSometimes you have collected information into a list but want to use each element individually as arow For instance you might want to further match patterns in the graph Or you passed in a collectionof values but now want to create or match a node or relationship for each element Then you can usethe UNWIND clause to unroll a list into a sequence of rows again
For instance a query to find the top 3 co-actors and then follow their movies and again list the cast foreach of those movies
MATCH (actorPerson)-[ACTED_IN]-gt(movieMovie)lt-[ACTED_IN]-(colleaguePerson)
WHERE actorname lt colleaguename
Introduction to Cypher
39
WITH actor colleague count() AS frequency collect(movie) AS movies
ORDER BY frequency DESC LIMIT 3 UNWIND movies AS m
MATCH (m)lt-[ACTED_IN]-(a)
RETURN mtitle AS movie collect(aname) AS cast
movie cast
The American President [Michael Douglas Martin Sheen]
Back to the Future [Christopher Lloyd Michael J Fox]
Wall Street [Michael Douglas Martin Sheen Charlie
Sheen Michael Douglas Martin Sheen Charlie
Sheen]
3 rows
Introduction to Cypher
40
39 Cypher vs SQLIf you have used SQL and want to learn Cypher this chapter is for you We wonrsquot dig very deep intoeither of the languages but focus on bridging the gap
Data ModelFor our example we will use data about persons who act in direct produce movies
Herersquos an entity-relationship model for the example
Person
Movie
acted in directed produced
We have Person and Movie entities which are related in three different ways each of which have many-to-many cardinality
In a RDBMS we would use tables for the entities as well as for the associative entities (join tables)needed In this case we decided to go with the following tables movie person acted_in directedproduced Yoursquoll find the SQL for this below
In Neo4j the basic data units are nodes and relationships Both can have properties which correspondto attributes in a RDBMS
Nodes can be grouped by putting labels on them In the example we will use the labels Movie andPerson
When using Neo4j related entities can be represented directly by using relationships Therersquos no needto deal with foreign keys to handle the relationships the database will take care of such mechanicsAlso the relationships always have full referential integrity Therersquos no constraints to enable for this asitrsquos not optional itrsquos really part of the underlying data model Relationships always have a type and wewill differentiate the different kinds of relationships by using the types ACTED_IN DIRECTED PRODUCED
Sample DataFirst off letrsquos see how to set up our example data in a RDBMS Wersquoll start out creating a few tables andthen go on to populate them
CREATE TABLE movie (
id INTEGER
title VARCHAR(100)
released INTEGER
tagline VARCHAR(100)
)
CREATE TABLE person (
id INTEGER
name VARCHAR(100)
born INTEGER
)
CREATE TABLE acted_in (
role varchar(100)
person_id INTEGER
movie_id INTEGER
)
CREATE TABLE directed (
Introduction to Cypher
41
person_id INTEGER
movie_id INTEGER
)
CREATE TABLE produced (
person_id INTEGER
movie_id INTEGER
)
Populating with data
INSERT INTO movie (id title released tagline)
VALUES (
(1 The Matrix 1999 Welcome to the Real World)
(2 The Devils Advocate 1997 Evil has its winning ways)
(3 Monster 2003 The first female serial killer of America)
)
INSERT INTO person (id name born)
VALUES (
(1 Keanu Reeves 1964)
(2 Carrie-Anne Moss 1967)
(3 Laurence Fishburne 1961)
(4 Hugo Weaving 1960)
(5 Andy Wachowski 1967)
(6 Lana Wachowski 1965)
(7 Joel Silver 1952)
(8 Charlize Theron 1975)
(9 Al Pacino 1940)
(10 Taylor Hackford 1944)
)
INSERT INTO acted_in (role person_id movie_id)
VALUES (
(Neo 1 1)
(Trinity 2 1)
(Morpheus 3 1)
(Agent Smith 4 1)
(Kevin Lomax 1 2)
(Mary Ann Lomax 8 2)
(John Milton 9 2)
(Aileen 8 3)
)
INSERT INTO directed (person_id movie_id)
VALUES (
(5 1)
(6 1)
(10 2)
)
INSERT INTO produced (person_id movie_id)
VALUES (
(7 1)
(8 3)
)
Doing this in Neo4j will look quite different To begin with we wonrsquot create any schema up front Wersquollcome back to schema later for now itrsquos enough to know that labels can be used right away withoutdeclaring them
In the CREATE statements below we tell Neo4j what data we want to have in the graph Simply put theparentheses denote nodes while the arrows (--gt or in our case with a relationship type included -[DIRECTED]-gt) denote relationships For the nodes we set identifiers like TheMatrix so we can easily referto them later on in the statement Note that the identifiers are scoped to the statement and not visibleto other Cypher statements We could use identifiers for the relationships as well but therersquos no needfor that in this case
CREATE (TheMatrixMovie titleThe Matrix released1999 taglineWelcome to the Real World )
Introduction to Cypher
42
CREATE (KeanuPerson nameKeanu Reeves born1964 )
CREATE (CarriePerson nameCarrie-Anne Moss born1967 )
CREATE (LaurencePerson nameLaurence Fishburne born1961 )
CREATE (HugoPerson nameHugo Weaving born1960 )
CREATE (AndyWPerson nameAndy Wachowski born1967 )
CREATE (LanaWPerson nameLana Wachowski born1965 )
CREATE (JoelSPerson nameJoel Silver born1952 )
CREATE (Keanu)-[ACTED_IN roles [Neo]]-gt(TheMatrix)
(Carrie)-[ACTED_IN roles [Trinity]]-gt(TheMatrix)
(Laurence)-[ACTED_IN roles [Morpheus]]-gt(TheMatrix)
(Hugo)-[ACTED_IN roles [Agent Smith]]-gt(TheMatrix)(AndyW)-[DIRECTED]-gt(TheMatrix)
(LanaW)-[DIRECTED]-gt(TheMatrix)(JoelS)-[PRODUCED]-gt(TheMatrix)
CREATE (TheDevilsAdvocateMovie titleThe Devils Advocate released1997
tagline Evil has its winning ways )
CREATE (MonsterMovie title Monster released 2003
tagline The first female serial killer of America )
CREATE (CharlizePerson nameCharlize Theron born1975 )
CREATE (AlPerson nameAl Pacino born1940 )
CREATE (TaylorPerson nameTaylor Hackford born1944 )
CREATE (Keanu)-[ACTED_IN roles [Kevin Lomax]]-gt(TheDevilsAdvocate)
(Charlize)-[ACTED_IN roles [Mary Ann Lomax]]-gt(TheDevilsAdvocate)
(Al)-[ACTED_IN roles [John Milton]]-gt(TheDevilsAdvocate)
(Taylor)-[DIRECTED]-gt(TheDevilsAdvocate)(Charlize)-[ACTED_IN roles [Aileen]]-gt(Monster)
(Charlize)-[PRODUCED roles [Aileen]]-gt(Monster)
Simple read of dataLetrsquos find all entries in the movie table and output their title attribute in our RDBMS
SELECT movietitle
FROM movie
TITLE
The Matrix
The Devils Advocate
Monster
3 rows
Using Neo4j find all nodes labeled Movie and output their title property
MATCH (movieMovie)
RETURN movietitle
movietitle
The Matrix
The Devils Advocate
Monster
3 rows
MATCH tells Neo4j to match a pattern in the graph In this case the pattern is very simple any node with aMovie label on it We bind the result of the pattern matching to the identifier movie for use in the RETURNclause And as you can see the RETURN keyword of Cypher is similar to SELECT in SQL
Now letrsquos get movies released after 1998
SELECT movietitle
FROM movie
WHERE moviereleased gt 1998
Introduction to Cypher
43
TITLE
The Matrix
Monster
2 rows
In this case the addition actually looks identical in Cypher
MATCH (movieMovie)
WHERE moviereleased gt 1998
RETURN movietitle
movietitle
The Matrix
Monster
2 rows
Note however that the semantics of WHERE in Cypher is somewhat different see Section 113ldquoWhererdquo [167] for more information
JoinLetrsquos list all persons and the movies they acted in
SELECT personname movietitle
FROM person
JOIN acted_in AS acted_in ON acted_inperson_id = personid
JOIN movie ON acted_inmovie_id = movieid
NAME TITLE
Keanu Reeves The Matrix
Keanu Reeves The Devils Advocate
Carrie-Anne Moss The Matrix
Laurence Fishburne The Matrix
Hugo Weaving The Matrix
Charlize Theron The Devils Advocate
Charlize Theron Monster
Al Pacino The Devils Advocate
8 rows
The same using Cypher
MATCH (personPerson)-[ACTED_IN]-gt(movieMovie)
RETURN personname movietitle
Here we match a Person and a Movie node in case they are connected with an ACTED_IN relationship
personname movietitle
Hugo Weaving The Matrix
Laurence Fishburne The Matrix
Carrie-Anne Moss The Matrix
Keanu Reeves The Matrix
Al Pacino The Devils Advocate
8 rows
Introduction to Cypher
44
personname movietitle
Charlize Theron The Devils Advocate
Keanu Reeves The Devils Advocate
Charlize Theron Monster
8 rows
To make things slightly more complex letrsquos search for the co-actors of Keanu Reeves In SQL we use aself join on the person table and join on the acted_in table once for Keanu and once for the co-actors
SELECT DISTINCT co_actorname
FROM person AS keanu
JOIN acted_in AS acted_in1 ON acted_in1person_id = keanuid
JOIN acted_in AS acted_in2 ON acted_in2movie_id = acted_in1movie_id
JOIN person AS co_actor
ON acted_in2person_id = co_actorid AND co_actorid ltgt keanuid
WHERE keanuname = Keanu Reeves
NAME
Al Pacino
Carrie-Anne Moss
Charlize Theron
Hugo Weaving
Laurence Fishburne
5 rows
In Cypher we use a pattern with two paths that target the same Movie node
MATCH (keanuPerson)-[ACTED_IN]-gt(movieMovie)(coActorPerson)-[ACTED_IN]-gt(movie)
WHERE keanuname = Keanu Reeves
RETURN DISTINCT coActorname
You may have noticed that we used the co_actorid ltgt keanuid predicate in SQL only This is becauseNeo4j will only match on the ACTED_IN relationship once in the same pattern If this is not what we wantwe can split the pattern up by using two MATCH clauses like this
MATCH (keanuPerson)-[ACTED_IN]-gt(movieMovie)
MATCH (coActorPerson)-[ACTED_IN]-gt(movie)
WHERE keanuname = Keanu Reeves
RETURN DISTINCT coActorname
This time Keanu Reeves is included in the result as well
coActorname
Al Pacino
Charlize Theron
Keanu Reeves
Hugo Weaving
Laurence Fishburne
Carrie-Anne Moss
6 rows
Next letrsquos find out who has both acted in and produced movies
SELECT personname
Introduction to Cypher
45
FROM person
WHERE personid IN (SELECT person_id FROM acted_in)
AND personid IN (SELECT person_id FROM produced)
NAME
Charlize Theron
1 rows
In Cypher we use patterns as predicates in this case That is we require the relationships to exist butdonrsquot care about the connected nodes thus the empty parentheses
MATCH (personPerson)
WHERE (person)-[ACTED_IN]-gt() AND (person)-[PRODUCED]-gt()
RETURN personname
AggregationNow letrsquos find out a bit about the directors in movies that Keanu Reeves acted in We want to know howmany of those movies each of them directed
SELECT directorname count()
FROM person keanu
JOIN acted_in ON keanuid = acted_inperson_id
JOIN directed ON acted_inmovie_id = directedmovie_id
JOIN person AS director ON directedperson_id = directorid
WHERE keanuname = Keanu Reeves
GROUP BY directorname
ORDER BY count() DESC
NAME C2
Andy Wachowski 1
Lana Wachowski 1
Taylor Hackford 1
3 rows
Herersquos how wersquoll do the same in Cypher
MATCH (keanuPerson name Keanu Reeves )-[ACTED_IN]-gt(movieMovie)
(directorPerson)-[DIRECTED]-gt(movie)
RETURN directorname count()
ORDER BY count() DESC
As you can see there is no GROUP BY in the Cypher equivalent Instead Neo4j will automatically figure outthe grouping key
46
Chapter 4 Use Cypher in an application
The most direct way to use Cypher programmatically is to execute a HTTP POST operation againstthe transactional Cypher endpoint You can send a large number of statements with parameters tothe server with each request For immediate execution you can use the dbdatatransactioncommitendpoint with a JSON payload like this
curl -i -H acceptapplicationjson -H content-typeapplicationjson -XPOST httplocalhost7474dbdatatransactioncommit
-d statements[statementCREATE (pPerson namenamebornborn) RETURN pparametersnameKeanu
Reevesborn1964]
The above command results in
results[columns[p]data[row[nameKeanu Reevesborn1964]]]errors[]
You can add as many statement objects in the statements list as you want
For larger use-cases that span multiple requests but whose read-write-read-write operations shouldbe executed within the same transactional scope yoursquod use the dbdatatransaction endpoint This willgive you a transaction URL as the Location header which you can continue to write to and read from Atthe end you either commit the whole transaction by POSTing to the (also returned) commit URL or byissuing a DELETE request against the transaction URL
curl -i -H acceptapplicationjson -H content-typeapplicationjson -XPOST httplocalhost7474dbdatatransaction
-d statements[statementCREATE (pPerson namenamebornborn) RETURN pparametersnameClint
Eastwoodborn1930]
The above command results in
HTTP11 201 Created
Location httplocalhost7474dbdatatransaction261
commithttplocalhost7474dbdatatransaction261committransactionexpiresWed 03 Sep 2014 232651
+0000errors[]
results[columns[p]data[row[nameClint Eastwoodborn1930]]]
See Section 211 ldquoTransactional Cypher HTTP endpointrdquo [298] for more information
47
Chapter 5 Basic Data Modeling Examples
The following chapters contain simple examples to get you started thinking about data modeling withgraphs If you are looking for more advanced examples you can head straight to Chapter 6 AdvancedData Modeling Examples [62]
The examples use Cypher queries a lot read Part III ldquoCypher Query Languagerdquo [102] for moreinformation
Basic Data Modeling Examples
48
51 Movie DatabaseOur example graph consists of movies with title and year and actors with a name Actors have ACTS_INrelationships to movies which represents the role they played This relationship also has a roleattribute
Wersquoll go with three movies and three actors
CREATE (matrix1Movie title The Matrix year 1999-03-31 )
CREATE (matrix2Movie title The Matrix Reloaded year 2003-05-07 )
CREATE (matrix3Movie title The Matrix Revolutions year 2003-10-27 )
CREATE (keanuActor nameKeanu Reeves )
CREATE (laurenceActor nameLaurence Fishburne )
CREATE (carrieanneActor nameCarrie-Anne Moss )
CREATE (keanu)-[ACTS_IN role Neo ]-gt(matrix1)
CREATE (keanu)-[ACTS_IN role Neo ]-gt(matrix2)
CREATE (keanu)-[ACTS_IN role Neo ]-gt(matrix3)
CREATE (laurence)-[ACTS_IN role Morpheus ]-gt(matrix1)
CREATE (laurence)-[ACTS_IN role Morpheus ]-gt(matrix2)
CREATE (laurence)-[ACTS_IN role Morpheus ]-gt(matrix3)
CREATE (carrieanne)-[ACTS_IN role Trinity ]-gt(matrix1)
CREATE (carrieanne)-[ACTS_IN role Trinity ]-gt(matrix2)
CREATE (carrieanne)-[ACTS_IN role Trinity ]-gt(matrix3)
This gives us the following graph to play with
Movie
t it le = The Matrix year = 1999-03-31
Movie
year = 2003-05-07t it le = The Matrix Reloaded
Movie
year = 2003-10-27t it le = The Matrix Revolut ions
Actor
nam e = Keanu Reeves
ACTS_INrole = Neo
ACTS_INrole = Neo
ACTS_INrole = Neo
Actor
nam e = Laurence Fishburne
ACTS_INrole = Morpheus
ACTS_INrole = Morpheus
ACTS_INrole = Morpheus
Actor
nam e = Carrie-Anne Moss
ACTS_INrole = Trinity
ACTS_INrole = Trinity
ACTS_INrole = Trinity
Letrsquos check how many nodes we have now
MATCH (n)
RETURN Hello Graph with + count()+ Nodes AS welcome
Return a single node by name
MATCH (movieMovie title The Matrix )
RETURN movie
Return the title and date of the matrix node
MATCH (movieMovie title The Matrix )
RETURN movietitle movieyear
Which results in
movietitle movieyear
The Matrix 1999-03-31
1 row
Show all actors
MATCH (actorActor)
RETURN actor
Return just the name and order them by name
Basic Data Modeling Examples
49
MATCH (actorActor)
RETURN actorname
ORDER BY actorname
Count the actors
MATCH (actorActor)
RETURN count()
Get only the actors whose names end with ldquosrdquo
MATCH (actorActor)
WHERE actorname =~ s$
RETURN actorname
Herersquos some exploratory queries for unknown datasets Donrsquot do this on live production databases
Count nodes
MATCH (n)
RETURN count()
Count relationship types
MATCH (n)-[r]-gt()
RETURN type(r) count()
type(r) count()
ACTS_IN 9
1 row
List all nodes and their relationships
MATCH (n)-[r]-gt(m)
RETURN n AS FROM r AS `-gt` m AS to
from -gt to
Node[3]nameKeanu Reeves ACTS_IN[2]roleNeo Node[2]year2003-10-27
titleThe Matrix Revolutions
Node[3]nameKeanu Reeves ACTS_IN[1]roleNeo Node[1]year2003-05-07
titleThe Matrix Reloaded
Node[3]nameKeanu Reeves ACTS_IN[0]roleNeo Node[0]titleThe Matrix
year1999-03-31
Node[4]nameLaurence
Fishburne
ACTS_IN[5]roleMorpheus Node[2]year2003-10-27
titleThe Matrix Revolutions
Node[4]nameLaurence
Fishburne
ACTS_IN[4]roleMorpheus Node[1]year2003-05-07
titleThe Matrix Reloaded
Node[4]nameLaurence
Fishburne
ACTS_IN[3]roleMorpheus Node[0]titleThe Matrix
year1999-03-31
Node[5]nameCarrie-Anne Moss ACTS_IN[8]roleTrinity Node[2]year2003-10-27
titleThe Matrix Revolutions
Node[5]nameCarrie-Anne Moss ACTS_IN[7]roleTrinity Node[1]year2003-05-07
titleThe Matrix Reloaded
Node[5]nameCarrie-Anne Moss ACTS_IN[6]roleTrinity Node[0]titleThe Matrix
year1999-03-31
9 rows
Basic Data Modeling Examples
50
52 Social Movie DatabaseOur example graph consists of movies with title and year and actors with a name Actors have ACTS_INrelationships to movies which represents the role they played This relationship also has a roleattribute
So far we queried the movie data now letrsquos update the graph too
CREATE (matrix1Movie title The Matrix year 1999-03-31 )
CREATE (matrix2Movie title The Matrix Reloaded year 2003-05-07 )
CREATE (matrix3Movie title The Matrix Revolutions year 2003-10-27 )
CREATE (keanuActor nameKeanu Reeves )
CREATE (laurenceActor nameLaurence Fishburne )
CREATE (carrieanneActor nameCarrie-Anne Moss )
CREATE (keanu)-[ACTS_IN role Neo ]-gt(matrix1)
CREATE (keanu)-[ACTS_IN role Neo ]-gt(matrix2)
CREATE (keanu)-[ACTS_IN role Neo ]-gt(matrix3)
CREATE (laurence)-[ACTS_IN role Morpheus ]-gt(matrix1)
CREATE (laurence)-[ACTS_IN role Morpheus ]-gt(matrix2)
CREATE (laurence)-[ACTS_IN role Morpheus ]-gt(matrix3)
CREATE (carrieanne)-[ACTS_IN role Trinity ]-gt(matrix1)
CREATE (carrieanne)-[ACTS_IN role Trinity ]-gt(matrix2)
CREATE (carrieanne)-[ACTS_IN role Trinity ]-gt(matrix3)
We will add ourselves friends and movie ratings
Herersquos how to add a node for yourself and return it letrsquos say your name is ldquoMerdquo
CREATE (meUser name Me )
RETURN me
me
Node[6]nameMe
1 rowNodes created 1Properties set 1Labels added 1
Letrsquos check if the node is there
MATCH (meUser name Me )
RETURN mename
Add a movie rating
MATCH (meUser name Me )(movieMovie title The Matrix )
CREATE (me)-[RATED stars 5 comment I love that movie ]-gt(movie)
Which movies did I rate
MATCH (meUser name Me )(me)-[ratingRATED]-gt(movie)
RETURN movietitle ratingstars ratingcomment
movietitle ratingstars ratingcomment
The Matrix 5 I love that movie
1 row
We need a friend
CREATE (friendUser name A Friend )
RETURN friend
Basic Data Modeling Examples
51
Add our friendship idempotently so we can re-run the query without adding it several times We returnthe relationship to check that it has not been created several times
MATCH (meUser name Me )(friendUser name A Friend )
CREATE UNIQUE (me)-[friendshipFRIEND]-gt(friend)
RETURN friendship
You can rerun the query see that it doesnrsquot change anything the second time
Letrsquos update our friendship with a since property
MATCH (meUser name Me )-[friendshipFRIEND]-gt(friendUser name A Friend )
SET friendshipsince=forever
RETURN friendship
Letrsquos pretend us being our friend and wanting to see which movies our friends have rated
MATCH (meUser name A Friend )-[FRIEND]-(friend)-[ratingRATED]-gt(movie)
RETURN movietitle avg(ratingstars) AS stars collect(ratingcomment) AS comments count()
movietitle stars comments count()
The Matrix 5 0 [I love that movie] 1
1 row
Thatrsquos too little data letrsquos add some more friends and friendships
MATCH (meUser name Me )
FOREACH (i IN range(110)| CREATE (friendUser name Friend + i )(me)-[FRIEND]-gt(friend))
Show all our friends
MATCH (meUser name Me )-[rFRIEND]-gt(friend)
RETURN type(r) AS friendship friendname
friendship friendname
FRIEND Friend 5
FRIEND Friend 4
FRIEND Friend 3
FRIEND Friend 2
FRIEND Friend 1
FRIEND Friend 10
FRIEND Friend 8
FRIEND Friend 9
FRIEND Friend 6
FRIEND Friend 7
FRIEND A Friend
11 rows
Basic Data Modeling Examples
52
53 Finding PathsOur example graph consists of movies with title and year and actors with a name Actors have ACTS_INrelationships to movies which represents the role they played This relationship also has a roleattribute
We queried and updated the data so far now letrsquos find interesting constellations aka paths
CREATE (matrix1Movie title The Matrix year 1999-03-31 )
CREATE (matrix2Movie title The Matrix Reloaded year 2003-05-07 )
CREATE (matrix3Movie title The Matrix Revolutions year 2003-10-27 )
CREATE (keanuActor nameKeanu Reeves )
CREATE (laurenceActor nameLaurence Fishburne )
CREATE (carrieanneActor nameCarrie-Anne Moss )
CREATE (keanu)-[ACTS_IN role Neo ]-gt(matrix1)
CREATE (keanu)-[ACTS_IN role Neo ]-gt(matrix2)
CREATE (keanu)-[ACTS_IN role Neo ]-gt(matrix3)
CREATE (laurence)-[ACTS_IN role Morpheus ]-gt(matrix1)
CREATE (laurence)-[ACTS_IN role Morpheus ]-gt(matrix2)
CREATE (laurence)-[ACTS_IN role Morpheus ]-gt(matrix3)
CREATE (carrieanne)-[ACTS_IN role Trinity ]-gt(matrix1)
CREATE (carrieanne)-[ACTS_IN role Trinity ]-gt(matrix2)
CREATE (carrieanne)-[ACTS_IN role Trinity ]-gt(matrix3)
All other movies that actors in ldquoThe Matrixrdquo acted in ordered by occurrence
MATCH (Movie title The Matrix )lt-[ACTS_IN]-(actor)-[ACTS_IN]-gt(movie)
RETURN movietitle count()
ORDER BY count() DESC
movietitle count()
The Matrix Revolutions 3
The Matrix Reloaded 3
2 rows
Letrsquos see who acted in each of these movies
MATCH (Movie title The Matrix )lt-[ACTS_IN]-(actor)-[ACTS_IN]-gt(movie)
RETURN movietitle collect(actorname) count() AS count
ORDER BY count DESC
movietitle collect(actorname) count
The Matrix Revolutions [Carrie-Anne Moss Laurence
Fishburne Keanu Reeves]
3
The Matrix Reloaded [Carrie-Anne Moss Laurence
Fishburne Keanu Reeves]
3
2 rows
What about co-acting that is actors that acted together
MATCH (Movie title The Matrix
)lt-[ACTS_IN]-(actor)-[ACTS_IN]-gt(movie)lt-[ACTS_IN]-(colleague)
RETURN actorname collect(DISTINCT colleaguename)
actorname collect(distinct colleaguename)
Carrie-Anne Moss [Laurence Fishburne Keanu Reeves]
Keanu Reeves [Carrie-Anne Moss Laurence Fishburne]
3 rows
Basic Data Modeling Examples
53
actorname collect(distinct colleaguename)
Laurence Fishburne [Carrie-Anne Moss Keanu Reeves]
3 rows
Who of those other actors acted most often with anyone from the matrix cast
MATCH (Movie title The Matrix
)lt-[ACTS_IN]-(actor)-[ACTS_IN]-gt(movie)lt-[ACTS_IN]-(colleague)
RETURN colleaguename count()
ORDER BY count() DESC LIMIT 10
colleaguename count()
Carrie-Anne Moss 4
Keanu Reeves 4
Laurence Fishburne 4
3 rows
Starting with paths a path is a sequence of nodes and relationships from a start node to an end node
We know that Trinity loves Neo but how many paths exist between the two actors Wersquoll limit the pathlength of the pattern as it exhaustively searches the graph otherwise This is done by using 05 in thepattern relationship
MATCH p =(Actor name Keanu Reeves )-[ACTS_IN05]-(Actor name Carrie-Anne Moss )
RETURN p length(p)
LIMIT 10
p length(p)
[Node[3]nameKeanu Reeves ACTS_IN[0]
roleNeo Node[0]titleThe Matrix
year1999-03-31 ACTS_IN[6]roleTrinity
Node[5]nameCarrie-Anne Moss]
2
[Node[3]nameKeanu Reeves ACTS_IN[1]
roleNeo Node[1]year2003-05-07 titleThe
Matrix Reloaded ACTS_IN[4]roleMorpheus
Node[4]nameLaurence Fishburne ACTS_IN[3]
roleMorpheus Node[0]titleThe Matrix
year1999-03-31 ACTS_IN[6]roleTrinity
Node[5]nameCarrie-Anne Moss]
4
[Node[3]nameKeanu Reeves ACTS_IN[2]
roleNeo Node[2]year2003-10-27
titleThe Matrix Revolutions ACTS_IN[5]
roleMorpheus Node[4]nameLaurence
Fishburne ACTS_IN[3]roleMorpheus
Node[0]titleThe Matrix
year1999-03-31 ACTS_IN[6]roleTrinity
Node[5]nameCarrie-Anne Moss]
4
[Node[3]nameKeanu Reeves ACTS_IN[1]
roleNeo Node[1]year2003-05-07 titleThe
Matrix Reloaded ACTS_IN[7]roleTrinity
Node[5]nameCarrie-Anne Moss]
2
[Node[3]nameKeanu Reeves ACTS_IN[0]
roleNeo Node[0]titleThe Matrix
4
9 rows
Basic Data Modeling Examples
54
p length(p) year1999-03-31 ACTS_IN[3]roleMorpheus
Node[4]nameLaurence Fishburne ACTS_IN[4]
roleMorpheus Node[1]year2003-05-07
titleThe Matrix Reloaded ACTS_IN[7]
roleTrinity Node[5]nameCarrie-Anne
Moss]
[Node[3]nameKeanu Reeves ACTS_IN[2]
roleNeo Node[2]year2003-10-27
titleThe Matrix Revolutions ACTS_IN[5]
roleMorpheus Node[4]nameLaurence
Fishburne ACTS_IN[4]roleMorpheus
Node[1]year2003-05-07 titleThe Matrix
Reloaded ACTS_IN[7]roleTrinity Node[5]
nameCarrie-Anne Moss]
4
[Node[3]nameKeanu Reeves ACTS_IN[2]
roleNeo Node[2]year2003-10-27 titleThe
Matrix Revolutions ACTS_IN[8]roleTrinity
Node[5]nameCarrie-Anne Moss]
2
[Node[3]nameKeanu Reeves ACTS_IN[0]
roleNeo Node[0]titleThe Matrix
year1999-03-31 ACTS_IN[3]roleMorpheus
Node[4]nameLaurence Fishburne ACTS_IN[5]
roleMorpheus Node[2]year2003-10-27
titleThe Matrix Revolutions ACTS_IN[8]
roleTrinity Node[5]nameCarrie-Anne
Moss]
4
[Node[3]nameKeanu Reeves ACTS_IN[1]
roleNeo Node[1]year2003-05-07 titleThe
Matrix Reloaded ACTS_IN[4]roleMorpheus
Node[4]nameLaurence Fishburne ACTS_IN[5]
roleMorpheus Node[2]year2003-10-27
titleThe Matrix Revolutions ACTS_IN[8]
roleTrinity Node[5]nameCarrie-Anne
Moss]
4
9 rows
But thatrsquos a lot of data we just want to look at the names and titles of the nodes of the path
MATCH p =(Actor name Keanu Reeves )-[ACTS_IN05]-(Actor name Carrie-Anne Moss )
RETURN extract(n IN nodes(p)| coalesce(ntitlenname)) AS `names AND titles` length(p)
ORDER BY length(p)
LIMIT 10
names and titles length(p)
[Keanu Reeves The Matrix Carrie-Anne Moss] 2
[Keanu Reeves The Matrix Reloaded Carrie-
Anne Moss]
2
[Keanu Reeves The Matrix Revolutions Carrie-
Anne Moss]
2
[Keanu Reeves The Matrix Reloaded Laurence
Fishburne The Matrix Carrie-Anne Moss]
4
9 rows
Basic Data Modeling Examples
55
names and titles length(p)
[Keanu Reeves The Matrix
Revolutions Laurence Fishburne The
Matrix Carrie-Anne Moss]
4
[Keanu Reeves The Matrix Laurence
Fishburne The Matrix Reloaded Carrie-Anne
Moss]
4
[Keanu Reeves The Matrix
Revolutions Laurence Fishburne The Matrix
Reloaded Carrie-Anne Moss]
4
[Keanu Reeves The Matrix Laurence
Fishburne The Matrix Revolutions Carrie-Anne
Moss]
4
[Keanu Reeves The Matrix Reloaded Laurence
Fishburne The Matrix Revolutions Carrie-Anne
Moss]
4
9 rows
Basic Data Modeling Examples
56
54 Linked ListsA powerful feature of using a graph database is that you can create your own in-graph datastructures mdash for example a linked list
This data structure uses a single node as the list reference The reference has an outgoing relationshipto the head of the list and an incoming relationship from the last element of the list If the list is emptythe reference will point to itself
To make it clear what happens we will show how the graph looks after each query
To initialize an empty linked list we simply create a node and make it link to itself Unlike the actual listelements it doesnrsquot have a value property
CREATE (root name ROOT )-[LINK]-gt(root)
RETURN root
nam e = ROOT LINK
Adding values is done by finding the relationship where the new value should be placed in andreplacing it with a new node and two relationships to it We also have to handle the fact that the beforeand after nodes could be the same as the root node The case where before after and the root nodeare all the same makes it necessary to use CREATE UNIQUE to not create two new value nodes by mistake
MATCH (root)-[LINK0]-gt(before)(after)-[LINK0]-gt(root)(before)-[oldLINK]-gt(after)
WHERE rootname = ROOT AND (beforevalue lt 25 OR before = root) AND (25 lt aftervalue OR after =
root)
CREATE UNIQUE (before)-[LINK]-gt( value25 )-[LINK]-gt(after)
DELETE old
nam e = ROOT
value = 25
LINK LINK
Letrsquos add one more value
MATCH (root)-[LINK0]-gt(before)(after)-[LINK0]-gt(root)(before)-[oldLINK]-gt(after)
WHERE rootname = ROOT AND (beforevalue lt 10 OR before = root) AND (10 lt aftervalue OR after =
root)
CREATE UNIQUE (before)-[LINK]-gt( value10 )-[LINK]-gt(after)
DELETE old
Basic Data Modeling Examples
57
nam e = ROOT
value = 10
LINK
value = 25
LINK
LINK
Deleting a value conversely is done by finding the node with the value and the two relationships goingin and out from it and replacing the relationships with a new one
MATCH (root)-[LINK0]-gt(before)(before)-[delBeforeLINK]-gt(del)-[delAfterLINK]-gt(after)
(after)-[LINK0]-gt(root)
WHERE rootname = ROOT AND delvalue = 10
CREATE UNIQUE (before)-[LINK]-gt(after)
DELETE del delBefore delAfter
nam e = ROOT
value = 25
LINK LINK
Deleting the last value node is what requires us to use CREATE UNIQUE when replacing the relationshipsOtherwise we would end up with two relationships from the root node to itself as both before andafter nodes are equal to the root node meaning the pattern would match twice
MATCH (root)-[LINK0]-gt(before)(before)-[delBeforeLINK]-gt(del)-[delAfterLINK]-gt(after)
(after)-[LINK0]-gt(root)
WHERE rootname = ROOT AND delvalue = 25
CREATE UNIQUE (before)-[LINK]-gt(after)
DELETE del delBefore delAfter
nam e = ROOT LINK
Basic Data Modeling Examples
58
55 TV ShowsThis example show how TV Shows with Seasons Episodes Characters Actors Users and Reviews canbe modeled in a graph database
Data ModelLetrsquos start out with an entity-relationship model of the domain at hand
TV Show
Season
has
Episode
has
Review
has
Character
featured
User
wrote
Actor
played
To implement this in Neo4j wersquoll use the following relationship types
Relationship Type Description
HAS_SEASON Connects a show with its seasons
HAS_EPISODE Connects a season with its episodes
FEATURED_CHARACTER Connects an episode with its characters
PLAYED_CHARACTER Connects actors with characters Note that anactor can play multiple characters in an episodeand that the same character can be played bymultiple actors as well
HAS_REVIEW Connects an episode with its reviews
WROTE_REVIEW Connects users with reviews they contributed
Sample DataLetrsquos create some data and see how the domain plays out in practice
CREATE (himymTVShow name How I Met Your Mother )
CREATE (himym_s1Season name HIMYM Season 1 )
CREATE (himym_s1_e1Episode name Pilot )
CREATE (tedCharacter name Ted Mosby )
CREATE (joshRadnorActor name Josh Radnor )
Basic Data Modeling Examples
59
CREATE UNIQUE (joshRadnor)-[PLAYED_CHARACTER]-gt(ted)
CREATE UNIQUE (himym)-[HAS_SEASON]-gt(himym_s1)
CREATE UNIQUE (himym_s1)-[HAS_EPISODE]-gt(himym_s1_e1)
CREATE UNIQUE (himym_s1_e1)-[FEATURED_CHARACTER]-gt(ted)
CREATE (himym_s1_e1_review1 title Meet Me At The Bar In 15 Minutes amp Suit Up
content It was awesome )
CREATE (wakenPayneUser name WakenPayne )
CREATE (wakenPayne)-[WROTE_REVIEW]-gt(himym_s1_e1_review1)lt-[HAS_REVIEW]-(himym_s1_e1)
This is how the data looks in the database
TVShow
nam e = How I Met Your Mother
Season
nam e = HIMYM Season 1
HAS_SEASON
Episode
nam e = Pilot
HAS_EPISODE
t it le = Meet Me At The Bar In 15 Minutes amp Suit Upcontent = It was awesom e
HAS_REVIEW
Character
nam e = Ted Mosby
FEATURED_CHARACTER
Actor
nam e = Josh Radnor
PLAYED_CHARACTER
User
nam e = WakenPayne
WROTE_REVIEW
Note that even though we could have modeled the reviews as relationships with title and contentproperties on them we made them nodes instead We gain a lot of flexibility in this way for example ifwe want to connect comments to each review
Now letrsquos add more data
MATCH (himymTVShow name How I Met Your Mother )(himym_s1Season)
(himym_s1_e1Episode name Pilot )
(himym)-[HAS_SEASON]-gt(himym_s1)-[HAS_EPISODE]-gt(himym_s1_e1)
CREATE (marshallCharacter name Marshall Eriksen )
CREATE (robinCharacter name Robin Scherbatsky )
CREATE (barneyCharacter name Barney Stinson )
CREATE (lilyCharacter name Lily Aldrin )
CREATE (jasonSegelActor name Jason Segel )
CREATE (cobieSmuldersActor name Cobie Smulders )
CREATE (neilPatrickHarrisActor name Neil Patrick Harris )
CREATE (alysonHanniganActor name Alyson Hannigan )
CREATE UNIQUE (jasonSegel)-[PLAYED_CHARACTER]-gt(marshall)
CREATE UNIQUE (cobieSmulders)-[PLAYED_CHARACTER]-gt(robin)
CREATE UNIQUE (neilPatrickHarris)-[PLAYED_CHARACTER]-gt(barney)
CREATE UNIQUE (alysonHannigan)-[PLAYED_CHARACTER]-gt(lily)
CREATE UNIQUE (himym_s1_e1)-[FEATURED_CHARACTER]-gt(marshall)
CREATE UNIQUE (himym_s1_e1)-[FEATURED_CHARACTER]-gt(robin)
CREATE UNIQUE (himym_s1_e1)-[FEATURED_CHARACTER]-gt(barney)
CREATE UNIQUE (himym_s1_e1)-[FEATURED_CHARACTER]-gt(lily)
CREATE (himym_s1_e1_review2 title What a great pilot for a show )
Basic Data Modeling Examples
60
content The humour is great )
CREATE (atlasreduxUser name atlasredux )
CREATE (atlasredux)-[WROTE_REVIEW]-gt(himym_s1_e1_review2)lt-[HAS_REVIEW]-(himym_s1_e1)
Information for a showFor a particular TV show show all the seasons and all the episodes and all the reviews and all the castmembers from that show that is all of the information connected to that TV show
MATCH (tvShowTVShow)-[HAS_SEASON]-gt(season)-[HAS_EPISODE]-gt(episode)
WHERE tvShowname = How I Met Your Mother
RETURN seasonname episodename
seasonname episodename
HIMYM Season 1 Pilot
1 row
We could also grab the reviews if there are any by slightly tweaking the query
MATCH (tvShowTVShow)-[HAS_SEASON]-gt(season)-[HAS_EPISODE]-gt(episode)
WHERE tvShowname = How I Met Your Mother
WITH season episode
OPTIONAL MATCH (episode)-[HAS_REVIEW]-gt(review)
RETURN seasonname episodename review
seasonname episodename review
HIMYM Season 1 Pilot Node[15]titleWhat a
great pilot for a show )
contentThe humour is great
HIMYM Season 1 Pilot Node[5]titleMeet Me At The
Bar In 15 Minutes amp Suit Up
contentIt was awesome
2 rows
Now letrsquos list the characters featured in a show Note that in this query we only put identifiers on thenodes we actually use later on The other nodes of the path pattern are designated by ()
MATCH (tvShowTVShow)-[HAS_SEASON]-gt()-[HAS_EPISODE]-gt()-[FEATURED_CHARACTER]-gt(character)
WHERE tvShowname = How I Met Your Mother
RETURN DISTINCT charactername
charactername
Lily Aldrin
Barney Stinson
Robin Scherbatsky
Marshall Eriksen
Ted Mosby
5 rows
Now letrsquos look at how to get all cast members of a show
MATCH
(tvShowTVShow)-[HAS_SEASON]-gt()-[HAS_EPISODE]-gt(episode)-[FEATURED_CHARACTER]-gt()lt-[PLAYED_CHARACTER]-(actor)
WHERE tvShowname = How I Met Your Mother
RETURN DISTINCT actorname
Basic Data Modeling Examples
61
actorname
Alyson Hannigan
Neil Patrick Harris
Cobie Smulders
Jason Segel
Josh Radnor
5 rows
Information for an actorFirst letrsquos add another TV show that Josh Radnor appeared in
CREATE (erTVShow name ER )
CREATE (er_s7Season name ER S7 )
CREATE (er_s7_e17Episode name Peters Progress )
CREATE (tedMosbyCharacter name The Advocate )
CREATE UNIQUE (er)-[HAS_SEASON]-gt(er_s7)
CREATE UNIQUE (er_s7)-[HAS_EPISODE]-gt(er_s7_e17)
WITH er_s7_e17
MATCH (actorActor)(episodeEpisode)
WHERE actorname = Josh Radnor AND episodename = Peters Progress
WITH actor episode
CREATE (keithCharacter name Keith )
CREATE UNIQUE (actor)-[PLAYED_CHARACTER]-gt(keith)
CREATE UNIQUE (episode)-[FEATURED_CHARACTER]-gt(keith)
And now wersquoll create a query to find the episodes that he has appeared in
MATCH (actorActor)-[PLAYED_CHARACTER]-gt(character)lt-[FEATURED_CHARACTER]-(episode)
WHERE actorname = Josh Radnor
RETURN episodename AS Episode charactername AS Character
Episode Character
Peters Progress Keith
Pilot Ted Mosby
2 rows
Now letrsquos go for a similar query but add the season and show to it as well
MATCH (actorActor)-[PLAYED_CHARACTER]-gt(character)lt-[FEATURED_CHARACTER]-(episode)
(episode)lt-[HAS_EPISODE]-(season)lt-[HAS_SEASON]-(tvshow)
WHERE actorname = Josh Radnor
RETURN tvshowname AS Show seasonname AS Season episodename AS Episode
charactername AS Character
Show Season Episode Character
ER ER S7 Peters Progress Keith
How I Met Your Mother HIMYM Season 1 Pilot Ted Mosby
2 rows
62
Chapter 6 Advanced Data Modeling Examples
The following chapters contain simplified examples of how different domains can be modeledusing Neo4j The aim is not to give full examples but to suggest possible ways to think using nodesrelationships graph patterns and data locality in traversals
The examples use Cypher queries a lot read Part III ldquoCypher Query Languagerdquo [102] for moreinformation
Advanced Data Modeling Examples
63
61 ACL structures in graphsThis example gives a generic overview of an approach to handling Access Control Lists (ACLs) in graphsand a simplified example with concrete queries
Generic approachIn many scenarios an application needs to handle security on some form of managed objects Thisexample describes one pattern to handle this through the use of a graph structure and traversersthat build a full permissions-structure for any managed object with exclude and include overridingpossibilities This results in a dynamic construction of ACLs based on the position and context of themanaged objectThe result is a complex security scheme that can easily be implemented in a graph structuresupporting permissions overriding principal and content composition without duplicating dataanywhere
TechniqueAs seen in the example graph layout there are some key concepts in this domain model
bull The managed content (folders and files) that are connected by HAS_CHILD_CONTENT relationshipsbull The Principal subtree pointing out principals that can act as ACL members pointed out by the
PRINCIPAL relationshipsbull The aggregation of principals into groups connected by the IS_MEMBER_OF relationship One principal
(user or group) can be part of many groups at the same timebull The SECURITY mdash relationships connecting the content composite structure to the principal composite
structure containing a additionremoval modifier property (+RW)
Constructing the ACLThe calculation of the effective permissions (eg Read Write Execute) for a principal for any given ACL-managed node (content) follows a number of rules that will be encoded into the permissions-traversal
Top-down-TraversalThis approach will let you define a generic permission pattern on the root content and then refine thatfor specific sub-content nodes and specific principals
Advanced Data Modeling Examples
64
1 Start at the content node in question traverse upwards to the content root node to determine thepath to it
2 Start with a effective optimistic permissions list of all permitted (111 in a bit encodedReadWriteExecute case) or 000 if you like pessimistic security handling (everything is forbidden unlessexplicitly allowed)
3 Beginning from the topmost content node look for any SECURITY relationships on it4 If found look if the principal in question is part of the end-principal of the SECURITY relationship5 If yes add the + permission modifiers to the existing permission pattern revoke the - permission
modifiers from the pattern6 If two principal nodes link to the same content node first apply the more generic prinipals modifiers7 Repeat the security modifier search all the way down to the target content node thus overriding
more generic permissions with the set on nodes closer to the target node
The same algorithm is applicable for the bottom-up approach basically just traversing from the targetcontent node upwards and applying the security modifiers dynamically as the traverser goes up
ExampleNow to get the resulting access rights for eg user 1 on the My Filepdf in a Top-Down approach onthe model in the graph above would go like
1 Traveling upward we start with Root folder and set the permissions to 11 initially (only consideringRead Write)
2 There are two SECURITY relationships to that folder User 1 is contained in both of them but root ismore generic so apply it first then All principals +W +R rarr 11
3 Home has no SECURITY instructions continue4 user1 Home has SECURITY First apply Regular Users (-R -W) rarr 00 Then user 1 (+R +W) rarr 115 The target node My Filepdf has no SECURITY modifiers on it so the effective permissions for User
1 on My Filepdf are ReadWrite rarr 11
Read-permission exampleIn this example we are going to examine a tree structure of directories and files Also there are usersthat own files and roles that can be assigned to users Roles can have permissions on directory or filesstructures (here we model only canRead as opposed to full rwx Unix permissions) and be nested A morethorough example of modeling ACL structures can be found at How to Build Role-Based Access Controlin SQL1
1 httpwwwxaprbcomblog20060816how-to-build-role-based-access-control-in-sql
Advanced Data Modeling Examples
65
Node[20]
nam e = Hom eU1
Node[17]
nam e = File1
leaf
Node[23]
nam e = Desktop
Node[16]
nam e = File2
leaf
Node[10]
nam e = Hom e
contains
Node[15]
nam e = Hom eU2
contains
contains
Node[11]
nam e = init d
Node[12]
nam e = etc
contains
Node[18]
nam e = FileRoot
contains contains
Node[7]
nam e = User
Node[14]
nam e = User1
m em ber
Node[13]
nam e = User2
m em ber
owns
owns
Node[8]
nam e = Adm in2
Node[9]
nam e = Adm in1
Node[21]
nam e = Role
subRole
Node[22]
nam e = SUDOers
subRole
canReadm em ber m em ber
Node[19]
nam e = Root
has
has
Find all files in the directory structureIn order to find all files contained in this structure we need a variable length query that follows allcontains relationships and retrieves the nodes at the other end of the leaf relationships
MATCH ( name FileRoot )-[contains0]-gt(parentDir)-[leaf]-gt(file)
RETURN file
resulting in
file
Node[10]nameFile1
Node[9]nameFile2
2 rows
What files are owned by whomIf we introduce the concept of ownership on files we then can ask for the owners of the files wefind mdash connected via owns relationships to file nodes
MATCH ( name FileRoot )-[contains0]-gt()-[leaf]-gt(file)lt-[owns]-(user)
RETURN file user
Returning the owners of all files below the FileRoot node
file user
Node[10]nameFile1 Node[7]nameUser1
Node[9]nameFile2 Node[6]nameUser2
2 rows
Who has access to a FileIf we now want to check what users have read access to all Files and define our ACL as
bull The root directory has no access granted
Advanced Data Modeling Examples
66
bull Any user having a role that has been granted canRead access to one of the parent folders of a File hasread access
In order to find users that can read any part of the parent folder hierarchy above the files Cypherprovides optional variable length path
MATCH (file)lt-[leaf]-()lt-[contains0]-(dir)
OPTIONAL MATCH (dir)lt-[canRead]-(role)-[member]-gt(readUser)
WHERE filename =~ File
RETURN filename dirname rolename readUsername
This will return the file and the directory where the user has the canRead permission along with theuser and their role
filename dirname rolename readUsername
File2 Desktop ltnullgt ltnullgt
File2 HomeU2 ltnullgt ltnullgt
File2 Home ltnullgt ltnullgt
File2 FileRoot SUDOers Admin2
File2 FileRoot SUDOers Admin1
File1 HomeU1 ltnullgt ltnullgt
File1 Home ltnullgt ltnullgt
File1 FileRoot SUDOers Admin2
File1 FileRoot SUDOers Admin1
9 rows
The results listed above contain null for optional path segments which can be mitigated by eitherasking several queries or returning just the really needed values
Advanced Data Modeling Examples
67
62 HyperedgesImagine a user being part of different groups A group can have different roles and a user can be partof different groups He also can have different roles in different groups apart from the membershipThe association of a User a Group and a Role can be referred to as a HyperEdge However it can beeasily modeled in a property graph as a node that captures this n-ary relationship as depicted below inthe U1G2R1 node
Figure 61 Graph
nam e = U1G2R1
nam e = Group2
hasGroup
nam e = Role1
hasRole
canHave
nam e = Role2
canHave
nam e = Group
isA
nam e = Role
isA isA
nam e = Group1
canHave canHaveisA
nam e = User1
hasRoleInGroup
in in nam e = U1G1R2
hasRoleInGroup
hasRole
hasGroup
Find GroupsTo find out in what roles a user is for a particular groups (here Group2) the following query can traversethis HyperEdge node and provide answers
Query
MATCH ( name User1 )-[hasRoleInGroup]-gt(hyperEdge)-[hasGroup]-gt( name Group2 )
(hyperEdge)-[hasRole]-gt(role)
RETURN rolename
The role of User1 is returned
Resultrolename
Role1
1 row
Advanced Data Modeling Examples
68
Find all groups and roles for a userHere find all groups and the roles a user has sorted by the name of the role
Query
MATCH ( name User1 )-[hasRoleInGroup]-gt(hyperEdge)-[hasGroup]-gt(group)
(hyperEdge)-[hasRole]-gt(role)
RETURN rolename groupname
ORDER BY rolename ASC
The groups and roles of User1 are returned
Resultrolename groupname
Role1 Group2
Role2 Group1
2 rows
Find common groups based on shared rolesAssume a more complicated graph
1 Two user nodes User1 User22 User1 is in Group1 Group2 Group33 User1 has Role1 Role2 in Group1 Role2 Role3 in Group2 Role3 Role4 in Group3 (hyper edges)4 User2 is in Group1 Group2 Group35 User2 has Role2 Role5 in Group1 Role3 Role4 in Group2 Role5 Role6 in Group3 (hyper edges)
The graph for this looks like the following (nodes like U1G2R23 representing the HyperEdges)
Figure 62 Graph
nam e = U2G2R34
nam e = Role3
hasRole
nam e = Role4
hasRole
nam e = Group2
hasGroup
nam e = U1G3R34
hasRolehasRole
nam e = Group3
hasGroup
nam e = User2
hasRoleInGroup
nam e = U2G3R56
hasRoleInGroup
nam e = U2G1R25
hasRoleInGroup
hasGroup
nam e = Role6
hasRole
nam e = Role5
hasRole
nam e = Role2
hasRole hasRole
nam e = Group1
hasGroup
nam e = User1
hasRoleInGroup
nam e = U1G2R23
hasRoleInGroup
nam e = U1G1R12
hasRoleInGroup
hasRole hasGroup hasRole hasRole hasGroup
nam e = Role1
hasRole
To return Group1 and Group2 as User1 and User2 share at least one common role in these two groups thequery looks like this
Query
MATCH (u1)-[hasRoleInGroup]-gt(hyperEdge1)-[hasGroup]-gt(group)(hyperEdge1)-[hasRole]-gt(role)
(u2)-[hasRoleInGroup]-gt(hyperEdge2)-[hasGroup]-gt(group)(hyperEdge2)-[hasRole]-gt(role)
WHERE u1name = User1 AND u2name = User2
RETURN groupname count(role)
ORDER BY groupname ASC
The groups where User1 and User2 share at least one common role
Resultgroupname count(role)
Group1 1
Group2 1
2 rows
Advanced Data Modeling Examples
69
63 Basic friend finding based on social neighborhoodImagine an example graph like the following one
Figure 63 Graph
nam e = Bill
nam e = Ian
knows
nam e = Derrick
knows
nam e = Sara
knows
knows nam e = Jill
knows
nam e = Joe
knows
knows
To find out the friends of Joersquos friends that are not already his friends the query looks like this
Query
MATCH (joe name Joe )-[knows22]-(friend_of_friend)
WHERE NOT (joe)-[knows]-(friend_of_friend)
RETURN friend_of_friendname COUNT()
ORDER BY COUNT() DESC friend_of_friendname
This returns a list of friends-of-friends ordered by the number of connections to them and secondly bytheir name
Resultfriend_of_friendname COUNT()
Ian 2
Derrick 1
Jill 1
3 rows
Advanced Data Modeling Examples
70
64 Co-favorited placesFigure 64 Graph
nam e = SaunaX nam e = CoffeeShop1
nam e = Cosy
tagged
nam e = Cool
tagged
nam e = MelsPlace
taggedtagged
nam e = CoffeeShop3
tagged
nam e = CoffeeShop2
tagged
nam e = CoffeShop2
nam e = Jill
favorite favorite favorite
nam e = Joe
favorite favorite favorite
Co-favorited places mdash users who like x also like yFind places that people also like who favorite this place
bull Determine who has favorited place xbull What else have they favorited that is not place x
Query
MATCH (place)lt-[favorite]-(person)-[favorite]-gt(stuff)
WHERE placename = CoffeeShop1
RETURN stuffname count()
ORDER BY count() DESC stuffname
The list of places that are favorited by people that favorited the start place
Resultstuffname count()
MelsPlace 2
CoffeShop2 1
SaunaX 1
3 rows
Co-Tagged places mdash places related through tagsFind places that are tagged with the same tags
bull Determine the tags for place xbull What else is tagged the same as x that is not x
Query
MATCH (place)-[tagged]-gt(tag)lt-[tagged]-(otherPlace)
WHERE placename = CoffeeShop1
RETURN otherPlacename collect(tagname)
ORDER BY length(collect(tagname)) DESC otherPlacename
This query returns other places than CoffeeShop1 which share the same tags they are ranked by thenumber of tags
ResultotherPlacename collect(tagname)
MelsPlace [Cosy Cool]
3 rows
Advanced Data Modeling Examples
71
otherPlacename collect(tagname)
CoffeeShop2 [Cool]
CoffeeShop3 [Cosy]
3 rows
Advanced Data Modeling Examples
72
65 Find people based on similar favoritesFigure 65 Graph
nam e = Sara
nam e = Bikes
favorite
nam e = Cats
favorite
nam e = Derrick
favoritefavorite
nam e = Jill
favorite
nam e = Joe
friend
favoritefavorite
To find out the possible new friends based on them liking similar things as the asking person use aquery like this
Query
MATCH (me name Joe )-[favorite]-gt(stuff)lt-[favorite]-(person)
WHERE NOT (me)-[friend]-(person)
RETURN personname count(stuff)
ORDER BY count(stuff) DESC
The list of possible friends ranked by them liking similar stuff that are not yet friends is returned
Resultpersonname count(stuff)
Derrick 2
Jill 1
2 rows
Advanced Data Modeling Examples
73
66 Find people based on mutual friends and groupsFigure 66 Graph
Node[0]
nam e = Bill
Node[1]
nam e = Group1
m em ber_of_group
Node[2]
nam e = Bob
m em ber_of_group
Node[3]
nam e = Jill
knows
m em ber_of_group
Node[4]
nam e = Joe
knows
m em ber_of_group
In this scenario the problem is to determine mutual friends and groups if any between persons If nomutual groups or friends are found there should be a 0 returned
Query
MATCH (me name Joe )(other)
WHERE othername IN [Jill Bob]
OPTIONAL MATCH pGroups=(me)-[member_of_group]-gt(mg)lt-[member_of_group]-(other)
OPTIONAL MATCH pMutualFriends=(me)-[knows]-gt(mf)lt-[knows]-(other)
RETURN othername AS name count(DISTINCT pGroups) AS mutualGroups
count(DISTINCT pMutualFriends) AS mutualFriends
ORDER BY mutualFriends DESC
The question we are asking is mdash how many unique paths are there between me and Jill the paths beingcommon group memberships and common friends If the paths are mandatory no results will bereturned if me and Bob lack any common friends and we donrsquot want that To make a path optional youhave to make at least one of itrsquos relationships optional That makes the whole path optional
Resultname mutualGroups mutualFriends
Jill 1 1
Bob 1 0
2 rows
Advanced Data Modeling Examples
74
67 Find friends based on similar taggingFigure 67 Graph
nam e = Anim als nam e = Hobby
nam e = Surfing
tagged
nam e = Sara
nam e = Horses
favorite
nam e = Bikes
favorite
tagged tagged
nam e = Cats
tagged
nam e = Derrick
favorite
nam e = Joe
favoritefavorite favoritefavorite
To find people similar to me based on the taggings of their favorited items one approach could be
bull Determine the tags associated with what I favoritebull What else is tagged with those tagsbull Who favorites items tagged with the same tagsbull Sort the result by how many of the same things these people like
Query
MATCH
(me)-[favorite]-gt(myFavorites)-[tagged]-gt(tag)lt-[tagged]-(theirFavorites)lt-[favorite]-(people)
WHERE mename = Joe AND NOT me=people
RETURN peoplename AS name count() AS similar_favs
ORDER BY similar_favs DESC
The query returns the list of possible friends ranked by them liking similar stuff that are not yet friends
Resultname similar_favs
Sara 2
Derrick 1
2 rows
Advanced Data Modeling Examples
75
68 Multirelational (social) graphsFigure 68 Graph
nam e = cats
nam e = nature
nam e = Ben
nam e = Sara
LIKES
FOLLOWS
nam e = cars
LIKES
nam e = bikes
LIKES
nam e = Joe
FOLLOWS
LIKES
FOLLOWS
LIKES
nam e = Maria
FOLLOWSLOVES
LIKES
FOLLOWSLOVES
This example shows a multi-relational network between persons and things they like A multi-relationalgraph is a graph with more than one kind of relationship between nodes
Query
MATCH (me name Joe )-[r1FOLLOWS|LOVES]-gt(other)-[r2]-gt(me)
WHERE type(r1)=type(r2)
RETURN othername type(r1)
The query returns people that FOLLOWS or LOVES Joe back
Resultothername type(r1)
Maria FOLLOWS
Maria LOVES
Sara FOLLOWS
3 rows
Advanced Data Modeling Examples
76
69 Implementing newsfeeds in a graph
nam e = Bob
nam e = Alice
FRIENDstatus = CONFIRMED
date = 1nam e = bob_s1text = bobs status1
STATUS
nam e = Joe
FRIENDstatus = PENDING
date = 2nam e = alice_s1text = Alices status1
STATUS
date = 4nam e = bob_s2text = bobs status2
NEXT
FRIENDstatus = CONFIRMED
date = 3nam e = joe_s1text = Joe status1
STATUS
date = 5nam e = alice_s2text = Alices status2
NEXT
date = 6nam e = joe_s2text = Joe status2
NEXT
Implementation of newsfeed or timeline feature is a frequent requirement for social applications Thefollowing exmaples are inspired by Newsfeed feature powered by Neo4j Graph Database2 The queryasked here is
Starting at me retrieve the time-ordered status feed of the status updates of me and and all friends thatare connected via a CONFIRMED FRIEND relationship to me
Query
MATCH (me name Joe )-[relsFRIEND01]-(myfriend)
WHERE ALL (r IN rels WHERE rstatus = CONFIRMED)
WITH myfriend
MATCH (myfriend)-[STATUS]-(latestupdate)-[NEXT01]-(statusupdates)
RETURN myfriendname AS name statusupdatesdate AS date statusupdatestext AS text
ORDER BY statusupdatesdate DESC LIMIT 3
To understand the strategy letrsquos divide the query into five steps
1 First Get the list of all my friends (along with me) through FRIEND relationship (MATCH (me nameJoe)-[relsFRIEND01]-(myfriend)) Also the WHERE predicate can be added to check whether thefriend request is pending or confirmed
2 httpswebarchiveorgweb20121102191919httptechfinin201210newsfeed-feature-powered-by-neo4j-graph-database
Advanced Data Modeling Examples
77
2 Get the latest status update of my friends through Status relationship (MATCH (myfriend)-[STATUS]-(latestupdate))
3 Get subsequent status updates (along with the latest one) of my friends through NEXT relationships(MATCH (myfriend)-[STATUS]-(latestupdate)-[NEXT01]-(statusupdates)) which will give you thelatest and one additional statusupdate adjust 01 to whatever suits your case
4 Sort the status updates by posted date (ORDER BY statusupdatesdate DESC)5 LIMIT the number of updates you need in every query (LIMIT 3)
Resultname date text
Joe 6 Joe status2
Bob 4 bobs status2
Joe 3 Joe status1
3 rows
Here the example shows how to add a new status update into the existing data for a user
Query
MATCH (me)
WHERE mename=Bob
OPTIONAL MATCH (me)-[rSTATUS]-(secondlatestupdate)
DELETE r
CREATE (me)-[STATUS]-gt(latest_update textStatusdate123 )
WITH latest_update collect(secondlatestupdate) AS seconds
FOREACH (x IN seconds | CREATE (latest_update)-[NEXT]-gt(x))
RETURN latest_updatetext AS new_status
Dividing the query into steps this query resembles adding new item in middle of a doubly linked list
1 Get the latest update (if it exists) of the user through the STATUS relationship (OPTIONAL MATCH (me)-[rSTATUS]-(secondlatestupdate))
2 Delete the STATUS relationship between user and secondlatestupdate (if it exists) as this wouldbecome the second latest update now and only the latest update would be added through a STATUSrelationship all earlier updates would be connected to their subsequent updates through a NEXTrelationship (DELETE r)
3 Now create the new statusupdate node (with text and date as properties) and connectthis with the user through a STATUS relationship (CREATE (me)-[STATUS]-gt(latest_update textStatusdate123 ))
4 Pipe over statusupdate or an empty collection to the next query part (WITH latest_updatecollect(secondlatestupdate) AS seconds)
5 Now create a NEXT relationship between the latest status update and the second latest status update(if it exists) (FOREACH(x in seconds | CREATE (latest_update)-[NEXT]-gt(x)))
Resultnew_status
Status
1 rowNodes created 1Relationships created 2Properties set 2Relationships deleted 1
Advanced Data Modeling Examples
78
Node[0]
nam e = Bob
Node[1]
date = 1nam e = bob_s1text = bobs status1
STATUS
Node[2]
date = 4nam e = bob_s2text = bobs status2
NEXT
Advanced Data Modeling Examples
79
610 Boosting recommendation resultsFigure 69 Graph
nam e = Clark Kent
nam e = Daily Planet
WORKS_ATweight = 2act ivity = 45
nam e = Jim m y Olsen
KNOWSweight = 4
nam e = Lois Lane
KNOWSweight = 4
WORKS_ATweight = 2act ivity = 10
nam e = Perry White
KNOWSweight = 4
WORKS_ATweight = 2act ivity = 56
nam e = Anderson Cooper
KNOWSweight = 4
KNOWSweight = 4
nam e = CNN
WORKS_ATweight = 2act ivity = 2
WORKS_ATweight = 2act ivity = 6
WORKS_ATweight = 2act ivity = 3
This query finds the recommended friends for the origin that are working at the same place as theorigin or know a person that the origin knows also the origin should not already know the target Thisrecommendation is weighted for the weight of the relationship r2 and boosted with a factor of 2 ifthere is an activity-property on that relationship
Query
MATCH (origin)-[r1KNOWS|WORKS_AT]-(c)-[r2KNOWS|WORKS_AT]-(candidate)
WHERE originname = Clark Kent AND type(r1)=type(r2) AND NOT (origin)-[KNOWS]-(candidate)
RETURN originname AS origin candidatename AS candidate SUM(ROUND(r2weight
+(COALESCE(r2activity
0) 2))) AS boost
ORDER BY boost DESC LIMIT 10
This returns the recommended friends for the origin nodes and their recommendation score
Resultorigin candidate boost
Clark Kent Perry White 22 0
Clark Kent Anderson Cooper 4 0
2 rows
Advanced Data Modeling Examples
80
611 Calculating the clustering coefficient of a networkFigure 610 Graph
nam e = startnode
KNOWS
KNOWS
KNOWS KNOWS
KNOWS KNOWS KNOWS
In this example adapted from Niko Gamulins blog post on Neo4j for Social Network Analysis3the graph in question is showing the 2-hop relationships of a sample person as nodes with KNOWSrelationships
The clustering coefficient4 of a selected node is defined as the probability that two randomly selectedneighbors are connected to each other With the number of neighbors as n and the number of mutualconnections between the neighbors r the calculation is
The number of possible connections between two neighbors is n(2(n-2)) = 4(2(4-2)) = 244 =6 where n is the number of neighbors n = 4 and the actual number r of connections is 1 Therefore theclustering coefficient of node 1 is 16
n and r are quite simple to retrieve via the following query
Query
MATCH (a name startnode )--(b)
WITH a count(DISTINCT b) AS n
MATCH (a)--()-[r]-()--(a)
RETURN n count(DISTINCT r) AS r
This returns n and r for the above calculations
Resultn r
4 1
1 row
3 httpmypetprojectsblogspotse201206social-network-analysis-with-neo4jhtml4 httpenwikipediaorgwikiClustering_coefficient
Advanced Data Modeling Examples
81
612 Pretty graphsThis section is showing how to create some of the named pretty graphs on Wikipedia5
Star graphThe graph is created by first creating a center node and then once per element in the range creates aleaf node and connects it to the center
Query
CREATE (center)
FOREACH (x IN range(16)| CREATE (leaf)(center)-[X]-gt(leaf))
RETURN id(center) AS id
The query returns the id of the center node
Resultid
0
1 rowNodes created 7Relationships created 6
Figure 611 Graph
X
X
X X
X
X
Wheel graphThis graph is created in a number of steps
bull Create a center nodebull Once per element in the range create a leaf and connect it to the centerbull Connect neighboring leafsbull Find the minimum and maximum leaf and connect thesebull Return the id of the center node
Query
CREATE (center)
5 httpenwikipediaorgwikiGallery_of_named_graphs
Advanced Data Modeling Examples
82
FOREACH (x IN range(16)| CREATE (leaf countx )(center)-[X]-gt(leaf))
WITH center
MATCH (large_leaf)lt--(center)--gt(small_leaf)
WHERE large_leafcount = small_leafcount + 1
CREATE (small_leaf)-[X]-gt(large_leaf)
WITH center min(small_leafcount) AS min max(large_leafcount) AS max
MATCH (first_leaf)lt--(center)--gt(last_leaf)
WHERE first_leafcount = min AND last_leafcount = max
CREATE (last_leaf)-[X]-gt(first_leaf)
RETURN id(center) AS id
The query returns the id of the center node
Resultid
0
1 rowNodes created 7Relationships created 12Properties set 6
Figure 612 Graph
count = 6
X
count = 5X
count = 4
X
count = 3
X
count = 2 X
count = 1
X
X
X
X
X
X
X
Complete graphTo create this graph we first create 6 nodes and label them with the Leaf label We then match all theunique pairs of nodes and create a relationship between them
Query
FOREACH (x IN range(16)| CREATE (leafLeaf count x ))
WITH
MATCH (leaf1Leaf)(leaf2Leaf)
WHERE id(leaf1)lt id(leaf2)
CREATE (leaf1)-[X]-gt(leaf2)
Nothing is returned by this query
Result
(empty result)
Nodes created 6Relationships created 15Properties set 6Labels added 6
Advanced Data Modeling Examples
83
Figure 613 Graph
Leaf
count = 1
Leaf
count = 6
X
Leaf
count = 5
X
Leaf
count = 4
X
Leaf
count = 3
X
Leaf
count = 2
X
X
XX XX
X
XX
X X
Friendship graphThis query first creates a center node and then once per element in the range creates a cycle graphand connects it to the center
Query
CREATE (center)
FOREACH (x IN range(13)| CREATE (leaf1)(leaf2)(center)-[X]-gt(leaf1)(center)-[X]-gt(leaf2)
(leaf1)-[X]-gt(leaf2))
RETURN ID(center) AS id
The id of the center node is returned by the query
Resultid
0
1 rowNodes created 7Relationships created 9
Advanced Data Modeling Examples
84
Figure 614 Graph
XX
XX
X
X
X
X
X
Advanced Data Modeling Examples
85
613 A multilevel indexing structure (path tree)In this example a multi-level tree structure is used to index event nodes (here Event1 Event2 and Event3in this case with a YEAR-MONTH-DAY granularity making this a timeline indexing structure Howeverthis approach should work for a wide range of multi-level ranges
The structure follows a couple of rules
bull Events can be indexed multiple times by connecting the indexing structure leafs with the events via aVALUE relationship
bull The querying is done in a path-range fashion That is the start- and end path from the indexing rootto the start and end leafs in the tree are calculated
bull Using Cypher the queries following different strategies can be expressed as path sections and puttogether using one single query
The graph below depicts a structure with 3 Events being attached to an index structure at differentleafs
Figure 615 Graph
Root
Year 2010
2010
Year 2011
2011
Month 12
12
Month 01
01
Day 31
31
Day 01
01
Day 02
02
Day 03
03
NEXT
Event1
VALUE
Event2
VALUE
NEXT
VALUE
NEXT
Event3
VALUE
Return zero rangeHere only the events indexed under one leaf (2010-12-31) are returned The query only needs one pathsegment rootPath (color Green) through the index
Advanced Data Modeling Examples
86
Figure 616 Graph
Root
Year 2010
2010
Year 2011
2011
Month 12
12
Month 01
01
Day 31
31
Day 01
01
Day 02
02
Day 03
03
NEXT
Event1
VALUE
Event2
VALUE
NEXT
VALUE
NEXT
Event3
VALUE
Query
MATCH rootPath=(root)-[`2010`]-gt()-[`12`]-gt()-[`31`]-gt(leaf)(leaf)-[VALUE]-gt(event)
WHERE rootname = Root
RETURN eventname
ORDER BY eventname ASC
Returning all events on the date 2010-12-31 in this case Event1 and Event2
Resulteventname
Event1
Event2
2 rows
Return the full rangeIn this case the range goes from the first to the last leaf of the index tree Here startPath (colorGreenyellow) and endPath (color Green) span up the range valuePath (color Blue) is then connecting theleafs and the values can be read from the middle node hanging off the values (color Red) path
Advanced Data Modeling Examples
87
Figure 617 Graph
Root
Year 2010
2010
Year 2011
2011
Month 12
12
Month 01
01
Day 31
31
Day 01
01
Day 02
02
Day 03
03
NEXT
Event1
VALUE
Event2
VALUE
NEXT
VALUE
NEXT
Event3
VALUE
Query
MATCH startPath=(root)-[`2010`]-gt()-[`12`]-gt()-[`31`]-gt(startLeaf)
endPath=(root)-[`2011`]-gt()-[`01`]-gt()-[`03`]-gt(endLeaf)
valuePath=(startLeaf)-[NEXT0]-gt(middle)-[NEXT0]-gt(endLeaf)
vals=(middle)-[VALUE]-gt(event)
WHERE rootname = Root
RETURN eventname
ORDER BY eventname ASC
Returning all events between 2010-12-31 and 2011-01-03 in this case all events
Resulteventname
Event1
Event2
Event2
Event3
4 rows
Return partly shared path rangesHere the query range results in partly shared paths when querying the index making the introductionof and common path segment commonPath (color Black) necessary before spanning up startPath (colorGreenyellow) and endPath (color Darkgreen) After that valuePath (color Blue) connects the leafs and theindexed values are returned off values (color Red) path
Advanced Data Modeling Examples
88
Figure 618 Graph
Root
Year 2010
2010
Year 2011
2011
Month 12
12
Month 01
01
Day 31
31
Day 01
01
Day 02
02
Day 03
03
NEXT
Event1
VALUE
Event2
VALUE
NEXT
VALUE
NEXT
Event3
VALUE
Query
MATCH commonPath=(root)-[`2011`]-gt()-[`01`]-gt(commonRootEnd)
startPath=(commonRootEnd)-[`01`]-gt(startLeaf) endPath=(commonRootEnd)-[`03`]-gt(endLeaf)
valuePath=(startLeaf)-[NEXT0]-gt(middle)-[NEXT0]-gt(endLeaf)
vals=(middle)-[VALUE]-gt(event)
WHERE rootname = Root
RETURN eventname
ORDER BY eventname ASC
Returning all events between 2011-01-01 and 2011-01-03 in this case Event2 and Event3
Resulteventname
Event2
Event3
2 rows
Advanced Data Modeling Examples
89
614 Complex similarity computationsCalculate similarities by complex calculationsHere a similarity between two players in a game is calculated by the number of times they have eatenthe same food
Query
MATCH (me name me )-[r1ATE]-gt(food)lt-[r2ATE]-(you)
WITH mecount(DISTINCT r1) AS H1count(DISTINCT r2) AS H2you
MATCH (me)-[r1ATE]-gt(food)lt-[r2ATE]-(you)
RETURN sum((1-ABS(r1timesH1-r2timesH2))(r1times+r2times)(H1+H2)) AS similarity
The two players and their similarity measure
Resultsimilarity
-30 0
1 row
Figure 619 Graph
nam e = m e
nam e = m eat
ATEt im es = 10
nam e = you
ATEt im es = 5
Advanced Data Modeling Examples
90
615 The Graphity activity stream modelFind Activity Streams in a network without scaling penaltyThis is an approach for scaling the retrieval of activity streams in a friend graph put forward by RenePickard as Graphity6 In short a linked list is created for every persons friends in the order that the lastactivities of these friends have occured When new activities occur for a friend all the ordered friendlists that this friend is part of are reordered transferring computing load to the time of new eventupdates instead of activity stream reads
TipThis approach of course makes excessive use of relationship types This needs to betaken into consideration when designing a production system with this approach SeeSection 175 ldquoCapacityrdquo [284] for the maximum number of relationship types
To find the activity stream for a person just follow the linked list of the friend list and retrieve theneeded amount of activities form the respective activity list of the friends
Query
MATCH p=(me name Jane )-[jane_knows]-gt(friend)(friend)-[has]-gt(status)
RETURN mename friendname statusname length(p)
ORDER BY length(p)
The returns the activity stream for Jane
Resultmename friendname statusname length(p)
Jane Bill Bill_s1 1
Jane Joe Joe_s1 2
Jane Bob Bob_s1 3
3 rows
6 httpwwwrene-pickhardtdegraphity-an-efficient-graph-model-for-retrieving-the-top-k-news-feeds-for-users-in-social-networks
Advanced Data Modeling Examples
91
Figure 620 Graph
nam e = Bill
nam e = Joe
jane_knows
nam e = Bill_s1
has
nam e = Joe_s1
has
nam e = Bob
jane_knows
nam e = Bill_s2
next
nam e = Ted_s1
nam e = Ted_s2
next
nam e = Jane
jane_knows
nam e = Joe_s2
next
nam e = Bob_s1
has
nam e = Ted
bob_knows
bob_knows
has
Advanced Data Modeling Examples
92
616 User roles in graphsThis is an example showing a hierarchy of roles Whatrsquos interesting is that a tree is not sufficient forstoring this kind of structure as elaborated below
This is an implementation of an example found in the article A Model to Represent Directed AcyclicGraphs (DAG) on SQL Databases7 by Kemal Erdogan8 The article discusses how to store directedacyclic graphs9 (DAGs) in SQL based DBs DAGs are almost trees but with a twist it may be possible toreach the same node through different paths Trees are restricted from this possibility which makesthem much easier to handle In our case it is ldquoAlirdquo and ldquoEnginrdquo as they are both admins and users andthus reachable through these group nodes Reality often looks this way and canrsquot be captured by treestructures
In the article an SQL Stored Procedure solution is provided The main idea that also have some supportfrom scientists is to pre-calculate all possible (transitive) paths Pros and cons of this approach
bull decent performance on readbull low performance on insertbull wastes lots of spacebull relies on stored procedures
In Neo4j storing the roles is trivial In this case we use PART_OF (green edges) relationships to model thegroup hierarchy and MEMBER_OF (blue edges) to model membership in groups We also connect the toplevel groups to the reference node by ROOT relationships This gives us a useful partitioning of the graphNeo4j has no predefined relationship types you are free to create any relationship types and give themthe semantics you want
Lets now have a look at how to retrieve information from the graph The the queries are done usingCypher the Java code is using the Neo4j Traversal API (see Section 342 ldquoTraversal Framework JavaAPIrdquo [611] which is part of Part VII ldquoAdvanced Usagerdquo [558])
Get the adminsIn Cypher we could get the admins like this
7 httpwwwcodeprojectcomArticles22824A-Model-to-Represent-Directed-Acyclic-Graphs-DAG-o8 httpwwwcodeprojectcomscriptArticlesMemberArticlesaspxamid=2745189 httpenwikipediaorgwikiDirected_acyclic_graph
Advanced Data Modeling Examples
93
MATCH ( name Admins )lt-[PART_OF0]-(group)lt-[MEMBER_OF]-(user)
RETURN username groupname
resulting in
username groupname
Ali Admins
Demet HelpDesk
Engin HelpDesk
3 rows
And herersquos the code when using the Java Traversal API
Node admins = getNodeByName( Admins )
TraversalDescription traversalDescription = dbtraversalDescription()
breadthFirst()
evaluator( EvaluatorsexcludeStartPosition() )
relationships( RoleRelsPART_OF DirectionINCOMING )
relationships( RoleRelsMEMBER_OF DirectionINCOMING )
Traverser traverser = traversalDescriptiontraverse( admins )
resulting in the output
Found Ali at depth 0
Found HelpDesk at depth 0
Found Demet at depth 1
Found Engin at depth 1
The result is collected from the traverser using this code
String output =
for ( Path path traverser )
Node node = pathendNode()
output += Found + nodegetProperty( NAME ) + at depth
+ ( pathlength() - 1 ) + n
Get the group memberships of a userIn Cypher
MATCH ( name Jale )-[MEMBER_OF]-gt()-[PART_OF0]-gt(group)
RETURN groupname
groupname
ABCTechnicians
Technicians
Users
3 rows
Using the Neo4j Java Traversal API this query looks like
Node jale = getNodeByName( Jale )
traversalDescription = dbtraversalDescription()
depthFirst()
evaluator( EvaluatorsexcludeStartPosition() )
relationships( RoleRelsMEMBER_OF DirectionOUTGOING )
relationships( RoleRelsPART_OF DirectionOUTGOING )
Advanced Data Modeling Examples
94
traverser = traversalDescriptiontraverse( jale )
resulting in
Found ABCTechnicians at depth 0
Found Technicians at depth 1
Found Users at depth 2
Get all groupsIn Cypher
MATCH ( name Reference_Node )lt-[ROOT]-gt()lt-[PART_OF0]-(group)
RETURN groupname
groupname
Users
Managers
Technicians
ABCTechnicians
Admins
HelpDesk
6 rows
In Java
Node referenceNode = getNodeByName( Reference_Node)
traversalDescription = dbtraversalDescription()
breadthFirst()
evaluator( EvaluatorsexcludeStartPosition() )
relationships( RoleRelsROOT DirectionINCOMING )
relationships( RoleRelsPART_OF DirectionINCOMING )
traverser = traversalDescriptiontraverse( referenceNode )
resulting in
Found Users at depth 0
Found Admins at depth 0
Found Technicians at depth 1
Found Managers at depth 1
Found HelpDesk at depth 1
Found ABCTechnicians at depth 2
Get all members of all groupsNow letrsquos try to find all users in the system being part of any group
In Cypher this looks like
MATCH ( name Reference_Node )lt-[ROOT]-gt(root) p=(root)lt-[PART_OF0]-()lt-[MEMBER_OF]-(user)
RETURN username min(length(p))
ORDER BY min(length(p)) username
and results in the following output
username min(length(p))
Ali 1
Burcu 1
10 rows
Advanced Data Modeling Examples
95
username min(length(p))
Can 1
Engin 1
Demet 2
Fuat 2
Gul 2
Hakan 2
Irmak 2
Jale 3
10 rows
in Java
traversalDescription = dbtraversalDescription()
breadthFirst()
evaluator(
EvaluatorsincludeWhereLastRelationshipTypeIs( RoleRelsMEMBER_OF ) )
traverser = traversalDescriptiontraverse( referenceNode )
Found Can at depth 1
Found Burcu at depth 1
Found Engin at depth 1
Found Ali at depth 1
Found Irmak at depth 2
Found Hakan at depth 2
Found Fuat at depth 2
Found Gul at depth 2
Found Demet at depth 2
Found Jale at depth 3
As seen above querying even more complex scenarios can be done using comparatively shortconstructs in Cypher or Java
96
Chapter 7 Languages
Please see httpneo4jcomdeveloperlanguage-guides for the current set of drivers
Therersquos an included Java example which shows a ldquolow-levelrdquo approach to using the Neo4j REST API fromJava
Languages
97
71 How to use the REST API from JavaCreating a graph through the REST API from JavaThe REST API uses HTTP and JSON so that it can be used from many languages and platforms Stillwhen geting started itrsquos useful to see some patterns that can be re-used In this brief overview wersquollshow you how to create and manipulate a simple graph through the REST API and also how to query itFor these examples wersquove chosen the Jersey1 client components which are easily downloaded2 viaMaven
Start the serverBefore we can perform any actions on the server we need to start it as per Section 232 ldquoServerInstallationrdquo [438] Next up wersquoll check the connection to the server
WebResource resource = Clientcreate()
resource( SERVER_ROOT_URI )
ClientResponse response = resourceget( ClientResponseclass )
Systemoutprintln( Stringformat( GET on [s] status code [d]
SERVER_ROOT_URI responsegetStatus() ) )
responseclose()
If the status of the response is 200 OK then we know the server is running fine and we can continue Ifthe code fails to connect to the server then please have a look at Part V ldquoOperationsrdquo [434]
NoteIf you get any other response than 200 OK (particularly 4xx or 5xx responses) then pleasecheck your configuration and look in the log files in the datalog directory
Sending CypherUsing the REST API we can send Cypher queries to the server This is the main way to use Neo4j Itallows control of the transactional boundaries as neededLetrsquos try to use this to list all the nodes in the database which have a name property
final String txUri = SERVER_ROOT_URI + transactioncommit
WebResource resource = Clientcreate()resource( txUri )
String payload = statements [ statement +query + ]
ClientResponse response = resource
accept( MediaTypeAPPLICATION_JSON )
type( MediaTypeAPPLICATION_JSON )
entity( payload )
post( ClientResponseclass )
Systemoutprintln( Stringformat(
POST [s] to [s] status code [d] returned data
+ SystemlineSeparator() + s
payload txUri responsegetStatus()
responsegetEntity( Stringclass ) ) )
responseclose()
For more details see Section 211 ldquoTransactional Cypher HTTP endpointrdquo [298]
Fine-grained REST API callsFor exploratory and special purposes there is a fine grained REST API see Chapter 21 REST API [297]The following sections highlight some of the basic operations
1 httpjerseyjavanet2 httpsjerseyjavanetnonavdocumentation19user-guidehtmlchapter_deps
Languages
98
Creating a nodeThe REST API uses POST to create nodes Encapsulating that in Java is straightforward using the Jerseyclient
final String nodeEntryPointUri = SERVER_ROOT_URI + node
httplocalhost7474dbdatanode
WebResource resource = Clientcreate()
resource( nodeEntryPointUri )
POST to the node entry point URI
ClientResponse response = resourceaccept( MediaTypeAPPLICATION_JSON )
type( MediaTypeAPPLICATION_JSON )
entity( )
post( ClientResponseclass )
final URI location = responsegetLocation()
Systemoutprintln( Stringformat(
POST to [s] status code [d] location header [s]
nodeEntryPointUri responsegetStatus() locationtoString() ) )
responseclose()
return location
If the call completes successfully under the covers it will have sent a HTTP request containing a JSONpayload to the server The server will then have created a new node in the database and respondedwith a 201 Created response and a Location header with the URI of the newly created nodeIn our example we call this functionality twice to create two nodes in our database
Adding propertiesOnce we have nodes in our datatabase we can use them to store useful data In this case wersquore goingto store information about music in our database Letrsquos start by looking at the code that we use tocreate nodes and add properties Here wersquove added nodes to represent Joe Strummer and a bandcalled The Clash
URI firstNode = createNode()
addProperty( firstNode name Joe Strummer )
URI secondNode = createNode()
addProperty( secondNode band The Clash )
Inside the addProperty method we determine the resource that represents properties for the node anddecide on a name for that property We then proceed to PUT the value of that property to the server
String propertyUri = nodeUritoString() + properties + propertyName
httplocalhost7474dbdatanodenode_idpropertiesproperty_name
WebResource resource = Clientcreate()
resource( propertyUri )
ClientResponse response = resourceaccept( MediaTypeAPPLICATION_JSON )
type( MediaTypeAPPLICATION_JSON )
entity( + propertyValue + )
put( ClientResponseclass )
Systemoutprintln( Stringformat( PUT to [s] status code [d]
propertyUri responsegetStatus() ) )
responseclose()
If everything goes well wersquoll get a 204 No Content back indicating that the server processed the requestbut didnrsquot echo back the property value
Adding relationshipsNow that we have nodes to represent Joe Strummer and The Clash we can relate them The RESTAPI supports this through a POST of a relationship representation to the start node of the relationship
Languages
99
Correspondingly in Java we POST some JSON to the URI of our node that represents Joe Strummer toestablish a relationship between that node and the node representing The Clash
URI relationshipUri = addRelationship( firstNode secondNode singer
from 1976 until 1986 )
Inside the addRelationship method we determine the URI of the Joe Strummer nodersquos relationshipsand then POST a JSON description of our intended relationship This description contains the destinationnode a label for the relationship type and any attributes for the relation as a JSON collection
private static URI addRelationship( URI startNode URI endNode
String relationshipType String jsonAttributes )
throws URISyntaxException
URI fromUri = new URI( startNodetoString() + relationships )
String relationshipJson = generateJsonRelationship( endNode
relationshipType jsonAttributes )
WebResource resource = Clientcreate()
resource( fromUri )
POST JSON to the relationships URI
ClientResponse response = resourceaccept( MediaTypeAPPLICATION_JSON )
type( MediaTypeAPPLICATION_JSON )
entity( relationshipJson )
post( ClientResponseclass )
final URI location = responsegetLocation()
Systemoutprintln( Stringformat(
POST to [s] status code [d] location header [s]
fromUri responsegetStatus() locationtoString() ) )
responseclose()
return location
If all goes well we receive a 201 Created status code and a Location header which contains a URI of thenewly created relation
Add properties to a relationshipLike nodes relationships can have properties Since wersquore big fans of both Joe Strummer and the Clashwersquoll add a rating to the relationship so that others can see hersquos a 5-star singer with the band
addMetadataToProperty( relationshipUri stars 5 )
Inside the addMetadataToProperty method we determine the URI of the properties of the relationshipand PUT our new values (since itrsquos PUT it will always overwrite existing values so be careful)
private static void addMetadataToProperty( URI relationshipUri
String name String value ) throws URISyntaxException
URI propertyUri = new URI( relationshipUritoString() + properties )
String entity = toJsonNameValuePairCollection( name value )
WebResource resource = Clientcreate()
resource( propertyUri )
ClientResponse response = resourceaccept( MediaTypeAPPLICATION_JSON )
type( MediaTypeAPPLICATION_JSON )
entity( entity )
put( ClientResponseclass )
Systemoutprintln( Stringformat(
PUT [s] to [s] status code [d] entity propertyUri
responsegetStatus() ) )
responseclose()
Languages
100
Assuming all goes well wersquoll get a 204 OK response back from the server (which we can check by callingClientResponsegetStatus()) and wersquove now established a very small graph that we can query
Querying graphsAs with the embedded version of the database the Neo4j server uses graph traversals to look for datain graphs Currently the Neo4j server expects a JSON payload describing the traversal to be POST-ed atthe starting node for the traversal (though this is likely to change in time to a GET-based approach)
To start this process we use a simple class that can turn itself into the equivalent JSON ready for POST-ing to the server and in this case wersquove hardcoded the traverser to look for all nodes with outgoingrelationships with the type singer
TraversalDefinition turns into JSON to send to the Server
TraversalDefinition t = new TraversalDefinition()
tsetOrder( TraversalDefinitionDEPTH_FIRST )
tsetUniqueness( TraversalDefinitionNODE )
tsetMaxDepth( 10 )
tsetReturnFilter( TraversalDefinitionALL )
tsetRelationships( new Relation( singer RelationOUT ) )
Once we have defined the parameters of our traversal we just need to transfer it We do this bydetermining the URI of the traversers for the start node and then POST-ing the JSON representation ofthe traverser to it
URI traverserUri = new URI( startNodetoString() + traversenode )
WebResource resource = Clientcreate()
resource( traverserUri )
String jsonTraverserPayload = ttoJson()
ClientResponse response = resourceaccept( MediaTypeAPPLICATION_JSON )
type( MediaTypeAPPLICATION_JSON )
entity( jsonTraverserPayload )
post( ClientResponseclass )
Systemoutprintln( Stringformat(
POST [s] to [s] status code [d] returned data
+ SystemlineSeparator() + s
jsonTraverserPayload traverserUri responsegetStatus()
responsegetEntity( Stringclass ) ) )
responseclose()
Once that request has completed we get back our dataset of singers and the bands they belong to
[
outgoing_relationships httplocalhost7474dbdatanode82relationshipsout
data
band The Clash
name Joe Strummer
traverse httplocalhost7474dbdatanode82traversereturnType
all_typed_relationships httplocalhost7474dbdatanode82relationshipsall-list|amp|types
property httplocalhost7474dbdatanode82propertieskey
all_relationships httplocalhost7474dbdatanode82relationshipsall
self httplocalhost7474dbdatanode82
properties httplocalhost7474dbdatanode82properties
outgoing_typed_relationships httplocalhost7474dbdatanode82relationshipsout-list|amp|types
incoming_relationships httplocalhost7474dbdatanode82relationshipsin
incoming_typed_relationships httplocalhost7474dbdatanode82relationshipsin-list|amp|types
create_relationship httplocalhost7474dbdatanode82relationships
outgoing_relationships httplocalhost7474dbdatanode83relationshipsout
Languages
101
data
traverse httplocalhost7474dbdatanode83traversereturnType
all_typed_relationships httplocalhost7474dbdatanode83relationshipsall-list|amp|types
property httplocalhost7474dbdatanode83propertieskey
all_relationships httplocalhost7474dbdatanode83relationshipsall
self httplocalhost7474dbdatanode83
properties httplocalhost7474dbdatanode83properties
outgoing_typed_relationships httplocalhost7474dbdatanode83relationshipsout-list|amp|types
incoming_relationships httplocalhost7474dbdatanode83relationshipsin
incoming_typed_relationships httplocalhost7474dbdatanode83relationshipsin-list|amp|types
create_relationship httplocalhost7474dbdatanode83relationships
]
Phew is that itThatrsquos a flavor of what we can do with the REST API Naturally any of the HTTP idioms we provide on theserver can be easily wrapped including removing nodes and relationships through DELETE Still if yoursquovegotten this far then switching post() for delete() in the Jersey client code should be straightforward
Whatrsquos nextThe HTTP API provides a good basis for implementers of client libraries itrsquos also great for HTTP andREST folks In the future though we expect that idiomatic language bindings will appear to takeadvantage of the REST API while providing comfortable language-level constructs for developers to usemuch as there are similar bindings for the embedded database
Appendix the code
bull CreateSimpleGraphjava3
bull Relationjava4
bull TraversalDefinitionjava5
3 httpsgithubcomneo4jneo4jblob230communityserver-examplessrcmainjavaorgneo4jexamplesserverCreateSimpleGraphjava4 httpsgithubcomneo4jneo4jblob230communityserver-examplessrcmainjavaorgneo4jexamplesserverRelationjava5 httpsgithubcomneo4jneo4jblob230communityserver-examplessrcmainjavaorgneo4jexamplesserverTraversalDefinitionjava
Part III Cypher Query LanguageThe Cypher part is the authoritative source for details on the Cypher Query Language For a shortintroduction see Section 81 ldquoWhat is Cypherrdquo [106] To take your first steps with Cypher seeChapter 3 Introduction to Cypher [16] For the terminology used see Terminology [636]
103
8 Introduction 10581 What is Cypher 10682 Updating the graph 10983 Transactions 11084 Uniqueness 11185 Parameters 11386 Compatibility 117
9 Syntax 11891 Values 11992 Expressions 12093 Identifiers 12394 Operators 12495 Comments 12696 Patterns 12797 Collections 13198 Working with NULL 134
10 General Clauses 136101 Return 137102 Order by 140103 Limit 142104 Skip 144105 With 146106 Unwind 148107 Union 150108 Using 152
11 Reading Clauses 155111 Match 156112 Optional Match 165113 Where 167114 Start 175115 Aggregation 177116 Load CSV 183
12 Writing Clauses 186121 Create 187122 Merge 192123 Set 200124 Delete 204125 Remove 205126 Foreach 207127 Create Unique 208128 Importing CSV files with Cypher 211129 Using Periodic Commit 213
13 Functions 214131 Predicates 215132 Scalar functions 218133 Collection functions 224134 Mathematical functions 229135 String functions 238
14 Schema 243141 Indexes 244142 Constraints 247143 Statistics 252
15 Query Tuning 253151 How are queries executed 254152 How do I profile a query 255153 Basic query tuning example 256
Cypher Query Language
104
16 Execution Plans 259161 Starting point operators 260162 Expand operators 263163 Combining operators 265164 Row operators 270165 Update Operators 275
105
Chapter 8 Introduction
To get an overview of Cypher continue reading Section 81 ldquoWhat is Cypherrdquo [106] The rest of thischapter deals with the context of Cypher statements like for example transaction management andhow to use parameters For the Cypher language reference itself see other chapters at Part III ldquoCypherQuery Languagerdquo [102] To take your first steps with Cypher see Chapter 3 Introduction to Cypher [16]For the terminology used see Terminology [636]
Introduction
106
81 What is CypherIntroductionCypher is a declarative graph query language that allows for expressive and efficient querying andupdating of the graph store Cypher is a relatively simple but still very powerful language Verycomplicated database queries can easily be expressed through Cypher This allows you to focus onyour domain instead of getting lost in database access
Cypher is designed to be a humane query language suitable for both developers and (importantly wethink) operations professionals Our guiding goal is to make the simple things easy and the complexthings possible Its constructs are based on English prose and neat iconography which helps to makequeries more self-explanatory We have tried to optimize the language for reading and not for writing
Being a declarative language Cypher focuses on the clarity of expressing what to retrieve from a graphnot on how to retrieve it This is in contrast to imperative languages like Java scripting languages likeGremlin1 and the JRuby Neo4j bindings2 This approach makes query optimization an implementationdetail instead of burdening the user with it and requiring her to update all traversals just because thephysical database structure has changed (new indexes etc)
Cypher is inspired by a number of different approaches and builds upon established practices forexpressive querying Most of the keywords like WHERE and ORDER BY are inspired by SQL3 Patternmatching borrows expression approaches from SPARQL4 Some of the collection semantics have beenborrowed from languages such as Haskell and Python
StructureCypher borrows its structure from SQL mdash queries are built up using various clauses
Clauses are chained together and the they feed intermediate result sets between each other Forexample the matching identifiers from one MATCH clause will be the context that the next clause existsin
The query language is comprised of several distinct clauses You can read more details about them laterin the manual
Here are a few clauses used to read from the graph
bull MATCH The graph pattern to match This is the most common way to get data from the graphbull WHERE Not a clause in itrsquos own right but rather part of MATCH OPTIONAL MATCH and WITH Adds constraints
to a pattern or filters the intermediate result passing through WITHbull RETURN What to return
Letrsquos see MATCH and RETURN in action
Imagine an example graph like the following one
1 httpgremlintinkerpopcom2 httpsgithubcomneo4jrbneo4j3 httpenwikipediaorgwikiSQL4 httpenwikipediaorgwikiSPARQL
Introduction
107
Figure 81 Example Graph
nam e = Sara
nam e = Maria
friend
nam e = Steve
nam e = John
friend
nam e = Joe
friend
friend
For example here is a query which finds a user called John and Johnrsquos friends (though not his directfriends) before returning both John and any friends-of-friends that are found
MATCH (john name John)-[friend]-gt()-[friend]-gt(fof)
RETURN johnname fofname
Resulting in
johnname fofname
John Maria
John Steve
2 rows
Next up we will add filtering to set more parts in motion
We take a list of user names and find all nodes with names from this list match their friends and returnonly those followed users who have a name property starting with S
MATCH (user)-[friend]-gt(follower)
WHERE username IN [Joe John Sara Maria Steve] AND followername =~ S
RETURN username followername
Resulting in
username followername
John Sara
Joe Steve
2 rows
And here are examples of clauses that are used to update the graph
bull CREATE (and DELETE) Create (and delete) nodes and relationshipsbull SET (and REMOVE) Set values to properties and add labels on nodes using SET and use REMOVE to remove
thembull MERGE Match existing or create new nodes and patterns This is especially useful together with
uniqueness constraints
For more Cypher examples see Chapter 5 Basic Data Modeling Examples [47] as well as the rest ofthe Cypher part with details on the language To use Cypher from Java see Section 3314 ldquoExecute
Introduction
108
Cypher Queries from Javardquo [605] To take your first steps with Cypher see Chapter 3 Introduction toCypher [16]
Introduction
109
82 Updating the graphCypher can be used for both querying and updating your graph
The Structure of Updating Queries
bull A Cypher query part canrsquot both match and update the graph at the same timebull Every part can either read and match on the graph or make updates on it
If you read from the graph and then update the graph your query implicitly has two parts mdash thereading is the first part and the writing is the second part
If your query only performs reads Cypher will be lazy and not actually match the pattern until you askfor the results In an updating query the semantics are that all the reading will be done before anywriting actually happens
The only pattern where the query parts are implicit is when you first read and then write mdash any otherorder and you have to be explicit about your query parts The parts are separated using the WITHstatement WITH is like an event horizon mdash itrsquos a barrier between a plan and the finished execution ofthat plan
When you want to filter using aggregated data you have to chain together two reading queryparts mdash the first one does the aggregating and the second filters on the results coming from the firstone
MATCH (n name John)-[FRIEND]-(friend)
WITH n count(friend) as friendsCount
WHERE friendsCount gt 3
RETURN n friendsCount
Using WITH you specify how you want the aggregation to happen and that the aggregation has to befinished before Cypher can start filtering
Herersquos an example of updating the graph writing the aggregated data to the graph
MATCH (n name John)-[FRIEND]-(friend)
WITH n count(friend) as friendsCount
SET nfriendCount = friendsCount
RETURN nfriendsCount
You can chain together as many query parts as the available memory permits
Returning dataAny query can return data If your query only reads it has to return data mdash it serves no purpose if itdoesnrsquot and it is not a valid Cypher query Queries that update the graph donrsquot have to return anythingbut they can
After all the parts of the query comes one final RETURN clause RETURN is not part of any query part mdash itis a period symbol at the end of a query The RETURN clause has three sub-clauses that come with itSKIPLIMIT and ORDER BY
If you return graph elements from a query that has just deleted them mdash beware you are holding apointer that is no longer valid Operations on that node are undefined
Introduction
110
83 TransactionsAny query that updates the graph will run in a transaction An updating query will always either fullysucceed or not succeed at all
Cypher will either create a new transaction or run inside an existing one
bull If no transaction exists in the running context Cypher will create one and commit it once the queryfinishes
bull In case there already exists a transaction in the running context the query will run inside it andnothing will be persisted to disk until that transaction is successfully committed
This can be used to have multiple queries be committed as a single transaction
1 Open a transaction2 run multiple updating Cypher queries3 and commit all of them in one go
Note that a query will hold the changes in memory until the whole query has finished executing A largequery will consequently need a JVM with lots of heap space
For using transactions over the REST API see Section 211 ldquoTransactional Cypher HTTPendpointrdquo [298]
When writing server extensions or using Neo4j embedded remember that all iterators returned froman execution result should be either fully exhausted or closed to ensure that the resources bound tothem will be properly released Resources include transactions started by the query so failing to do somay for example lead to deadlocks or other weird behavior
Introduction
111
84 UniquenessWhile pattern matching Neo4j makes sure to not include matches where the same graph relationshipis found multiple times in a single pattern In most use cases this is a sensible thing to do
Example looking for a userrsquos friends of friends should not return said user
Letrsquos create a few nodes and relationships
CREATE (adamUser name Adam )(pernillaUser name Pernilla )(davidUser name David
)
(adam)-[FRIEND]-gt(pernilla)(pernilla)-[FRIEND]-gt(david)
Which gives us the following graph
User
nam e = Adam
User
nam e = Pernilla
FRIEND
User
nam e = David
FRIEND
Now letrsquos look for friends of friends of Adam
MATCH (userUser name Adam )-[r1FRIEND]-()-[r2FRIEND]-(friend_of_a_friend)
RETURN friend_of_a_friendname AS fofName
fofName
David
1 row
In this query Cypher makes sure to not return matches where the pattern relationships r1 and r2 pointto the same graph relationship
This is however not always desired If the query should return the user it is possible to spread thematching over multiple MATCH clauses like so
MATCH (userUser name Adam )-[r1FRIEND]-(friend)
MATCH (friend)-[r2FRIEND]-(friend_of_a_friend)
RETURN friend_of_a_friendname AS fofName
fofName
David
Adam
2 rows
Note that while the following query looks similar to the previous one it is actually equivalent to the onebefore
Introduction
112
MATCH (userUser name Adam )-[r1FRIEND]-(friend)(friend)-[r2FRIEND]-(friend_of_a_friend)
RETURN friend_of_a_friendname AS fofName
Here the MATCH clause has a single pattern with two paths while the previous query has two distinctpatterns
fofName
David
1 row
Introduction
113
85 ParametersCypher supports querying with parameters This means developers donrsquot have to resort to stringbuilding to create a query In addition to that it also makes caching of execution plans much easier forCypher
Parameters can be used for literals and expressions in the WHERE clause for the index value in theSTART clause index queries and finally for noderelationship ids Parameters can not be used as forproperty names relationship types and labels since these patterns are part of the query structure thatis compiled into a query plan
Accepted names for parameters are letters and numbers and any combination of these
For details on using parameters via the Neo4j REST API see Section 211 ldquoTransactional CypherHTTP endpointrdquo [298] For details on parameters when using the Neo4j embedded Java API seeSection 3315 ldquoQuery Parametersrdquo [607]
Below follows a comprehensive set of examples of parameter usage The parameters are given as JSONhere Exactly how to submit them depends on the driver in use
String literalParameters
name Johan
Query
MATCH (n)
WHERE nname = name
RETURN n
You can use parameters in this syntax as well
Parameters
name Johan
Query
MATCH (n name name )
RETURN n
Regular expressionParameters
regex h
Query
MATCH (n)
WHERE nname =~ regex
RETURN nname
Case-sensitive string pattern matchingParameters
Introduction
114
name Michael
Query
MATCH (n)
WHERE nname STARTS WITH name
RETURN nname
Create node with propertiesParameters
props
position Developer
name Andres
Query
CREATE ( props )
Create multiple nodes with propertiesParameters
props [
position Developer
awesome true
name Andres
position Developer
name Michael
children 3
]
Query
CREATE (nPerson props )
RETURN n
Setting all properties on nodeNote that this will replace all the current properties
Parameters
props
position Developer
name Andres
Query
MATCH (n)
WHERE nname=Michaela
SET n = props
Introduction
115
SKIP and LIMITParameters
s 1
l 1
Query
MATCH (n)
RETURN nname
SKIP s
LIMIT l
Node idParameters
id 0
Query
MATCH n
WHERE id(n)= id
RETURN nname
Multiple node idsParameters
ids [ 0 1 2 ]
Query
MATCH n
WHERE id(n) IN ids
RETURN nname
Index value (legacy indexes)Parameters
value Michaela
Query
START n=nodepeople(name = value )
RETURN n
Index query (legacy indexes)Parameters
query nameAndreas
Introduction
116
Query
START n=nodepeople( query )
RETURN n
Introduction
117
86 CompatibilityCypher is still changing rather rapidly Parts of the changes are internal mdash we add new patternmatchers aggregators and optimizations or write new query planners which hopefully makes yourqueries run faster
Other changes are directly visible to our users mdash the syntax is still changing New concepts are beingadded and old ones changed to fit into new possibilities To guard you from having to keep up with oursyntax changes Neo4j allows you to use an older parser but still gain speed from new optimizations
There are two ways you can select which parser to use You can configure your database with theconfiguration parameter cypher_parser_version and enter which parser yoursquod like to use (see thesection called ldquoSupported Language Versionsrdquo [117])) Any Cypher query that doesnrsquot explicitly sayanything else will get the parser you have configured or the latest parser if none is configured
The other way is on a query by query basis By simply putting CYPHER 22 at the beginning thatparticular query will be parsed with the 22 version of the parser Below is an example using the STARTclause to access a legacy index
CYPHER 22
START n=nodenodes(name = A)
RETURN n
Accessing entities by id via STARTIn versions of Cypher prior to 22 it was also possible to access specific nodes or relationships using theSTART clause In this case you could use a syntax like the following
CYPHER 19
START n=node(42)
RETURN n
NoteThe use of the START clause to find nodes by ID was deprecated from Cypher 20 onwardsand is now entirely disabled in Cypher 22 and up You should instead make use of the MATCHclause for starting points See Section 111 ldquoMatchrdquo [156] for more information on thecorrect syntax for this The START clause should only be used when accessing legacy indexes(see Chapter 35 Legacy Indexing [617])
Supported Language VersionsNeo4j 23 supports the following versions of the Cypher language
bull Neo4j Cypher 23bull Neo4j Cypher 22bull Neo4j Cypher 19
TipEach release of Neo4j supports a limited number of old Cypher Language Versions Whenyou upgrade to a new release of Neo4j please make sure that it supports the Cypherlanguage version you need If not you may need to modify your queries to work with anewer Cypher language version
118
Chapter 9 Syntax
The nitty-gritty details of Cypher syntax
Syntax
119
91 ValuesAll values that are handled by Cypher have a distinct type The supported types of values are
bull Numeric valuesbull String valuesbull Boolean valuesbull Nodesbull Relationshipsbull Pathsbull Maps from Strings to other valuesbull Collections of any other type of value
Most types of values can be constructed in a query using literal expressions (see Section 92ldquoExpressionsrdquo [120]) Special care must be taken when using NULL as NULL is a value of every type (seeSection 98 ldquoWorking with NULLrdquo [134]) Nodes relationships and paths are returned as a result ofpattern matching
Note that labels are not values but are a form of pattern syntax
Syntax
120
92 ExpressionsExpressions in generalAn expression in Cypher can be
bull A decimal (integer or double) literal 13 -40000 314 6022E23bull A hexadecimal integer literal (starting with 0x) 0x13zf 0xFC3A9 -0x66effbull An octal integer literal (starting with 0) 01372 02127 -05671bull A string literal Hello Worldbull A boolean literal true false TRUE FALSEbull An identifier n x rel myFancyIdentifier `A name with weird stuff in it[]`bull A property nprop xprop relthisProperty myFancyIdentifier`(weird property name)`bull A dynamic property n[prop] rel[ncity + nzip] map[coll[0]]bull A parameter param 0bull A collection of expressions [a b] [123] [a 2 nproperty param] [ ]bull A function call length(p) nodes(p)bull An aggregate function avg(xprop) count()bull A path-pattern (a)--gt()lt--(b)bull An operator application 1 + 2 and 3 lt 4bull A predicate expression is an expression that returns true or false aprop = Hello length(p) gt 10
has(aname)bull A regular expression aname =~ Tobbull A case-sensitive string matching expression asurname STARTS WITH Sven asurname ENDS WITH son
or asurname CONTAINS sonbull A CASE expression
Note on string literalsString literals can contain these escape sequences
Escapesequence
Character
t Tab
b Backspace
n Newline
r Carriage return
f Form feed
Single quote
Double quote
Backslash
uxxxx Unicode UTF-16 code point (4hex digits must follow the u)
Uxxxxxxxx Unicode UTF-32 code point (8hex digits must follow the U)
Case ExpressionsCypher supports CASE expressions which is a generic conditional expression similar to ifelsestatements in other languages Two variants of CASE exist mdash the simple form and the generic form
Syntax
121
Simple CASEThe expression is calculated and compared in order with the WHEN clauses until a match is found If nomatch is found the expression in the ELSE clause is used or null if no ELSE case exists
Syntax
CASE test
WHEN value THEN result
[WHEN ]
[ELSE default]
END
Arguments
bull test A valid expressionbull value An expression whose result will be compared to the test expressionbull result This is the result expression used if the value expression matches the test expressionbull default The expression to use if no match is found
Query
MATCH (n)
RETURN
CASE neyes
WHEN blue
THEN 1
WHEN brown
THEN 2
ELSE 3 END AS result
Resultresult
2
1
2
1
3
5 rows
Generic CASEThe predicates are evaluated in order until a true value is found and the result value is used If nomatch is found the expression in the ELSE clause is used or null if no ELSE case exists
Syntax
CASE
WHEN predicate THEN result
[WHEN ]
[ELSE default]
END
Arguments
bull predicate A predicate that is tested to find a valid alternativebull result This is the result expression used if the predicate matchesbull default The expression to use if no match is found
Query
Syntax
122
MATCH (n)
RETURN
CASE
WHEN neyes = blue
THEN 1
WHEN nage lt 40
THEN 2
ELSE 3 END AS result
Resultresult
3
1
2
1
3
5 rows
Syntax
123
93 IdentifiersWhen you reference parts of a pattern or a query you do so by naming them The names you give thedifferent parts are called identifiers
In this example
MATCH (n)--gt(b) RETURN b
The identifiers are n and b
Identifier names are case sensitive and can contain underscores and alphanumeric characters (a-z0-9) but must always start with a letter If other characters are needed you can quote the identifierusing backquote (`) signs
The same rules apply to property names
Identifiers are only visible in the same query partIdentifiers are not carried over to subsequent queries If multiple query parts are chainedtogether using WITH identifiers have to be listed in the WITH clause to be carried over to thenext part For more information see Section 105 ldquoWithrdquo [146]
Syntax
124
94 OperatorsMathematical operatorsThe mathematical operators are + - and ^
Comparison operatorsThe comparison operators are = ltgt lt gt lt= gt= IS NULL and IS NOT NULL See the section called ldquoEqualityand Comparison of Valuesrdquo [124] on how they behave
The operators STARTS WITH ENDS WITH and CONTAINS can be used to search for a string value by itrsquoscontent
Boolean operatorsThe boolean operators are AND OR XOR NOT
String operatorsStrings can be concatenated using the + operator For regular expression matching the =~ operator isused
Collection operatorsCollections can be concatenated using the + operator To check if an element exists in a collection youcan use the IN operator
Property operators
NoteSince version 20 the previously existing property operators and have been removedThis syntax is no longer supported Missing properties are now returned as NULL Please use(NOT(has(ltidentgtprop)) OR ltidentgtprop=ltvaluegt) if you really need the old behavior of the operator mdash Also the use of for optional relationships has been removed in favor of thenewly introduced OPTIONAL MATCH clause
Equality and Comparison of Values
EqualityCypher supports comparing values (see Section 91 ldquoValuesrdquo [119]) by equality using the = and ltgtoperators
Values of the same type are only equal if they are the same identical value (eg 3 = 3 and x ltgt xy)
Maps are only equal if they map exactly the same keys to equal values and collections are only equal ifthey contain the same sequence of equal values (eg [3 4] = [1+2 82])
Values of different types are considered as equal according to the following rules
bull Paths are treated as collections of alternating nodes and relationships and are equal to all collectionsthat contain that very same sequence of nodes and relationships
bull Testing any value against NULL with both the = and the ltgt operators always is NULL This includes NULL =NULL and NULL ltgt NULL The only way to reliably test if a value v is NULL is by using the special v IS NULLor v IS NOT NULL equality operators
All other combinations of types of values cannot be compared with each other Especially nodesrelationships and literal maps are incomparable with each other
It is an error to compare values that cannot be compared
Syntax
125
Ordering and Comparison of ValuesThe comparison operators lt= lt (for ascending) and gt= gt (for descending) are used to compare valuesfor ordering The following points give some details on how the comparison is performed
bull Numerical values are compared for ordering using numerical order (eg 3 lt 4 is true)bull The special value javalangDoubleNaN is regarded as being larger than all other numbersbull String values are compared for ordering using lexicographic order (eg x lt xy)bull Boolean values are compared for ordering such that false lt truebull Comparing for ordering when one argument is NULL is NULL (eg NULL lt 3 is NULL)bull It is an error to compare other types of values with each other for ordering
Chaining Comparison OperationsComparisons can be chained arbitrarily eg x lt y lt= z is equivalent to x lt y AND y lt= z
Formally if a b c y z are expressions and op1 op2 opN are comparison operators then aop1 b op2 c y opN z is equivalent to a op1 b and b op2 c and y opN z
Note that a op1 b op2 c does not imply any kind of comparison between a and c so that eg x lt y gt zis perfectly legal (though perhaps not pretty)
The example
MATCH (n) WHERE 21 lt nage lt= 30 RETURN n
is equivalent to
MATCH (n) WHERE 21 lt nage AND nage lt= 30 RETURN n
Thus it will match all nodes where the age is between 21 and 30
This syntax extends to all equality and inequality comparisons as well as extending to chains longerthan three
For example
a lt b = c lt= d ltgt e
Is equivalent to
a lt b AND b = c AND c lt= d AND d ltgt e
For other comparison operators see the section called ldquoComparison operatorsrdquo [124]
Syntax
126
95 CommentsTo add comments to your queries use double slash Examples
MATCH (n) RETURN n This is an end of line comment
MATCH (n)
This is a whole line comment
RETURN n
MATCH (n) WHERE nproperty = This is NOT a comment RETURN n
Syntax
127
96 PatternsPatterns and pattern-matching are at the very heart of Cypher so being effective with Cypher requiresa good understanding of patterns
Using patterns you describe the shape of the data yoursquore looking for For example in the MATCH clauseyou describe the shape with a pattern and Cypher will figure out how to get that data for you
The pattern describes the data using a form that is very similar to how one typically draws the shape ofproperty graph data on a whiteboard usually as circles (representing nodes) and arrows between themto represent relationships
Patterns appear in multiple places in Cypher in MATCH CREATE and MERGE clauses and in patternexpressions Each of these is described in more details in
bull Section 111 ldquoMatchrdquo [156]bull Section 112 ldquoOptional Matchrdquo [165]bull Section 121 ldquoCreaterdquo [187]bull Section 122 ldquoMergerdquo [192]bull the section called ldquoUsing path patterns in WHERErdquo [171]
Patterns for nodesThe very simplest ldquoshaperdquo that can be described in a pattern is a node A node is described using a pairof parentheses and is typically given a name For example
(a)
This simple pattern describes a single node and names that node using the identifier a
Note that the parentheses may be omitted but only when there are no labels or properties specifiedfor the node pattern
Patterns for related nodesMore interesting is patterns that describe multiple nodes and relationships between them Cypherpatterns describe relationships by employing an arrow between two nodes For example
(a)--gt(b)
This pattern describes a very simple data shape two nodes and a single relationship from one to theother In this example the two nodes are both named as a and b respectively and the relationship isldquodirectedrdquo it goes from a to b
This way of describing nodes and relationships can be extended to cover an arbitrary number of nodesand the relationships between them for example
(a)--gt(b)lt--(c)
Such a series of connected nodes and relationships is called a path
Note that the naming of the nodes in these patterns is only necessary should one need to refer to thesame node again either later in the pattern or elsewhere in the Cypher query If this is not necessarythen the name may be omitted like so
(a)--gt()lt--(c)
LabelsIn addition to simply describing the shape of a node in the pattern one can also describe attributesThe most simple attribute that can be described in the pattern is a label that the node must have Forexample
Syntax
128
(aUser)--gt(b)
One can also describe a node that has multiple labels
(aUserAdmin)--gt(b)
Specifying propertiesNodes and relationships are the fundamental structures in a graph Neo4j uses properties on both ofthese to allow for far richer models
Properties can be expressed in patterns using a map-construct curly brackets surrounding a number ofkey-expression pairs separated by commas Eg a node with two properties on it would look like
(a name Andres sport Brazilian Ju-Jitsu )
A relationship with expectations on it would could look like
(a)-[blocked false]-gt(b)
When properties appear in patterns they add an additional constraint to the shape of the data Inthe case of a CREATE clause the properties will be set in the newly created nodes and relationshipsIn the case of a MERGE clause the properties will be used as additional constraints on the shape anyexisting data must have (the specified properties must exactly match any existing data in the graph)If no matching data is found then MERGE behaves like CREATE and the properties will be set in the newlycreated nodes and relationships
Note that patterns supplied to CREATE may use a single parameter to specify properties eg CREATE(node paramName) This is not possible with patterns used in other clauses as Cypher needs to knowthe property names at the time the query is compiled so that matching can be done effectively
Describing relationshipsThe simplest way to describe a relationship is by using the arrow between two nodes as in theprevious examples Using this technique you can describe that the relationship should exist and thedirectionality of it If you donrsquot care about the direction of the relationship the arrow head can beomitted like so
(a)--(b)
As with nodes relationships may also be given names In this case a pair of square brackets is used tobreak up the arrow and the identifier is placed between For example
(a)-[r]-gt(b)
Much like labels on nodes relationships can have types To describe a relationship with a specific typeyou can specify this like so
(a)-[rREL_TYPE]-gt(b)
Unlike labels relationships can only have one type But if wersquod like to describe some data such that therelationship could have any one of a set of types then they can all be listed in the pattern separatingthem with the pipe symbol | like this
(a)-[rTYPE1|TYPE2]-gt(b)
Note that this form of pattern can only be used to describe existing data (ie when using a patternwith MATCH or as an expression) It will not work with CREATE or MERGE since itrsquos not possible to create arelationship with multiple types
As with nodes the name of the relationship can always be omitted in this case like so
(a)-[REL_TYPE]-gt(b)
Syntax
129
Variable length
CautionVariable length pattern matching in versions 21x and earlier does not enforce relationshipuniqueness for patterns described inside of a single MATCH clause This means that a querysuch as the following MATCH (a)-[r]-gt(b) (a)-[rs]-gt(c) RETURN may include r as part ofthe rs set This behavior has changed in versions 220 and later in such a way that r will beexcluded from the result set as this better adheres to the rules of relationship uniquenessas documented here Section 84 ldquoUniquenessrdquo [111] If you have a query pattern that needsto retrace relationships rather than ignoring them as the relationship uniqueness rulesnormally dictate you can accomplish this using multiple match clauses as follows MATCH(a)-[r]-gt(b) MATCH (a)-[rs]-gt(c) RETURN This will work in all versions of Neo4j thatsupport the MATCH clause namely 200 and later
Rather than describing a long path using a sequence of many node and relationship descriptions in apattern many relationships (and the intermediate nodes) can be described by specifying a length in therelationship description of a pattern For example
(a)-[2]-gt(b)
This describes a graph of three nodes and two relationship all in one path (a path of length 2) This isequivalent to
(a)--gt()--gt(b)
A range of lengths can also be specified such relationship patterns are called ldquovariable lengthrelationshipsrdquo For example
(a)-[35]-gt(b)
This is a minimum length of 3 and a maximum of 5 It describes a graph of either 4 nodes and 3relationships 5 nodes and 4 relationships or 6 nodes and 5 relationships all connected together in asingle path
Either bound can be omitted For example to describe paths of length 3 or more use
(a)-[3]-gt(b)
And to describe paths of length 5 or less use
(a)-[5]-gt(b)
Both bounds can be omitted allowing paths of any length to be described
(a)-[]-gt(b)
As a simple example letrsquos take the query below
Query
MATCH (me)-[KNOWS12]-(remote_friend)
WHERE mename = Filipa
RETURN remote_friendname
Resultremote_friendname
Dilshad
Anders
2 rows
Syntax
130
This query finds data in the graph which a shape that fits the pattern specifically a node (with the nameproperty Filipa) and then the KNOWS related nodes one or two steps out This is a typical example offinding first and second degree friends
Note that variable length relationships can not be used with CREATE and MERGE
Assigning to path identifiersAs described above a series of connected nodes and relationships is called a path Cypher allowspaths to be named using an identifer like so
p = (a)-[35]-gt(b)
You can do this in MATCH CREATE and MERGE but not when using patterns as expressions
Syntax
131
97 CollectionsCypher has good support for collections
Collections in generalA literal collection is created by using brackets and separating the elements in the collection withcommasQuery
RETURN [0123456789] AS collection
Resultcollection
[0 1 2 3 4 5 6 7 8 9]
1 row
In our examples wersquoll use the range function It gives you a collection containing all numbers betweengiven start and end numbers Range is inclusive in both endsTo access individual elements in the collection we use the square brackets again This will extract fromthe start index and up to but not including the end indexQuery
RETURN range(010)[3]
Resultrange(010)[3]
3
1 row
You can also use negative numbers to start from the end of the collection insteadQuery
RETURN range(010)[-3]
Resultrange(010)[-3]
8
1 row
Finally you can use ranges inside the brackets to return ranges of the collectionQuery
RETURN range(010)[03]
Resultrange(010)[03]
[0 1 2]
1 row
Query
RETURN range(010)[0-5]
Resultrange(010)[0-5]
[0 1 2 3 4 5]
1 row
Syntax
132
Query
RETURN range(010)[-5]
Resultrange(010)[-5]
[6 7 8 9 10]
1 row
Query
RETURN range(010)[4]
Resultrange(010)[4]
[0 1 2 3]
1 row
NoteOut-of-bound slices are simply truncated but out-of-bound single elements return NULL
Query
RETURN range(010)[15]
Resultrange(010)[15]
ltnullgt
1 row
Query
RETURN range(010)[515]
Resultrange(010)[515]
[5 6 7 8 9 10]
1 row
You can get the size of a collection like this
Query
RETURN size(range(010)[03])
Resultsize(range(010)[03])
3
1 row
List comprehensionList comprehension is a syntactic construct available in Cypher for creating a collection based onexisting collections It follows the form of the mathematical set-builder notation (set comprehension)instead of the use of map and filter functions
Query
Syntax
133
RETURN [x IN range(010) WHERE x 2 = 0 | x^3] AS result
Resultresult
[0 0 8 0 64 0 216 0 512 0 1000 0]
1 row
Either the WHERE part or the expression can be omitted if you only want to filter or map respectively
Query
RETURN [x IN range(010) WHERE x 2 = 0] AS result
Resultresult
[0 2 4 6 8 10]
1 row
Query
RETURN [x IN range(010)| x^3] AS result
Resultresult
[0 0 1 0 8 0 27 0 64 0 125 0 216 0 343 0 512 0 729 0 1000 0]
1 row
Literal mapsFrom Cypher you can also construct maps Through REST you will get JSON objects in Java they will bejavautilMapltStringObjectgt
Query
RETURN key Value collectionKey [ inner Map1 inner Map2 ] AS result
Resultresult
key -gt Value collectionKey -gt [inner -gt Map1 inner -gt Map2]
1 row
Syntax
134
98 Working with NULLIntroduction to NULL in CypherIn Cypher NULL is used to represent missing or undefined values Conceptually NULL means ldquoa missingunknown valuerdquo and it is treated somewhat differently from other values For example getting aproperty from a node that does not have said property produces NULL Most expressions that take NULLas input will produce NULL This includes boolean expressions that are used as predicates in the WHEREclause In this case anything that is not TRUE is interpreted as being false
NULL is not equal to NULL Not knowing two values does not imply that they are the same value So theexpression NULL = NULL yields NULL and not TRUE
Logical operations with NULLThe logical operators (AND OR XOR IN NOT) treat NULL as the ldquounknownrdquo value of three-valued logic Here isthe truth table for AND OR and XOR
a b a AND b a OR b a XOR b
FALSE FALSE FALSE FALSE FALSE
FALSE NULL FALSE NULL NULL
FALSE TRUE FALSE TRUE TRUE
TRUE FALSE FALSE TRUE TRUE
TRUE NULL NULL TRUE NULL
TRUE TRUE TRUE TRUE FALSE
NULL FALSE FALSE NULL NULL
NULL NULL NULL NULL NULL
NULL TRUE NULL TRUE NULL
The IN operator and NULLThe IN operator follows similar logic If Cypher knows that something exists in a collection the resultwill be TRUE Any collection that contains a NULL and doesnrsquot have a matching element will return NULLOtherwise the result will be false Here is a table with examples
Expression Result
2 IN [1 2 3] TRUE
2 IN [1 NULL 3] NULL
2 IN [1 2 NULL] TRUE
2 IN [1] FALSE
2 IN [] FALSE
NULL IN [123] NULL
NULL IN [1NULL3] NULL
NULL IN [] FALSE
Using ALL ANY NONE and SINGLE follows a similar rule If the result can be calculated definitely TRUE orFALSE is returned Otherwise NULL is produced
Expressions that return NULL
bull Getting a missing element from a collection [][0] head([])
Syntax
135
bull Trying to access a property that does not exist on a node or relationship nmissingPropertybull Comparisons when either side is NULL 1 lt NULLbull Arithmetic expressions containing NULL 1 + NULLbull Function calls where any arguments are NULL sin(NULL)
136
Chapter 10 General Clauses
General Clauses
137
101 ReturnThe RETURN clause defines what to include in the query result set
In the RETURN part of your query you define which parts of the pattern you are interested in It can benodes relationships or properties on these
TipIf what you actually want is the value of a property make sure to not return the full noderelationship This will improve performance
Figure 101 Graph
nam e = Ahappy = Yes age = 55
nam e = B
BLOCKS KNOWS
Return nodesTo return a node list it in the RETURN statement
Query
MATCH (n name B )
RETURN n
The example will return the node
Resultn
Node[1]nameB
1 row
Return relationshipsTo return a relationship just include it in the RETURN list
Query
MATCH (n name A )-[rKNOWS]-gt(c)
RETURN r
The relationship is returned by the example
Resultr
KNOWS[0]
1 row
Return propertyTo return a property use the dot separator like this
General Clauses
138
Query
MATCH (n name A )
RETURN nname
The value of the property name gets returned
Resultnname
A
1 row
Return all elementsWhen you want to return all nodes relationships and paths found in a query you can use the symbolQuery
MATCH p=(a name A )-[r]-gt(b)
RETURN
This returns the two nodes the relationship and the path used in the query
Resulta b p r
Node[0]nameA
happyYes age55
Node[1]nameB [Node[0]nameA
happyYes
age55 BLOCKS[1]
Node[1]nameB]
BLOCKS[1]
Node[0]nameA
happyYes age55
Node[1]nameB [Node[0]nameA
happyYes
age55 KNOWS[0]
Node[1]nameB]
KNOWS[0]
2 rows
Identifier with uncommon charactersTo introduce a placeholder that is made up of characters that are outside of the english alphabet youcan use the ` to enclose the identifier like thisQuery
MATCH (`This isnt a common identifier`)
WHERE `This isnt a common identifier`name=A
RETURN `This isnt a common identifier`happy
The node with name A is returned
Result`This isnt a common identifier`happy
Yes
1 row
Column aliasIf the name of the column should be different from the expression used you can rename it by using ASltnew namegtQuery
MATCH (a name A )
RETURN aage AS SomethingTotallyDifferent
General Clauses
139
Returns the age property of a node but renames the column
ResultSomethingTotallyDifferent
55
1 row
Optional propertiesIf a property might or might not be there you can still select it as usual It will be treated as NULL if it ismissing
Query
MATCH (n)
RETURN nage
This example returns the age when the node has that property or null if the property is not there
Resultnage
55
ltnullgt
2 rows
Other expressionsAny expression can be used as a return item mdash literals predicates properties functions and everythingelse
Query
MATCH (a name A )
RETURN aage gt 30 Im a literal(a)--gt()
Returns a predicate a literal and function call with a pattern expression parameter
Resultaage gt 30 Im a literal (a)--gt()
true Im a literal [[Node[0]nameA happyYes
age55 BLOCKS[1] Node[1]
nameB] [Node[0]nameA
happyYes age55 KNOWS[0]
Node[1]nameB]]
1 row
Unique resultsDISTINCT retrieves only unique rows depending on the columns that have been selected to output
Query
MATCH (a name A )--gt(b)
RETURN DISTINCT b
The node named B is returned by the query but only once
Resultb
Node[1]nameB
1 row
General Clauses
140
102 Order byORDER BY is a sub-clause following RETURN or WITH and it specifies that the output should besorted and how
Note that you can not sort on nodes or relationships just on properties on these ORDER BY relies oncomparisons to sort the output see the section called ldquoOrdering and Comparison of Valuesrdquo [125]
In terms of scope of identifiers ORDER BY follows special rules depending on if the projecting RETURNor WITH clause is either aggregating or DISTINCT If it is an aggregating or DISTINCT projection only theidentifiers available in the projection are available If the projection does not alter the output cardinality(which aggregation and DISTINCT do) identifiers available from before the projecting clause are alsoavailable When the projection clause shadows already existing identifiers only the new identifiers areavailable
Lastly it is not allowed to use aggregating expressions in the ORDER BY sub-clause if they are not alsolisted in the projecting clause This last rule is to make sure that ORDER BY does not change the resultsonly the order of them
Figure 102 Graph
nam e = Aage = 34length = 170
nam e = Bage = 34
KNOWS
nam e = Cage = 32length = 185
KNOWS
Order nodes by propertyORDER BY is used to sort the output
Query
MATCH (n)
RETURN n
ORDER BY nname
The nodes are returned sorted by their name
Resultn
Node[0]nameA age34 length170
Node[1]nameB age34
Node[2]nameC age32 length185
3 rows
General Clauses
141
Order nodes by multiple propertiesYou can order by multiple properties by stating each identifier in the ORDER BY clause Cypher will sortthe result by the first identifier listed and for equals values go to the next property in the ORDER BYclause and so on
Query
MATCH (n)
RETURN n
ORDER BY nage nname
This returns the nodes sorted first by their age and then by their name
Resultn
Node[2]nameC age32 length185
Node[0]nameA age34 length170
Node[1]nameB age34
3 rows
Order nodes in descending orderBy adding DESC[ENDING] after the identifier to sort on the sort will be done in reverse order
Query
MATCH (n)
RETURN n
ORDER BY nname DESC
The example returns the nodes sorted by their name reversely
Resultn
Node[2]nameC age32 length185
Node[1]nameB age34
Node[0]nameA age34 length170
3 rows
Ordering NULLWhen sorting the result set NULL will always come at the end of the result set for ascending sorting andfirst when doing descending sort
Query
MATCH (n)
RETURN nlength n
ORDER BY nlength
The nodes are returned sorted by the length property with a node without that property last
Resultnlength n
170 Node[0]nameA age34 length170
185 Node[2]nameC age32 length185
ltnullgt Node[1]nameB age34
3 rows
General Clauses
142
103 LimitLIMIT constrains the number of rows in the output
LIMIT accepts any expression that evaluates to a positive integer mdash however the expression cannot referto nodes or relationships
Figure 103 Graph
nam e = D nam e = E
nam e = A
KNOWS KNOWS
nam e = C
KNOWS
nam e = B
KNOWS
Return first partTo return a subset of the result starting from the top use this syntax
Query
MATCH (n)
RETURN n
ORDER BY nname
LIMIT 3
The top three items are returned by the example query
Resultn
Node[2]nameA
Node[3]nameB
Node[4]nameC
3 rows
Return first from expressionLimit accepts any expression that evaluates to a positive integer as long as it is not referring to anyexternal identifiers
Parameters
p 12
Query
MATCH (n)
RETURN n
ORDER BY nname
LIMIT toInt(3 rand())+ 1
Returns one to three top items
General Clauses
143
Resultn
Node[2]nameA
Node[3]nameB
2 rows
General Clauses
144
104 SkipSKIP defines from which row to start including the rows in the output
By using SKIP the result set will get trimmed from the top Please note that no guarantees are made onthe order of the result unless the query specifies the ORDER BY clause SKIP accepts any expression thatevaluates to a positive integer mdash however the expression cannot refer to nodes or relationships
Figure 104 Graph
nam e = D nam e = E
nam e = A
KNOWS KNOWS
nam e = C
KNOWS
nam e = B
KNOWS
Skip first threeTo return a subset of the result starting from the fourth result use the following syntax
Query
MATCH (n)
RETURN n
ORDER BY nname
SKIP 3
The first three nodes are skipped and only the last two are returned in the result
Resultn
Node[0]nameD
Node[1]nameE
2 rows
Return middle twoTo return a subset of the result starting from somewhere in the middle use this syntax
Query
MATCH (n)
RETURN n
ORDER BY nname
SKIP 1
LIMIT 2
Two nodes from the middle are returned
Resultn
Node[3]nameB
Node[4]nameC
2 rows
General Clauses
145
Skip first from expressionSkip accepts any expression that evaluates to a positive integer as long as it is not referring to anyexternal identifiers
Query
MATCH (n)
RETURN n
ORDER BY nname
SKIP toInt(3rand())+ 1
The first three nodes are skipped and only the last two are returned in the result
Resultn
Node[3]nameB
Node[4]nameC
Node[0]nameD
Node[1]nameE
4 rows
General Clauses
146
105 WithThe WITH clause allows query parts to be chained together piping the results from one tobe used as starting points or criteria in the next
Using WITH you can manipulate the output before it is passed on to the following query parts Themanipulations can be of the shape andor number of entries in the result set
One common usage of WITH is to limit the number of entries that are then passed on to other MATCHclauses By combining ORDER BY and LIMIT itrsquos possible to get the top X entries by some criteria and thenbring in additional data from the graph
Another use is to filter on aggregated values WITH is used to introduce aggregates which can then byused in predicates in WHERE These aggregate expressions create new bindings in the results WITH canalso like RETURN alias expressions that are introduced into the results using the aliases as binding name
WITH is also used to separate reading from updating of the graph Every part of a query must be eitherread-only or write-only When going from a writing part to a reading part the switch must be done witha WITH clause
Figure 105 Graph
nam e = David
nam e = Anders
KNOWS
nam e = Ceasar
BLOCKS
nam e = Bossm an
KNOWS
nam e = Em il
KNOWS
BLOCKS
KNOWS
Filter on aggregate function resultsAggregated results have to pass through a WITH clause to be able to filter on
Query
MATCH (david name David )--(otherPerson)--gt()
WITH otherPerson count() AS foaf
WHERE foaf gt 1
RETURN otherPerson
The person connected to David with the at least more than one outgoing relationship will be returnedby the query
General Clauses
147
ResultotherPerson
Node[2]nameAnders
1 row
Sort results before using collect on themYou can sort your results before passing them to collect thus sorting the resulting collection
Query
MATCH (n)
WITH n
ORDER BY nname DESC LIMIT 3
RETURN collect(nname)
A list of the names of people in reverse order limited to 3 in a collection
Resultcollect(nname)
[Emil David Ceasar]
1 row
Limit branching of your path searchYou can match paths limit to a certain number and then match again using those paths as a base Aswell as any number of similar limited searches
Query
MATCH (n name Anders )--(m)
WITH m
ORDER BY mname DESC LIMIT 1
MATCH (m)--(o)
RETURN oname
Starting at Anders find all matching nodes order by name descending and get the top result then findall the nodes connected to that top result and return their names
Resultoname
Bossman
Anders
2 rows
General Clauses
148
106 UnwindUNWIND expands a collection into a sequence of rows
With UNWIND you can transform any collection back into individual rows These collections can beparameters that were passed in previously COLLECTed result or other collection expressions
One common usage of unwind is to create distinct collections Another is to create data fromparameter collections that are provided to the query
UNWIND requires you to specify a new name for the inner values
Unwind a collectionWe want to transform the literal collection into rows named x and return them
Query
UNWIND[123] AS x
RETURN x
Each value of the original collection is returned as an individual row
Resultx
1
2
3
3 rows
Create a distinct collectionWe want to transform a collection of duplicates into a set using DISTINCT
Query
WITH [1122] AS coll UNWIND coll AS x
WITH DISTINCT x
RETURN collect(x) AS SET
Each value of the original collection is unwound and passed through DISTINCT to create a unique set
Resultset
[1 2]
1 row
Create nodes from a collection parameterCreate a number of nodes and relationships from a parameter-list without using FOREACH
Parameters
events [
year 2014
id 1
year 2014
id 2
]
General Clauses
149
Query
UNWIND events AS event
MERGE (yYear yeareventyear )
MERGE (y)lt-[IN]-(eEvent ideventid )
RETURN eid AS x
ORDER BY x
Each value of the original collection is unwound and passed through MERGE to find or create the nodesand relationships
Resultx
1
2
2 rowsNodes created 3Relationships created 2Properties set 3Labels added 3
General Clauses
150
107 UnionThe UNION clause is used to combine the result of multiple queries
It combines the results of two or more queries into a single result set that includes all the rows thatbelong to all queries in the union
The number and the names of the columns must be identical in all queries combined by using UNION
To keep all the result rows use UNION ALL Using just UNION will combine and remove duplicates from theresult set
Figure 106 Graph
Actor
nam e = Anthony Hopkins
Movie
t it le = Hitchcock
ACTS_INActor
nam e = Helen Mirren
KNOWS
ACTS_IN
Actor
nam e = Hitchcock
Combine two queriesCombining the results from two queries is done using UNION ALL
Query
MATCH (nActor)
RETURN nname AS name
UNION ALL MATCH (nMovie)
RETURN ntitle AS name
The combined result is returned including duplicates
Resultname
Anthony Hopkins
Helen Mirren
Hitchcock
Hitchcock
4 rows
Combine two queries and remove duplicatesBy not including ALL in the UNION duplicates are removed from the combined result set
Query
MATCH (nActor)
General Clauses
151
RETURN nname AS name
UNION
MATCH (nMovie)
RETURN ntitle AS name
The combined result is returned without duplicates
Resultname
Anthony Hopkins
Helen Mirren
Hitchcock
3 rows
General Clauses
152
108 UsingUSING is used to influence the decisions of the planner when building an execution plan fora query NOTE Forcing planner behaviour is an advanced feature and should be used withcaution by experienced developers andor database administrators only as it may causequeries to perform poorly
When executing a query Neo4j needs to decide where in the query graph to start matching This isdone by looking at the MATCH clause and the WHERE conditions and using that information to find usefulindexes
This index might not be the best choice though mdash sometimes multiple indexes could be used andNeo4j has picked the wrong one (from a performance point of view)
You can force Neo4j to use a specific starting point through the USING clause This is called giving anindex hint
If your query matches large parts of an index it might be faster to scan the label and filter out nodesthat do not match To do this you can use USING SCAN It will force Cypher to not use an index that couldhave been used and instead do a label scan
NoteYou cannot use index hints if your query has a START clause
You can also force Neo4j to produce plans which perform joins between query sub-graphs
Query using an index hintTo query using an index hint use USING INDEX
Query
MATCH (nSwede)
USING INDEX nSwede(surname)
WHERE nsurname = Taylor
RETURN n
Query Plan
+-----------------+----------------+------+---------+-------------+-----------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+------+---------+-------------+-----------------+
| +ProduceResults | 1 | 1 | 0 | n | n |
| | +----------------+------+---------+-------------+-----------------+
| +NodeIndexSeek | 1 | 1 | 2 | n | Swede(surname) |
+-----------------+----------------+------+---------+-------------+-----------------+
Total database accesses 2
Query using multiple index hintsTo query using multiple index hints use USING INDEX
Query
MATCH (mGerman)--gt(nSwede)
USING INDEX mGerman(surname)
USING INDEX nSwede(surname)
WHERE msurname = Plantikow AND nsurname = Taylor
RETURN m
General Clauses
153
Query Plan
+-------------------+------+---------+----------------+----------------+
| Operator | Rows | DB Hits | Identifiers | Other |
+-------------------+------+---------+----------------+----------------+
| +ColumnFilter | 1 | 0 | m | keep columns m |
| | +------+---------+----------------+----------------+
| +TraversalMatcher | 1 | 11 | anon[17] m n | n anon[17] m |
+-------------------+------+---------+----------------+----------------+
Total database accesses 11
Hinting a label scanIf the best performance is to be had by scanning all nodes in a label and then filtering on that set useUSING SCAN
Query
MATCH (mGerman)
USING SCAN mGerman
WHERE msurname = Plantikow
RETURN m
Query Plan
+------------------+----------------+------+---------+-------------+------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+------------------+----------------+------+---------+-------------+------------------------------+
| +ProduceResults | 1 | 1 | 0 | m | m |
| | +----------------+------+---------+-------------+------------------------------+
| +Filter | 1 | 1 | 1 | m | msurname == AUTOSTRING0 |
| | +----------------+------+---------+-------------+------------------------------+
| +NodeByLabelScan | 1 | 1 | 2 | m | German |
+------------------+----------------+------+---------+-------------+------------------------------+
Total database accesses 3
Hinting a join on a single nodeTo force the query planner to produce plans with joins in them use USING JOIN
Query
MATCH (andres nameAndres )--gt(x)lt--(emil name Emil )
USING JOIN ON x
RETURN x
Query Plan
+-----------------+----------------+------+---------+-------------------------------------
+---------------------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other
|
+-----------------+----------------+------+---------+-------------------------------------
+---------------------------------------------------------------------------+
| +ProduceResults | 0 | 1 | 0 | x | x
|
| | +----------------+------+---------+-------------------------------------
+---------------------------------------------------------------------------+
| +Filter | 0 | 1 | 6 | anon[31] anon[37] andres emil x | Ands(Fby(NOT(anon[31] ==
anon[37])Last(andresname == AUTOSTRING0))) |
| | +----------------+------+---------+-------------------------------------
+---------------------------------------------------------------------------+
| +Expand(All) | 2 | 9 | 12 | anon[31] anon[37] andres emil x | (x)lt--(andres)
|
General Clauses
154
| | +----------------+------+---------+-------------------------------------
+---------------------------------------------------------------------------+
| +NodeHashJoin | 1 | 3 | 0 | anon[37] emil x | x
|
| | +----------------+------+---------+-------------------------------------
+---------------------------------------------------------------------------+
| | +AllNodesScan | 5 | 5 | 6 | x |
|
| | +----------------+------+---------+-------------------------------------
+---------------------------------------------------------------------------+
| +Expand(All) | 1 | 3 | 4 | anon[37] emil x | (emil)--gt(x)
|
| | +----------------+------+---------+-------------------------------------
+---------------------------------------------------------------------------+
| +Filter | 1 | 1 | 5 | emil | emilname == AUTOSTRING1
|
| | +----------------+------+---------+-------------------------------------
+---------------------------------------------------------------------------+
| +AllNodesScan | 5 | 5 | 6 | emil |
|
+-----------------+----------------+------+---------+-------------------------------------
+---------------------------------------------------------------------------+
Total database accesses 39
Hinting a join on multiple nodesTo force the query planner to produce plans with joins in them use USING JOIN
Query
MATCH (andy nameAndres )-[r1]-gt(x)lt-[r2]-(y)-[r3]-(andy)
USING JOIN ON x y
RETURN x y
Query Plan
+-----------------+----------------+------+---------+------------------------+----------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+------+---------+------------------------+----------------------------------------------+
| +ProduceResults | 1 | 3 | 0 | x y | x y |
| | +----------------+------+---------+------------------------+----------------------------------------------+
| +Filter | 1 | 3 | 0 | andy r1 r2 r3 x y | Ands(Fby(NOT(r1 == r2)Last(NOT(r2 == r3)))) |
| | +----------------+------+---------+------------------------+----------------------------------------------+
| +NodeHashJoin | 1 | 3 | 0 | andy r1 r2 r3 x y | x y |
| | +----------------+------+---------+------------------------+----------------------------------------------+
| | +Expand(All) | 10 | 10 | 15 | r2 x y | (y)-[r2]-gt(x) |
| | | +----------------+------+---------+------------------------+----------------------------------------------+
| | +AllNodesScan | 5 | 5 | 6 | y | |
| | +----------------+------+---------+------------------------+----------------------------------------------+
| +Filter | 3 | 3 | 0 | andy r1 r3 x y | NOT(r1 == r3) |
| | +----------------+------+---------+------------------------+----------------------------------------------+
| +Expand(All) | 3 | 4 | 5 | andy r1 r3 x y | (andy)-[r3]-(y) |
| | +----------------+------+---------+------------------------+----------------------------------------------+
| +Expand(All) | 1 | 1 | 2 | andy r1 x | (andy)-[r1]-gt(x) |
| | +----------------+------+---------+------------------------+----------------------------------------------+
| +Filter | 1 | 1 | 5 | andy | andyname == AUTOSTRING0 |
| | +----------------+------+---------+------------------------+----------------------------------------------+
| +AllNodesScan | 5 | 5 | 6 | andy | |
+-----------------+----------------+------+---------+------------------------+----------------------------------------------+
Total database accesses 39
155
Chapter 11 Reading Clauses
The flow of data within a Cypher query is an unordered sequence of maps with key-value pairs mdash a setof possible bindings between the identifiers in the query and values derived from the database This setis refined and augmented by subsequent parts of the query
Reading Clauses
156
111 MatchThe MATCH clause is used to search for the pattern described in it
IntroductionThe MATCH clause allows you to specify the patterns Neo4j will search for in the database This isthe primary way of getting data into the current set of bindings It is worth reading up more on thespecification of the patterns themselves in Section 96 ldquoPatternsrdquo [127]
MATCH is often coupled to a WHERE part which adds restrictions or predicates to the MATCH patternsmaking them more specific The predicates are part of the pattern description not a filter applied afterthe matching is done This means that WHERE should always be put together with the MATCH clause it belongsto
MATCH can occur at the beginning of the query or later possibly after a WITH If it is the first clausenothing will have been bound yet and Neo4j will design a search to find the results matching the clauseand any associated predicates specified in any WHERE part This could involve a scan of the databasea search for nodes of a certain label or a search of an index to find starting points for the patternmatching Nodes and relationships found by this search are available as bound pattern elements andcan be used for pattern matching of sub-graphs They can also be used in any further MATCH clauseswhere Neo4j will use the known elements and from there find further unknown elements
Cypher is declarative and so usually the query itself does not specify the algorithm to use to performthe search Neo4j will automatically work out the best approach to finding start nodes and matchingpatterns Predicates in WHERE parts can be evaluated before pattern matching during pattern matchingor after finding matches However there are cases where you can influence the decisions taken bythe query compiler Read more about indexes in Section 141 ldquoIndexesrdquo [244] and more about thespecifying index hints to force Neo4j to use a specific index in Section 108 ldquoUsingrdquo [152]
TipTo understand more about the patterns used in the MATCH clause read Section 96ldquoPatternsrdquo [127]
The following graph is used for the examples below
Figure 111 Graph
Person
nam e = Oliver Stone
Movie
nam e = WallSt reet t it le = Wall St reet
DIRECTED
Person
nam e = Charlie Sheen
ACTED_INPerson
nam e = Mart in Sheen
FATHER
ACTED_IN
Movie
t it le = The Am erican President nam e = TheAm ericanPresident
ACTED_IN
Person
nam e = Rob Reiner
DIRECTED
Person
nam e = Michael Douglas
ACTED_IN ACTED_IN
nam e = Rob Reiner
nam e = Charlie Sheen
TYPE THAT HAS SPACE IN IT
Basic node finding
Get all nodesBy just specifying a pattern with a single node and no labels all nodes in the graph will be returned
Query
MATCH (n)
RETURN n
Reading Clauses
157
Returns all the nodes in the database
Resultn
Node[0]nameOliver Stone
Node[1]nameCharlie Sheen
Node[2]nameMartin Sheen
Node[3]titleThe American President nameTheAmericanPresident
Node[4]nameWallStreet titleWall Street
Node[5]nameRob Reiner
Node[6]nameMichael Douglas
Node[7]nameRob Reiner
Node[8]nameCharlie Sheen
9 rows
Get all nodes with a labelGetting all nodes with a label on them is done with a single node pattern where the node has a label onit
Query
MATCH (movieMovie)
RETURN movie
Returns all the movies in the database
Resultmovie
Node[3]titleThe American President nameTheAmericanPresident
Node[4]nameWallStreet titleWall Street
2 rows
Related nodesThe symbol -- means related to without regard to type or direction of the relationship
Query
MATCH (director nameOliver Stone )--(movie)
RETURN movietitle
Returns all the movies directed by Oliver Stone
Resultmovietitle
Wall Street
1 row
Match with labelsTo constrain your pattern with labels on nodes you add it to your pattern nodes using the label syntax
Query
MATCH (charliePerson nameCharlie Sheen )--(movieMovie)
RETURN movie
Reading Clauses
158
Return any nodes connected with the Person Charlie that are labeled Movie
Resultmovie
Node[4]nameWallStreet titleWall Street
1 row
Relationship basics
Outgoing relationshipsWhen the direction of a relationship is interesting it is shown by using --gt or lt-- like this
Query
MATCH (martin nameMartin Sheen )--gt(movie)
RETURN movietitle
Returns nodes connected to Martin by outgoing relationships
Resultmovietitle
The American President
Wall Street
2 rows
Directed relationships and identifierIf an identifier is needed either for filtering on properties of the relationship or to return therelationship this is how you introduce the identifier
Query
MATCH (martin nameMartin Sheen )-[r]-gt(movie)
RETURN r
Returns all outgoing relationships from Martin
Resultr
ACTED_IN[3]
ACTED_IN[1]
2 rows
Match by relationship typeWhen you know the relationship type you want to match on you can specify it by using a colontogether with the relationship type
Query
MATCH (wallstreet titleWall Street )lt-[ACTED_IN]-(actor)
RETURN actor
Returns nodes that ACTED_IN Wall Street
Resultactor
Node[6]nameMichael Douglas
3 rows
Reading Clauses
159
actor
Node[2]nameMartin Sheen
Node[1]nameCharlie Sheen
3 rows
Match by multiple relationship typesTo match on one of multiple types you can specify this by chaining them together with the pipe symbol|
Query
MATCH (wallstreet titleWall Street )lt-[ACTED_IN|DIRECTED]-(person)
RETURN person
Returns nodes with a ACTED_IN or DIRECTED relationship to Wall Street
Resultperson
Node[0]nameOliver Stone
Node[6]nameMichael Douglas
Node[2]nameMartin Sheen
Node[1]nameCharlie Sheen
4 rows
Match by relationship type and use an identifierIf you both want to introduce an identifier to hold the relationship and specify the relationship typeyou want just add them both like this
Query
MATCH (wallstreet titleWall Street )lt-[rACTED_IN]-(actor)
RETURN r
Returns nodes that ACTED_IN Wall Street
Resultr
ACTED_IN[2]
ACTED_IN[1]
ACTED_IN[0]
3 rows
Relationships in depth
NoteInside a single pattern relationships will only be matched once You can read more aboutthis in Section 84 ldquoUniquenessrdquo [111]
Relationship types with uncommon charactersSometime your database will have types with non-letter characters or with spaces in them Use `(backtick) to quote these
Query
MATCH (n nameRob Reiner )-[r`TYPE THAT HAS SPACE IN IT`]-gt()
Reading Clauses
160
RETURN r
Returns a relationship of a type with spaces in it
Resultr
TYPE THAT HAS SPACE IN IT[8]
1 row
Multiple relationshipsRelationships can be expressed by using multiple statements in the form of ()--() or they can bestrung together like this
Query
MATCH (charlie nameCharlie Sheen )-[ACTED_IN]-gt(movie)lt-[DIRECTED]-(director)
RETURN charliemoviedirector
Returns the three nodes in the path
Resultcharlie movie director
Node[1]nameCharlie Sheen Node[4]nameWallStreet
titleWall Street
Node[0]nameOliver Stone
1 row
Variable length relationshipsNodes that are a variable number of relationshiprarrnode hops away can be found using the followingsyntax -[TYPEminHopsmaxHops]-gt minHops and maxHops are optional and default to 1 and infinityrespectively When no bounds are given the dots may be omitted
Query
MATCH (martin nameMartin Sheen )-[ACTED_IN12]-(x)
RETURN x
Returns nodes that are 1 or 2 relationships away from Martin
Resultx
Node[4]nameWallStreet titleWall Street
Node[1]nameCharlie Sheen
Node[6]nameMichael Douglas
Node[3]titleThe American President nameTheAmericanPresident
Node[6]nameMichael Douglas
5 rows
Relationship identifier in variable length relationshipsWhen the connection between two nodes is of variable length a relationship identifier becomes ancollection of relationships
Query
MATCH (actor nameCharlie Sheen )-[rACTED_IN2]-(co_actor)
RETURN r
The query returns a collection of relationships
Reading Clauses
161
Resultr
[ACTED_IN[0] ACTED_IN[1]]
[ACTED_IN[0] ACTED_IN[2]]
2 rows
Match with properties on a variable length pathA variable length relationship with properties defined on in it means that all relationships in the pathmust have the property set to the given value In this query there are two paths between Charile Sheenand his dad Martin Sheen One of the includes a ldquoblockedrdquo relationship and the other doesnrsquot In thiscase we first alter the original graph by using the following query to add ldquoblockedrdquo and ldquounblockedrdquorelationships
MATCH (charliePerson nameCharlie Sheen )(martinPerson nameMartin Sheen )
CREATE (charlie)-[X blockedfalse ]-gt(Unblocked)lt-[X blockedfalse ]-(martin)
CREATE (charlie)-[X blockedtrue ]-gt(Blocked)lt-[X blockedfalse ]-(martin)
This means that we are starting out with the following graph
Person
nam e = Oliver Stone
Movie
nam e = WallSt reet t it le = Wall St reet
DIRECTED
Person
nam e = Charlie Sheen
ACTED_IN
Blocked
Xblocked = t rue
Unblocked
Xblocked = false
Blocked
Xblocked = t rue
Unblocked
Xblocked = false
Blocked
Xblocked = t rue
Unblocked
Xblocked = false
Person
nam e = Mart in Sheen
FATHER
ACTED_IN Xblocked = false
Xblocked = false
Xblocked = false
Xblocked = false
Xblocked = false
Xblocked = false
Movie
t it le = The Am erican President nam e = TheAm ericanPresident
ACTED_IN
Person
nam e = Rob Reiner
DIRECTED
Person
nam e = Michael Douglas
ACTED_IN ACTED_IN
nam e = Rob Reiner
nam e = Charlie Sheen
TYPE THAT HAS SPACE IN IT
Query
MATCH p =(charliePerson)-[ blockedfalse ]-(martinPerson)
WHERE charliename = Charlie Sheen AND martinname = Martin Sheen
RETURN p
Returns the paths between Charlie and Martin Sheen where all relationships have the blocked propertyset to FALSE
Resultp
[Node[1]nameCharlie Sheen X[9]blockedfalse Node[9] X[10]blockedfalse Node[2]
nameMartin Sheen]
1 row
Zero length pathsUsing variable length paths that have the lower bound zero means that two identifiers can point tothe same node If the distance between two nodes is zero they are by definition the same nodeNote that when matching zero length paths the result may contain a match even when matching on arelationship type not in use
Query
MATCH (wallstreetMovie titleWall Street )-[01]-(x)
RETURN x
Returns all nodes that are zero or one relationships away from Wall Street
Resultx
Node[4]nameWallStreet titleWall Street
5 rows
Reading Clauses
162
x
Node[1]nameCharlie Sheen
Node[2]nameMartin Sheen
Node[6]nameMichael Douglas
Node[0]nameOliver Stone
5 rows
Named pathIf you want to return or filter on a path in your pattern graph you can a introduce a named path
Query
MATCH p =(michael nameMichael Douglas )--gt()
RETURN p
Returns the two paths starting from Michael
Resultp
[Node[6]nameMichael Douglas ACTED_IN[4] Node[3]titleThe American President
nameTheAmericanPresident]
[Node[6]nameMichael Douglas ACTED_IN[2] Node[4]nameWallStreet titleWall Street]
2 rows
Matching on a bound relationshipWhen your pattern contains a bound relationship and that relationship pattern doesnrsquot specifydirection Cypher will try to match the relationship in both directions
Query
MATCH (a)-[r]-(b)
WHERE id(r)= 0
RETURN ab
This returns the two connected nodes once as the start node and once as the end node
Resulta b
Node[1]nameCharlie Sheen Node[4]nameWallStreet titleWall Street
Node[4]nameWallStreet titleWall Street Node[1]nameCharlie Sheen
2 rows
Shortest path
Single shortest pathFinding a single shortest path between two nodes is as easy as using the shortestPath function Itrsquos donelike this
Query
MATCH (martinPerson nameMartin Sheen )(oliverPerson nameOliver Stone )
p = shortestPath((martin)-[15]-(oliver))
RETURN p
This means find a single shortest path between two nodes as long as the path is max 15 relationshipslong Inside of the parentheses you define a single link of a path mdash the starting node the connecting
Reading Clauses
163
relationship and the end node Characteristics describing the relationship like relationship type maxhops and direction are all used when finding the shortest path You can also mark the path as optional
Resultp
[Node[2]nameMartin Sheen ACTED_IN[1] Node[4]nameWallStreet titleWall
Street DIRECTED[5] Node[0]nameOliver Stone]
1 row
All shortest pathsFinds all the shortest paths between two nodes
Query
MATCH (martinPerson nameMartin Sheen )(michaelPerson nameMichael Douglas )
p = allShortestPaths((martin)-[]-(michael))
RETURN p
Finds the two shortest paths between Martin and Michael
Resultp
[Node[2]nameMartin Sheen ACTED_IN[3] Node[3]titleThe American President
nameTheAmericanPresident ACTED_IN[4] Node[6]nameMichael Douglas]
[Node[2]nameMartin Sheen ACTED_IN[1] Node[4]nameWallStreet titleWall
Street ACTED_IN[2] Node[6]nameMichael Douglas]
2 rows
Get node or relationship by id
Node by idSearch for nodes by id can be done with the id function in a predicate
NoteNeo4j reuses its internal ids when nodes and relationships are deleted This means thatapplications using and relying on internal Neo4j ids are brittle or at risk of making mistakesRather use application generated ids
Query
MATCH (n)
WHERE id(n)= 1
RETURN n
The corresponding node is returned
Resultn
Node[1]nameCharlie Sheen
1 row
Relationship by idSearch for nodes by id can be done with the id function in a predicate
This is not recommended practice See the section called ldquoNode by idrdquo [163] for more information onthe use of Neo4j ids
Query
Reading Clauses
164
MATCH ()-[r]-gt()
WHERE id(r)= 0
RETURN r
The relationship with id 0 is returned
Resultr
ACTED_IN[0]
1 row
Multiple nodes by idMultiple nodes are selected by specifying them in an IN clause
Query
MATCH (n)
WHERE id(n) IN [1 2 0]
RETURN n
This returns the nodes listed in the IN expression
Resultn
Node[0]nameOliver Stone
Node[1]nameCharlie Sheen
Node[2]nameMartin Sheen
3 rows
Reading Clauses
165
112 Optional MatchThe OPTIONAL MATCH clause is used to search for the pattern described in it while using NULLsfor missing parts of the pattern
IntroductionOPTIONAL MATCH matches patterns against your graph database just like MATCH does The difference is thatif no matches are found OPTIONAL MATCH will use NULLs for missing parts of the pattern OPTIONAL MATCHcould be considered the Cypher equivalent of the outer join in SQL
Either the whole pattern is matched or nothing is matched Remember that WHERE is part of the patterndescription and the predicates will be considered while looking for matches not after This mattersespecially in the case of multiple (OPTIONAL) MATCH clauses where it is crucial to put WHERE together withthe MATCH it belongs to
TipTo understand the patterns used in the OPTIONAL MATCH clause read Section 96ldquoPatternsrdquo [127]
The following graph is used for the examples below
Figure 112 Graph
Person
nam e = Oliver Stone
Movie
nam e = WallSt reet t it le = Wall St reet
DIRECTED
Person
nam e = Charlie Sheen
ACTED_INPerson
nam e = Mart in Sheen
FATHER
ACTED_IN
Movie
t it le = The Am erican President nam e = TheAm ericanPresident
ACTED_IN
Person
nam e = Rob Reiner
DIRECTED
Person
nam e = Michael Douglas
ACTED_IN ACTED_IN
RelationshipIf a relationship is optional use the OPTIONAL MATCH clause This is similar to how a SQL outer join worksIf the relationship is there it is returned If itrsquos not NULL is returned in itrsquos placeQuery
MATCH (aMovie title Wall Street )
OPTIONAL MATCH (a)--gt(x)
RETURN x
Returns NULL since the node has no outgoing relationships
Resultx
ltnullgt
1 row
Properties on optional elementsReturning a property from an optional element that is NULL will also return NULL
Reading Clauses
166
Query
MATCH (aMovie title Wall Street )
OPTIONAL MATCH (a)--gt(x)
RETURN x xname
Returns the element x (NULL in this query) and NULL as its name
Resultx xname
ltnullgt ltnullgt
1 row
Optional typed and named relationshipJust as with a normal relationship you can decide which identifier it goes into and what relationshiptype you need
Query
MATCH (aMovie title Wall Street )
OPTIONAL MATCH (a)-[rACTS_IN]-gt()
RETURN r
This returns a node and NULL since the node has no outgoing ACTS_IN relationships
Resultr
ltnullgt
1 row
Reading Clauses
167
113 WhereWHERE adds constraints to the patterns in a MATCH or OPTIONAL MATCH clause or filters the resultsof a WITH clause
WHERE is not a clause in itrsquos own right mdash rather itrsquos part of MATCH OPTIONAL MATCH START and WITHIn the case of WITH and START WHERE simply filters the resultsFor MATCH and OPTIONAL MATCH on the other hand WHERE adds constraints to the patterns described Itshould not be seen as a filter after the matching is finished
ImportantIn the case of multiple MATCH OPTIONAL MATCH clauses the predicate in WHERE is alwaysa part of the patterns in the directly preceding MATCH OPTIONAL MATCH Both results andperformance may be impacted if the WHERE is put inside the wrong MATCH clause
Figure 113 Graph
address = SwedenMalm onam e = Tobiasage = 25
em ail = peter_nexam plecom nam e = Peterage = 34
Swedish
nam e = Andresage = 36belt = white
KNOWS KNOWS
Basic usage
Boolean operationsYou can use the expected boolean operators AND and OR and also the boolean function NOT SeeSection 98 ldquoWorking with NULLrdquo [134] for more information on how this works with NULLQuery
MATCH (n)
WHERE nname = Peter XOR (nage lt 30 AND nname = Tobias) OR NOT (nname = Tobias OR
nname=Peter)
RETURN n
Resultn
Node[0]addressSwedenMalmo nameTobias age25
Node[1]emailpeter_nexample com namePeter age34
Node[2]nameAndres age36 beltwhite
3 rows
Filter on node labelTo filter nodes by label write a label predicate after the WHERE keyword using WHERE nfooQuery
MATCH (n)
WHERE nSwedish
RETURN n
Reading Clauses
168
The Andres node will be returned
Resultn
Node[2]nameAndres age36 beltwhite
1 row
Filter on node propertyTo filter on a property write your clause after the WHERE keyword Filtering on relationship propertiesworks just the same way
Query
MATCH (n)
WHERE nage lt 30
RETURN n
Tobias is returned because he is younger than 30
Resultn
Node[0]addressSwedenMalmo nameTobias age25
1 row
Filter on dynamic node propertyTo filter on a property using a dynamically computed name use square bracket syntax
Parameters
prop AGE
Query
MATCH (n)
WHERE n[toLower( prop )]lt 30
RETURN n
Tobias is returned because he is younger than 30
Resultn
Node[0]addressSwedenMalmo nameTobias age25
1 row
Property existsUse the EXISTS() function to only include nodes or relationships in which a property exists
Query
MATCH (n)
WHERE exists(nbelt)
RETURN n
Andres will be returned because he is the only one with a belt property
ImportantThe HAS() function has been superseded by EXISTS() and will be removed in a future release
Reading Clauses
169
Resultn
Node[2]nameAndres age36 beltwhite
1 row
String matchingThe start and end of strings can be matched using STARTS WITH and ENDS WITH To match regardless oflocation in a string use CONTAINS The matching is case-sensitive
Match the start of a stringThe STARTS WITH operator is used to perform case-sensitive matching on the start of stringsQuery
MATCH (n)
WHERE nname STARTS WITH Pet
RETURN n
Peter will be returned because his name starts with Pet
Resultn
Node[1]emailpeter_nexample com namePeter age34
1 row
Match the end of a stringThe ENDS WITH operator is used to perform case-sensitive matching on the end of stringsQuery
MATCH (n)
WHERE nname ENDS WITH ter
RETURN n
Peter will be returned because his name ends with ter
Resultn
Node[1]emailpeter_nexample com namePeter age34
1 row
Match anywhere in a stringThe CONTAINS operator is used to perform case-sensitive matching regardless of location in stringsQuery
MATCH (n)
WHERE nname CONTAINS ete
RETURN n
Peter will be returned because his name contains ete
Resultn
Node[1]emailpeter_nexample com namePeter age34
1 row
String matching negationUse the NOT keyword to exclude all matches on given string from your result
Reading Clauses
170
Query
MATCH (n)
WHERE NOT nname ENDS WITH s
RETURN n
Peter will be returned because his name does not end with s
Resultn
Node[1]emailpeter_nexample com namePeter age34
1 row
Regular expressionsCypher supports filtering using regular expressions The regular expression syntax is inherited fromthe Java regular expressions1 This includes support for flags that change how strings are matchedincluding case-insensitive (i) multiline (m) and dotall (s) Flags are given at the start of the regularexpression for example MATCH (n) WHERE nname =~ (i)Lon RETURN n will return nodes with nameLondon or with name LonDoN
Regular expressionsYou can match on regular expressions by using =~ regexp like this
Query
MATCH (n)
WHERE nname =~ Tob
RETURN n
Tobias is returned because his name starts with Tob
Resultn
Node[0]addressSwedenMalmo nameTobias age25
1 row
Escaping in regular expressionsIf you need a forward slash inside of your regular expression escape it Remember that back slashneeds to be escaped in string literals
Query
MATCH (n)
WHERE naddress =~ SwedenMalmo
RETURN n
Tobias is returned because his address is in SwedenMalmo
Resultn
Node[0]addressSwedenMalmo nameTobias age25
1 row
Case insensitive regular expressionsBy pre-pending a regular expression with (i) the whole expression becomes case insensitive
Query
1 httpsdocsoraclecomjavase7docsapijavautilregexPatternhtml
Reading Clauses
171
MATCH (n)
WHERE nname =~ (i)ANDR
RETURN n
Andres is returned because his name starts with ANDR regardless of case
Resultn
Node[2]nameAndres age36 beltwhite
1 row
Using path patterns in WHERE
Filter on patternsPatterns are expressions in Cypher expressions that return a collection of paths Collection expressionsare also predicates mdash an empty collection represents false and a non-empty represents true
So patterns are not only expressions they are also predicates The only limitation to your pattern isthat you must be able to express it in a single path You can not use commas between multiple pathslike you do in MATCH You can achieve the same effect by combining multiple patterns with AND
Note that you can not introduce new identifiers here Although it might look very similar to the MATCHpatterns the WHERE clause is all about eliminating matched subgraphs MATCH (a)-[]-gt(b) is verydifferent from WHERE (a)-[]-gt(b) the first will produce a subgraph for every path it can find betweena and b and the latter will eliminate any matched subgraphs where a and b do not have a directedrelationship chain between them
Query
MATCH (tobias name Tobias )(others)
WHERE othersname IN [Andres Peter] AND (tobias)lt--(others)
RETURN others
Nodes that have an outgoing relationship to the Tobias node are returned
Resultothers
Node[2]nameAndres age36 beltwhite
1 row
Filter on patterns using NOTThe NOT function can be used to exclude a pattern
Query
MATCH (persons)(peter name Peter )
WHERE NOT (persons)--gt(peter)
RETURN persons
Nodes that do not have an outgoing relationship to the Peter node are returned
Resultpersons
Node[0]addressSwedenMalmo nameTobias age25
Node[1]emailpeter_nexample com namePeter age34
2 rows
Filter on patterns with propertiesYou can also add properties to your patterns
Reading Clauses
172
Query
MATCH (n)
WHERE (n)-[KNOWS]-( nameTobias )
RETURN n
Finds all nodes that have a KNOWS relationship to a node with the name Tobias
Resultn
Node[2]nameAndres age36 beltwhite
1 row
Filtering on relationship typeYou can put the exact relationship type in the MATCH pattern but sometimes you want to be able to domore advanced filtering on the type You can use the special property TYPE to compare the type withsomething else In this example the query does a regular expression comparison with the name of therelationship type
Query
MATCH (n)-[r]-gt()
WHERE nname=Andres AND type(r)=~ K
RETURN r
This returns relationships that has a type whose name starts with K
Resultr
KNOWS[1]
KNOWS[0]
2 rows
Collections
IN operatorTo check if an element exists in a collection you can use the IN operator
Query
MATCH (a)
WHERE aname IN [Peter Tobias]
RETURN a
This query shows how to check if a property exists in a literal collection
Resulta
Node[0]addressSwedenMalmo nameTobias age25
Node[1]emailpeter_nexample com namePeter age34
2 rows
Missing properties and values
Default to false if property is missingAs missing properties evaluate to NULL the comparision in the example will evaluate to FALSE for nodeswithout the belt property
Reading Clauses
173
Query
MATCH (n)
WHERE nbelt = white
RETURN n
Only nodes with white belts are returned
Resultn
Node[2]nameAndres age36 beltwhite
1 row
Default to true if property is missingIf you want to compare a property on a graph element but only if it exists you can compare theproperty against both the value you are looking for and NULL like
Query
MATCH (n)
WHERE nbelt = white OR nbelt IS NULL RETURN n
ORDER BY nname
This returns all nodes even those without the belt property
Resultn
Node[2]nameAndres age36 beltwhite
Node[1]emailpeter_nexample com namePeter age34
Node[0]addressSwedenMalmo nameTobias age25
3 rows
Filter on NULLSometimes you might want to test if a value or an identifier is NULL This is done just like SQL does itwith IS NULL Also like SQL the negative is IS NOT NULL although NOT(IS NULL x) also works
Query
MATCH (person)
WHERE personname = Peter AND personbelt IS NULL RETURN person
Nodes that have name Peter but no belt property are returned
Resultperson
Node[1]emailpeter_nexample com namePeter age34
1 row
Using ranges
Simple rangeTo check for an element being inside a specific range use the inequality operators lt lt= gt= gt
Query
MATCH (a)
WHERE aname gt= Peter
RETURN a
Reading Clauses
174
Nodes having a name property lexicographically greater than or equal to Peter are returned
Resulta
Node[0]addressSwedenMalmo nameTobias age25
Node[1]emailpeter_nexample com namePeter age34
2 rows
Composite rangeSeveral inequalities can be used to construct a range
Query
MATCH (a)
WHERE aname gt Andres AND aname lt Tobias
RETURN a
Nodes having a name property lexicographically between Andres and Tobias are returned
Resulta
Node[1]emailpeter_nexample com namePeter age34
1 row
Reading Clauses
175
114 StartFind starting points through legacy indexes
ImportantThe START clause should only be used when accessing legacy indexes (see Chapter 35 LegacyIndexing [617]) In all other cases use MATCH instead (see Section 111 ldquoMatchrdquo [156])
In Cypher every query describes a pattern and in that pattern one can have multiple starting pointsA starting point is a relationship or a node where a pattern is anchored Using START you can onlyintroduce starting points by legacy index seeks Note that trying to use a legacy index that doesnrsquot existwill generate an error
This is the graph the examples are using
Figure 114 Graph
Node[0]
nam e = A
Node[2]
nam e = C
KNOWS
Node[1]
nam e = B
KNOWS
Get node or relationship from index
Node by index seekWhen the starting point can be found by using index seeks it can be done like this nodeindex-name(key= value) In this example there exists a node index named nodes
Query
START n=nodenodes(name = A)
RETURN n
The query returns the node indexed with the name A
Resultn
Node[0]nameA
1 row
Relationship by index seekWhen the starting point can be found by using index seeks it can be done like this relationshipindex-name(key = value)
Query
START r=relationshiprels(name = Andreacutes)
RETURN r
The relationship indexed with the name property set to Andreacutes is returned by the query
Reading Clauses
176
Resultr
KNOWS[0]nameAndreacutes
1 row
Node by index queryWhen the starting point can be found by more complex Lucene queries this is the syntax to usenodeindex-name(query)This allows you to write more advanced index queries
Query
START n=nodenodes(nameA)
RETURN n
The node indexed with name A is returned by the query
Resultn
Node[0]nameA
1 row
Reading Clauses
177
115 AggregationIntroductionTo calculate aggregated data Cypher offers aggregation much like SQLrsquos GROUP BY
Aggregate functions take multiple input values and calculate an aggregated value from them Examplesare avg that calculates the average of multiple numeric values or min that finds the smallest numericvalue in a set of values
Aggregation can be done over all the matching subgraphs or it can be further divided by introducingkey values These are non-aggregate expressions that are used to group the values going into theaggregate functions
So if the return statement looks something like this
RETURN n count()
We have two return expressions n and count() The first n is no aggregate function and so it will bethe grouping key The latter count() is an aggregate expression So the matching subgraphs will bedivided into different buckets depending on the grouping key The aggregate function will then run onthese buckets calculating the aggregate values
If you want to use aggregations to sort your result set the aggregation must be included in the RETURNto be used in your ORDER BY
The last piece of the puzzle is the DISTINCT keyword It is used to make all values unique before runningthem through an aggregate function
An example might be helpful In this case we are running the query against the following data
Person
nam e = Deyes = brown
Person
nam e = Aproperty = 13
KNOWS
Person
nam e = Cproperty = 44eyes = blue
KNOWS
Person
nam e = Bproperty = 33eyes = blue
KNOWS
Person
nam e = D
KNOWS KNOWS
Query
MATCH (mePerson)--gt(friendPerson)--gt(friend_of_friendPerson)
WHERE mename = A
RETURN count(DISTINCT friend_of_friend) count(friend_of_friend)
In this example we are trying to find all our friends of friends and count them The first aggregatefunction count(DISTINCT friend_of_friend) will only see a friend_of_friend once mdash DISTINCT removesthe duplicates The latter aggregate function count(friend_of_friend) might very well see the samefriend_of_friend multiple times In this case both B and C know D and thus D will get counted twicewhen not using DISTINCT
Reading Clauses
178
Resultcount(distinct friend_of_friend) count(friend_of_friend)
1 2
1 row
The following examples are assuming the example graph structure below
Figure 115 Graph
Person
nam e = Deyes = brown
Person
nam e = Aproperty = 13
KNOWS
Person
nam e = Cproperty = 44eyes = blue
KNOWS
Person
nam e = Bproperty = 33eyes = blue
KNOWS
COUNTCOUNT is used to count the number of rows
COUNT can be used in two forms mdash COUNT() which just counts the number of matching rows andCOUNT(ltidentifiergt) which counts the number of non-NULL values in ltidentifiergt
Count nodesTo count the number of nodes for example the number of nodes connected to one node you can usecount()
Query
MATCH (n name A )--gt(x)
RETURN n count()
This returns the start node and the count of related nodes
Resultn count()
Node[1]nameA property13 3
1 row
Group Count Relationship TypesTo count the groups of relationship types return the types and count them with count()
Query
MATCH (n name A )-[r]-gt()
RETURN type(r) count()
The relationship types and their group count is returned by the query
Resulttype(r) count()
KNOWS 3
1 row
Reading Clauses
179
Count entitiesInstead of counting the number of results with count() it might be more expressive to include thename of the identifier you care about
Query
MATCH (n name A )--gt(x)
RETURN count(x)
The example query returns the number of connected nodes from the start node
Resultcount(x)
3
1 row
Count non-null valuesYou can count the non-NULL values by using count(ltidentifiergt)
Query
MATCH (nPerson)
RETURN count(nproperty)
The count of related nodes with the property property set is returned by the query
Resultcount(nproperty)
3
1 row
Statistics
sumThe sum aggregation function simply sums all the numeric values it encounters NULLs are silentlydropped
Query
MATCH (nPerson)
RETURN sum(nproperty)
This returns the sum of all the values in the property property
Resultsum(nproperty)
90
1 row
avgavg calculates the average of a numeric column
Query
MATCH (nPerson)
RETURN avg(nproperty)
The average of all the values in the property property is returned by the example query
Reading Clauses
180
Resultavg(nproperty)
30 0
1 row
percentileDiscpercentileDisc calculates the percentile of a given value over a group with a percentile from 00 to 10It uses a rounding method returning the nearest value to the percentile For interpolated values seepercentileContQuery
MATCH (nPerson)
RETURN percentileDisc(nproperty 05)
The 50th percentile of the values in the property property is returned by the example query In this case05 is the median or 50th percentile
ResultpercentileDisc(nproperty 05)
33
1 row
percentileContpercentileCont calculates the percentile of a given value over a group with a percentile from 00 to 10It uses a linear interpolation method calculating a weighted average between two values if the desiredpercentile lies between them For nearest values using a rounding method see percentileDiscQuery
MATCH (nPerson)
RETURN percentileCont(nproperty 04)
The 40th percentile of the values in the property property is returned by the example query calculatedwith a weighted average
ResultpercentileCont(nproperty 04)
29 0
1 row
stdevstdev calculates the standard deviation for a given value over a group It uses a standard two-passmethod with N - 1 as the denominator and should be used when taking a sample of the populationfor an unbiased estimate When the standard variation of the entire population is being calculatedstdevp should be usedQuery
MATCH (n)
WHERE nname IN [A B C]
RETURN stdev(nproperty)
The standard deviation of the values in the property property is returned by the example query
Resultstdev(nproperty)
15 716233645501712
1 row
Reading Clauses
181
stdevpstdevp calculates the standard deviation for a given value over a group It uses a standard two-passmethod with N as the denominator and should be used when calculating the standard deviation for anentire population When the standard variation of only a sample of the population is being calculatedstdev should be used
Query
MATCH (n)
WHERE nname IN [A B C]
RETURN stdevp(nproperty)
The population standard deviation of the values in the property property is returned by the examplequery
Resultstdevp(nproperty)
12 832251036613439
1 row
maxmax find the largest value in a numeric column
Query
MATCH (nPerson)
RETURN max(nproperty)
The largest of all the values in the property property is returned
Resultmax(nproperty)
44
1 row
minmin takes a numeric property as input and returns the smallest value in that column
Query
MATCH (nPerson)
RETURN min(nproperty)
This returns the smallest of all the values in the property property
Resultmin(nproperty)
13
1 row
collectcollect collects all the values into a list It will ignore NULLs
Query
MATCH (nPerson)
RETURN collect(nproperty)
Returns a single row with all the values collected
Reading Clauses
182
Resultcollect(nproperty)
[13 33 44]
1 row
DISTINCTAll aggregation functions also take the DISTINCT modifier which removes duplicates from the values Soto count the number of unique eye colors from nodes related to a this query can be used
Query
MATCH (aPerson name A )--gt(b)
RETURN count(DISTINCT beyes)
Returns the number of eye colors
Resultcount(distinct beyes)
2
1 row
Reading Clauses
183
116 Load CSVLOAD CSV is used to import data from CSV files Supports resources compressed with gzipDeflate as well as ZIP archives
The URL of the CSV file is specified by using FROM followed by an arbitrary expression evaluating to theURL in question It is required to specify an identifier for the CSV data using AS
See the examples below for further details
There is also a worked example see Section 128 ldquoImporting CSV files with Cypherrdquo [211]
CSV file formatThe CSV file to use with LOAD CSV must have the following characteristics
bull the character encoding is UTF-8bull the end line termination is system dependent eg it is n on unix or rn on windowsbull the default field terminator is bull the field terminator character can be change by using the option FIELDTERMINATOR available in the LOAD
CSV commandbull quoted strings are allowed in the CSV file and the quotes are dropped when reading the databull the character for string quotation is double quote bull the escape character is
Import data from a CSV fileTo import data from a CSV file into Neo4j you can use LOAD CSV to get the data into your query Thenyou write it to your database using the normal updating clauses of Cypher
artistscsv
1ABBA1992
2Roxette1986
3Europe1979
4The Cardigans1992
Query
LOAD CSV FROM httpneo4jcomdocs230csvartistscsv AS line
CREATE (Artist name line[1] year toInt(line[2]))
A new node with the Artist label is created for each row in the CSV file In addition two columns fromthe CSV file are set as properties on the nodes
Result(empty result)
Nodes created 4Properties set 8Labels added 4
Import data from a CSV file containing headersWhen your CSV file has headers you can view each row in the file as a map instead of as an array ofstrings
artists-with-headerscsv
IdNameYear
1ABBA1992
2Roxette1986
Reading Clauses
184
3Europe1979
4The Cardigans1992
Query
LOAD CSV WITH HEADERS FROM httpneo4jcomdocs230csvartists-with-headerscsv AS line
CREATE (Artist name lineName year toInt(lineYear))
This time the file starts with a single row containing column names Indicate this using WITH HEADERS andyou can access specific fields by their corresponding column name
Result(empty result)
Nodes created 4Properties set 8Labels added 4
Import data from a CSV file with a custom field delimiterSometimes your CSV file has other field delimiters than commas You can specify which delimiter yourfile uses using FIELDTERMINATOR
artists-fieldterminatorcsv
1ABBA1992
2Roxette1986
3Europe1979
4The Cardigans1992
Query
LOAD CSV FROM httpneo4jcomdocs230csvartists-fieldterminatorcsv AS line FIELDTERMINATOR
CREATE (Artist name line[1] year toInt(line[2]))
As values in this file are separated by a semicolon a custom FIELDTERMINATOR is specified in the LOAD CSVclause
Result(empty result)
Nodes created 4Properties set 8Labels added 4
Importing large amounts of dataIf the CSV file contains a significant number of rows (approaching hundreds of thousands or millions)USING PERIODIC COMMIT can be used to instruct Neo4j to perform a commit after a number of rows Thisreduces the memory overhead of the transaction state By default the commit will happen every 1000rows For more information see Section 129 ldquoUsing Periodic Commitrdquo [213]
Query
USING PERIODIC COMMIT
LOAD CSV FROM httpneo4jcomdocs230csvartistscsv AS line
CREATE (Artist name line[1] year toInt(line[2]))
Result(empty result)
Nodes created 4Properties set 8Labels added 4
Reading Clauses
185
Setting the rate of periodic commitsYou can set the number of rows as in the example where it is set to 500 rows
Query
USING PERIODIC COMMIT 500
LOAD CSV FROM httpneo4jcomdocs230csvartistscsv AS line
CREATE (Artist name line[1] year toInt(line[2]))
Result(empty result)
Nodes created 4Properties set 8Labels added 4
Import data containing escaped charactersIn this example we both have additional quotes around the values as well as escaped quotes insideone value
artists-with-escaped-charcsv
1The Symbol1992
Query
LOAD CSV FROM httpneo4jcomdocs230csvartists-with-escaped-charcsv AS line
CREATE (aArtist name line[1] year toInt(line[2]))
RETURN aname AS name ayear AS year length(aname) AS length
Note that strings are wrapped in quotes in the output here You can see that when comparing to thelength of the string in this case
Resultname year length
The Symbol 1992 12
1 rowNodes created 1Properties set 2Labels added 1
186
Chapter 12 Writing Clauses
Write data to the database
Writing Clauses
187
121 CreateThe CREATE clause is used to create graph elements mdash nodes and relationships
TipIn the CREATE clause patterns are used a lot Read Section 96 ldquoPatternsrdquo [127] for anintroduction
Create nodes
Create single nodeCreating a single node is done by issuing the following query
Query
CREATE (n)
Nothing is returned from this query except the count of affected nodes
Result(empty result)
Nodes created 1
Create multiple nodesCreating multiple nodes is done by separating them with a comma
Query
CREATE (n)(m)
Result(empty result)
Nodes created 2
Create a node with a labelTo add a label when creating a node use the syntax below
Query
CREATE (nPerson)
Nothing is returned from this query
Result(empty result)
Nodes created 1Labels added 1
Create a node with multiple labelsTo add labels when creating a node use the syntax below In this case we add two labels
Query
CREATE (nPersonSwedish)
Writing Clauses
188
Nothing is returned from this query
Result(empty result)
Nodes created 1Labels added 2
Create node and add labels and propertiesWhen creating a new node with labels you can add properties at the same time
Query
CREATE (nPerson name Andres title Developer )
Nothing is returned from this query
Result(empty result)
Nodes created 1Properties set 2Labels added 1
Return created nodeCreating a single node is done by issuing the following query
Query
CREATE (a name Andres )
RETURN a
The newly created node is returned
Resulta
Node[0]nameAndres
1 rowNodes created 1Properties set 1
Create relationships
Create a relationship between two nodesTo create a relationship between two nodes we first get the two nodes Once the nodes are loaded wesimply create a relationship between them
Query
MATCH (aPerson)(bPerson)
WHERE aname = Node A AND bname = Node B
CREATE (a)-[rRELTYPE]-gt(b)
RETURN r
The created relationship is returned by the query
Resultr
RELTYPE[0]
1 rowRelationships created 1
Writing Clauses
189
Create a relationship and set propertiesSetting properties on relationships is done in a similar manner to how itrsquos done when creating nodesNote that the values can be any expression
Query
MATCH (aPerson)(bPerson)
WHERE aname = Node A AND bname = Node B
CREATE (a)-[rRELTYPE name aname + lt-gt + bname ]-gt(b)
RETURN r
The newly created relationship is returned by the example query
Resultr
RELTYPE[0]nameNode Alt-gtNode B
1 rowRelationships created 1Properties set 1
Create a full pathWhen you use CREATE and a pattern all parts of the pattern that are not already in scope at this time willbe created
Query
CREATE p =(andres nameAndres )-[WORKS_AT]-gt(neo)lt-[WORKS_AT]-(michael nameMichael )
RETURN p
This query creates three nodes and two relationships in one go assigns it to a path identifier andreturns it
Resultp
[Node[0]nameAndres WORKS_AT[0] Node[1] WORKS_AT[1] Node[2]nameMichael]
1 rowNodes created 3Relationships created 2Properties set 2
Use parameters with CREATE
Create node with a parameter for the propertiesYou can also create a graph entity from a map All the keyvalue pairs in the map will be set asproperties on the created relationship or node In this case we add a Person label to the node as well
Parameters
props
name Andres
position Developer
Query
CREATE (nPerson props )
RETURN n
Writing Clauses
190
Resultn
Node[0]nameAndres positionDeveloper
1 rowNodes created 1Properties set 2Labels added 1
Create multiple nodes with a parameter for their propertiesBy providing Cypher an array of maps it will create a node for each map
Parameters
props [
name Andres
position Developer
name Michael
position Developer
]
Query
UNWIND props AS map
CREATE (n)
SET n = map
Result(empty result)
Nodes created 2Properties set 4
Create multiple nodes with a parameter for their properties using old syntaxBy providing Cypher an array of maps it will create a node for each map
NoteWhen you do this you canrsquot create anything else in the same CREATE clause
NoteThis syntax is deprecated in Neo4j version 23 It may be removed in a future major releaseSee the above example using UNWIND for how to achieve the same functionality
Parameters
props [
name Andres
position Developer
name Michael
position Developer
]
Query
CREATE (n props )
Writing Clauses
191
RETURN n
Resultn
Node[0]nameAndres positionDeveloper
Node[1]nameMichael positionDeveloper
2 rowsNodes created 2Properties set 4
Writing Clauses
192
122 MergeThe MERGE clause ensures that a pattern exists in the graph Either the pattern alreadyexists or it needs to be created
IntroductionMERGE either matches existing nodes and binds them or it creates new data and binds that Itrsquos like acombination of MATCH and CREATE that additionally allows you to specify what happens if the data wasmatched or created
For example you can specify that the graph must contain a node for a user with a certain name Ifthere isnrsquot a node with the correct name a new node will be created and its name property set
When using MERGE on full patterns the behavior is that either the whole pattern matches or the wholepattern is created MERGE will not partially use existing patterns mdash itrsquos all or nothing If partial matches areneeded this can be accomplished by splitting a pattern up into multiple MERGE clauses
As with MATCH MERGE can match multiple occurrences of a pattern If there are multiple matches they willall be passed on to later stages of the query
The last part of MERGE is the ON CREATE and ON MATCH These allow a query to express additional changes tothe properties of a node or relationship depending on if the element was MATCHed in the database or ifit was CREATEd
The rule planner (see Section 151 ldquoHow are queries executedrdquo [254]) expands a MERGE pattern fromthe end point that has the identifier with the lowest lexicographical order This means that it mightchoose a suboptimal expansion path expanding from a node with a higher degree The pattern MERGE(aA)-[R]-gt(bB) will always expand from a to b so if it is known that b nodes are a better choice forstart point renaming identifiers could improve performance
The following graph is used for the examples below
Figure 121 Graph
Person
chauffeurNam e = Bill Whitenam e = Oliver StonebornIn = New York
Movie
nam e = WallSt reet t it le = Wall St reet
DIRECTED
Person
chauffeurNam e = John Brownnam e = Charlie SheenbornIn = New York
ACTED_IN
Person
chauffeurNam e = Bob Brownnam e = Mart in SheenbornIn = Ohio
FATHER
ACTED_IN
Movie
t it le = The Am erican President nam e = TheAm ericanPresident
ACTED_IN
Person
chauffeurNam e = Ted Greennam e = Rob ReinerbornIn = New York
DIRECTED
Person
bornIn = New JerseychauffeurNam e = John Brownnam e = Michael Douglas
ACTED_IN ACTED_IN
Merge nodes
Merge single node with a labelMerging a single node with a given label
Query
MERGE (robertCritic)
RETURN robert labels(robert)
A new node is created because there are no nodes labeled Critic in the database
Writing Clauses
193
Resultrobert labels(robert)
Node[7] [Critic]
1 rowNodes created 1Labels added 1
Merge single node with propertiesMerging a single node with properties where not all properties match any existing node
Query
MERGE (charlie nameCharlie Sheen age10 )
RETURN charlie
A new node with the name Charlie Sheen will be created since not all properties matched the existingCharlie Sheen node
Resultcharlie
Node[7]nameCharlie Sheen age10
1 rowNodes created 1Properties set 2
Merge single node specifying both label and propertyMerging a single node with both label and property matching an existing node
Query
MERGE (michaelPerson nameMichael Douglas )
RETURN michaelname michaelbornIn
Michael Douglas will be matched and the name and bornIn properties returned
Resultmichaelname michaelbornIn
Michael Douglas New Jersey
1 row
Merge single node derived from an existing node propertyFor some property p in each bound node in a set of nodes a single new node is created for eachunique value for p
Query
MATCH (personPerson)
MERGE (cityCity name personbornIn )
RETURN personname personbornIn city
Three nodes labeled City are created each of which contains a name property with the value of NewYork Ohio and New Jersey respectively Note that even though the MATCH clause results in three boundnodes having the value New York for the bornIn property only a single New York node (ie a City nodewith a name of New York) is created As the New York node is not matched for the first bound node itis created However the newly-created New York node is matched and bound for the second and thirdbound nodes
Writing Clauses
194
Resultpersonname personbornIn city
Oliver Stone New York Node[7]nameNew York
Charlie Sheen New York Node[7]nameNew York
Martin Sheen Ohio Node[8]nameOhio
Rob Reiner New York Node[7]nameNew York
Michael Douglas New Jersey Node[9]nameNew Jersey
5 rowsNodes created 3Properties set 3Labels added 3
Use ON CREATE and ON MATCH
Merge with ON CREATEMerge a node and set properties if the node needs to be createdQuery
MERGE (keanuPerson nameKeanu Reeves )
ON CREATE SET keanucreated = timestamp()
RETURN keanuname keanucreated
The query creates the keanu node and sets a timestamp on creation time
Resultkeanuname keanucreated
Keanu Reeves 1445037182554
1 rowNodes created 1Properties set 2Labels added 1
Merge with ON MATCHMerging nodes and setting properties on found nodesQuery
MERGE (personPerson)
ON MATCH SET personfound = TRUE RETURN personname personfound
The query finds all the Person nodes sets a property on them and returns them
Resultpersonname personfound
Oliver Stone true
Charlie Sheen true
Martin Sheen true
Rob Reiner true
Michael Douglas true
5 rowsProperties set 5
Merge with ON CREATE and ON MATCHMerge a node and set properties if the node needs to be createdQuery
Writing Clauses
195
MERGE (keanuPerson nameKeanu Reeves )
ON CREATE SET keanucreated = timestamp()
ON MATCH SET keanulastSeen = timestamp()
RETURN keanuname keanucreated keanulastSeen
The query creates the keanu node and sets a timestamp on creation time If keanu had already existeda different property would have been set
Resultkeanuname keanucreated keanulastSeen
Keanu Reeves 1445037184685 ltnullgt
1 rowNodes created 1Properties set 2Labels added 1
Merge with ON MATCH setting multiple propertiesIf multiple properties should be set simply separate them with commasQuery
MERGE (personPerson)
ON MATCH SET personfound = TRUE personlastAccessed = timestamp()
RETURN personname personfound personlastAccessed
Resultpersonname personfound personlastAccessed
Oliver Stone true 1445037183921
Charlie Sheen true 1445037183921
Martin Sheen true 1445037183921
Rob Reiner true 1445037183921
Michael Douglas true 1445037183921
5 rowsProperties set 10
Merge relationships
Merge on a relationshipMERGE can be used to match or create a relationshipQuery
MATCH (charliePerson nameCharlie Sheen )(wallStreetMovie titleWall Street )
MERGE (charlie)-[rACTED_IN]-gt(wallStreet)
RETURN charliename type(r) wallStreettitle
Charlie Sheen had already been marked as acting in Wall Street so the existing relationship is found andreturned Note that in order to match or create a relationship when using MERGE at least one boundnode must be specified which is done via the MATCH clause in the above example
Resultcharliename type(r) wallStreettitle
Charlie Sheen ACTED_IN Wall Street
1 row
Merge on multiple relationshipsWhen MERGE is used on a whole pattern either everything matches or everything is createdQuery
Writing Clauses
196
MATCH (oliverPerson nameOliver Stone )(reinerPerson nameRob Reiner )
MERGE (oliver)-[DIRECTED]-gt(movieMovie)lt-[ACTED_IN]-(reiner)
RETURN movie
In our example graph Oliver Stone and Rob Reiner have never worked together When we try to MERGE amovie between them Neo4j will not use any of the existing movies already connected to either personInstead a new movie node is created
Resultmovie
Node[7]
1 rowNodes created 1Relationships created 2Labels added 1
Merge on an undirected relationshipMERGE can also be used with an undirected relationship When it needs to create a new one it will pick adirection
Query
MATCH (charliePerson nameCharlie Sheen )(oliverPerson nameOliver Stone )
MERGE (charlie)-[rKNOWS]-(oliver)
RETURN r
As Charlie Sheen and Oliver Stone do not know each other this MERGE query will create a KNOWSrelationship between them The direction of the created relationship is arbitrary
Resultr
KNOWS[8]
1 rowRelationships created 1
Merge on a relationship between two existing nodesMERGE can be used in conjunction with preceding MATCH and MERGE clauses to create a relationshipbetween two bound nodes m and n where m is returned by MATCH and n is created or matched by theearlier MERGE
Query
MATCH (personPerson)
MERGE (cityCity name personbornIn )
MERGE (person)-[rBORN_IN]-gt(city)
RETURN personname personbornIn city
This builds on the example from the section called ldquoMerge single node derived from an existing nodepropertyrdquo [193] The second MERGE creates a BORN_IN relationship between each person and a citycorresponding to the value of the personrsquos bornIn property Charlie Sheen Rob Reiner and Oliver Stone allhave a BORN_IN relationship to the same City node (New York)
Resultpersonname personbornIn city
Oliver Stone New York Node[7]nameNew York
5 rowsNodes created 3Relationships created 5Properties set 3Labels added 3
Writing Clauses
197
personname personbornIn city
Charlie Sheen New York Node[7]nameNew York
Martin Sheen Ohio Node[8]nameOhio
Rob Reiner New York Node[7]nameNew York
Michael Douglas New Jersey Node[9]nameNew Jersey
5 rowsNodes created 3Relationships created 5Properties set 3Labels added 3
Merge on a relationship between an existing node and a merged node derived from anode propertyMERGE can be used to simultaneously create both a new node n and a relationship between a boundnode m and nQuery
MATCH (personPerson)
MERGE (person)-[rHAS_CHAUFFEUR]-gt(chauffeurChauffeur name personchauffeurName )
RETURN personname personchauffeurName chauffeur
As MERGE found no matches mdash in our example graph there are no nodes labeled with Chauffeur and noHAS_CHAUFFEUR relationships mdash MERGE creates five nodes labeled with Chauffeur each of which containsa name property whose value corresponds to each matched Person nodersquos chauffeurName propertyvalue MERGE also creates a HAS_CHAUFFEUR relationship between each Person node and the newly-createdcorresponding Chauffeur node As Charlie Sheen and Michael Douglas both have a chauffeur with thesame name mdash John Brown mdash a new node is created in each case resulting in two Chauffeur nodes havinga name of John Brown correctly denoting the fact that even though the name property may be identicalthese are two separate people This is in contrast to the example shown above in the section calledldquoMerge on a relationship between two existing nodesrdquo [196] where we used the first MERGE to bind theCity nodes to prevent them from being recreated (and thus duplicated) in the second MERGE
Resultpersonname personchauffeurName chauffeur
Oliver Stone Bill White Node[7]nameBill White
Charlie Sheen John Brown Node[8]nameJohn Brown
Martin Sheen Bob Brown Node[9]nameBob Brown
Rob Reiner Ted Green Node[10]nameTed Green
Michael Douglas John Brown Node[11]nameJohn Brown
5 rowsNodes created 5Relationships created 5Properties set 5Labels added 5
Using unique constraints with MERGECypher prevents getting conflicting results from MERGE when using patterns that involve uniquenessconstrains In this case there must be at most one node that matches that patternFor example given two uniqueness constraints on Person(id) and Person(ssn) then a query such asMERGE (nPerson id 12 ssn 437) will fail if there are two different nodes (one with id 12 and onewith ssn 437) or if there is only one node with only one of the properties In other words there must beexactly one node that matches the pattern or no matching nodesNote that the following examples assume the existence of uniqueness constraints that have beencreated using
Writing Clauses
198
CREATE CONSTRAINT ON (nPerson) ASSERT nname IS UNIQUE
CREATE CONSTRAINT ON (nPerson) ASSERT nrole IS UNIQUE
Merge using unique constraints creates a new node if no node is foundMerge using unique constraints creates a new node if no node is found
Query
MERGE (laurencePerson name Laurence Fishburne )
RETURN laurencename
The query creates the laurence node If laurence had already existed MERGE would just match the existingnode
Resultlaurencename
Laurence Fishburne
1 rowNodes created 1Properties set 1Labels added 1
Merge using unique constraints matches an existing nodeMerge using unique constraints matches an existing node
Query
MERGE (oliverPerson nameOliver Stone )
RETURN olivername oliverbornIn
The oliver node already exists so MERGE just matches it
Resultolivername oliverbornIn
Oliver Stone New York
1 row
Merge with unique constraints and partial matchesMerge using unique constraints fails when finding partial matches
Query
MERGE (michaelPerson nameMichael Douglas roleGordon Gekko )
RETURN michael
While there is a matching unique michael node with the name Michael Douglas there is no unique nodewith the role of Gordon Gekko and MERGE fails to match
Error message
Merge did not find a matching node and can not create a new node due to conflicts
with both existing and missing unique nodes The conflicting constraints are on
Personname and Personrole
Merge with unique constraints and conflicting matchesMerge using unique constraints fails when finding conflicting matches
Query
MERGE (oliverPerson nameOliver Stone roleGordon Gekko )
RETURN oliver
Writing Clauses
199
While there is a matching unique oliver node with the name Oliver Stone there is also another uniquenode with the role of Gordon Gekko and MERGE fails to match
Error message
Merge did not find a matching node and can not create a new node due to conflicts
with both existing and missing unique nodes The conflicting constraints are on
Personname and Personrole
Using map parameters with MERGEMERGE does not support map parameters like for example CREATE does To use map parameters withMERGE it is necessary to explicitly use the expected properties like in the following example For moreinformation on parameters see Section 85 ldquoParametersrdquo [113]
Parameters
param
name Keanu Reeves
role Neo
Query
MERGE (personPerson name param name role param role )
RETURN personname personrole
Resultpersonname personrole
Keanu Reeves Neo
1 rowNodes created 1Properties set 2Labels added 1
Writing Clauses
200
123 SetThe SET clause is used to update labels on nodes and properties on nodes andrelationships
SET can also be used with maps from parameters to set properties
NoteSetting labels on a node is an idempotent operations mdash if you try to set a label on a nodethat already has that label on it nothing happens The query statistics will tell you ifsomething needed to be done or not
The examples use this graph as a starting point
nam e = Em il
nam e = Peterage = 34
KNOWS
nam e = Stefan
Swedish
nam e = Andresage = 36hungry = t rue
KNOWS
KNOWS
Set a propertyTo set a property on a node or relationship use SET
Query
MATCH (n name Andres )
SET nsurname = Taylor
RETURN n
The newly changed node is returned by the query
Resultn
Node[3]surnameTaylor nameAndres age36 hungrytrue
1 rowProperties set 1
Remove a propertyNormally you remove a property by using REMOVE but itrsquos sometimes handy to do it using the SETcommand One example is if the property comes from a parameter
Query
Writing Clauses
201
MATCH (n name Andres )
SET nname = NULL RETURN n
The node is returned by the query and the name property is now missing
Resultn
Node[3]hungrytrue age36
1 rowProperties set 1
Copying properties between nodes and relationshipsYou can also use SET to copy all properties from one graph element to another Remember that doingthis will remove all other properties on the receiving graph element
Query
MATCH (at name Andres )(pn name Peter )
SET at = pn
RETURN at pn
The Andres node has had all itrsquos properties replaced by the properties in the Peter node
Resultat pn
Node[3]namePeter age34 Node[2]namePeter age34
1 rowProperties set 3
Adding properties from mapsWhen setting properties from a map (literal paremeter or graph element) you can use the += form ofSET to only add properties and not remove any of the existing properties on the graph element
Query
MATCH (peter name Peter )
SET peter += hungry TRUE position Entrepreneur
Result(empty result)
Properties set 2
Set a property using a parameterUse a parameter to give the value of a property
Parameters
surname Taylor
Query
MATCH (n name Andres )
SET nsurname = surname
RETURN n
The Andres node has got an surname added
Writing Clauses
202
Resultn
Node[3]surnameTaylor nameAndres age36 hungrytrue
1 rowProperties set 1
Set all properties using a parameterThis will replace all existing properties on the node with the new set provided by the parameter
Parameters
props
name Andres
position Developer
Query
MATCH (n name Andres )
SET n = props
RETURN n
The Andres node has had all itrsquos properties replaced by the properties in the props parameter
Resultn
Node[3]nameAndres positionDeveloper
1 rowProperties set 4
Set multiple properties using one SET clauseIf you want to set multiple properties in one go simply separate them with a comma
Query
MATCH (n name Andres )
SET nposition = Developer nsurname = Taylor
Result(empty result)
Properties set 2
Set a label on a nodeTo set a label on a node use SET
Query
MATCH (n name Stefan )
SET n German
RETURN n
The newly labeled node is returned by the query
Resultn
Node[1]nameStefan
1 rowLabels added 1
Writing Clauses
203
Set multiple labels on a nodeTo set multiple labels on a node use SET and separate the different labels using
Query
MATCH (n name Emil )
SET n SwedishBossman
RETURN n
The newly labeled node is returned by the query
Resultn
Node[0]nameEmil
1 rowLabels added 2
Writing Clauses
204
124 DeleteThe DELETE clause is used to delete graph elements mdash nodes relationships or paths
For removing properties and labels see Section 125 ldquoRemoverdquo [205] Remember that you can notdelete a node without also deleting relationships that start or end on said node Either explicitly deletethe relationships or use DETACH DELETEThe examples start out with the following database
nam e = Tobiasage = 25
nam e = Peterage = 34
nam e = Andresage = 36
KNOWS KNOWS
Delete single nodeTo delete a node use the DELETE clauseQuery
MATCH (nUseless)
DELETE n
Result(empty result)
Nodes deleted 1
Delete all nodes and relationshipsThis query isnrsquot for deleting large amounts of data but is nice when playing around with small exampledata setsQuery
MATCH (n)
DETACH DELETE n
Result(empty result)
Nodes deleted 3Relationships deleted 2
Delete a node with all its relationshipsWhen you want to delete a node and any relationship going to or from it use DETACH DELETEQuery
MATCH (n nameAndres )
DETACH DELETE n
Result(empty result)
Nodes deleted 1Relationships deleted 2
Writing Clauses
205
125 RemoveThe REMOVE clause is used to remove properties and labels from graph elements
For deleting nodes and relationships see Section 124 ldquoDeleterdquo [204]
NoteRemoving labels from a node is an idempotent operation If you try to remove a label from anode that does not have that label on it nothing happens The query statistics will tell you ifsomething needed to be done or not
The examples start out with the following database
Swedish
nam e = Tobiasage = 25
Swedish Germ an
nam e = Peterage = 34
Swedish
nam e = Andresage = 36
KNOWS KNOWS
Remove a propertyNeo4j doesnrsquot allow storing null in properties Instead if no value exists the property is just not thereSo to remove a property value on a node or a relationship is also done with REMOVE
Query
MATCH (andres name Andres )
REMOVE andresage
RETURN andres
The node is returned and no property age exists on it
Resultandres
Node[2]nameAndres
1 rowProperties set 1
Remove a label from a nodeTo remove labels you use REMOVE
Query
MATCH (n name Peter )
REMOVE nGerman
RETURN n
Resultn
Node[1]namePeter age34
1 rowLabels removed 1
Writing Clauses
206
Removing multiple labelsTo remove multiple labels you use REMOVE
Query
MATCH (n name Peter )
REMOVE nGermanSwedish
RETURN n
Resultn
Node[1]namePeter age34
1 rowLabels removed 2
Writing Clauses
207
126 ForeachThe FOREACH clause is used to update data within a collection whether components of apath or result of aggregation
Collections and paths are key concepts in Cypher To use them for updating data you can use theFOREACH construct It allows you to do updating commands on elements in a collection mdash a path or acollection created by aggregation
The identifier context inside of the foreach parenthesis is separate from the one outside it This meansthat if you CREATE a node identifier inside of a FOREACH you will not be able to use it outside of theforeach statement unless you match to find it
Inside of the FOREACH parentheses you can do any of the updating commands mdash CREATE CREATE UNIQUEMERGE DELETE and FOREACH
If you want to execute an additional MATCH for each element in a collection then UNWIND (see Section 106ldquoUnwindrdquo [148]) would be a more appropriate command
Figure 122 Data for the examples
nam e = D
nam e = A
nam e = B
KNOWS
nam e = C
KNOWS
KNOWS
Mark all nodes along a pathThis query will set the property marked to true on all nodes along a path
Query
MATCH p =(begin)-[]-gt(END )
WHERE beginname=A AND END name=D
FOREACH (n IN nodes(p)| SET nmarked = TRUE )
Nothing is returned from this query but four properties are set
Result(empty result)
Properties set 4
Writing Clauses
208
127 Create UniqueThe CREATE UNIQUE clause is a mix of MATCH and CREATE mdash it will match what it can and createwhat is missing
Introduction
TipMERGE might be what you want to use instead of CREATE UNIQUE Note however that MERGEdoesnrsquot give as strong guarantees for relationships being unique
CREATE UNIQUE is in the middle of MATCH and CREATE mdash it will match what it can and create what is missingCREATE UNIQUE will always make the least change possible to the graph mdash if it can use parts of theexisting graph it will
Another difference to MATCH is that CREATE UNIQUE assumes the pattern to be unique If multiple matchingsubgraphs are found an error will be generated
TipIn the CREATE UNIQUE clause patterns are used a lot Read Section 96 ldquoPatternsrdquo [127] for anintroduction
The examples start out with the following data set
nam e = A
nam e = C
KNOWS
nam e = root
X
X nam e = B
X
Create unique nodes
Create node if missingIf the pattern described needs a node and it canrsquot be matched a new node will be created
Query
MATCH (root name root )
CREATE UNIQUE (root)-[LOVES]-(someone)
RETURN someone
The root node doesnrsquot have any LOVES relationships and so a node is created and also a relationship tothat node
Writing Clauses
209
Resultsomeone
Node[4]
1 rowNodes created 1Relationships created 1
Create nodes with valuesThe pattern described can also contain values on the node These are given using the following syntaxprop ltexpressiongt
Query
MATCH (root name root )
CREATE UNIQUE (root)-[X]-(leaf nameD )
RETURN leaf
No node connected with the root node has the name D and so a new node is created to match thepattern
Resultleaf
Node[4]nameD
1 rowNodes created 1Relationships created 1Properties set 1
Create labeled node if missingIf the pattern described needs a labeled node and there is none with the given labels Cypher will createa new one
Query
MATCH (a name A )
CREATE UNIQUE (a)-[KNOWS]-(cblue)
RETURN c
The A node is connected in a KNOWS relationship to the c node but since C doesnrsquot have the blue label anew node labeled as blue is created along with a KNOWS relationship from A to it
Resultc
Node[4]
1 rowNodes created 1Relationships created 1Labels added 1
Create unique relationships
Create relationship if it is missingCREATE UNIQUE is used to describe the pattern that should be found or created
Query
MATCH (lft name A )(rgt)
WHERE rgtname IN [B C]
CREATE UNIQUE (lft)-[rKNOWS]-gt(rgt)
Writing Clauses
210
RETURN r
The left node is matched agains the two right nodes One relationship already exists and can bematched and the other relationship is created before it is returned
Resultr
KNOWS[4]
KNOWS[3]
2 rowsRelationships created 1
Create relationship with valuesRelationships to be created can also be matched on values
Query
MATCH (root name root )
CREATE UNIQUE (root)-[rX sinceforever ]-()
RETURN r
In this example we want the relationship to have a value and since no such relationship can be founda new node and relationship are created Note that since we are not interested in the created node wedonrsquot name it
Resultr
X[4]sinceforever
1 rowNodes created 1Relationships created 1Properties set 1
Describe complex patternThe pattern described by CREATE UNIQUE can be separated by commas just like in MATCH and CREATE
Query
MATCH (root name root )
CREATE UNIQUE (root)-[FOO]-gt(x)(root)-[BAR]-gt(x)
RETURN x
This example pattern uses two paths separated by a comma
Resultx
Node[4]
1 rowNodes created 1Relationships created 2
Writing Clauses
211
128 Importing CSV files with CypherThis tutorial will show you how to import data from CSV files using LOAD CSV
In this example wersquore given three CSV files a list of persons a list of movies and a list of which role wasplayed by some of these persons in each movie
CSV files can be stored on the database server and are then accessible using a file URLAlternatively LOAD CSV also supports accessing CSV files via HTTPS HTTP and FTP
Using the following Cypher queries wersquoll create a node for each person a node for each movie and arelationship between the two with a property denoting the role Wersquore also keeping track of the countryin which each movie was made
Letrsquos start with importing the persons
LOAD CSV WITH HEADERS FROM httpneo4jcomdocs230csvimportpersonscsv AS csvLine
CREATE (pPerson id toInt(csvLineid) name csvLinename )
The CSV file wersquore using looks like this
personscsv
idname
1Charlie Sheen
2Oliver Stone
3Michael Douglas
4Martin Sheen
5Morgan Freeman
Now letrsquos import the movies This time wersquore also creating a relationship to the country in which themovie was made If you are storing your data in a SQL database this is the one-to-many relationshiptype
Wersquore using MERGE to create nodes that represent countries Using MERGE avoids creating duplicatecountry nodes in the case where multiple movies have been made in the same country
ImportantWhen using MERGE or MATCH with LOAD CSV we need to make sure we have an index(see Section 141 ldquoIndexesrdquo [244]) or a unique constraint (see Section 142ldquoConstraintsrdquo [247]) on the property wersquore merging This will ensure the query executes ina performant way
Before running our query to connect movies and countries wersquoll create an index for the name propertyon the Country label to ensure the query runs as fast as it can
CREATE INDEX ON Country(name)
LOAD CSV WITH HEADERS FROM httpneo4jcomdocs230csvimportmoviescsv AS csvLine
MERGE (countryCountry name csvLinecountry )
CREATE (movieMovie id toInt(csvLineid) title csvLinetitle yeartoInt(csvLineyear))
CREATE (movie)-[MADE_IN]-gt(country)
moviescsv
idtitlecountryyear
1Wall StreetUSA1987
2The American PresidentUSA1995
3The Shawshank RedemptionUSA1994
Lastly we create the relationships between the persons and the movies Since the relationship is amany to many relationship one actor can participate in many movies and one movie has many actorsin it We have this data in a separate file
Writing Clauses
212
Wersquoll index the id property on Person and Movie nodes The id property is a temporary property usedto look up the appropriate nodes for a relationship when importing the third file By indexing the idproperty node lookup (eg by MATCH) will be much faster Since we expect the ids to be unique in eachset wersquoll create a unique constraint This protects us from invalid data since constraint creation willfail if there are multiple nodes with the same id property Creating a unique constraint also creates aunique index (which is faster than a regular index)
CREATE CONSTRAINT ON (personPerson) ASSERT personid IS UNIQUE
CREATE CONSTRAINT ON (movieMovie) ASSERT movieid IS UNIQUE
Now importing the relationships is a matter of finding the nodes and then creating relationshipsbetween them
For this query wersquoll use USING PERIODIC COMMIT (see Section 129 ldquoUsing Periodic Commitrdquo [213]) whichis helpful for queries that operate on large CSV files This hint tells Neo4j that the query might build upinordinate amounts of transaction state and so needs to be periodically committed In this case wealso set the limit to 500 rows per commit
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM httpneo4jcomdocs230csvimportrolescsv AS csvLine
MATCH (personPerson id toInt(csvLinepersonId))(movieMovie id toInt(csvLinemovieId))
CREATE (person)-[PLAYED role csvLinerole ]-gt(movie)
rolescsv
personIdmovieIdrole
11Bud Fox
41Carl Fox
31Gordon Gekko
42AJ MacInerney
32President Andrew Shepherd
53Ellis Boyd Red Redding
Finally as the id property was only necessary to import the relationships we can drop the constraintsand the id property from all movie and person nodes
DROP CONSTRAINT ON (personPerson) ASSERT personid IS UNIQUE
DROP CONSTRAINT ON (movieMovie) ASSERT movieid IS UNIQUE
MATCH (n)
WHERE nPerson OR nMovie
REMOVE nid
Writing Clauses
213
129 Using Periodic CommitNoteSee Section 128 ldquoImporting CSV files with Cypherrdquo [211] on how to import data from CSVfiles
Importing large amounts of data using LOAD CSV with a single Cypher query may fail due to memoryconstraints This will manifest itself as an OutOfMemoryError
For this situation only Cypher provides the global USING PERIODIC COMMIT query hint for updatingqueries using LOAD CSV You can optionally set the limit for the number of rows per commit like so USINGPERIODIC COMMIT 500
PERIODIC COMMIT will process the rows until the number of rows reaches a limit Then the currenttransaction will be committed and replaced with a newly opened transaction If no limit is set a defaultvalue will be used
See the section called ldquoImporting large amounts of datardquo [184] in Section 116 ldquoLoad CSVrdquo [183] forexamples of USING PERIODIC COMMIT with and without setting the number of rows per commit
ImportantUsing periodic commit will prevent running out of memory when importing large amountsof data However it will also break transactional isolation and thus it should only be usedwhere needed
214
Chapter 13 Functions
This chapter contains information on all functions in Cypher Note that related information exists inSection 94 ldquoOperatorsrdquo [124]
NoteMost functions in Cypher will return NULL if an input parameter is NULL
Functions
215
131 PredicatesPredicates are boolean functions that return true or false for a given set of input They are mostcommonly used to filter out subgraphs in the WHERE part of a query
See also the section called ldquoComparison operatorsrdquo [124]
Figure 131 Graph
nam e = Danielage = 54eyes = brown
Spouse
array = [ one two three]nam e = Eskilage = 41eyes = blue
foo bar
nam e = Aliceage = 38eyes = brown
nam e = Charlieage = 53eyes = green
KNOWS
nam e = Bobage = 25eyes = blue
KNOWS
KNOWS KNOWS MARRIED
ALLTests whether a predicate holds for all element of this collection collection
Syntax ALL(identifier in collection WHERE predicate)
Arguments
bull collection An expression that returns a collectionbull identifier This is the identifier that can be used from the predicatebull predicate A predicate that is tested against all items in the collection
Query
MATCH p=(a)-[13]-gt(b)
WHERE aname=Alice AND bname=Daniel AND ALL (x IN nodes(p) WHERE xage gt 30)
RETURN p
All nodes in the returned paths will have an age property of at least 30
Resultp
[Node[2]nameAlice age38 eyesbrown KNOWS[1] Node[4]nameCharlie age53
eyesgreen KNOWS[3] Node[0]nameDaniel age54 eyesbrown]
1 row
ANYTests whether a predicate holds for at least one element in the collection
Syntax ANY(identifier in collection WHERE predicate)
Functions
216
Arguments
bull collection An expression that returns a collectionbull identifier This is the identifier that can be used from the predicatebull predicate A predicate that is tested against all items in the collection
Query
MATCH (a)
WHERE aname=Eskil AND ANY (x IN aarray WHERE x = one)
RETURN a
All nodes in the returned paths has at least one one value set in the array property named array
Resulta
Node[1]array[one two three] nameEskil age41 eyesblue
1 row
NONEReturns true if the predicate holds for no element in the collection
Syntax NONE(identifier in collection WHERE predicate)Arguments
bull collection An expression that returns a collectionbull identifier This is the identifier that can be used from the predicatebull predicate A predicate that is tested against all items in the collection
Query
MATCH p=(n)-[13]-gt(b)
WHERE nname=Alice AND NONE (x IN nodes(p) WHERE xage = 25)
RETURN p
No nodes in the returned paths has a age property set to 25
Resultp
[Node[2]nameAlice age38 eyesbrown KNOWS[1] Node[4]nameCharlie age53 eyesgreen]
[Node[2]nameAlice age38 eyesbrown KNOWS[1] Node[4]nameCharlie age53
eyesgreen KNOWS[3] Node[0]nameDaniel age54 eyesbrown]
2 rows
SINGLEReturns true if the predicate holds for exactly one of the elements in the collection
Syntax SINGLE(identifier in collection WHERE predicate)Arguments
bull collection An expression that returns a collectionbull identifier This is the identifier that can be used from the predicatebull predicate A predicate that is tested against all items in the collection
Query
MATCH p=(n)--gt(b)
Functions
217
WHERE nname=Alice AND SINGLE (var IN nodes(p) WHERE vareyes = blue)
RETURN p
Exactly one node in every returned path will have the eyes property set to blue
Resultp
[Node[2]nameAlice age38 eyesbrown KNOWS[0] Node[3]nameBob age25 eyesblue]
1 row
EXISTSReturns true if a match for the pattern exists in the graph or the property exists in the noderelationship or map
Syntax EXISTS( pattern-or-property )Arguments
bull pattern-or-property A pattern or a property (in the form identifierprop)
Query
MATCH (n)
WHERE EXISTS(nname)
RETURN nname AS name EXISTS((n)-[MARRIED]-gt()) AS is_married
This query returns all the nodes with a name property along with a boolean truefalse indicating if theyare married
Resultname is_married
Daniel false
Eskil false
Alice false
Bob true
Charlie false
5 rows
Functions
218
132 Scalar functionsScalar functions return a single value
ImportantThe LENGTH and SIZE functions are quite similar and so it is important to take note of thedifference Due to backwards compatibility LENGTH currently works on four types stringspaths collections and pattern expressions However for clarity it is recommended to onlyuse LENGTH on strings and paths and use the new SIZE function on collections and patternexpressions LENGTH on those types may be deprecated in future
Figure 132 Graph
nam e = Danielage = 54eyes = brown
Spouse
array = [ one two three]nam e = Eskilage = 41eyes = blue
foo bar
nam e = Aliceage = 38eyes = brown
nam e = Charlieage = 53eyes = green
KNOWS
nam e = Bobage = 25eyes = blue
KNOWS
KNOWS KNOWS MARRIED
SIZETo return or filter on the size of a collection use the SIZE() function
Syntax SIZE( collection )
Arguments
bull collection An expression that returns a collection
Query
RETURN size([Alice Bob]) AS col
The number of items in the collection is returned by the query
Resultcol
2
1 row
SIZE of pattern expressionThis is the same SIZE() method described before but instead of passing in a collection directly youprovide a pattern expression that can be used in a match query to provide a new set of results The sizeof the result is calculated not the length of the expression itself
Functions
219
Syntax SIZE( pattern expression )
Arguments
bull pattern expression A pattern expression that returns a collection
Query
MATCH (a)
WHERE aname=Alice
RETURN size((a)--gt()--gt()) AS fof
The number of sub-graphs matching the pattern expression is returned by the query
Resultfof
3
1 row
LENGTHTo return or filter on the length of a path use the LENGTH() function
Syntax LENGTH( path )
Arguments
bull path An expression that returns a path
Query
MATCH p=(a)--gt(b)--gt(c)
WHERE aname=Alice
RETURN length(p)
The length of the path p is returned by the query
Resultlength(p)
2
2
2
3 rows
LENGTH of stringTo return or filter on the length of a string use the LENGTH() function
Syntax LENGTH( string )
Arguments
bull string An expression that returns a string
Query
MATCH (a)
WHERE length(aname)gt 6
RETURN length(aname)
The length of the name Charlie is returned by the query
Functions
220
Resultlength(aname)
7
1 row
TYPEReturns a string representation of the relationship type
Syntax TYPE( relationship )
Arguments
bull relationship A relationship
Query
MATCH (n)-[r]-gt()
WHERE nname=Alice
RETURN type(r)
The relationship type of r is returned by the query
Resulttype(r)
KNOWS
KNOWS
2 rows
IDReturns the id of the relationship or node
Syntax ID( property-container )
Arguments
bull property-container A node or a relationship
Query
MATCH (a)
RETURN id(a)
This returns the node id for three nodes
Resultid(a)
0
1
2
3
4
5 rows
COALESCEReturns the first non-NULL value in the list of expressions passed to it In case all arguments are NULLNULL will be returned
Functions
221
Syntax COALESCE( expression [ expression] )
Arguments
bull expression The expression that might return NULL
Query
MATCH (a)
WHERE aname=Alice
RETURN coalesce(ahairColor aeyes)
Resultcoalesce(ahairColor aeyes)
brown
1 row
HEADHEAD returns the first element in a collection
Syntax HEAD( expression )
Arguments
bull expression This expression should return a collection of some kind
Query
MATCH (a)
WHERE aname=Eskil
RETURN aarray head(aarray)
The first node in the path is returned
Resultaarray head(aarray)
[one two three] one
1 row
LASTLAST returns the last element in a collection
Syntax LAST( expression )
Arguments
bull expression This expression should return a collection of some kind
Query
MATCH (a)
WHERE aname=Eskil
RETURN aarray last(aarray)
The last node in the path is returned
Resultaarray last(aarray)
[one two three] three
1 row
Functions
222
TIMESTAMPTIMESTAMP returns the difference measured in milliseconds between the current time and midnightJanuary 1 1970 UTC It will return the same value during the whole one query even if the query is along running one
Syntax TIMESTAMP()ArgumentsQuery
RETURN timestamp()
The time in milliseconds is returned
Resulttimestamp()
1445037145128
1 row
STARTNODESTARTNODE returns the starting node of a relationship
Syntax STARTNODE( relationship )Arguments
bull relationship An expression that returns a relationship
Query
MATCH (xfoo)-[r]-()
RETURN startNode(r)
ResultstartNode(r)
Node[2]nameAlice age38 eyesbrown
Node[2]nameAlice age38 eyesbrown
2 rows
ENDNODEENDNODE returns the end node of a relationship
Syntax ENDNODE( relationship )Arguments
bull relationship An expression that returns a relationship
Query
MATCH (xfoo)-[r]-()
RETURN endNode(r)
ResultendNode(r)
Node[4]nameCharlie age53 eyesgreen
Node[3]nameBob age25 eyesblue
2 rows
Functions
223
TOINTTOINT converts the argument to an integer A string is parsed as if it was an integer number If theparsing fails NULL will be returned A floating point number will be cast into an integer
Syntax TOINT( expression )Arguments
bull expression An expression that returns anything
Query
RETURN toInt(42) toInt(not a number)
ResulttoInt(42) toInt(not a number)
42 ltnullgt
1 row
TOFLOATTOFLOAT converts the argument to a float A string is parsed as if it was an floating point number If theparsing fails NULL will be returned An integer will be cast to a floating point number
Syntax TOFLOAT( expression )
Arguments
bull expression An expression that returns anything
Query
RETURN toFloat(115) toFloat(not a number)
ResulttoFloat(115) toFloat(not a number)
11 5 ltnullgt
1 row
Functions
224
133 Collection functionsCollection functions return collections of things mdash nodes in a path and so on
See also the section called ldquoCollection operatorsrdquo [124]
Figure 133 Graph
nam e = Danielage = 54eyes = brown
Spouse
array = [ one two three]nam e = Eskilage = 41eyes = blue
foo bar
nam e = Aliceage = 38eyes = brown
nam e = Charlieage = 53eyes = green
KNOWS
nam e = Bobage = 25eyes = blue
KNOWS
KNOWS KNOWS MARRIED
NODESReturns all nodes in a path
Syntax NODES( path )
Arguments
bull path A path
Query
MATCH p=(a)--gt(b)--gt(c)
WHERE aname=Alice AND cname=Eskil
RETURN nodes(p)
All the nodes in the path p are returned by the example query
Resultnodes(p)
[Node[2]nameAlice age38 eyesbrown Node[3]nameBob age25 eyesblue Node[1]array
[one two three] nameEskil age41 eyesblue]
1 row
RELATIONSHIPSReturns all relationships in a path
Syntax RELATIONSHIPS( path )
Arguments
bull path A path
Functions
225
Query
MATCH p=(a)--gt(b)--gt(c)
WHERE aname=Alice AND cname=Eskil
RETURN relationships(p)
All the relationships in the path p are returned
Resultrelationships(p)
[KNOWS[0] MARRIED[4]]
1 row
LABELSReturns a collection of string representations for the labels attached to a node
Syntax LABELS( node )Arguments
bull node Any expression that returns a single node
Query
MATCH (a)
WHERE aname=Alice
RETURN labels(a)
The labels of n is returned by the query
Resultlabels(a)
[foo bar]
1 row
KEYSReturns a collection of string representations for the property names of a node relationship or map
Syntax KEYS( property-container )Arguments
bull property-container A node a relationship or a literal map
Query
MATCH (a)
WHERE aname=Alice
RETURN keys(a)
The name of the properties of n is returned by the query
Resultkeys(a)
[name age eyes]
1 row
EXTRACTTo return a single property or the value of a function from a collection of nodes or relationships youcan use EXTRACT It will go through a collection run an expression on every element and return the
Functions
226
results in an collection with these values It works like the map method in functional languages such asLisp and Scala
Syntax EXTRACT( identifier in collection | expression )
Arguments
bull collection An expression that returns a collectionbull identifier The closure will have an identifier introduced in itrsquos context Here you decide which
identifier to usebull expression This expression will run once per value in the collection and produces the result
collection
Query
MATCH p=(a)--gt(b)--gt(c)
WHERE aname=Alice AND bname=Bob AND cname=Daniel
RETURN extract(n IN nodes(p)| nage) AS extracted
The age property of all nodes in the path are returned
Resultextracted
[38 25 54]
1 row
FILTERFILTER returns all the elements in a collection that comply to a predicate
Syntax FILTER(identifier in collection WHERE predicate)
Arguments
bull collection An expression that returns a collectionbull identifier This is the identifier that can be used from the predicatebull predicate A predicate that is tested against all items in the collection
Query
MATCH (a)
WHERE aname=Eskil
RETURN aarray filter(x IN aarray WHERE size(x)= 3)
This returns the property named array and a list of values in it which have size 3
Resultaarray filter(x in aarray WHERE size(x) = 3)
[one two three] [one two]
1 row
TAILTAIL returns all but the first element in a collection
Syntax TAIL( expression )
Arguments
bull expression This expression should return a collection of some kind
Query
Functions
227
MATCH (a)
WHERE aname=Eskil
RETURN aarray tail(aarray)
This returns the property named array and all elements of that property except the first one
Resultaarray tail(aarray)
[one two three] [two three]
1 row
RANGEReturns numerical values in a range with a non-zero step value step Range is inclusive in both ends
Syntax RANGE( start end [ step] )
Arguments
bull start A numerical expressionbull end A numerical expressionbull step A numerical expression
Query
RETURN range(010) range(2183)
Two lists of numbers are returned
Resultrange(010) range(2183)
[0 1 2 3 4 5 6 7 8 9 10] [2 5 8 11 14 17]
1 row
REDUCETo run an expression against individual elements of a collection and store the result of the expressionin an accumulator you can use REDUCE It will go through a collection run an expression on everyelement storing the partial result in the accumulator It works like the fold or reduce method infunctional languages such as Lisp and Scala
Syntax REDUCE( accumulator = initial identifier in collection | expression )
Arguments
bull accumulator An identifier that will hold the result and the partial results as the collection is iteratedbull initial An expression that runs once to give a starting value to the accumulatorbull collection An expression that returns a collectionbull identifier The closure will have an identifier introduced in itrsquos context Here you decide which
identifier to usebull expression This expression will run once per value in the collection and produces the result value
Query
MATCH p=(a)--gt(b)--gt(c)
WHERE aname=Alice AND bname=Bob AND cname=Daniel
RETURN reduce(totalAge = 0 n IN nodes(p)| totalAge + nage) AS reduction
The age property of all nodes in the path are summed and returned as a single value
Functions
228
Resultreduction
117
1 row
Functions
229
134 Mathematical functionsThese functions all operate on numerical expressions only and will return an error if used on any othervalues
See also the section called ldquoMathematical operatorsrdquo [124]
Figure 134 Graph
nam e = Danielage = 54eyes = brown
Spouse
array = [ one two three]nam e = Eskilage = 41eyes = blue
foo bar
nam e = Aliceage = 38eyes = brown
nam e = Charlieage = 53eyes = green
KNOWS
nam e = Bobage = 25eyes = blue
KNOWS
KNOWS KNOWS MARRIED
ABSABS returns the absolute value of a number
Syntax ABS( expression )
Arguments
bull expression A numeric expression
Query
MATCH (a)(e)
WHERE aname = Alice AND ename = Eskil
RETURN aage eage abs(aage - eage)
The absolute value of the age difference is returned
Resultaage eage abs(aage - eage)
38 41 3 0
1 row
ACOSACOS returns the arccosine of the expression in radians
Syntax ACOS( expression )
Arguments
bull expression A numeric expression
Functions
230
Query
RETURN acos(05)
The arccosine of 05
Resultacos(05)
1 0471975511965979
1 row
ASINASIN returns the arcsine of the expression in radians
Syntax ASIN( expression )
Arguments
bull expression A numeric expression
Query
RETURN asin(05)
The arcsine of 05
Resultasin(05)
0 5235987755982989
1 row
ATANATAN returns the arctangent of the expression in radians
Syntax ATAN( expression )
Arguments
bull expression A numeric expression
Query
RETURN atan(05)
The arctangent of 05
Resultatan(05)
0 4636476090008061
1 row
ATAN2ATAN2 returns the arctangent2 of a set of coordinates in radians
Syntax ATAN2( expression expression)
Arguments
bull expression A numeric expression for y
Functions
231
bull expression A numeric expression for x
Query
RETURN atan2(05 06)
The arctangent2 of 05 06
Resultatan2(05 06)
0 6947382761967033
1 row
CEILCEIL returns the smallest integer greater than or equal to the number
Syntax CEIL( expression )
Arguments
bull expression A numeric expression
Query
RETURN ceil(01)
The ceil of 01
Resultceil(01)
1 0
1 row
COSCOS returns the cosine of the expression
Syntax COS( expression )
Arguments
bull expression A numeric expression
Query
RETURN cos(05)
The cosine of 05 is returned
Resultcos(05)
0 8775825618903728
1 row
COTCOT returns the cotangent of the expression
Syntax COT( expression )
Arguments
Functions
232
bull expression A numeric expression
Query
RETURN cot(05)
The cotangent of 05 is returned
Resultcot(05)
1 830487721712452
1 row
DEGREESDEGREES converts radians to degrees
Syntax DEGREES( expression )Arguments
bull expression A numeric expression
Query
RETURN degrees(314159)
The number of degrees in something close to pi
Resultdegrees(314159)
179 99984796050427
1 row
EE returns the constant e
Syntax E()ArgumentsQuery
RETURN e()
The constant e is returned (the base of natural log)
Resulte()
2 718281828459045
1 row
EXPEXP returns the value e raised to the power of the expression
Syntax EXP( expression )Arguments
bull expression A numeric expression
Query
Functions
233
RETURN exp(2)
The exp of 2 is returned e2
Resultexp(2)
7 38905609893065
1 row
FLOORFLOOR returns the greatest integer less than or equal to the expression
Syntax FLOOR( expression )
Arguments
bull expression A numeric expression
Query
RETURN floor(09)
The floor of 09 is returned
Resultfloor(09)
0 0
1 row
HAVERSINHAVERSIN returns half the versine of the expression
Syntax HAVERSIN( expression )
Arguments
bull expression A numeric expression
Query
RETURN haversin(05)
The haversine of 05 is returned
Resulthaversin(05)
0 06120871905481362
1 row
Spherical distance using the haversin functionThe haversin function may be used to compute the distance on the surface of a sphere between twopoints (each given by their latitude and longitude) In this example the spherical distance (in km)between Berlin in Germany (at lat 525 lon 134) and San Mateo in California (at lat 375 lon -1223) iscalculated using an average earth radius of 6371 km
Query
CREATE (berCity lat 525 lon 134 )(smCity lat 375 lon -1223 )
RETURN 2 6371 asin(sqrt(haversin(radians(smlat - berlat))+ cos(radians(smlat))
Functions
234
cos(radians(berlat)) haversin(radians(smlon - berlon)))) AS dist
The distance between Berlin and San Mateo is returned (about 9129 km)
Resultdist
9129 969740051658
1 rowNodes created 2Properties set 4Labels added 2
LOGLOG returns the natural logarithm of the expression
Syntax LOG( expression )
Arguments
bull expression A numeric expression
Query
RETURN log(27)
The log of 27 is returned
Resultlog(27)
3 295836866004329
1 row
LOG10LOG10 returns the base 10 logarithm of the expression
Syntax LOG10( expression )
Arguments
bull expression A numeric expression
Query
RETURN log10(27)
The log10 of 27 is returned
Resultlog10(27)
1 4313637641589874
1 row
PIPI returns the mathematical constant pi
Syntax PI()
Arguments
Query
Functions
235
RETURN pi()
The constant pi is returned
Resultpi()
3 141592653589793
1 row
RADIANSRADIANS converts degrees to radians
Syntax RADIANS( expression )
Arguments
bull expression A numeric expression
Query
RETURN radians(180)
The number of radians in 180 is returned (pi)
Resultradians(180)
3 141592653589793
1 row
RANDRAND returns a random double between 0 and 10
Syntax RAND( expression )
Arguments
bull expression A numeric expression
Query
RETURN rand() AS x1
A random number is returned
Resultx1
0 10866206023708813
1 row
ROUNDROUND returns the numerical expression rounded to the nearest integer
Syntax ROUND( expression )
Arguments
bull expression A numerical expression
Query
Functions
236
RETURN round(3141592)
Resultround(3141592)
3 0
1 row
SIGNSIGN returns the signum of a number mdash zero if the expression is zero -1 for any negative number and 1for any positive number
Syntax SIGN( expression )
Arguments
bull expression A numerical expression
Query
RETURN sign(-17) sign(01)
Resultsign(-17) sign(01)
-1 0 1 0
1 row
SINSIN returns the sine of the expression
Syntax SIN( expression )
Arguments
bull expression A numeric expression
Query
RETURN sin(05)
The sine of 05 is returned
Resultsin(05)
0 479425538604203
1 row
SQRTSQRT returns the square root of a number
Syntax SQRT( expression )
Arguments
bull expression A numerical expression
Query
RETURN sqrt(256)
Functions
237
Resultsqrt(256)
16 0
1 row
TANTAN returns the tangent of the expression
Syntax TAN( expression )
Arguments
bull expression A numeric expression
Query
RETURN tan(05)
The tangent of 05 is returned
Resulttan(05)
0 5463024898437905
1 row
Functions
238
135 String functionsThese functions all operate on string expressions only and will return an error if used on any othervalues The exception to this rule is TOSTRING() which also accepts numbers
See also the section called ldquoString operatorsrdquo [124]
Figure 135 Graph
nam e = Danielage = 54eyes = brown
Spouse
array = [ one two three]nam e = Eskilage = 41eyes = blue
foo bar
nam e = Aliceage = 38eyes = brown
nam e = Charlieage = 53eyes = green
KNOWS
nam e = Bobage = 25eyes = blue
KNOWS
KNOWS KNOWS MARRIED
STRSTR returns a string representation of the expression If the expression returns a string the result willbewrapped in quotation marks
Syntax STR( expression )
Arguments
bull expression An expression that returns anything
Query
RETURN str(1) str(hello)
Resultstr(1) str(hello)
1 hello
1 row
NoteThe STR() function is deprecated from Neo4j version 23 and onwards This means it may beremoved in a future Neo4j major release
REPLACEREPLACE returns a string with the search string replaced by the replace string It replaces all occurrences
Syntax REPLACE( original search replace )
Arguments
Functions
239
bull original An expression that returns a stringbull search An expression that returns a string to search forbull replace An expression that returns the string to replace the search string with
Query
RETURN replace(hello l w)
Resultreplace(hello l w)
hewwo
1 row
SUBSTRINGSUBSTRING returns a substring of the original with a 0-based index start and length If length is omittedit returns a substring from start until the end of the string
Syntax SUBSTRING( original start [ length] )
Arguments
bull original An expression that returns a stringbull start An expression that returns a positive numberbull length An expression that returns a positive number
Query
RETURN substring(hello 1 3) substring(hello 2)
Resultsubstring(hello 1 3) substring(hello 2)
ell llo
1 row
LEFTLEFT returns a string containing the left n characters of the original string
Syntax LEFT( original length )
Arguments
bull original An expression that returns a stringbull n An expression that returns a positive number
Query
RETURN left(hello 3)
Resultleft(hello 3)
hel
1 row
RIGHTRIGHT returns a string containing the right n characters of the original string
Syntax RIGHT( original length )
Functions
240
Arguments
bull original An expression that returns a stringbull n An expression that returns a positive number
Query
RETURN right(hello 3)
Resultright(hello 3)
llo
1 row
LTRIMLTRIM returns the original string with whitespace removed from the left side
Syntax LTRIM( original )
Arguments
bull original An expression that returns a string
Query
RETURN ltrim( hello)
Resultltrim( hello)
hello
1 row
RTRIMRTRIM returns the original string with whitespace removed from the right side
Syntax RTRIM( original )
Arguments
bull original An expression that returns a string
Query
RETURN rtrim(hello )
Resultrtrim(hello )
hello
1 row
TRIMTRIM returns the original string with whitespace removed from both sides
Syntax TRIM( original )
Arguments
bull original An expression that returns a string
Functions
241
Query
RETURN trim( hello )
Resulttrim( hello )
hello
1 row
LOWERLOWER returns the original string in lowercase
Syntax LOWER( original )
Arguments
bull original An expression that returns a string
Query
RETURN lower(HELLO)
Resultlower(HELLO)
hello
1 row
UPPERUPPER returns the original string in uppercase
Syntax UPPER( original )
Arguments
bull original An expression that returns a string
Query
RETURN upper(hello)
Resultupper(hello)
HELLO
1 row
SPLITSPLIT returns the sequence of strings witch are delimited by split patterns
Syntax SPLIT( original splitPattern )
Arguments
bull original An expression that returns a stringbull splitPattern The string to split the original string with
Query
RETURN split(onetwo )
Functions
242
Resultsplit(onetwo )
[one two]
1 row
REVERSEREVERSE returns the original string reversed
Syntax REVERSE( original )
Arguments
bull original An expression that returns a string
Query
RETURN reverse(anagram)
Resultreverse(anagram)
margana
1 row
TOSTRINGTOSTRING converts the argument to a string It converts integral and floating point numbers to stringsand if called with a string will leave it unchanged
Syntax TOSTRING( expression )
Arguments
bull expression An expression that returns a number or a string
Query
RETURN toString(115) toString(already a string)
ResulttoString(115) toString(already a string)
11 5 already a string
1 row
243
Chapter 14 Schema
Neo4j 20 introduced an optional schema for the graph based around the concept of labels Labels areused in the specification of indexes and for defining constraints on the graph Together indexes andconstraints are the schema of the graph Cypher includes data definition language (DDL) statements formanipulating the schema