optimizing cypher queries in neo4j

Optimizing Cypher Queries in Neo4j

Wes Freeman (@wefreema)

Mark Needham (@markhneedham)

Today's schedule

• Brief overview of cypher syntax

• Graph global vs Graph local queries

• Labels and indexes

• Optimization patterns

• Profiling cypher queries

• Applying optimization patterns

Cypher Syntax

• Statement partso Optional: Querying part (MATCH|WHERE)o Optional: Updating part (CREATE|MERGE)o Optional: Returning part (WITH|RETURN)

• Parts can be chained together

Cypher Syntax - Refresher

MATCH (n:Label)-[r:LINKED]->(m)WHERE n.prop = "..."RETURN n, r, m

Starting points

• Graph scan (global; potentially slow)

• Label scan (usually reserved for aggregation queries; not ideal)

• Label property index lookup (local; good!)

Introducing the football dataset

The 1.9 global scanO(n)

n = # of nodes

START pl = node(*) MATCH (pl)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats

150ms w/ 30k nodes, 120k rels

The 2.0 global scan

MATCH (pl)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats

130ms w/ 30k nodes, 120k rels

O(n)n = # of nodes

Why is it a global scan?

• Cypher is a pattern matching language

• It doesn't discriminate unless you tell it too It must try to start at all nodes to find this

pattern, as specified

Introduce a label

Label your starting points

CREATE (player:Player {name: "Wayne Rooney"} )

O(k)k = # of nodes with that labelLabel scan

MATCH (pl:Player)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats

80ms w/ 30k nodes, 120k rels (~900 :Player nodes)

Indexes don't come for free

CREATE INDEX ON :Player(name)

CREATE CONSTRAINT ON pl:PlayerASSERT pl.name IS UNIQUE

O(log k)k = # of nodes with that labelIndex lookup

MATCH (pl:Player)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats

6ms w/ 30k nodes, 120k rels (~900 :Player nodes)

Optimization Patterns

• Avoid cartesian products

• Avoid patterns in the WHERE clause

• Start MATCH patterns at the lowest cardinality and expand outward

• Separate MATCH patterns with minimal expansion at each stage

Introducing the movie data set

Anti-pattern: Cartesian Products

MATCH (m:Movie), (p:Person)

Subtle Cartesian Products

MATCH (p:Person)-[:KNOWS]->(c)WHERE p.name="Tom Hanks"WITH cMATCH (k:Keyword)RETURN c, k

Counting Cartesian Products

MATCH (pl:Player),(t:Team),(g:Game)RETURN COUNT(DISTINCT pl), COUNT(DISTINCT t), COUNT(DISTINCT g)

80000 ms w/ ~900 players, ~40 teams, ~1200 games

MATCH (pl:Player)WITH COUNT(pl) as playersMATCH (t:Team)WITH COUNT(t) as teams, playersMATCH (g:Game)RETURN COUNT(g) as games, teams, players8ms w/ ~900 players, ~40 teams, ~1200 games

Better Counting

Directions on patterns

MATCH (p:Person)-[:ACTED_IN]-(m)WHERE p.name = "Tom Hanks"RETURN m

Parameterize your queries

MATCH (p:Person)-[:ACTED_IN]-(m)WHERE p.name = {name}RETURN m

Fast predicates first

Bad:MATCH (t:Team)-[:played_in]->(g)WHERE NOT (t)-[:home_team]->(g) AND g.away_goals > g.home_goals RETURN t, COUNT(g)

Better:MATCH (t:Team)-[:played_in]->(g)WHERE g.away_goals > g.home_goals AND NOT (t)-[:home_team]->()RETURN t, COUNT(g)

Fast predicates first

Patterns in WHERE clauses

• Keep them in the MATCH

• The only pattern that needs to be in a WHERE clause is a NOT

MERGE and CONSTRAINTs

• MERGE is MATCH or CREATE

• MERGE can take advantage of unique constraints and indexes

MERGE (without index)MERGE (g:Game

{date:1290257100,

time: 1245,

home_goals: 2,

away_goals: 3,

match_id: 292846,

attendance: 60102})

RETURN g

188 ms w/ ~400 games

Adding an index

CREATE INDEX ON :Game(match_id)

MERGE (with index)MERGE (g:Game

{date:1290257100,

time: 1245,

home_goals: 2,

away_goals: 3,

match_id: 292846,

attendance: 60102})

RETURN g

6 ms w/ ~400 games

Alternative MERGE approachMERGE (g:Game { match_id: 292846 })ON CREATESET g.date = 1290257100

SET g.time = 1245SET g.home_goals = 2SET g.away_goals = 3SET g.attendance = 60102RETURN g

Profiling queries

• Use the PROFILE keyword in front of the queryo from webadmin or shell - won't work in

browser

• Look for db_hits and rows

• Ignore everything else (for now!)

Reviewing the football dataset

Football OptimizationMATCH (game)<-[:contains_match]-(season:Season),

(team)<-[:away_team]-(game),

(stats)-[:in]->(game),

(team)<-[:for]-(stats)<-[:played]-(player)

WHERE season.name = "2012-2013"

RETURN player.name,

COLLECT(DISTINCT team.name),

SUM(stats.goals) as goals

ORDER BY goals DESC

LIMIT 103137 ms w/ ~900 players, ~20 teams, ~400 games

Football Optimization==> ColumnFilter(symKeys=["player.name", " INTERNAL_AGGREGATEe91b055b-a943-4ddd-9fe8-e746407c504a", "

INTERNAL_AGGREGATE240cfcd2-24d9-48a2-8ca9-fb0286f3d323"], returnItemNames=["player.name", "COLLECT(DISTINCT team.name)", "goals"], _rows=10, _db_hits=0)

==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE240cfcd2-24d9-48a2-8ca9-fb0286f3d323 of type Number),false)"], limit="Literal(10)", _rows=10, _db_hits=0)

==> EagerAggregation(keys=["Cached(player.name of type Any)"], aggregates=["( INTERNAL_AGGREGATEe91b055b-a943-4ddd-9fe8-e746407c504a,Distinct(Collect(Property(team,name(0))),Property(team,name(0))))", "( INTERNAL_AGGREGATE240cfcd2-24d9-48a2-8ca9-fb0286f3d323,Sum(Property(stats,goals(13))))"], _rows=503, _db_hits=10899)

==> Extract(symKeys=["stats", " UNNAMED12", " UNNAMED108", "season", " UNNAMED55", "player", "team", " UNNAMED124", " UNNAMED85", "game"], exprKeys=["player.name"], _rows=5192, _db_hits=5192)

==> PatternMatch(g="(player)-[' UNNAMED124']-(stats)", _rows=5192, _db_hits=0)

==> Filter(pred="Property(season,name(0)) == Literal(2012-2013)", _rows=5192, _db_hits=15542)

==> TraversalMatcher(trail="(season)-[ UNNAMED12:contains_match WHERE true AND true]->(game)<-[ UNNAMED85:in WHERE true AND true]-(stats)-[ UNNAMED108:for WHERE true AND true]->(team)<-

[ UNNAMED55:away_team WHERE true AND true]-(game)", _rows=15542, _db_hits=1620462)

Break out the match statements

MATCH (game)<-[:contains_match]-(season:Season)

MATCH (team)<-[:away_team]-(game)

MATCH (stats)-[:in]->(game)

MATCH (team)<-[:for]-(stats)<-[:played]-(player)

RETURN player.name,

ORDER BY goals DESCLIMIT 10200 ms w/ ~900 players, ~20 teams, ~400 games

Start small

• Smallest cardinality label first

• Smallest intermediate result set first

Exploring cardinalitiesMATCH (game)<-[:contains_match]-(season:Season)

RETURN COUNT(DISTINCT game), COUNT(DISTINCT season)

1140 games, 3 seasons

MATCH (team)<-[:away_team]-(game:Game)

RETURN COUNT(DISTINCT team), COUNT(DISTINCT game)

25 teams, 1140 games

Exploring cardinalitiesMATCH (stats)-[:in]->(game:Game)

RETURN COUNT(DISTINCT stats), COUNT(DISTINCT game)

31117 stats, 1140 games

MATCH (stats)<-[:played]-(player:Player)

RETURN COUNT(DISTINCT stats), COUNT(DISTINCT player)

31117 stats, 880 players

Look for teams firstMATCH (team)<-[:away_team]-(game:Game)MATCH (game)<-[:contains_match]-(season)

RETURN player.name,

ORDER BY goals DESC

==> ColumnFilter(symKeys=["player.name", " INTERNAL_AGGREGATEbb08f36b-a70d-46b3-9297-b0c7ec85c969", " INTERNAL_AGGREGATE199af213-e3bd-400f-aba9-8ca2a9e153c5"], returnItemNames=["player.name", "COLLECT(DISTINCT team.name)", "goals"], _rows=10, _db_hits=0)

==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE199af213-e3bd-400f-aba9-8ca2a9e153c5 of type Number),false)"], limit="Literal(10)", _rows=10, _db_hits=0)

==> EagerAggregation(keys=["Cached(player.name of type Any)"], aggregates=["( INTERNAL_AGGREGATEbb08f36b-a70d-46b3-9297-b0c7ec85c969,Distinct(Collect(Property(team,name(0))),Property(team,name(0))))", "( INTERNAL_AGGREGATE199af213-e3bd-400f-aba9-8ca2a9e153c5,Sum(Property(stats,goals(13))))"], _rows=503, _db_hits=10899)

==> Extract(symKeys=["stats", " UNNAMED12", " UNNAMED168", "season", " UNNAMED125", "player", "team", " UNNAMED152", " UNNAMED51", "game"], exprKeys=["player.name"], _rows=5192, _db_hits=5192)

==> PatternMatch(g="(stats)-[' UNNAMED152']-(team),(player)-[' UNNAMED168']-(stats)", _rows=5192, _db_hits=0)

==> PatternMatch(g="(stats)-[' UNNAMED125']-(game)", _rows=10394, _db_hits=0)

==> Filter(pred="Property(season,name(0)) == Literal(2012-2013)", _rows=380, _db_hits=380)

==> PatternMatch(g="(season)-[' UNNAMED51']-(game)", _rows=380, _db_hits=1140)

==> TraversalMatcher(trail="(game)-[ UNNAMED12:away_team WHERE true AND true]->(team)", _rows=1140,

_db_hits=1140)

Look for teams first

Filter games a bit earlierMATCH (game)<-[:contains_match]-(season:Season)

WHERE season.name = "2012-2013"MATCH (team)<-[:away_team]-(game)

RETURN player.name,

ORDER BY goals DESC

Filter out stats with no goalsMATCH (game)<-[:contains_match]-(season:Season)

WHERE season.name = "2012-2013"MATCH (team)<-[:away_team]-(game)

MATCH (stats)-[:in]->(game)WHERE stats.goals > 0MATCH (team)<-[:for]-(stats)<-[:played]-(player)RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goalsORDER BY goals DESCLIMIT 10

59 ms w/ ~900 players, ~20 teams, ~400 games

Movie query optimizationMATCH (movie:Movie {title: {title} })

MATCH (genre)<-[:HAS_GENRE]-(movie)MATCH (director)-[:DIRECTED]->(movie)MATCH (actor)-[:ACTED_IN]->(movie)MATCH (writer)-[:WRITER_OF]->(movie)MATCH (actor)-[:ACTED_IN]->(actormovies)MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writersORDER BY weight DESC, actormoviesweight DESCWITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writersMATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writersORDER BY keyword_weightRETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers

Movie query optimizationMATCH (movie:Movie {title: 'The Matrix' })

Movie query optimizationMATCH (movie:Movie {title: 'The Matrix' })MATCH (genre)<-[:HAS_GENRE]-(movie)MATCH (director)-[:DIRECTED]->(movie)MATCH (actor)-[:ACTED_IN]->(movie)MATCH (writer)-[:WRITER_OF]->(movie)MATCH (actor)-[:ACTED_IN]->(actormovies)MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writersORDER BY weight DESC, actormoviesweight DESCWITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writersMATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writersORDER BY keyword_weightRETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers

MATCH (genre)<-[:HAS_GENRE]-(movie)

MATCH (director)-[:DIRECTED]->(movie)MATCH (actor)-[:ACTED_IN]->(movie)MATCH (writer)-[:WRITER_OF]->(movie)MATCH (actor)-[:ACTED_IN]->(actormovies)MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writersORDER BY weight DESC, actormoviesweight DESCWITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writersMATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writersORDER BY keyword_weightRETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers

MATCH (genre)<-[:HAS_GENRE]-(movie)MATCH (director)-[:DIRECTED]->(movie)

MATCH (actor)-[:ACTED_IN]->(movie)MATCH (writer)-[:WRITER_OF]->(movie)MATCH (actor)-[:ACTED_IN]->(actormovies)MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writersORDER BY weight DESC, actormoviesweight DESCWITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writersMATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writersORDER BY keyword_weightRETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers

MATCH (genre)<-[:HAS_GENRE]-(movie)MATCH (director)-[:DIRECTED]->(movie)MATCH (actor)-[:ACTED_IN]->(movie)

MATCH (writer)-[:WRITER_OF]->(movie)MATCH (actor)-[:ACTED_IN]->(actormovies)MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writersORDER BY weight DESC, actormoviesweight DESCWITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writersMATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writersORDER BY keyword_weightRETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers

MATCH (genre)<-[:HAS_GENRE]-(movie)MATCH (director)-[:DIRECTED]->(movie)MATCH (actor)-[:ACTED_IN]->(movie)MATCH (writer)-[:WRITER_OF]->(movie)

MATCH (actor)-[:ACTED_IN]->(actormovies)MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writersORDER BY weight DESC, actormoviesweight DESCWITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writersMATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writersORDER BY keyword_weightRETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers

MATCH (genre)<-[:HAS_GENRE]-(movie)MATCH (director)-[:DIRECTED]->(movie)MATCH (actor)-[:ACTED_IN]->(movie)MATCH (writer)-[:WRITER_OF]->(movie)MATCH (actor)-[:ACTED_IN]->(actormovies)MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writersORDER BY weight DESC, actormoviesweight DESCWITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as

related, genres, directors, writersMATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writersORDER BY keyword_weightRETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers

Movie query optimizationMATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)

WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight // 1 row per actorORDER BY actormoviesweight DESCWITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre)WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie)WITH movie, actors, genres, collect(director.name) as directors // 1 rowMATCH (writer)-[:WRITER_OF]->(movie)WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movieORDER BY keywords DESCWITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 rowMATCH (movie)-[:HAS_KEYWORD]->(keyword)RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers

10x faster

WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight // 1 row per actorORDER BY actormoviesweight DESCWITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre)WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie)WITH movie, actors, genres, collect(director.name) as directors // 1 rowMATCH (writer)-[:WRITER_OF]->(movie)WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movieORDER BY keywords DESCWITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 rowMATCH (movie)-[:HAS_KEYWORD]->(keyword)RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers

10x faster

Movie query optimizationMATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)WITH movie, actor, length((actor)-

[:ACTED_IN]->()) as actormoviesweightORDER BY actormoviesweight DESC // 1 row per actorWITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre)WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie)WITH movie, actors, genres, collect(director.name) as directors // 1 rowMATCH (writer)-[:WRITER_OF]->(movie)WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movieORDER BY keywords DESCWITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 rowMATCH (movie)-[:HAS_KEYWORD]->(keyword)RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers

10x faster

WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweightORDER BY actormoviesweight DESC // 1 row per actorWITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre)WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie)WITH movie, actors, genres, collect(director.name) as directors // 1 rowMATCH (writer)-[:WRITER_OF]->(movie)WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movieORDER BY keywords DESCWITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 rowMATCH (movie)-[:HAS_KEYWORD]->(keyword)RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers

10x faster

WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweightORDER BY actormoviesweight DESC // 1 row per actorWITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre)WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie)WITH movie, actors, genres, collect(director.name) as directors // 1 rowMATCH (writer)-[:WRITER_OF]->(movie)WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movieORDER BY keywords DESCWITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 rowMATCH (movie)-[:HAS_KEYWORD]->(keyword)RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers // 1 row

10x faster

Design for Queryability

Making the implicit explicit

• When you have implicit relationships in the graph you can sometimes get better query performance by modeling the relationship explicitly

Making the implicit explicit

Refactor property to node

Bad:MATCH (g:Game)WHERE g.date > 1343779200 AND g.date < 1369094400RETURN g

Good:MATCH (s:Season)-[:contains]->(g)WHERE season.name = "2012-2013"RETURN g

Refactor property to node

Conclusion

• Avoid the global scan

• Add indexes / unique constraints

• Split up MATCH statements

• Measure, measure, measure, tweak, repeat

• Soon Cypher will do a lot of this for you!

Bonus tip

• Use transactions/transactional cypher endpoint

• If you have them send them in

optimizing cypher queries in neo4j

players match g

c match

return g

patterns match p

players match t

queries match p

goals g

actormovies match movie

Technology

optimizing cypher queries in neo4j

cypher endeavor stent

leanxcale’s disruptive technology v17...databases....

introducing a graph database: neo4j - doag.org ·...

understanding graph databases with neo4j and cypher

stefan armbruster bed-con 2016 databases and the... ·...

incremental graph queries for cypher€¦ · batch vs....

learning cypher - packt publishing | technology books ......

powerful and efficient bulk shortest-path queries: cypher

graph databases - viski · chapter 3 - data modeling with...

query languages for unrestricted graph data · • cypher...

graph databasesrdbms and graph database rdbms graph database...

jugmarche: neo4j 2 (cypher)

an overview of neo4j -...

sarvesh nagarajan. what is neo4j? graph databases cypher ...

the cypher

an overview of neo4j -...

facebook graph search with cypher and neo4j

the neo4j rest api documentation v3 3.0...

the ingraph project - amazon s3 · the ingraph project and...