neo4j: graph-like power

69

Upload: roman-rodomanskyy

Post on 26-Jan-2015

120 views

Category:

Technology


2 download

DESCRIPTION

Graph Databases in NoSQL world. Neo4j and Cypher.

TRANSCRIPT

Page 1: Neo4j: Graph-like power
Page 2: Neo4j: Graph-like power

Graph-like power

Roman R.

MATCH (a:Actor),(m:Movie)WHERE a.name ='Keanu Reeves' AND m.title='The Matrix'CREATE (actor)-[:ACTS_IN]->(movie)

Page 3: Neo4j: Graph-like power

Today○ Graphs in NoSQL world

○ classification○ definition○ components

○ Neo4j○ nodes, rels, props, indexes○ Cypher

○ PHP and Neo4j○ Demo○ Alternatives○ Q/A

1

Page 4: Neo4j: Graph-like power

NoSQL Databases

Key-Value

Document

Graph

Column (BigTable

)

MemcacheDB

Redis

Riak

Cassandra

CouchDB

Neo4j

TITAN

HBase/Hadoop

OrientDB

2

Elasticsearch

RavenDB

Tokyo Cabinet

Infinite GraphAllegroGraph

NoSQL

MongoDB

Page 5: Neo4j: Graph-like power

What is a Graph in math

3

● represent a connected set of objects

● graph:○ vertex (node/points)○ edge (arc/line/relationship/arrow) - undirected○ attribute (property) - on node/relationship

● types:○ pair: G = (V, E)○ digraph: D = (V, A)○ mixed: G = (V, E, A)

V = {1, 2, 3, 4, 5, 6}

E = {{1, 2}, {1, 5}, {2, 3}, {2, 5}, {3, 4}, {4, 5}, {4, 6}}

Page 6: Neo4j: Graph-like power

What is a Graph database

4

● stores data in a graph and retrieving vast networks of data● shines when storing richly-connected data

● consists of nodes, connected by relationships○ A Graph —records data in→ Nodes —which have→ Properties○ Nodes —are organized by→ Rels —which also have→ Properties○ Nodes —are grouped by→ Labels —into→ Sets○ A Traversal —navigates→ a Graph

it —identifies→ Paths —which order→ Nodes○ An Index —maps from→ Properties —to either→ Nodes or Rels○ A Graph Database —manages a→ Graph and

—also manages related→ Indexes

Page 7: Neo4j: Graph-like power

Nodes, Rels, Props, Labels

5

A Graph—records data in→ Nodes—which have→ Properties

Nodes—are organized by→ Relationships—which also have→ Properties

Nodes—are grouped by→ Labels—into→ Sets

Page 8: Neo4j: Graph-like power

Graph Traversal

6

A Traversal—navigates→ a Graph

it—identifies→ Paths—which order→ Nodes

what music

do my friends like

that I don’t yet own

if this power supply goes down,

what web services

are affected?

Page 9: Neo4j: Graph-like power

Graph Index

7

An Index—maps from→ Properties—to either→ Nodes or Rels

find the Account

for username master-of-graphs

Page 10: Neo4j: Graph-like power

Graph

8

A Graph Database—manages a→ Graph and—also manages related→ Indexes

Page 11: Neo4j: Graph-like power

How looks Graph database

9

Page 12: Neo4j: Graph-like power

A Graph Database transforms a RDBMS

10

Page 13: Neo4j: Graph-like power

A Graph Database elaborates a Key-Value Store

11

K* = keyV* = value

Page 14: Neo4j: Graph-like power

A Graph Database relates Column-Family

12

● BigTable databases are an evolution of key-value,using "families" to allow grouping of rows

● stored in a graph, the families could become hierarchical, and the relationships among data becomes explicit

Page 15: Neo4j: Graph-like power

A Graph Database navigates a Document Store

13

D=Document,S=Subdocument,V=Value,D2/S2 = reference

Page 16: Neo4j: Graph-like power

NoSQL Data Models

14

90% of all use cases

Relational Databases

Page 17: Neo4j: Graph-like power

15

Page 18: Neo4j: Graph-like power

● intuitive, using a graph model for data representation● reliable, fully transactional, upholds ACID● durable and fast, using a custom disk-based, native storage engine● massively scalable, up to several billion nodes/relationships/properties● highly-available, when distributed across multiple machines● expressive, with a powerful, human readable declarative graph query

language● fast, with a powerful traversal framework for high-speed graph queries● embeddable, with a few small jars● simple, accesible by a convenient REST API interface or an object-

oriented JAVA API● indexes are based on Apache Lucene, supports Secondary Indexes● has been in commercial development for 10 years and in production for

over 7 years; since 2003;● Cross-platform; Simple set-up; Well documented; Open source;● GPL for Community, AGPL for Enterprise

16

Neo4j features

Page 19: Neo4j: Graph-like power

● CPU - Intel Core i3/i7● Memory - 2GB .. 16/32GB● Disk - 10GB SATA .. SSD w/ SATA● Filesystem - ext4 .. ext4/ZFS● Software - Oracle JAVA 7

17

Neo4j requirements

Page 20: Neo4j: Graph-like power

● Neo4j Community○ Open-Source High Performance○ fully ACID transactional graph database

● Neo4j Enterprise○ High-Performance Cache (up to 10x faster)○ Horizontal scalability with Neo4j Clustering (predictable scalability)○ High-availability and online backups○ Cache based sharding (shard your graph in memory)○ Advanced Monitoring (operational metrics)○ Certified for Windows and Linux○ Email/Phone Support (10x5, 24x7 hours)○ Subscriptions

■ Personal (up to 3 devs, $100k annual revenue) = FREE■ Startups (<$10M funding, <$5M annual revenue) = $12k■ Business (medium, to Global 2000) = Contact Sales

18

Neo4j license

Page 21: Neo4j: Graph-like power

19

● for the simple friends of friends query, Neo4j is 60% faster than MySQL● for friends of friends of friends, Neo is 180 times faster● and for the depth four query, Neo4j is 1,135 times faster● and MySQL just chokes on the depth 5 query

Neo4j vs. Mysql

Page 22: Neo4j: Graph-like power

Neo4j: Nodes● fundamental units that form a graph● can have key/value-style properties● index nodes and relationships

by {key, value} pairs● represent entities

20

Page 23: Neo4j: Graph-like power

Neo4j: Relationships #1/2

● connect entities and structure domain● allow for finding related data● are always directed (outgoing or incoming)● are equally well traversed in either direction● can have relationships to itself● have a relationship type (label)

21

Page 24: Neo4j: Graph-like power

Neo4j: Relationships #2/2

22

Page 25: Neo4j: Graph-like power

Neo4j: Properties● nodes and relationships can have properties● are key-value pairs

○ key is a string○ values can be either a primitive or an array of

one primitive type■ boolean, String, int, int[], etc

■ Java Language Specification

● entity attributes, rels qualities,and metadata

23

Page 26: Neo4j: Graph-like power

Neo4j: Labels● used to group nodes into sets● any number of labels, including none● can be added and removed during runtime● can be used to mark temporary states for nodes● names case-sensitive● CamelCase (convention)

24

Page 27: Neo4j: Graph-like power

Neo4j: Paths● is one or more nodes with connecting relationships● shortest path:

● a path of length one:

● a path of length one:

25

Page 28: Neo4j: Graph-like power

Neo4j: Traversal● Traversal Framework from box● means visiting nodes, following relationships by rules● in most cases only a subgraph is visited● callback based traversal API

○ you can specify the traversal rules● traversing breadth- or depth-first● open Java API

26

Page 29: Neo4j: Graph-like power

Neo4j: graph algorithms● A* (> uses the A* algorithm to find the cheapest path between two

nodes)● Dijkstra (dijkstra > Dijkstra algorithm to find the cheapest path

between two nodes)● PathWithLength (> all paths of a certain length (depth)

between two nodes)● Shortest paths (shortestPath Default > find all the

shortest paths between two nodes)● All simple paths (allSimplePaths > find all simple paths

between two nodes; without loops;)● All paths (allPaths > find all available paths between two

nodes)

27

Page 30: Neo4j: Graph-like power

Neo4j: Schema● is schema-optional graph database

28

Page 31: Neo4j: Graph-like power

● introduced in Neo4j 2.0● eventually available (populating in the background, is

not immediately available for querying)○ come online after fully populated○ failed status (drop and recreate the index)

● can be created on labels group● indexed Nodes & Rels● node_auto_indexing=false,

node_keys_indexable

Neo4j: Index

29

Page 32: Neo4j: Graph-like power

Neo4j: Constraints

● can help you keep your data clean● specify the rules for what your data should

look like● unique constraints is the only available

constraint type

30

Page 33: Neo4j: Graph-like power

● single server instance○ nodes = 2^35 (~34 billion)○ relationships = 2^35 (~34 billion)○ labels = 2^31 (~2 billion)○ properties = 2^36 to 2^38 depending on

property types (maximum ~274 billion, always at least ~68 billion)

○ relationship types = 2^15 (~ 32’000)

31

Neo4j: Data Size

Page 34: Neo4j: Graph-like power

● powerful graph query language● relatively simple● declarative grammar (say what you want, not how)

● humane query language● self-explanatory (based on English prose and neat iconography)● written in Scala● pattern-matching (borrows expression approaches from SPARQL)● aggregation, ordering, limits● create, update, delete● structure and most of keywords inspired by SQL● changing rather rapidly (CYPHER 1.9 START ...)

Cypher Query Language

32

“Makes the simple things easy, and the complex things possible”

Page 35: Neo4j: Graph-like power

Cypher patterns #1/2

33

● (a)● (b)● (a)-->(b)● (a)-->(b)-->(c)● (b)-->(c)<--(a)● (b)-->()<--(a)● (a)--(b)● (a)-(*5)->(b)● (a)-(*3..5)->(b)

○ (a)-(*3..)->(b)○ (a)-(*..5)->(b)○ (a)-(*)->(b)

Page 36: Neo4j: Graph-like power

Cypher patterns #2/2

34

● (a:Label)-->(m)● (a:User:Admin)-->(m)● (a)--(m)● (a)-[r]->(m)● (a)-[ACTED_IN]->(m)● (a)-[r:SOME|ELSE|WTH]->(m)

Page 37: Neo4j: Graph-like power

Cypher: START / RETURN“It all starts with the START”

Michael Hunger, Cypher webinar, Sep 2012

● designates the start points● START is optional (in Neo4j >= 2.0)

Examples:● START <lookup> RETURN <expression>● START n=node(0) RETURN n● START n=node(*) RETURN n.name

35

Page 38: Neo4j: Graph-like power

Cypher: MATCH● primary way of getting data from the database

● START <lookup> MATCH <pattern> RETURN <expr>● OPTIONAL MATCH <lookup> RETURN <expr>

Examples:● MATCH (n) RETURN count(n)● MATCH (actor:Actor) RETURN actor.name;● START me=node(0) MATCH (me)--(f) RETURN f.name● MATCH (n)-[r]->(m) RETURN n AS FROM, r AS `->`, m AS TO

36

Page 39: Neo4j: Graph-like power

● creates nodes and relationships

● CREATE (<name>[:label] [properties,..])● CREATE (<node-in>)-[<var>:RELATION [properties,..]]->(<node-out>);

● CREATE UNIQUE ...

Examples:● CREATE (n:Actor { name:"Keanu Reeves" });● CREATE (keanu)-[:ACTED_IN]->(matrix)● MATCH (keanu {name:”..”}) SET keanu.age=49 RETURN

Cypher: CREATE / SET

37

Page 40: Neo4j: Graph-like power

Cypher: WHERE● filters the results

● MATCH <pattern> WHERE <condition> RETURN <expr>

Examples:● WHERE n.name =~ “(?i)John.*”● WHERE NOT ..● WHERE type(rel) =~ “Perso.*”

38

Page 41: Neo4j: Graph-like power

Cypher: RETURN● creates the result table● any query can return data● can be nodes, relationships, or properties on these

● RETURN DISTINCT <expression> AS x● RETURN aggregate(expr) as alias● RETURN nodes, rels, properties● RETURN expressions of funcs and operators● RETURN aggregation funcs on the above

39

Page 42: Neo4j: Graph-like power

Cypher: etc● CASE / WHEN / ELSE● ORDER BY node.key, node2.key, .. ASC|DESC● LIMIT / SKIP● WITH (WITH count(*) as c)

● UNION / UNION ALL (combining results from multiple queries)

● USING INDEX/SCAN● MERGE / SET / DELETE / REMOVE / FORECH

● Expressions● Operators● Comments● Functions: ALL, ANY, LENGTH, {Math}, {String}, ...

40

Page 43: Neo4j: Graph-like power

● any updating query will run in a transaction● ACID● “it is very important to finish each transaction”

● write lock on node/rel:○ adding, changing or removing prop on a node/rel

● write lock on node:○ creating or deleting a node

● write lock on node and both its nodes:○ creating or deleting a relationship

Cypher: Transactions

41

Page 44: Neo4j: Graph-like power

Cypher: Aggregation● count(node/rel/prop)● count(n), count(n.prop)● sum(n.prop)● avg(n.prop)● percentileDisc(n.prop, {median})● stdev(n.prop, {median}) - calculate deviation from group

● max(n.prop, {median})● collect(n.prop, {median})

● RETURN n, count(*)

42

Page 45: Neo4j: Graph-like power

● SELECT *FROM PersonWHERE name=“Valentin” and age > 30

● START person=node:Person(node=”Valentin”)WHERE person.age > 30RETURN person

Cypher: back to SQL #1/5

43

Page 46: Neo4j: Graph-like power

Cypher: back to SQL #2/5

● SELECT “Email”.*FROM PersonJOIN “Email” ON “Person”.id = “Email”.person_idWHERE “Person”.name = “Benedikt”

● START person=node:Person(name=”Benedikt”)MATCH person-[:email]->emailRETURN email

44

Page 47: Neo4j: Graph-like power

Cypher: back to SQL #3/5

● show me all people that are both actors and directors

● SELECT name FROM PersonWHERE person_id IN (SELECT person_id FROM Actor) AND person_id IN (SELECT person_id FROM Director)

● START person=node:Person(“name:*”)WHERE (person)-[:ACTS_IN]->() AND (person)-[:DIRECTED]->()RETURN person.name

45

Page 48: Neo4j: Graph-like power

Cypher: back to SQL #4/5

● show me all Tom Hanks’s co-actors

● SELECT DISTICT co_actor.name FROM Person tomJOIN Movie a1 ON tom.person_in = a1.person_idJOIN Actor a2 ON a1.movie_id = a2.movie_idJOIN Person co_actor ON co_actor.person_id = a2.person_idWHERE tom.name = “Tom Hanks”

● START tom=node:Person(name=”Tom Hanks”)MATCH tom-[:ACTS_IN]->movie, co_actor-[:ACTS_IN]->movieRETURN DISTINCT co_actor.name

46

Page 49: Neo4j: Graph-like power

Cypher: back to SQL #5/5

● show me all Lucy’s favorite directors

● SELECT dir.name, count(*) FROM Person lucyJOIN Actor on Person.person_id = Actor.person_idJOIN Director ON Actor.movie_id = Director.movie_idJOIN Person dir ON Director.person_id = dir.person_idWHERE lucy.name = “Lucy Liu”GROUP BY dir.nameORDER BY count(*) DESC

● START lucy=node:Person(name=”Lucy Liu”)MATCH lucy-[:ACTS_IN]->movie, director-[:DIRECTED]->movieRETURN director.name, count(*)ORDER BY director.name, count(*) DESC

47

Page 50: Neo4j: Graph-like power

STARTlucy = node:Person(name=”Lucy Lui”),kevin = node:Person(name=”Kevin Bacon”)

MATCHp = shortestPath( lucy-[:ACTS_IN*]-kevin )

RETURNEXTRACT (n in NODES(p):

COALESCE(n.name?, n.title?))

48

Cypher: back to SQL #6/5

Page 51: Neo4j: Graph-like power

Neo4j Shell

● command-line shell for running Cypher queries● supports remote shell● :schema

● bash# neo4j-shell -path data/graph.db -readonly -config conf/neo4j.properties -c “<command>”

49

Page 52: Neo4j: Graph-like power

Neo4j: Security

● does not deal with data encryption explicitly

● can be used all means built into the Java● can be used encrypted datastore● webadmin https

50

Page 53: Neo4j: Graph-like power

● manipulate data stored in RDF format● focused on match triple sets

PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT ?name ?emailWHERE { ?person a foaf:Person. ?person foaf:name ?name. ?person foaf:mbox ?email.}

SPARQL

51

Page 54: Neo4j: Graph-like power

● graph traversal language● scripting language● Pipe & Filter (similar to jQuery)● across different graph databases● based on Groovy (limited to Java)● not as stable in Neo4j● XPath like

● ./outE[label=”family”]/inV/@name● g.v(1).out('likes').in('likes').out('likes').groupCount(m)● g.V.as('x').out.groupCount(m).loop('x'){c++ < 1000}● g.v(1).in(‘LOVE_OF’).out(‘SOME_IN’).has(‘title’,’abc’).back(2)

Gremlin

52

Page 55: Neo4j: Graph-like power

Neo4j and PHP● everyman/neo4jphp < packagist.org

○ PHP wrapper for the Neo4j using REST interface○ Follows the PSR-0 autoloading standard○ Basic wrappers for all components○ Last update - a month ago○ supports Gremlin

● Neo4j-PHP OGM < a lot of based on○ Object Graph Mapper, inspired by Doctrine○ based on Doctrine\Common○ borrows significantly Doctrine\ORM design○ uses annotations on classes○ MIT Licence

● Neo4J PHP REST API client○ Using Neo4j REST API○ Node create/find/delete○ Relationship create/list/filter

53

Page 56: Neo4j: Graph-like power

High Availability with Neo4j● in HA - a single master and zero or more slaves● slave synchronizing with the master to preserve

consistency● master write to slave before transaction completes

54

Page 57: Neo4j: Graph-like power

DemoNeo4j.org Example Datasets:● DrWho (nodes=1'060; rels=2'286)

● Cineasts Movies & Actors (nodes=64'069; rels=121'778)

● Hubway Data Challenge (nodes=554'674; rels=2'011'904)

GraphGist:● JIRA and neo4j● PHP and neo4j● Kant in neo4j

XSS

55

Page 58: Neo4j: Graph-like power

Gephi (win, nix, mac)

56

Page 59: Neo4j: Graph-like power

Linkurious.us

57

Page 60: Neo4j: Graph-like power

Neoclipse (eclipse plugin)

58

Page 61: Neo4j: Graph-like power

KeyLines (JavaScript library)

59

Page 62: Neo4j: Graph-like power

Graffeine (npm package)

60

Page 63: Neo4j: Graph-like power

Neovigator (neography + processing.js)

61

Page 64: Neo4j: Graph-like power

● Heroku○ GrapheneDB beta○ bash$ heroku addons:add graphenedb

● Jelastic Cloud PaaS

Cloud

62

Page 65: Neo4j: Graph-like power

● GrapheneDB - based on neo4j

● AllegroGraph - Closed Source, Commercial, RDF-QuadStore

● Sones - Closed Source, .NET focused○ graph database built around the W3C spec for the Resource

Description Framework○ supports SPARQL, RDFS++, and Prolog

● Virtuoso - Closed Source, RDF focused

● GraphDB - graph database built in .NET by the German company sones

● InfiniteGraph - goal is to create a graph database with "virtually unlimited scalability."

● FlockDB

Analogues

63

Page 66: Neo4j: Graph-like power

Docs● http://docs.neo4j.org/chunked/snapshot/● http://docs.neo4j.org/refcard/2.0/● http://graphdatabases.com/ - book, O'REILLY● http://www.cs.usfca.

edu/~galles/visualization/Algorithms.html - Graph Algorithms visualization

● http://bit.ly/rr-neo4j● https://github.com/itspoma/test-neo4j

64

Page 67: Neo4j: Graph-like power

● best used for graph-style,rich or complex,structured dense data,deep graphs with unlimited depth and cyclical,with weighted connections,interconnected data

● quickly add new functionality without impacting existing deployments

● schema-less forcing to re-think entire approach to data● not the silver bullet for all problems

Conclusion

Page 68: Neo4j: Graph-like power
Page 69: Neo4j: Graph-like power