hands on training – graph database with neo4j

53
Hands on Training – Graph Database with Neo4j www.serendio.com

Upload: serendio-inc

Post on 18-Jan-2017

419 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Hands on Training – Graph Database with Neo4j

Hands on Training – Graph Database with Neo4j

www.serendio.com

Page 2: Hands on Training – Graph Database with Neo4j

Content• Introduction• The Graph Database: Neo4j• Neo4j - Cypher Query Language

– CRUD operations

• Use cases– Mumbai Local Train– Movie Recommendation– Email Analytics

• Neo4j Tools: Import, Visualization• Conclusion

Page 3: Hands on Training – Graph Database with Neo4j

Introduction to Nosql

Nosql = Not O

nly SQL

Page 4: Hands on Training – Graph Database with Neo4j

4

What is Graph?

V1

V4

V6V7

V2

V5

V3

The set of objects connected by links.

Page 5: Hands on Training – Graph Database with Neo4j

The Graph based Technologies in BigData/Nosql domainStorage & Traversal/Query • Neo4j• TitanDB• OrientDB

Processing/Computation Engines• Apache Giraph• GraphLab• Apache Spark Graph ML/Graphx

Page 6: Hands on Training – Graph Database with Neo4j

Graph Databases

• A database which follows graph structure• Each node knows its adjacent nodes• As the number of nodes increases, the cost of local

step remains the same• Index for lookups• Optimized for traversing connected data

Page 7: Hands on Training – Graph Database with Neo4j

Neo4j

• Graph database from Neo Technology• A schema-free labeled Property Graph Database +

Lucene Index• Perfect for complex, highly connected data• Reliable with real ACID Transactions• Scalable: Billions of Nodes and Relationships, Scale

out with highly available Neo4j Cluster• Server with REST API or Embeddable• Declarative Query Language (Cypher)

Page 8: Hands on Training – Graph Database with Neo4j

Neo4j: Strengths & Weakness

Strengths• Powerful data model• Whiteboard friendly• Fast for connected data• Easy to query

Weakness• Sharding• Requires Conceptual Shift (Graph like thinking)

Page 9: Hands on Training – Graph Database with Neo4j

Four Building Blocks

• Nodes• Relationships• Properties• Labels

(:USER)[:RELATIVE] (:PET)

Name: Mike

Animal: DogName: AppleAge: 25

Relation: Owner

Page 10: Hands on Training – Graph Database with Neo4j

10Serendio Proprietary and Confidential

SQL to Graph DB: Data Model Transformation

SQL Graph DBTable Type of Node

Rows of Table NodesColumns of Table Node-PropertiesForeign-key, Joins Relationships

Page 11: Hands on Training – Graph Database with Neo4j

SQL to Graph DB: Data Model Transformation

Name Movies Language

Rajnikant Tamil

Maheshbabu Telugu

Vijay Tamil

Prabhas Telugu

Name Lead Actor

Bahubali Prabhas

Puli Vijay

Shrimanthudu Maheshbabu

Robot Rajnikant

Table: Actor

Table: Movie

ACTOR

MOVIE

ACTOR

MOVIE

Name Prabhas

Movie Language

Telugu

Name Rajnikant

Movie Language

TamilName Bahubali

Name Robot

LEAD_ACTOR

LEAD_ACTOR

Page 12: Hands on Training – Graph Database with Neo4j

Interact with Neo4j• Web Interface

– http://IP:7474/browser/– http://IP:7474/webadmin/

• Neo4j Console• REST API• Java Native Libraries

Page 13: Hands on Training – Graph Database with Neo4j

How to query Graph Database?• Graph Query Language

– Cypher– Gremlin

Page 14: Hands on Training – Graph Database with Neo4j

A pattern-matching query language for graphs

Cypher

Page 15: Hands on Training – Graph Database with Neo4j

Cypher Query Language• Declarative• SQL-inspired • Pattern based

Apple OrangeLIKES

(Apple:FRUIT) - [connect:RELATIVE] -> (Orange:FRUIT)

Page 16: Hands on Training – Graph Database with Neo4j

Cypher: Getting Started

Structure:• Similar to SQL• Most common clauses:

– MATCH: the graph pattern for matching– WHERE: add constrains or filter– RETURN: what to return

Page 17: Hands on Training – Graph Database with Neo4j

Cypher: Frequently Used Queries

• get whole database: MATCH n RETURN n

• delete whole database: MATCH (n)OPTIONAL MATCH (n)-[r]-()DELETE n,r

Page 18: Hands on Training – Graph Database with Neo4j

CRUD OperationsCopy the code from link and paste in Noe4j Web Browser

MATCH:• MATCH (n) RETURN n• MATCH (movie:Movie) RETURN movie• MATCH (movie:Movie { title: 'Bahubali' }) RETURN movie• MATCH (director { name:'Rajamouli' })--(movie) RETURN movie.title• MATCH (raj:Person { name:'Rajamouli'})--(movie:Movie) RETURN movie• MATCH (raj:Person { name:'Rajamouli'})-->(movie:Movie) RETURN movie• MATCH (raj:Person { name:'Rajamouli'})<--(movie:Movie) RETURN movie• MATCH (raj:Person { name:'Rajamouli'})-[:DIRECTED]->(movie:Movie)

RETURN movie

Page 19: Hands on Training – Graph Database with Neo4j

CRUD Operations

WHERE:• MATCH (n)

WHERE n:Movie

RETURN n• MATCH (n)

WHERE n.name <> 'Prabhas'

RETURN n

Page 20: Hands on Training – Graph Database with Neo4j

CRUD Operations

Let clean the database:

MATCH (n)OPTIONAL MATCH (n)-[r]-()DELETE n,r

Page 21: Hands on Training – Graph Database with Neo4j

CRUD Operations

CREATE:Node:• CREATE (n)• CREATE (n),(m)• CREATE (n:Person)• CREATE (n:Person:Swedish)• CREATE (n:Person { name : 'Andres', title : 'Developer' })• CREATE (a:Person { name : 'Roman' }) RETURN a

Page 22: Hands on Training – Graph Database with Neo4j

CRUD Operations

CREATE:Relationships:• MATCH (a:Person),(b:Person)

WHERE a.name = 'Roman' AND b.name = 'Andres'CREATE (a)-[r:RELTYPE]->(b)

RETURN r• MATCH (a:Person),(b:Person)

WHERE a.name = 'Roman' AND b.name = 'Andres'

CREATE (a)-[r:RELTYPE { name : a.name + '<->' + b.name }]->(b)

RETURN r

Page 23: Hands on Training – Graph Database with Neo4j

CRUD Operations

CREATE:Relationships:• CREATE p =(andres { name:'Andres'}) - [:WORKS_AT] -> (neo)

<- [:WORKS_AT] - (michael { name:'Michael' })

RETURN p

Page 24: Hands on Training – Graph Database with Neo4j

CRUD Operations

UPDATE:Properties:• MATCH (n:Person { name : 'Andres' }) SET n :Person:Coder• MATCH (n:Person { name : 'Andres', title : 'Developer' }) SET

n.title = 'Mang'

Page 25: Hands on Training – Graph Database with Neo4j

CRUD Operations

DELETE:• MATCH (n:Person)

WHERE n.name = 'Andres'

DELETE n• MATCH (n { name: 'Andres' })-[r]-()

DELETE n, r• MATCH (n:Person)

DELETE n• MATCH (n)

OPTIONAL MATCH (n)-[r]-()

DELETE n,r

Page 26: Hands on Training – Graph Database with Neo4j

Functions

Predicates:• ALL(identifier in collection WHERE predicate)• ANY(identifier in collection WHERE predicate)• NONE(identifier in collection WHERE predicate)• SINGLE(identifier in collection WHERE predicate)• EXISTS( pattern-or-property )

Scalar Function:• LENGTH( collection/pattern expression )• TYPE( relationship )• ID( property-container )• COALESCE( expression [, expression]* )• HEAD( expression )• LAST( expression )• TIMESTAMP()

Page 27: Hands on Training – Graph Database with Neo4j

Functions

Collection Function:• NODES( path )• RELATIONSHIPS( path )• LABELS( node )• FILTER(identifier in collection WHERE predicate)• REDUCE( accumulator = initial, identifier in collection | expression )

Mathematical Function:• ABS( expression )• COS( expression )• LOG( expression )• ROUND( expression )• SQRT( expression )

Page 28: Hands on Training – Graph Database with Neo4j

Neo4j in Action

Usecases

Page 29: Hands on Training – Graph Database with Neo4j

Use case 1: Mumbai Local Train*Problem• Four main railway lines- Western, Central, Harbour and Trans

Harbour.• Each line serves various sections of the city.• To travel across sections, one must change lines at various

interchange stations. • Find the shortest path from source station to destination

station.

•*https://gist.github.com/luanne/8159102

Page 30: Hands on Training – Graph Database with Neo4j

Use case 1: Mumbai Local Train (conti..)

Page 31: Hands on Training – Graph Database with Neo4j

Use case 1: Mumbai Local Train (conti..)Solution:• Create railway network graph.• Use shortest path algo for source and destination.

Page 32: Hands on Training – Graph Database with Neo4j

Use case 1: Mumbai Local Train (conti..)Graph Database Model:

Station StationNext

Page 33: Hands on Training – Graph Database with Neo4j

Use case 1: Mumbai Local Train (conti..)Create Graph• Open the file from link below, copy-paste and run it on neo4j.

Page 34: Hands on Training – Graph Database with Neo4j

Use case 1: Mumbai Local Train (conti..)• Query 1: The Graphmatch n return n

• Query 2: Route from Churchgate to Vashimatch (s1 {name:"Churchgate"}),(s2 {name:"Vashi"}),p=shortestPath((s1)-[:NEXT*]->(s2)) return p

• Query 3: Route from Santa Cruz to Dockyard Road

match (s1 {name:"Santa Cruz"}),(s2 {name:"Dockyard Road"}),p=shortestPath((s1)-[:NEXT*]-(s2)) return p

Page 35: Hands on Training – Graph Database with Neo4j

Use Case 2: Movie Recommendation* Problem: • We are running IMDB type website. • We have dataset which contains movie rating done by users. • Our problem is to generate list of movies which will be

recommended to individual users.

* http://www.neo4j.org/graphgist?8173017

Page 36: Hands on Training – Graph Database with Neo4j

Use Case 2: Movie Recommendation (Conti..)Solution: • We will find the people who has given similar rating to the

movies watch by both of them.• After that we will recommend movies which one has not seen

and other has rated high.

• Cosine Similarity function to calculate similarity between users.

• k-Nearest Neighbors for finding similar users

Page 37: Hands on Training – Graph Database with Neo4j

Use Case 2: Movie Recommendation (Conti..)

• Cosine Similarity:

• K-NN:

Page 38: Hands on Training – Graph Database with Neo4j

Use Case 2: Movie Recommendation (Conti..)• Let’s create real dataset with you folks.

• Visit: http://graphlab.byethost7.com/movie_recco/index.php

Page 39: Hands on Training – Graph Database with Neo4j

Use Case 2: Movie Recommendation (Conti..)Dataset:• Nodes:

– movies.csv– users.csv

• Edges:– rating.csv

EXTRA FILES WE WILL CREATE• movies_header.csv• users_header.csv• rating_header.csv

Page 40: Hands on Training – Graph Database with Neo4j

Use Case 2: Movie Recommendation (Conti..)• Import to Neo4j$ ./neo4j-import \--into /tmp/graph.db \--nodes:USER person_header.csv,person.csv \--nodes:MOVIES movies_header.csv,movies.csv \--relationships:RATING rating_header.csv, rating.csv

Page 41: Hands on Training – Graph Database with Neo4j

Use Case 2: Movie Recommendation (Conti..)• Query:Add Cosine Similarity

MATCH (p1:USER)-[x:RATING]->(m:MOVIES)<-[y:RATING]-(p2:USER)WITH SUM(x.rating * y.rating) AS xyDotProduct, SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating) | xDot + a^2)) AS

xLength, SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating) | yDot + b^2)) AS

yLength, p1, p2MERGE (p1)-[s:SIMILARITY]-(p2)SET s.similarity = xyDotProduct / (xLength * yLength)

Page 42: Hands on Training – Graph Database with Neo4j

Use Case 2: Movie Recommendation (Conti..)• Query: See who is your neighbor in

similarity

MATCH (p1:USER {name:'Nishant'})-[s:SIMILARITY](p2:USER)WITH p2, s.similarity AS simORDER BY sim DESCLIMIT 5RETURN p2.name AS Neighbor, sim AS Similarity

Page 43: Hands on Training – Graph Database with Neo4j

Use Case 2: Movie Recommendation (Conti..)• Query: Recommendation Finally :D

MATCH (b:USER)-[r:RATING]->(m:MOVIES), (b)-[s:SIMILARITY]-(a:USER {name:'Nishant'})

WHERE NOT((a)-[:RATING]->(m))WITH m, s.similarity AS similarity, r.rating AS ratingORDER BY m.name, similarity DESCWITH m.name AS movie, COLLECT(rating)[0..3] AS ratingsWITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS

recoORDER BY reco DESCRETURN movie AS Movie, reco AS Recommendation

Page 44: Hands on Training – Graph Database with Neo4j

Use Case 3: Email Analytics*Overview:• Framework for analyzing large email datasets• Capability of performing Sentiment Analysis and Topic

Extraction on email dataset• Accessed through Command Line Interface• Incubated at Serendio and open source project now.

*https://github.com/serendio-labs/email-analytics

Page 45: Hands on Training – Graph Database with Neo4j

Use Case 3: Email Analytics (Conti..)

System Architecture:

Page 46: Hands on Training – Graph Database with Neo4j

Use Case 3: Email Analytics (Conti..)• DEMO

Page 47: Hands on Training – Graph Database with Neo4j

Use Case 3: Email Analytics (Conti..)Possible Use cases:• Keep track of your employee’s activities.• Fraud-detection• Data-mining for Business Analytics

Page 48: Hands on Training – Graph Database with Neo4j

Use Case 3: Email Analytics (Conti..)

• Come forward and contribute:• The project need attention in the area of

– Web-UI– REST API– Unit Test– Custom Email Format Support– Other Features

Page 49: Hands on Training – Graph Database with Neo4j

Neo4j with Other Technologies

Neo4j-Integration

Page 50: Hands on Training – Graph Database with Neo4j

Neo4j with Other technologies• Data Import

– LOAD CSV– Neo4j-import

• Graph Visualization– Alistair Jones (Arrow)– Alchemy.js (GraphJSON)– Neo4j Browser– Linkurious– Keylines– D3.js

Page 51: Hands on Training – Graph Database with Neo4j

Neo4j Integration• Apache Spark• Elasticsearch • Docker

Page 52: Hands on Training – Graph Database with Neo4j

Conclusion Graph Database Technologies like Neo4j has lot of potential to

solve many complex problems. The neo4j is mature technology which can be used in

designing solutions.

Page 53: Hands on Training – Graph Database with Neo4j

[email protected]

Serendio provides Big Data Science Solutions & Services for Data-Driven Enterprises.

Learn more at: serendio.com/index.php/case-studies

Thank You!