hands on training – graph database with neo4j
TRANSCRIPT
Hands on Training – Graph Database with Neo4j
www.serendio.com
Content• Introduction• The Graph Database: Neo4j• Neo4j - Cypher Query Language
– CRUD operations
• Use cases– Mumbai Local Train– Movie Recommendation– Email Analytics
• Neo4j Tools: Import, Visualization• Conclusion
Introduction to Nosql
Nosql = Not O
nly SQL
4
What is Graph?
V1
V4
V6V7
V2
V5
V3
The set of objects connected by links.
The Graph based Technologies in BigData/Nosql domainStorage & Traversal/Query • Neo4j• TitanDB• OrientDB
Processing/Computation Engines• Apache Giraph• GraphLab• Apache Spark Graph ML/Graphx
Graph Databases
• A database which follows graph structure• Each node knows its adjacent nodes• As the number of nodes increases, the cost of local
step remains the same• Index for lookups• Optimized for traversing connected data
Neo4j
• Graph database from Neo Technology• A schema-free labeled Property Graph Database +
Lucene Index• Perfect for complex, highly connected data• Reliable with real ACID Transactions• Scalable: Billions of Nodes and Relationships, Scale
out with highly available Neo4j Cluster• Server with REST API or Embeddable• Declarative Query Language (Cypher)
Neo4j: Strengths & Weakness
Strengths• Powerful data model• Whiteboard friendly• Fast for connected data• Easy to query
Weakness• Sharding• Requires Conceptual Shift (Graph like thinking)
Four Building Blocks
• Nodes• Relationships• Properties• Labels
(:USER)[:RELATIVE] (:PET)
Name: Mike
Animal: DogName: AppleAge: 25
Relation: Owner
10Serendio Proprietary and Confidential
SQL to Graph DB: Data Model Transformation
SQL Graph DBTable Type of Node
Rows of Table NodesColumns of Table Node-PropertiesForeign-key, Joins Relationships
SQL to Graph DB: Data Model Transformation
Name Movies Language
Rajnikant Tamil
Maheshbabu Telugu
Vijay Tamil
Prabhas Telugu
Name Lead Actor
Bahubali Prabhas
Puli Vijay
Shrimanthudu Maheshbabu
Robot Rajnikant
Table: Actor
Table: Movie
ACTOR
MOVIE
ACTOR
MOVIE
Name Prabhas
Movie Language
Telugu
Name Rajnikant
Movie Language
TamilName Bahubali
Name Robot
LEAD_ACTOR
LEAD_ACTOR
Interact with Neo4j• Web Interface
– http://IP:7474/browser/– http://IP:7474/webadmin/
• Neo4j Console• REST API• Java Native Libraries
How to query Graph Database?• Graph Query Language
– Cypher– Gremlin
A pattern-matching query language for graphs
Cypher
Cypher Query Language• Declarative• SQL-inspired • Pattern based
Apple OrangeLIKES
(Apple:FRUIT) - [connect:RELATIVE] -> (Orange:FRUIT)
Cypher: Getting Started
Structure:• Similar to SQL• Most common clauses:
– MATCH: the graph pattern for matching– WHERE: add constrains or filter– RETURN: what to return
Cypher: Frequently Used Queries
• get whole database: MATCH n RETURN n
• delete whole database: MATCH (n)OPTIONAL MATCH (n)-[r]-()DELETE n,r
CRUD OperationsCopy the code from link and paste in Noe4j Web Browser
MATCH:• MATCH (n) RETURN n• MATCH (movie:Movie) RETURN movie• MATCH (movie:Movie { title: 'Bahubali' }) RETURN movie• MATCH (director { name:'Rajamouli' })--(movie) RETURN movie.title• MATCH (raj:Person { name:'Rajamouli'})--(movie:Movie) RETURN movie• MATCH (raj:Person { name:'Rajamouli'})-->(movie:Movie) RETURN movie• MATCH (raj:Person { name:'Rajamouli'})<--(movie:Movie) RETURN movie• MATCH (raj:Person { name:'Rajamouli'})-[:DIRECTED]->(movie:Movie)
RETURN movie
CRUD Operations
WHERE:• MATCH (n)
WHERE n:Movie
RETURN n• MATCH (n)
WHERE n.name <> 'Prabhas'
RETURN n
CRUD Operations
Let clean the database:
MATCH (n)OPTIONAL MATCH (n)-[r]-()DELETE n,r
CRUD Operations
CREATE:Node:• CREATE (n)• CREATE (n),(m)• CREATE (n:Person)• CREATE (n:Person:Swedish)• CREATE (n:Person { name : 'Andres', title : 'Developer' })• CREATE (a:Person { name : 'Roman' }) RETURN a
CRUD Operations
CREATE:Relationships:• MATCH (a:Person),(b:Person)
WHERE a.name = 'Roman' AND b.name = 'Andres'CREATE (a)-[r:RELTYPE]->(b)
RETURN r• MATCH (a:Person),(b:Person)
WHERE a.name = 'Roman' AND b.name = 'Andres'
CREATE (a)-[r:RELTYPE { name : a.name + '<->' + b.name }]->(b)
RETURN r
CRUD Operations
CREATE:Relationships:• CREATE p =(andres { name:'Andres'}) - [:WORKS_AT] -> (neo)
<- [:WORKS_AT] - (michael { name:'Michael' })
RETURN p
CRUD Operations
UPDATE:Properties:• MATCH (n:Person { name : 'Andres' }) SET n :Person:Coder• MATCH (n:Person { name : 'Andres', title : 'Developer' }) SET
n.title = 'Mang'
CRUD Operations
DELETE:• MATCH (n:Person)
WHERE n.name = 'Andres'
DELETE n• MATCH (n { name: 'Andres' })-[r]-()
DELETE n, r• MATCH (n:Person)
DELETE n• MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n,r
Functions
Predicates:• ALL(identifier in collection WHERE predicate)• ANY(identifier in collection WHERE predicate)• NONE(identifier in collection WHERE predicate)• SINGLE(identifier in collection WHERE predicate)• EXISTS( pattern-or-property )
Scalar Function:• LENGTH( collection/pattern expression )• TYPE( relationship )• ID( property-container )• COALESCE( expression [, expression]* )• HEAD( expression )• LAST( expression )• TIMESTAMP()
Functions
Collection Function:• NODES( path )• RELATIONSHIPS( path )• LABELS( node )• FILTER(identifier in collection WHERE predicate)• REDUCE( accumulator = initial, identifier in collection | expression )
Mathematical Function:• ABS( expression )• COS( expression )• LOG( expression )• ROUND( expression )• SQRT( expression )
Neo4j in Action
Usecases
Use case 1: Mumbai Local Train*Problem• Four main railway lines- Western, Central, Harbour and Trans
Harbour.• Each line serves various sections of the city.• To travel across sections, one must change lines at various
interchange stations. • Find the shortest path from source station to destination
station.
•*https://gist.github.com/luanne/8159102
Use case 1: Mumbai Local Train (conti..)
Use case 1: Mumbai Local Train (conti..)Solution:• Create railway network graph.• Use shortest path algo for source and destination.
Use case 1: Mumbai Local Train (conti..)Graph Database Model:
Station StationNext
Use case 1: Mumbai Local Train (conti..)Create Graph• Open the file from link below, copy-paste and run it on neo4j.
Use case 1: Mumbai Local Train (conti..)• Query 1: The Graphmatch n return n
• Query 2: Route from Churchgate to Vashimatch (s1 {name:"Churchgate"}),(s2 {name:"Vashi"}),p=shortestPath((s1)-[:NEXT*]->(s2)) return p
• Query 3: Route from Santa Cruz to Dockyard Road
match (s1 {name:"Santa Cruz"}),(s2 {name:"Dockyard Road"}),p=shortestPath((s1)-[:NEXT*]-(s2)) return p
Use Case 2: Movie Recommendation* Problem: • We are running IMDB type website. • We have dataset which contains movie rating done by users. • Our problem is to generate list of movies which will be
recommended to individual users.
* http://www.neo4j.org/graphgist?8173017
Use Case 2: Movie Recommendation (Conti..)Solution: • We will find the people who has given similar rating to the
movies watch by both of them.• After that we will recommend movies which one has not seen
and other has rated high.
• Cosine Similarity function to calculate similarity between users.
• k-Nearest Neighbors for finding similar users
Use Case 2: Movie Recommendation (Conti..)
• Cosine Similarity:
• K-NN:
Use Case 2: Movie Recommendation (Conti..)• Let’s create real dataset with you folks.
• Visit: http://graphlab.byethost7.com/movie_recco/index.php
Use Case 2: Movie Recommendation (Conti..)Dataset:• Nodes:
– movies.csv– users.csv
• Edges:– rating.csv
EXTRA FILES WE WILL CREATE• movies_header.csv• users_header.csv• rating_header.csv
Use Case 2: Movie Recommendation (Conti..)• Import to Neo4j$ ./neo4j-import \--into /tmp/graph.db \--nodes:USER person_header.csv,person.csv \--nodes:MOVIES movies_header.csv,movies.csv \--relationships:RATING rating_header.csv, rating.csv
Use Case 2: Movie Recommendation (Conti..)• Query:Add Cosine Similarity
MATCH (p1:USER)-[x:RATING]->(m:MOVIES)<-[y:RATING]-(p2:USER)WITH SUM(x.rating * y.rating) AS xyDotProduct, SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating) | xDot + a^2)) AS
xLength, SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating) | yDot + b^2)) AS
yLength, p1, p2MERGE (p1)-[s:SIMILARITY]-(p2)SET s.similarity = xyDotProduct / (xLength * yLength)
Use Case 2: Movie Recommendation (Conti..)• Query: See who is your neighbor in
similarity
MATCH (p1:USER {name:'Nishant'})-[s:SIMILARITY](p2:USER)WITH p2, s.similarity AS simORDER BY sim DESCLIMIT 5RETURN p2.name AS Neighbor, sim AS Similarity
Use Case 2: Movie Recommendation (Conti..)• Query: Recommendation Finally :D
MATCH (b:USER)-[r:RATING]->(m:MOVIES), (b)-[s:SIMILARITY]-(a:USER {name:'Nishant'})
WHERE NOT((a)-[:RATING]->(m))WITH m, s.similarity AS similarity, r.rating AS ratingORDER BY m.name, similarity DESCWITH m.name AS movie, COLLECT(rating)[0..3] AS ratingsWITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS
recoORDER BY reco DESCRETURN movie AS Movie, reco AS Recommendation
Use Case 3: Email Analytics*Overview:• Framework for analyzing large email datasets• Capability of performing Sentiment Analysis and Topic
Extraction on email dataset• Accessed through Command Line Interface• Incubated at Serendio and open source project now.
*https://github.com/serendio-labs/email-analytics
Use Case 3: Email Analytics (Conti..)
System Architecture:
Use Case 3: Email Analytics (Conti..)• DEMO
Use Case 3: Email Analytics (Conti..)Possible Use cases:• Keep track of your employee’s activities.• Fraud-detection• Data-mining for Business Analytics
Use Case 3: Email Analytics (Conti..)
• Come forward and contribute:• The project need attention in the area of
– Web-UI– REST API– Unit Test– Custom Email Format Support– Other Features
Neo4j with Other Technologies
Neo4j-Integration
Neo4j with Other technologies• Data Import
– LOAD CSV– Neo4j-import
• Graph Visualization– Alistair Jones (Arrow)– Alchemy.js (GraphJSON)– Neo4j Browser– Linkurious– Keylines– D3.js
Neo4j Integration• Apache Spark• Elasticsearch • Docker
Conclusion Graph Database Technologies like Neo4j has lot of potential to
solve many complex problems. The neo4j is mature technology which can be used in
designing solutions.
Serendio provides Big Data Science Solutions & Services for Data-Driven Enterprises.
Learn more at: serendio.com/index.php/case-studies
Thank You!