graph database
TRANSCRIPT
Graph Database
General Discussion
Richard Kuo
References
Extracted from:
• http://neo4j.org/, Tobias Ivarsson, Emil Eifrem,
• http://markorodriguez.com, Marko A. Rodriguez
• http://www.jayway.com/, Andreas Ronge
• etc• etc
4/12/2011 Creative Commons Attribution-Share Alike 3.0 2
Outline
• NoSQL
– What, Why, Who
• Graph Database
– Graph Theory– Graph Theory
– Benefit
• Neo4J
– Function & Feature
– Code & Demo
4/12/2011 3Creative Commons Attribution-Share Alike 3.0
Why ? Not only SQL
• Size• Distributed data with accelerating growth of data
• Scalability & elasticity (at low cost!)
• Connectedness• Global linked data• Global linked data
• Semi-structure• Flexible schemas / semi-structured data
• Complex queries
• Architecture• Data mining and association toward more complex data modeling
• Transactions / strong consistency / integrity
• Geographic distribution (multiple datacenters)
4/12/2011 Creative Commons Attribution-Share Alike 3.0 4
4/12/2011 Creative Commons Attribution-Share Alike 3.0 5
http://richard.cyganiak.de/2007/10/lod/lod-datasets_2010-09-22_colored.html
4/12/2011 Creative Commons Attribution-Share Alike 3.0 6
4/12/2011 Creative Commons Attribution-Share Alike 3.0 7
NoSQL Taxonomy
Key-Value stores
• Simple K/V lookups (DHT)
Column stores
• Each key is associated with many attributes (columns)
• NoSQL column stores are actually hybrid row/column stores
• • Different from “pure” relational column stores! • • Different from “pure” relational column stores!
Document stores
• Store semi-structured documents (JSON)
• Map/Reduce based materialization, sorting, aggregation, etc.
Graph databases
• Scale, semi-structure data model
More …
4/12/2011 Creative Commons Attribution-Share Alike 3.0 8
4/12/2011 Creative Commons Attribution-Share Alike 3.0 9
Graph Database Comparisonhttp://nosql.mypopescu.com/post/619181345/nosql-graph-database-matrix
4/12/2011 Creative Commons Attribution-Share Alike 3.0 10
GRAPH DATABASE
Why Graph Databases?
Data mining
• You can make algorithms for searching patterns and add AI
High-critical environments
• You can apply neo4j for high load databases and optimize the queries and reduce costs on hardware use
• Engineering in biochemical components• Engineering in biochemical components
• You can make algorithms for helping the study of protein synthesys, for example
Discrete event simulation
• You can apply a pattern and behavior and assign everything to a graph database
Social graph
• Everything in user related “tastes” can be organized in a graph
Network architecture
4/12/2011 Creative Commons Attribution-Share Alike 3.0 12
When should I use a Graph DB ?
Massive data volumes
• Massively distributed architecture required to store the data
• Google, Amazon, Yahoo, Facebook – 10-100K servers
Extreme query workload
• Impossible to efficiently do joins at that scale with an RDBMS
Have a complex and evolving data modelHave a complex and evolving data model
• Big part of domain is expressed as relationships
• Schema flexibility (migration) is not trivial at large scale
• Schema changes can be gradually introduced with NoSQL
• Few mandatory and many optional attributes
• Have SQL queries that span many table joins
Many YES => maybe a Graph DB is a good choice
4/12/2011 13Creative Commons Attribution-Share Alike 3.0
When NOT use Graph DB
• Don't have a graph related problem ?
• Not too much changing requirements ?
• Easy to organized data into:
− Tables, Documents or Key-Value models ?− Tables, Documents or Key-Value models ?
Few & well defined relationships in the domain ?
Don't have SQL queries that span many table joins ?
Many YES => maybe Graph DB not a good choice
4/12/2011 14Creative Commons Attribution-Share Alike 3.0
Undirected Graph
• dots (vertices) + lines
(edges) = graphs.
• The Undirected Graph
VerticesVertices
• All vertices denote the same
• type of object.
Edges
• All edges denote the same type of relationship.
• All edges denote a symmetric relationship.
4/12/2011 Creative Commons Attribution-Share Alike 3.0 15
Directed, Multiple Relational Graph
Vertices
• Vertices can be
different type of object.
EdgesEdges
• Edges can be different
type of relationship.
• All edges denote an
asymmetric
relationship.
4/12/2011 Creative Commons Attribution-Share Alike 3.0 16
4/12/2011 Creative Commons Attribution-Share Alike 3.0 17
Benefits of Graph Database
• Express your domain as a Graph
− Domain Modeling Friendly
− No O/R mismatch
− Efficient storage of Semi Structured InformationEfficient storage of Semi Structured Information
− Schema Less
• Express Queries as Traversals
− Fast deep traversal instead of slow SQL queries that
span many table joins
4/12/2011 18Creative Commons Attribution-Share Alike 3.0
4/12/2011 Creative Commons Attribution-Share Alike 3.0 19
Semi-structured information
4/12/2011 20Creative Commons Attribution-Share Alike 3.0
NEO4J
4/12/2011 Creative Commons Attribution-Share Alike 3.0 22
Why Neo4j ?
• Widely deployed graph db in the world
• ACID, persistent, embedded/server
• Robust: 24/7 production since 2003
• Mature: lots of production deployments
Scalable: High Availability, Master failover• Scalable: High Availability, Master failover
• Community: ecosystem of tools, bindings, frameworks
• Product: OSGi, Spatial, RDF, languages
• Available under AGPLv3 and as commercial product
• But the first one is free! For ALL use-cases
4/12/2011 Creative Commons Attribution-Share Alike 3.0 23
DEMO
BACKUP SLIDES
Create Node
4/12/2011 Creative Commons Attribution-Share Alike 3.0 26
Create Relationship & Traverse (1/2)
4/12/2011 Creative Commons Attribution-Share Alike 3.0 27
Traverse (2/2)
4/12/2011 Creative Commons Attribution-Share Alike 3.0 28
NeoEclipse
4/12/2011 Creative Commons Attribution-Share Alike 3.0 29
4/12/2011 30Creative Commons Attribution-Share Alike 3.0
4/12/2011 Creative Commons Attribution-Share Alike 3.0 31