bcn on rails may2010 on graph databases
DESCRIPTION
Short introduction to graph databases at Bcn On Rails May 2010.TRANSCRIPT
BcnOnRails May - 2010 - On Graph Databases 1
On Graph Databases
Pere Urbón [email protected]
May of 2010
BcnOnRails May - 2010 - On Graph Databases 2
On Graph Databases
● NoSQL movement.● Graph databases.● Pros and cons.● Use cases.● Technology overview.● Example.
BcnOnRails May - 2010 - On Graph Databases 3
NoSQL Movement
● Next Generation of Databases.
● Innovative. (?)
● Open Source. (?)
● Non-Relational.
● Schema-less.
● Distributed.
● Scalable.
BcnOnRails May - 2010 - On Graph Databases 4
NoSQL Movement
● Stores.
– Document.
– Key/Value.
– Object oriented.
– Column.
– Graph database.
● More Stores.
– Grid database.
– XML Database.
– RDF.
– .....
BcnOnRails May - 2010 - On Graph Databases 5
NoSQL Movement
● NoSQL is not the holy grail, never forget it.
● Precursors & roots begun at the early 70's.
– Network databases, Charles Bachman 1969.
案ずるより産むが易し。– Giving birth to a baby is easier than worrying about it.
BcnOnRails May - 2010 - On Graph Databases 6
Graph Databases
● Data strongly related.
– Social networks.
– GIS Systems.
– Transportation.
– Bibliographic.
– File systems.
– ........
GitHub Ruby community by country
BcnOnRails May - 2010 - On Graph Databases 7
Graph Databases
● The Property Graph.
– Labeled.
– Directed.
– Attributed.
– Multigraph.● Talk about.
– Nodes with types.
– Edges with types.
– Attributes.
BcnOnRails May - 2010 - On Graph Databases 8
Graph Databases
● Graph storage.
– Adjacency Matrix.
– Adjacency List.
– Incidence Matrix.
– Incidence List.● GraphDB's.
– Bitmaps.
– B+Trees.
– RB Trees.
BcnOnRails May - 2010 - On Graph Databases 9
Graph Databases
RDBMS OIM DEX
data 27.36 GB 54 GB 9.69 GB
ratio overhead
10,9 21,51 3,86
load time 52891 s 17543 s 95579 s
Query MySQL OIM DEX
Q1:count 20,38 17,35 0
Q2:scan 32,76 174,64 3,14
Q3:select 7,34 5,43 0,84
Q4:projection 17,34 43,7 33,19
Q5:combine 0,74 2,61 0,01
Q6:explode 0,07 202,07 0,01
Q7:values 12,28 20,77 0,01
Q8:hub >3hours >3hours 624,68
BcnOnRails May - 2010 - On Graph Databases 10
Graph Databases
BcnOnRails May - 2010 - On Graph Databases 11
Use cases
● Network analysis.● Link analysis.● Graph mining.● Neural networks.● Bibliographic search.● Semantic web.
BcnOnRails May - 2010 - On Graph Databases 12
Use cases
● Algorithmic recruitment with GitHub.– Centrality: The importance of a vertex within a
graph.● Betweens: Vertex that occur on many shortest
path have higher centrality. – O(v^3) without any optimization.
● Another possible choices:– Closeness: Vertex with a short geodesic distance
to other ones have a high closeness.● Usually preferred on network analysis.
BcnOnRails May - 2010 - On Graph Databases 13
Graph Databases
● Shortest Paths.
– BFS/DFS.
– Dijkstra.
– Floyd-Warshall.
– Ford.● Connectivity.
– Strongly connected.
– Weakly connected.
● Centrality.
– Betweenness.
– Closeness.
– Diameter.
– Radius.● Traversals.
– BFS/DFS.● Communities.
● Staining.
BcnOnRails May - 2010 - On Graph Databases 14
Pros and cons
● Data facts.
– Growths exponentially.
– Hugh interdependency and complexity.
– Relationships are important.
– Structure change over time.
● Relational model facts.
– E.F Codd model.
– Normalization.
– Object-Relational impedance mismatch.
– Join's doesn't scale.
– Big tables.
– Denormalization.
BcnOnRails May - 2010 - On Graph Databases 15
Technology overview
● Neo4J: Open source database NoSQL graph. ● Dex: The high performance graph database.● HyperGraphDB: An IA and semantic web
graph database.● Infogrid: The Internet Graph database.● Sones: SaaS dot Net Graph database.● VertexDB: High performance database server.
BcnOnRails May - 2010 - On Graph Databases 16
Benchmarking
Kernel DEX Neo4j Jena HypergraphDB
K1 Load (s) 7,44 697 141 +24h
K2 Scan edges (s) 0,0010 2,71 0,689
K3 2-hops (s) 0,0120 0,0260 0,443
K4 BC (s) 14,8 8,24 138
Db size (MB) 30 17 207
Scale 15
Kernel DEX Neo4j Jena HypergraphDB
K1 Load (s) 317 32.094 4.560 +24h
K2 Scan edges (s)
0,005 751 18,6
K3 2-hops (s) 0,033 0,0230 0,4580
K4 BC (s) 617 7.027 59.512
Db size (MB) 893 539 6.656
Scale 20
Graph Database Performance on theHPC Scalable Graph Analysis Benchmark
BcnOnRails May - 2010 - On Graph Databases 17
Technology overview
BcnOnRails May - 2010 - On Graph Databases 18
Technology overview
● Neo4J.rb ( JRuby target )
– Active record integration.
– Dynamic and schema free.
– Fast traversal of relationships.
– Transactions with rollbacks support.
– Indexing and querying of ruby objects.
– Massive loaders.
– Ruby on Rails integration.
– Accessible throw REST.
http://wiki.neo4j.org/content/Ruby
BcnOnRails May - 2010 - On Graph Databases 19
Technology overview
require "rubygems" require 'neo4j'
Neo4j::Transaction.run do node = Neo4j::Node.new end
Neo4j::Transaction.run do # neo4j operations goes here end
node = Neo4j::Node.new node[:name] = 'foo' node[:age] = 123 node[:hungry] = false node[4] = 3.14 node[:age] # => 123
Creating nodes
Transactions over blocks
Properties
node1 = Neo4j::Node.newnode2 = Neo4j::Node.newNeo4j::Relationship.new(:friends, node1, node2)
# which is same asnode1.rels.outgoing(:friends) << node2
Creating relationships
BcnOnRails May - 2010 - On Graph Databases 20
Technology overview
Accessing relationships
node1.rels.empty? # => false
# The rels method returns an enumeration of relationship objects. # The nodes method on the relationships returns the nodes instead. node1.rels.nodes.include?(node2) # => true
node1.rels.first # => the first relationship this node1 has. node1.rels.nodes.first # => node2 first node of any relationship type node2.rels.incoming(:friends).nodes.first # => node1 first node of relationship type 'friends' node2.rels.incoming(:friends).first # => a relationship object between node1 and node2
rel = node1.rels.outgoing(:friends).first
rel[:since] = 1982node1.rels.first[:since] # => 1982
Properties on Relationships
BcnOnRails May - 2010 - On Graph Databases 21
Example
For the joy of someone, lets play a little with a graph database.
BcnOnRails May - 2010 - On Graph Databases 22
On Graph Databases
Thanks you!
Pere Urbón Bayes