bcn on rails may2010 on graph databases

22
BcnOnRails May - 2010 - On Graph Databases 1 On Graph Databases Pere Urbón Bayes [email protected] May of 2010

Upload: pere-urbon-bayes

Post on 27-Jan-2015

108 views

Category:

Technology


1 download

DESCRIPTION

Short introduction to graph databases at Bcn On Rails May 2010.

TRANSCRIPT

Page 1: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 1

On Graph Databases

Pere Urbón [email protected]

May of 2010

Page 2: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 2

On Graph Databases

● NoSQL movement.● Graph databases.● Pros and cons.● Use cases.● Technology overview.● Example.

Page 3: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 3

NoSQL Movement

● Next Generation of Databases.

● Innovative. (?)

● Open Source. (?)

● Non-Relational.

● Schema-less.

● Distributed.

● Scalable.

Page 4: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 4

NoSQL Movement

● Stores.

– Document.

– Key/Value.

– Object oriented.

– Column.

– Graph database.

● More Stores.

– Grid database.

– XML Database.

– RDF.

– .....

Page 5: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 5

NoSQL Movement

● NoSQL is not the holy grail, never forget it.

● Precursors & roots begun at the early 70's.

– Network databases, Charles Bachman 1969.

案ずるより産むが易し。– Giving birth to a baby is easier than worrying about it.

Page 6: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 6

Graph Databases

● Data strongly related.

– Social networks.

– GIS Systems.

– Transportation.

– Bibliographic.

– File systems.

– ........

GitHub Ruby community by country

Page 7: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 7

Graph Databases

● The Property Graph.

– Labeled.

– Directed.

– Attributed.

– Multigraph.● Talk about.

– Nodes with types.

– Edges with types.

– Attributes.

Page 8: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 8

Graph Databases

● Graph storage.

– Adjacency Matrix.

– Adjacency List.

– Incidence Matrix.

– Incidence List.● GraphDB's.

– Bitmaps.

– B+Trees.

– RB Trees.

Page 9: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 9

Graph Databases

RDBMS OIM DEX

data 27.36 GB 54 GB 9.69 GB

ratio overhead

10,9 21,51 3,86

load time 52891 s 17543 s 95579 s

Query MySQL OIM DEX

Q1:count 20,38 17,35 0

Q2:scan 32,76 174,64 3,14

Q3:select 7,34 5,43 0,84

Q4:projection 17,34 43,7 33,19

Q5:combine 0,74 2,61 0,01

Q6:explode 0,07 202,07 0,01

Q7:values 12,28 20,77 0,01

Q8:hub >3hours >3hours 624,68

Page 10: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 10

Graph Databases

Page 11: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 11

Use cases

● Network analysis.● Link analysis.● Graph mining.● Neural networks.● Bibliographic search.● Semantic web.

Page 12: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 12

Use cases

● Algorithmic recruitment with GitHub.– Centrality: The importance of a vertex within a

graph.● Betweens: Vertex that occur on many shortest

path have higher centrality. – O(v^3) without any optimization.

● Another possible choices:– Closeness: Vertex with a short geodesic distance

to other ones have a high closeness.● Usually preferred on network analysis.

Page 13: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 13

Graph Databases

● Shortest Paths.

– BFS/DFS.

– Dijkstra.

– Floyd-Warshall.

– Ford.● Connectivity.

– Strongly connected.

– Weakly connected.

● Centrality.

– Betweenness.

– Closeness.

– Diameter.

– Radius.● Traversals.

– BFS/DFS.● Communities.

● Staining.

Page 14: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 14

Pros and cons

● Data facts.

– Growths exponentially.

– Hugh interdependency and complexity.

– Relationships are important.

– Structure change over time.

● Relational model facts.

– E.F Codd model.

– Normalization.

– Object-Relational impedance mismatch.

– Join's doesn't scale.

– Big tables.

– Denormalization.

Page 15: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 15

Technology overview

● Neo4J: Open source database NoSQL graph. ● Dex: The high performance graph database.● HyperGraphDB: An IA and semantic web

graph database.● Infogrid: The Internet Graph database.● Sones: SaaS dot Net Graph database.● VertexDB: High performance database server.

Page 16: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 16

Benchmarking

Kernel DEX Neo4j Jena HypergraphDB

K1 Load (s) 7,44 697 141 +24h

K2 Scan edges (s) 0,0010 2,71 0,689

K3 2-hops (s) 0,0120 0,0260 0,443

K4 BC (s) 14,8 8,24 138

Db size (MB) 30 17 207

Scale 15

Kernel DEX Neo4j Jena HypergraphDB

K1 Load (s) 317 32.094 4.560 +24h

K2 Scan edges (s)

0,005 751 18,6

K3 2-hops (s) 0,033 0,0230 0,4580

K4 BC (s) 617 7.027 59.512

Db size (MB) 893 539 6.656

Scale 20

Graph Database Performance on theHPC Scalable Graph Analysis Benchmark

Page 17: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 17

Technology overview

Page 18: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 18

Technology overview

● Neo4J.rb ( JRuby target )

– Active record integration.

– Dynamic and schema free.

– Fast traversal of relationships.

– Transactions with rollbacks support.

– Indexing and querying of ruby objects.

– Massive loaders.

– Ruby on Rails integration.

– Accessible throw REST.

http://wiki.neo4j.org/content/Ruby

Page 19: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 19

Technology overview

require "rubygems" require 'neo4j'

Neo4j::Transaction.run do node = Neo4j::Node.new end

Neo4j::Transaction.run do # neo4j operations goes here end

node = Neo4j::Node.new node[:name] = 'foo' node[:age] = 123 node[:hungry] = false node[4] = 3.14 node[:age] # => 123

Creating nodes

Transactions over blocks

Properties

node1 = Neo4j::Node.newnode2 = Neo4j::Node.newNeo4j::Relationship.new(:friends, node1, node2)

# which is same asnode1.rels.outgoing(:friends) << node2

Creating relationships

Page 20: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 20

Technology overview

Accessing relationships

node1.rels.empty? # => false

# The rels method returns an enumeration of relationship objects. # The nodes method on the relationships returns the nodes instead. node1.rels.nodes.include?(node2) # => true

node1.rels.first # => the first relationship this node1 has. node1.rels.nodes.first # => node2 first node of any relationship type node2.rels.incoming(:friends).nodes.first # => node1 first node of relationship type 'friends' node2.rels.incoming(:friends).first # => a relationship object between node1 and node2

rel = node1.rels.outgoing(:friends).first

rel[:since] = 1982node1.rels.first[:since] # => 1982

Properties on Relationships

Page 21: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 21

Example

For the joy of someone, lets play a little with a graph database.

Page 22: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 22

On Graph Databases

Thanks you!

Pere Urbón Bayes

[email protected]