graph databases, the web of data storage engines
DESCRIPTION
TRANSCRIPT
Graph databases, the Web of Data storage engines
Pere Urbón BayesSenior Software Engineer
Independent
February of 2010@purbon
in/purbon
purbon.com
The web of data storage engines - DataDevRoom - Fosdem 2011 2
Graph databases, the Web of Data storage engines
● We are going to talk about– Graph databases, facts and definitions.
– Graph database vendors.
– Use cases and applications, graph theory.
The web of data storage engines - DataDevRoom - Fosdem 2011 3
Graph databases, the Web of Data storage engines
“A graph database is a database that uses graph structures with nodes, edges, and properties to
represent and store information.
General graph databases that can store any graph are distinct from specialized graph
databases such as triple stores and network databases.”
Wikipedia
The web of data storage engines - DataDevRoom - Fosdem 2011 4
Graph DatabaseProperty graph
● Abstractions– Nodes
– Relationships
– Properties on both.John smith liked http://www.example.com at 01/10/11
The web of data storage engines - DataDevRoom - Fosdem 2011 5
Graph databasesFacts
Connectivity
Decades1990's 2010's 2020's
Text files
Blogs
Social Networks
Tagging
Ontologies
Folksonomies
Tagging
RDFLinked Data
Everything connected
The web of data storage engines - DataDevRoom - Fosdem 2011 6
Graph databasesFacts
Size of
1990's 2010's 2020's Decadeshttp://www.guardian.co.uk/business/2009/may/18/digital-content-expansion
The web of data storage engines - DataDevRoom - Fosdem 2011 7
Graph databasesFacts
Performance
Unstructured
Performance slowdown
Lists
Graph like structuresSemantic web
Semantic reasoningLinked data
The web of data storage engines - DataDevRoom - Fosdem 2011 8
RDBMS OIM GraphDB
data 27.36 GB 54 GB 9.69 GB
overhead 10.9 21.51 3.86
load 52891 s 17543 s 95579 s
Query RDBMS OIM GraphDB
Q1: count 20.38 17.35 0
Q2: projection 17.34 43.7 33.19
Q3: scan 32.76 174.64 3.14
Q4: values 12.28 20.77 0.01
Q5: select 7.34 5.43 0.84
Q6: hubs >3hours >3hours 624.68
Graph databasesPerformance comparison
The web of data storage engines - DataDevRoom - Fosdem 2011 9
Graph databasesVendors
● Neo4J (neo4j.org)
● Embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables.
● Dual-Licensed AGPL and Commercial.● High Availability, scalability, concurrent,etc.
The web of data storage engines - DataDevRoom - Fosdem 2011 10
Graph databasesVendors
● InfiniteGraph
● A java distributed, scalable, with high performance results commercial graph database, provided with the experience of Objectivity Inc.
● More info: http://www.infinitegraph.com/
The web of data storage engines - DataDevRoom - Fosdem 2011 11
Graph databasesVendors
● OrientDB
● An embedded pure java fast, transactional, scalable document-graph storage engine.
● Schema free, ACID, suport for SQL and JSON.● Apache License 2.0● More info: http://www.orientechnologies.com/
The web of data storage engines - DataDevRoom - Fosdem 2011 12
Graph databasesMore Vendors
● Dex: The high performance graph database.● HyperGraphDB: An IA and semantic web graph
database.● Infogrid: The Internet graph database. ● Sones: SaaS dot Net graph database. ● AllegroGraph: The semantic graph database.● VertexDB: High performance database server.
The web of data storage engines - DataDevRoom - Fosdem 2011 13
Graph Theoryanalytics
● Clustering (Communities)
● Social connexions● Hubs● Graph Mining● Centrality measures
● Task planning● Scheduling● Process assignation● Routing● Logistics● League planning
The web of data storage engines - DataDevRoom - Fosdem 2011 14
Graph TheoryApplications
● Pattern Recognition● Dependency analysis● Impact analysis● Network flow
– Traffic analysis and optimization
– Delivery optimization
● Optimization of tasks
The web of data storage engines - DataDevRoom - Fosdem 2011 15
Graph LikeApplications
● Recommendations– Heuristics (PageRank)
– Local● Shortest Paths● Hammock Functions● Walks● Search algorithms● Shooting stars● K-nearest neighbours
The web of data storage engines - DataDevRoom - Fosdem 2011 16
Graph LikeApplications
● Location based services● Hubs● Spatial databases● Logical (multi-)index construction
The web of data storage engines - DataDevRoom - Fosdem 2011 17
Web Trending Topics
● Semantic web– RDF (OWL) Store
– RDF-Sail
– SPARQL
● Linked data (Open Data)● Link analysis● Structure mining
The web of data storage engines - DataDevRoom - Fosdem 2011 18
Graph databasesPerformance
Kernel Scale 15
DEX Neo4j Jena HyperGraphDB
Load(s) 7,44 697 141 +24h
Scan (s) 0,0010 2,71 0,689
2-Hops(s) 0,0120 0,0260 0,443
BC (s) 14,8 8,24 138
Size (MB) 30 17 207
Kernel Scale 20
DEX Neo4j Jena HyperGraphDB
Load(s) 317 32.094 4.560 +24h
Scan (s) 0,005 751 18,6
2-Hops(s) 0,033 0,0230 0,4580
BC (s) 617 7027 59512
Size (MB) 893 539 6656
HPC Scalable Graph Analysis Benchmark IWGD 2010
The web of data storage engines - DataDevRoom - Fosdem 2011 19
Graph databasesXI FOSDEM Dinner
Interested in Graph Databases and NoSQL, attending this year FOSDEM.
Meeting point:20:00 PM
In front of Le Roy d'EspagneGrand Place 1
Brussels
The web of data storage engines - DataDevRoom - Fosdem 2011 20
Graph databasesMoviepilot is hiring
Interested in movies, data analytics, ruby, git, opensource. Join us!.
Moviepilot is a leading provider and discovery services for movies and TV series, based in
Berlin.
Interested, talk with @jannis or @purbon
The web of data storage engines - DataDevRoom - Fosdem 2011 21
Graph databases, the Web of Data storage engines
Questions?
Pere Urbón BayesSenior Software Engineer
Independent
February of 2010