graph databases, the web of data storage engines

21
Graph databases, the Web of Data storage engines Pere Urbón Bayes Senior Software Engineer Independent [email protected] February of 2010 @purbon in/purbon purbon.com

Upload: pere-urbon-bayes

Post on 27-Jan-2015

116 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Graph Databases, The Web of Data Storage Engines

Graph databases, the Web of Data storage engines

Pere Urbón BayesSenior Software Engineer

Independent

[email protected]

February of 2010@purbon

in/purbon

purbon.com

Page 2: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 2

Graph databases, the Web of Data storage engines

● We are going to talk about– Graph databases, facts and definitions.

– Graph database vendors.

– Use cases and applications, graph theory.

Page 3: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 3

Graph databases, the Web of Data storage engines

“A graph database is a database that uses graph structures with nodes, edges, and properties to

represent and store information.

General graph databases that can store any graph are distinct from specialized graph

databases such as triple stores and network databases.”

Wikipedia

Page 4: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 4

Graph DatabaseProperty graph

● Abstractions– Nodes

– Relationships

– Properties on both.John smith liked http://www.example.com at 01/10/11

Page 5: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 5

Graph databasesFacts

Connectivity

Decades1990's 2010's 2020's

Text files

Blogs

Social Networks

Tagging

Ontologies

Folksonomies

Tagging

RDFLinked Data

Everything connected

Page 6: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 6

Graph databasesFacts

Size of

1990's 2010's 2020's Decadeshttp://www.guardian.co.uk/business/2009/may/18/digital-content-expansion

Page 7: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 7

Graph databasesFacts

Performance

Unstructured

Performance slowdown

Lists

Graph like structuresSemantic web

Semantic reasoningLinked data

Page 8: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 8

RDBMS OIM GraphDB

data 27.36 GB 54 GB 9.69 GB

overhead 10.9 21.51 3.86

load 52891 s 17543 s 95579 s

Query RDBMS OIM GraphDB

Q1: count 20.38 17.35 0

Q2: projection 17.34 43.7 33.19

Q3: scan 32.76 174.64 3.14

Q4: values 12.28 20.77 0.01

Q5: select 7.34 5.43 0.84

Q6: hubs >3hours >3hours 624.68

Graph databasesPerformance comparison

Page 9: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 9

Graph databasesVendors

● Neo4J (neo4j.org)

● Embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables.

● Dual-Licensed AGPL and Commercial.● High Availability, scalability, concurrent,etc.

Page 10: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 10

Graph databasesVendors

● InfiniteGraph

● A java distributed, scalable, with high performance results commercial graph database, provided with the experience of Objectivity Inc.

● More info: http://www.infinitegraph.com/

Page 11: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 11

Graph databasesVendors

● OrientDB

● An embedded pure java fast, transactional, scalable document-graph storage engine.

● Schema free, ACID, suport for SQL and JSON.● Apache License 2.0● More info: http://www.orientechnologies.com/

Page 12: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 12

Graph databasesMore Vendors

● Dex: The high performance graph database.● HyperGraphDB: An IA and semantic web graph

database.● Infogrid: The Internet graph database. ● Sones: SaaS dot Net graph database. ● AllegroGraph: The semantic graph database.● VertexDB: High performance database server.

Page 13: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 13

Graph Theoryanalytics

● Clustering (Communities)

● Social connexions● Hubs● Graph Mining● Centrality measures

● Task planning● Scheduling● Process assignation● Routing● Logistics● League planning

Page 14: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 14

Graph TheoryApplications

● Pattern Recognition● Dependency analysis● Impact analysis● Network flow

– Traffic analysis and optimization

– Delivery optimization

● Optimization of tasks

Page 15: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 15

Graph LikeApplications

● Recommendations– Heuristics (PageRank)

– Local● Shortest Paths● Hammock Functions● Walks● Search algorithms● Shooting stars● K-nearest neighbours

Page 16: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 16

Graph LikeApplications

● Location based services● Hubs● Spatial databases● Logical (multi-)index construction

Page 17: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 17

Web Trending Topics

● Semantic web– RDF (OWL) Store

– RDF-Sail

– SPARQL

● Linked data (Open Data)● Link analysis● Structure mining

Page 18: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 18

Graph databasesPerformance

Kernel Scale 15

DEX Neo4j Jena HyperGraphDB

Load(s) 7,44 697 141 +24h

Scan (s) 0,0010 2,71 0,689

2-Hops(s) 0,0120 0,0260 0,443

BC (s) 14,8 8,24 138

Size (MB) 30 17 207

Kernel Scale 20

DEX Neo4j Jena HyperGraphDB

Load(s) 317 32.094 4.560 +24h

Scan (s) 0,005 751 18,6

2-Hops(s) 0,033 0,0230 0,4580

BC (s) 617 7027 59512

Size (MB) 893 539 6656

HPC Scalable Graph Analysis Benchmark IWGD 2010

Page 19: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 19

Graph databasesXI FOSDEM Dinner

Interested in Graph Databases and NoSQL, attending this year FOSDEM.

Meeting point:20:00 PM

In front of Le Roy d'EspagneGrand Place 1

Brussels

Page 20: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 20

Graph databasesMoviepilot is hiring

Interested in movies, data analytics, ruby, git, opensource. Join us!.

Moviepilot is a leading provider and discovery services for movies and TV series, based in

Berlin.

Interested, talk with @jannis or @purbon

Page 21: Graph Databases, The Web of Data Storage Engines

The web of data storage engines - DataDevRoom - Fosdem 2011 21

Graph databases, the Web of Data storage engines

Questions?

Pere Urbón BayesSenior Software Engineer

Independent

[email protected]

February of 2010