kickoff research project tu ilmenau
TRANSCRIPT
1
Introduction to graph databases
Kickoff research project
TU-Ilmenau 11/2011
Henning Rauch
2
Agenda
● Introduction● Graph databases● Pros● Cons● Use cases● Sones GraphDB
3
Introduction – /me
● Studied computer science at TU-Ilmenau● 02/2009 – 11/2010 sones core developer of the
sones GraphDB● GraphQL● Type-Management
● 11/2010 – 11/2011 sones Head of R&D● Design of v2 ● Refactoring of v1 → v2 (de-facto rewrite)
● 11/2011 – now NoSQL freelancer & visiting lecturer
4
Introduction – Current situation
● Data-intensive, complex and distributed applications● Semantic web● Recommendation systems● Social networks
● Similarities● Strong connected data in large amounts● Complex structures● Continuous growth in data volume● Mix of structured and non-structured (schema-less) data
5
Introduction – Example
http://www.facebook.com/press/info.php?statistics
6
Introduction – Challenges
● Recursive connected information as a new design goal
● Simple management of structured, semi-structured and unstructured data
● Replication● Versioning● Efficient partitioning of data● Graph oriented operations
7
Graph databases – Data model
● Graph G(V,E)● V – Vertices● E – Edges
Vertex0
Vertex1
8
Graph databases – Data model
Jena Berlin
Stuttgart
383 km 633 km
260 km
9
Graph databases – Property graph
● Extension of the graph data model● Additional properties on vertices and edges● The properties are key/value pairs (Age:23)● Keys are specified by the schema of the vertex type
Name: AliceID: 0
Age: 23
Name: BobID: 1
Age: 42
CommunicatesWithEncrypted : trueMethod : RSA
10
Graph databases – Property graph
Name: AliceID: 0
Age: 23
Name: BobID: 1
Age: 42
CommunicatesWithEncrypted: trueMethod: RSA
Name: CarolID: 3
Age: 18
Name: TU Ilmenau
Name: Uni StuttgartStudiesIn
Since: 2007
StudiesIn
Since: 2004
Relat
iveO
f
Degre
e: S
ister
Comm
unicatesWith
Encrypted: false
Stu
die
sIn
Sin
ce:
201
0
11
Graph databases – Definition
A graph database is a database that uses graph structures with nodes, edges, and properties to represent and store information. General graph
databases that can store any graph are distinct from specialized graph databases such as triplestores
and network databases.
http://en.wikipedia.org/wiki/Graph_database
12
Pros – Data model
● Explicit data model● Direct mapping of real world network
structures
13
Pros – Efficient graph traversal
● The most important operation of graph databases
● Recursive search for vertices/edges with certain properties
● Finding paths in graphs● GraphDB is able to do ~80M vertex-
traversals per second
14
Pros – Index-free adjacency
● Relations (edges) are directly modeled on the vertex → no need for an additional mapping
● No need for a global index for relations● Data locality → adjacent vertices can be
persisted "close together" (efficient storage)● → The vertex-traversal performance is
independent from the size of the graph
15
Cons
● In general the import is slower than in RDBMS
● Relatively new technology● Lack of standards
16
Use cases
● Rating of websites in search engines – Page rank
● Who knows-who in social networks – Shortest path
● Recommendation systems – Bipartite matching
● ...
17
Sones GraphDB – Overview
● http://www.sones.com● Object-oriented graph database● Property-Hypergraph data model● Written in C# (97%)● C# embedded/remote API● GraphQL● Non-persistent OSE and proprietary persistent
GraphFS
18
Sones GraphDB – Architecture
19
Sones GraphDB – Architecture
20
Sones GraphDB – GraphQL
// define Vertex Type
CREATE VERTEX User
ADD ATTRIBUTES (String Name, SET<User> Friends)
INDICES (Name)
// add vertices Alice and Bob
INSERT INTO User VALUES (Name = "Alice", Age = 23)
INSERT INTO User VALUES (Name = "Bob", Age = 42)
// add edges between Alice and Bob
LINK User(Name = ‘Alice') VIA Friends TO User(Name = ‘Bob')
LINK User(Name = ‘Bob') VIA Friends TO User(Name = ‘Alice‘)
21
Sones GraphDB – HowTo run it
● Windows: Install Visual Studio (professional and higher) or MonoDevelop
● Linux: Install mono-complete and MonoDevelop● Download the source from
https://github.com/cosh/sones● Open the „CoreDeveloper.sln“● Have phun
22
Sones GraphDB – Documentation
● Blog: http://developers.sones.de/
● Wiki: http://developers.sones.de/wiki/doku.php
● Forum: http://forum.sones.de/
● BugTracking: http://jira.sones.de/
● The fastest way to information: /me :)
23
Graph visualization
● http://gephi.org/screenshots/
● http://mbostock.github.com/d3/
● http://www.fluidops.net/information-workbench/