survey of graph database models

27
Survey of Graph Database Models Byoung Ju Yang 2011. 04. 01. IDS Lab., Seoul National University

Upload: chin

Post on 21-Jan-2016

62 views

Category:

Documents


0 download

DESCRIPTION

Survey of Graph Database Models. Byoung Ju Yang 2011. 04. 01. IDS Lab., Seoul National University. Table of contents. Survey of Graph Database Models Renzo Angles, Alaudio Gutierrez ACM Computing Surveys, Vol. 40, No. 1, Article 1 (2008) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Survey of Graph Database Models

Survey of Graph Database Models

Byoung Ju Yang

2011. 04. 01.

IDS Lab., Seoul National University

Page 2: Survey of Graph Database Models

Copyright 2008 by CEBT

Table of contents

Survey of Graph Database Models

Renzo Angles, Alaudio Gutierrez

ACM Computing Surveys, Vol. 40, No. 1, Article 1 (2008)

Data structures, Query languages, and Integrity constraints

1. Introduction

2. Graph Data Modeling

3. Graph Database Models (~2002)

The latest Graph Database Models

Neo4j, FlockDB

Blueprint

Sharding

2

Page 3: Survey of Graph Database Models

Copyright 2008 by CEBT

1. Introduction

3

Page 4: Survey of Graph Database Models

Copyright 2008 by CEBT

2-1. What is a Graph Data Model?

Data Structure(Schema)

Represented by graph, or by data structure generaliz-ing the notion of graph(hypergraph)

- (un)labeled, (un)directed

Separation between schema and data in most cases.

Data Manipulation (Query languages)

Expressed by graph transformations, or by operations whose main primitives are on graph features like paths, neighborhoods, subgraphs, graph patterns, connectivity, and graph statistics.

Integrity constraints

Enforce graph data consistency

4

Page 5: Survey of Graph Database Models

Copyright 2008 by CEBT

2-2. Why a Graph Data Model?

It allows for a more natural modeling of data

Being able to keep all the information about an entity in a single node and showing related information by arcs con-nected to it.

Queries can refer directly to this graph structure

Such as finding shortest paths, determining certain sub-graphs, and so forth.

For implementation, graph databases may provide spe-cial graph storage structures and efficient graph algo-rithms for realizing specific operations.

5

Page 6: Survey of Graph Database Models

Copyright 2008 by CEBT

2-3. Comparison with other DB Mod-els

Physical DB Models

Hierarchical(1976), network(1976) models

Lack a good abstraction level

Relational DB Models

Introduced a separation btw physical and logical levels

Landmark development (mathematical foundation)

Geared toward simple record-type data (schema is known)

Not easy to integrate different schemas

Query language cannot explore the underlying graph of re-lationships among the data (path, neighborhoods, patterns)

6

Page 7: Survey of Graph Database Models

Copyright 2008 by CEBT

2-3. Comparison with other DB Mod-els

Semantic DB Models

DB designer can represent objects and their relations in a natural and clear manner by using high-level abstraction concepts (E-R)

Relevant to graph DB (graph-like structures)

Object-oriented DB Models

For data-intensive domains (knowledge bases, eng. applica-tions)

Permit much richer structures but still require predefined schema

Related to graph DB (use graph structures in definitions)

Semi-structured DB Models

Irregular, implicit, and partial structures

7

Page 8: Survey of Graph Database Models

Copyright 2008 by CEBT

2-4. Motivations and Applications

Motivations

Real-life App. where component interconnectivity is a key feature

Applications

Classical applications

Complex networks

- Social networks (people, groups)

- Information networks (citation, word thesaurus)

- Technological networks (spatial and geographical)

- Biological networks (genomics)

8

Page 9: Survey of Graph Database Models

Copyright 2008 by CEBT

3-1. Brief historical overview

9

Page 10: Survey of Graph Database Models

Copyright 2008 by CEBT

3-2. Data Structures

Hypernode

Simple flat graph is not good at presenting information to user

Hypernode provides inherent support (nested graphs)

Hypergraph

Generalization of a graph

2-uniform hypergraph is a graph

10

Person2 Sang 1name

Person3 Yong chinname

Person1 Young keyname

Person2 Sang 1

Person3 Yong chin

Person1 Young key

name

Page 11: Survey of Graph Database Models

Copyright 2008 by CEBT

3-3. Integrity Constraints

Schema-instance consistency

The instance should contain only concrete entities and rela-tions from entity types and relations that were defined in the schema

Schema-instance separation

In most models there is a separation

An exception is the hypernode (dynamic DB)

Concentrated in the creation of consistent instances and the correct identification and reference of entities.

11

Page 12: Survey of Graph Database Models

Copyright 2008 by CEBT

3-4. Query and Manipulation Lan-guages

There is substantial work focused on query languages, the problem of querying graphs, the visual presentation of results, and graphical query languages

Some graph-oriented object models regard database transformations as graph transformations based on graph-pattern matching

GOOD, GOAL, etc.

12

Page 13: Survey of Graph Database Models

Copyright 2008 by CEBT

3. Summary

13

Page 14: Survey of Graph Database Models

Copyright 2008 by CEBT

NoSQL DataBases

14

Schema-less

Shared nothing architecture

Each server uses only its own local storage (faster)

Elasticity

Able to add servers without downtime

Sharding

Asynchronous replication

BASE instead of ACID

Page 15: Survey of Graph Database Models

Copyright 2008 by CEBT

NoSQL Database Models

15

Page 16: Survey of Graph Database Models

Copyright 2008 by CEBT

Graph Database Models

16

Scalability

ACID vs. BASE

Complexity

Relational - no redundancy or information loss (normaliza-tion)

powerful SQL, optimization by RDBMS

- performance problem in deep queries (many joins)

no schema evolution, etc

Graph – property graph model

Page 17: Survey of Graph Database Models

Copyright 2008 by CEBT

The latest Graph Database Models

17

AllegroGraph RDFStore

HyperGraphDB

InfoGrid

Neo4j

FlockDB

Sones

Virtuoso

Page 18: Survey of Graph Database Models

Copyright 2008 by CEBT

The latest Graph Database Models

18

License

Distribution

The only one truly distributed solution is HyperGraphDB

Indexing

Neo4j, indexing is not default behavior (index by Lucene, Solr)

Storage system

General vs. Special

HyperGraphDB uses Berkeley DB

APIs

Most of them provide java and web APIs

Page 19: Survey of Graph Database Models

Copyright 2008 by CEBT

Neo4j

19

Full ACID-transaction compliant graph DB written in java

High performance

Handles several billion nodes, relationships and properties

1~2 million traversal / second

- constant time (independent of total size)

Example code

Node creation

Find friend

Page 20: Survey of Graph Database Models

Copyright 2008 by CEBT

Neo4j

20

Example code

Traversal

Indexing

Page 21: Survey of Graph Database Models

Copyright 2008 by CEBT

Neo4j

21

Page 22: Survey of Graph Database Models

Copyright 2008 by CEBT

FlockDB

22

Goals

High rate of add/update/remove operations

Complex set arithmetic queries

Paging through query result sets containing millions of en-tries

Ability to ‘archive’ and later restore archived edges

Horizontal scaling including replication

Non-goals

Multi-hop queries (or graph-walking queries)

Automatic shard migrations

Characteristics

Optimized for very large adjacency lists (no traversal)

Page 23: Survey of Graph Database Models

Copyright 2008 by CEBT

FlockDB - Twitter

23

Previous models (could not have both)

Relational tables – handling write operations

Key-value storage – paging through giant result sets

Implementation goals

Write the simplest possible thing that could work

Use off-the-shelf MySQL as the storage engine

Allow horizontal partitioning

Allow write operations to arrive out of order or be pro-cessed more than one. (allow redundant work rather than lost work)

Twitter (April 2010)

More than 13 B edges, 20k writes/second, 100k reads/sec-ond

Page 24: Survey of Graph Database Models

Copyright 2008 by CEBT

FlockDB - Twitter

24

Stores graphs as sets of edges

Primary key

(a compound key of the source ID, state, and position)

When an adge is deleted, the row is just marked ‘removed’

without deleting from MySQL

Keep only a compound primary key and a secondary index for each row, and answer all queries from a single index.

Page 25: Survey of Graph Database Models

Copyright 2008 by CEBT

Sharding in Graph DB

25

Especially hard in graph DB due to traversal

Unless we store the entire graph on a single machine,

we are forced to query across machine boundaries (expen-sive)

Neo4j provides master/slave structure (still has limit)

FlockDB(twitter) does not consider (interested in 1-level re-lations)

Page 26: Survey of Graph Database Models

Copyright 2008 by CEBT

How to shard?

26

A proposal: gravity

Localizing data leads to greater performance (like cache)

Shard graph data based on gravity

Page 27: Survey of Graph Database Models

Copyright 2008 by CEBT

Blueprints

27

A collection of interfaces, etc for the property graph DB model

Analogous to the JDBC, but for graph DB

Provides a common set of interfaces to allow developers to plug-and-play their graph DB backend. (Pipes, Gremlin, Rexster)