graphite: an extensible graph traversal framework for relational database management systems

30
GRAPHITE: An Extensible Graph Traversal Framework for Relational Database Management Systems Marcus Paradies *, Wolfgang Lehner*, and Christof Bornhoevd° | SSDBM‘ 15 *TU Dresden °Risk Management Solutions, Inc.

Upload: marcus-paradies

Post on 21-Aug-2015

36 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

GRAPHITE: An Extensible Graph Traversal Framework for Relational Database Management Systems

Marcus Paradies*, Wolfgang Lehner*, and Christof Bornhoevd° | SSDBM‘ 15*TU Dresden °Risk Management Solutions, Inc.

Page 2: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

2

Graph Processing on Enterprise Data

Relational + Application Logic

Relational + Graph + Application Logic

Data already in RDBMSSQL as the only interface/no graph abstractionData transfer to application

Efficient processing in GDBMSProcessing on replicated dataData transfer to applicationNo combination with other data models possible

Page 3: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

3

Integration of Graph Processing into an RDBMS

How could a deep integration of graph functionality into an RDBMS look like?

Graph operators can be seamlessly combined with other plan operators

Page 4: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

4

Columnar Graph Storage

All available data types can be used as vertex/edge attributes

Page 5: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

5

Columnar Graph Storage

All available data types can be used as vertex/edge attributes

Lightweight compression techniques (RLE)

Page 6: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

6

Graph Traversal Workflow

Traversal Configuration

Traversal Kernels

• S – set of start vertices

• φ – edge predicate• c – collection

boundary• r – recursion boundary• d – traversal direction

Pluggable physical traversal kernels as implementations of a logical traversal operator

Page 7: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

7

Graph Traversal Formalism

 FORMAL DESCRIPTION (SET-BASED)

• A traversal operation is a totally ordered set P of path steps

• Each path step pi receives a vertex set Di-1 discovered at level (i-1) and returns a set of adjacent vertices Di (1 i r, r is recursion boundary)

• Initally, D0 = Di = • The final output R is defined asR =

Target vertices

Visitedvertices

Page 8: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

8

Graph Traversals by Example

Traversal Configuration Result

{ { A }, “type=‘a‘“, 1, 1, } { B, C, D }

Root vertex

Discovered vertex

Page 9: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

9

Graph Traversals by Example

Traversal Configuration Result

{ { E }, “type=‘b‘“, 2, 2, } { D }

Root vertex

Discovered vertex

Page 10: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

10

Graph Traversals by Example

Traversal Configuration Result

{ { A }, “type=‘a‘“, 1, ∞, } { B, C, D, F}

Root vertex

Discovered vertex

Page 11: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

11

Graph Traversals by Example

Traversal Configuration Result

{ {A}, “type=‘a‘ OR type=‘b‘“, 2, 2, }

{ E, F }

Root vertex

Discovered vertex

Page 12: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

12

Graph Traversals by Example

Traversal Configuration Result

{ { A }, “type=‘a‘“, 0, 1, } { A, B, C, D }

Root vertex

Discovered vertex

Page 13: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

13

Level-Synchronous (LS) Traversal

SCAN-BASED GRAPH TRAVERSAL

Scan partitions (dictionary-encoded)

Page 14: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

14

Level-Synchronous (LS) Traversal

SCAN-BASED GRAPH TRAVERSAL

Keep (intermediate) results as bitsets

Page 15: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

15

Level-Synchronous (LS) Traversal

SCAN-BASED GRAPH TRAVERSAL

Fetch neighbors by

position

Page 16: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

16

Level-Synchronous (LS) Traversal

SCAN-BASED GRAPH TRAVERSAL

Set operations on vectorized bitsets

Page 17: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

17

Level-Synchronous (LS) Traversal

SCAN-BASED GRAPH TRAVERSAL EDGE CLUSTERING

Clustering by edge type

Clustering by source vertex

Page 18: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

18

Fragmented-Incremental (FI) Traversal

• Partition column into fragments• Track dependencies between fragments in index

structure• Goal: Minimize number of fragment reads

Fragment

For a fragment size equal to |E|, FI-traversal degenerates to LS-traversal

Page 19: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

19

Fragmented-Incremental (FI) Traversal

• Partition column into fragments• Track dependencies between fragments in index

structure• Goal: Minimize number of fragment reads

Transition 8 13 12

For a fragment size equal to |E|, FI-traversal degenerates to LS-traversal

Page 20: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

20

Fragmented-Incremental (FI) Traversal

• Partition column into fragments• Track dependencies between fragments in index

structure• Goal: Minimize number of fragment reads

Transition Graph

Probabilistic Fragment Synopsis

For a fragment size equal to |E|, FI-traversal degenerates to LS-traversal

Page 21: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

21

Fragmented-Incremental (FI) Traversal

• Partition column into fragments• Track dependencies between fragments in index

structure• Goal: Minimize number of fragment reads

Transition Graph

Fragment Queue

Execution Chain

For a fragment size equal to |E|, FI-traversal degenerates to LS-traversal

Page 22: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

22

Experimental Evaluation

 EVALUATED REAL-WORLD DATA

SETS• Six real-world data sets with different topology characteristics

 EVALUATED SYSTEMS

• Implementation in main-memory column store prototype (C++)• Graph database (Neo4j)• RDF DBMS (Virtuoso 7.0 with columnar storage layout)• Commerical columnar RDBMS (via chained self-joins, with and without index support)

Page 23: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

23

Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-

TRAVERSAL

Traversal performance depends on the traversal depth and the topology

FI-Traversal outperforms LS-

Traversal by one order of magnitude

Page 24: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

24

Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-

TRAVERSAL

Traversal performance depends on the traversal depth and the topology

LS-Traversal starts

outperforming FI-Traversal

Page 25: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

25

Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-

TRAVERSAL

Traversal performance depends on the traversal depth and the topology

Different performance characteristics for

different fragment sizes

Page 26: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

26

Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-

TRAVERSAL

Traversal performance depends on the traversal depth and the topology

Page 27: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

27

Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-

TRAVERSAL

Traversal performance depends on the traversal depth and the topology

Scan-based traversal outperforms fragmented traversal depending on traversal depth and graph topology

Page 28: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

28

Experimental Evaluation

 SYSTEM-LEVEL BENCHMARK

Combination of LS and FI-traversal outperforms native graph systems

by up to two orders of magnitude

Page 29: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

29

Summary

 GRAPHITE

• Graph processing tightly integrated into RDBMS• Extensions of core components by graph

extensions (operators, cost model, index structures)

• Topology characteristics-aware traversal operators

 GRAPH-SPECIFIC DATA STATISTICS AND

ALGORITHMS• Diverse graph topologies demand different algorithmic design decisions

• Index scan versus full column scan decision also applies for graph traversals

 FUTURE WORK• Integration with temporal, spatial, and text data• Language extensions for custom code executed during graph

traversal

Integration of GRAPHITE into RDBMS

Page 30: GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Management Systems

Contact

Marcus Paradies, TU [email protected]://wwwdb.inf.tu-dresden.de/team/external-members/marcus-paradies/