extending in-memory relational database engines …...¨ extract graphs from the rdbms ¨ store...
TRANSCRIPT
![Page 1: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/1.jpg)
Extending In-Memory Relational Database Engines with Native Graph Support
Mohamed S. Hassan1 Tatiana Kuznetsova1 Hyun Chai Jeong1 Walid G. Aref1 Mohammad Sadoghi2
1Purdue University – West Lafayette, IN, USA
EDBT’18
2University of California – Davis, CA, USA
![Page 2: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/2.jpg)
Graphs are Ubiquitous 2
Road Network Biological Network
Datacenter Network Social Network
![Page 3: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/3.jpg)
Specialized Graph Database Systems 3
¨ Specialized graph databases can handle graph query-workloads ¤ Vital queries include shortest-path and reachability queries
![Page 4: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/4.jpg)
Why Use Relational Database Systems to Support Graphs?
4
¨ RDBMS technology is very mature and widely-adopted ¨ Relational data can have latent graphs ¨ Can easily represent graphs using relational tables ¨ Many applications involve graph queries
¤ Queries that involve both relational and graph predicates n E.g., for each Patient P in a selected area, find the nearest hospital to P
¨ How can an RDBMS effectively and efficiently handle graph query workloads?
![Page 5: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/5.jpg)
Graph Support in RDBMSs 5
¨ Why is it challenging? ¤ There is an impedance mismatch between the relational model and the
graph model
¨ Two common approaches for supporting graphs: ¤ Native Relational-Core ¤ Native Graph-Core
¨ Native G+R Core (The proposed GRFusion system)
![Page 6: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/6.jpg)
Native Relational-Core 6
¨ Use a vanilla RDBMS ¨ Encode graphs in relational schemas ¨ Support limited graph queries ¨ Translate the supported graph queries
into SQL or procedural SQL ¨ E.g., SQLGraph [SIGMOD’15],
Grail [CIDR’15] ¨ Pros:
¤ Use of very mature RDBMS technology ¨ Cons:
¤ Several graph queries are inefficient to evaluate using pure SQL
¤ Graphs are encoded in complex schemas
Relational Database
Relational Data
Relational Queries (SQL)
Results
Graph Encoded into Relational Tables
Graph Queries
SQL Translation Layer
![Page 7: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/7.jpg)
¨ Build on top of an RDBMS ¨ Extract graphs from the RDBMS ¨ Store graphs and process queries
outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],
GraphGen [VLDB’15, SIGMOD’17] ¨ Pros:
¤ Native processing of graph operations ¨ Cons:
¤ Graph updates require re-extracting the graphs
¤ Queries cannot reference any non-extracted relational data
Native Graph-Core 7
Relational Database
Relational Data
Graph Extraction Queries (SQL)
Graph Extraction and Materialization Engine
Extracted Graphs
Results Graph Queries
![Page 8: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/8.jpg)
The Relational Model vs. the Graph Model 8
¨ Graph-core approach ¤ +ve: Queries involving graph traversals are efficiently handled in
the graph model (e.g., shortest paths) ¤ -ve: Not as pervasive and mature as RDBMSs
¨ Relational-core approach ¤ +ve: Mature and pervasive ¤ -ve: Either many temporary inserts/deletes/updates, or too many
joins to traverse a graph n Intermediate-result size and cardinality estimation
¨ Can the best of the two worlds be combined? ¤ Support native graph processing inside an RDBMS
![Page 9: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/9.jpg)
Proposed Approach: Native G+R Core 9
¨ Assume that graphs have relational schemas ¨ Relational schemas describe the edges/nodes ¨ Enables graphs to be defined
as native database objects ¨ Store graphs in non-relational structures
optimized for graph operations ¨ Extend the SQL language
¤ Queries can compose relational and graph operations
¨ Cross-Data-Model QEPs (Query Evaluation Plans) ¨ Graph updates are supported
Relational Database
Graph Views (Topology + Tuple Pointers)
Relational Data
Graph Construction
⋈
σ GraphOp
π Graph and Relational Operators
in the Same QEP
Graph-Relational Queries (SQL)
Results
![Page 10: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/10.jpg)
¨ We realize the G+R approach inside VoltDB ¤ An open-source in-memory RDBMS ¤ GRFusion: Our realization of the G+R
approach into VoltDB ¤ A demo of GRFusion will appear in
SIGMOD 2018
GRFusion: Realizing the G+R Approach 10
In-Memory Relational Database
Declarative Graph-Relational Queries
Graph Views Relational Data
Graph-Relational Query Engine
Query Parser
Query Optimizer
Plan Executor
![Page 11: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/11.jpg)
Create Graph View 11
¨ Create-Graph-View statement ¤ Create a named graph database object that can be referenced in queries ¤ Define the relational sources of the graph’s vertexes/edges ¤ Materialize the topology of the graph in main-memory as a singleton graph
structure
![Page 12: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/12.jpg)
Graph-View of a Social Network 12
![Page 13: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/13.jpg)
Graph-View Structure [Traversal Index] 13
![Page 14: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/14.jpg)
Graph-View Structure [Traversal Index] 14
![Page 15: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/15.jpg)
The VERTEXES Construct 15
¨ Appears in the FROM clause and references a graph view ¤ Select … From MyGraphView.VERTEXES v
¨ VERTEXES represents the vertexes of a graph view ¨ A vertex is a tuple with the following properties:
¤ Id ¤ FanIn ¤ FanOut ¤ Property for each vertex attribute
![Page 16: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/16.jpg)
The EDGES Construct 16
¨ Appears in the FROM clause and references a graph view ¤ Select … From MyGraphView.EDGES v
¨ EDGES represents the edges of a graph view ¨ An edge is a tuple with the following properties:
¤ Id ¤ StartVertexId ¤ EndVertexId ¤ Property for each edge attribute
![Page 17: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/17.jpg)
The PATHS Construct – Extended SQL 17
¨ Appears in the FROM clause and references a graph view ¤ Select … From MyGraphView.PATHS P
¨ PATHS represents a set of lazily-evaluated paths ¨ A path is a set of consecutive edges ¨ Each edge has two endpoint vertexes
¤ E.g., (V:attributes) –(:E:attributes)à(V:attributes) ….. ¨ A path is a tuple with the following properties:
¤ Length ¤ StartVertex ¤ EndVertex ¤ Vertexes ¤ Edges
![Page 18: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/18.jpg)
Declarative Graph-Relational Queries 18
![Page 19: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/19.jpg)
The PathScan Operator 19
¨ PathScan is a logical operator that acts on a graph-view ¤ Has three corresponding physical operators: BFScan, DFScan, SPScan
¨ The output of PathScan is a tuple ¤ Extends the standard relational tuple ¤ PathScan output can be ingested by other relational operators in the QEP
¨ PathScan accepts the id of the vertex to start the traversal from ¤ Otherwise, all the vertexes will be considered as start vertexes
¨ Filters can be pushed as Hints into the PathScan operator ¤ E.g., P.PathLength = 2
![Page 20: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/20.jpg)
Friends-of-Friends Query Example 20
¨ For all the users working as lawyers, retrieve the last name of their friends of friends, where the friendships happened after 1/1/2000
![Page 21: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/21.jpg)
QEP of the Friends-of-Friends Query 21
![Page 22: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/22.jpg)
Reachability Query Example 22
¨ Check if Protein X interacts directly (i.e., by an edge) or indirectly (i.e., by a path) with Protein Y through either a covalent or a stable interaction type.
![Page 23: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/23.jpg)
Shortest-Path Queries with Relational Predicates 23
![Page 24: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/24.jpg)
Evaluating The Native G+R Approach 24
¨ Realized a certralized version of GRFusion (Native G+R Core approach) inside VoltDB Version 6.7
¨ Single node running Linux kernel Version 3.17.7 n 32 cores of Intel Xeon 2.90 GHz n 384 GB of RAM
¨ Comparing against: ¤ Native Relational-Core:
n SQLGraph [SIGMOD’15], Grail [CIDR’15]
¤ Natice Graph-Core Systems: n Neo4j [neo4j.com] and Titan [thinkaurelius.github.io/titan]
![Page 25: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/25.jpg)
Experimental Setup 25
¨ Native relational-core approach ¤ SQLGraph [SIGMOD’15]
n Represent path traversal using recursive relational joins n Commercial system (code not available) n Implemented the techniques in VoltDB in-memory
¤ Grail [CIDR’15] n Implemented Grail in VoltDB n Also evaluated Grail in Hekaton n Got similar conclusions (Do not report the Hekaton results here)
![Page 26: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/26.jpg)
Experimental Setup (Cont’d) 26
¨ Native Graph Approch ¤ Neo4j [neo4j.com] and Titan [thinkaurelius.github.io/titan]
n Native graph-cores (specialized graph systems) n Disk-based systems n Titan: configured to use the in-memory storage configuration n Neo4j: Run on RamDisk to mitigate the disk IO cost
¤ GRFusion uses simple graph algorithms (single-source-shortest-path - Dijkstra’s algorithm) n Want to investigate performance gains, if any, of the G+R approach in contrast to
the native relational-core
![Page 27: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/27.jpg)
Evaluating GRFusion 27
¨ Graph queries ¤ Reachability queries ¤ Reachability queries with filtering predicates ¤ Shortest path queries ¤ Subgraph queries (e.g., count triangles)
¨ Datasets
![Page 28: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/28.jpg)
Reachability Queries (DBLP Dataset) 28
¨ Performance of GRFusion, Neo4j, Titan more stable in contrast to SQLGraph ¤ Avoid overheads of recursive relational joins
¨ GRFusion performs better than Neo4J &Titan ¤ VoltDB is optimized for main-memory ¤ Disk-based Titan/Neo4j (although runs on
RamDisk) are not optimized for main-memory ¤ Graph views in GRFusion are more compact
n Encode only the topology within the graph n No vertex/edge attributes in the topology n Thus, GRFusion makes better use of caching
¤ GRFusion/VoltDB are C++-based ¤ Neo4j and Titan are Java-based
n Overheads from the automatic memory management of Java
![Page 29: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/29.jpg)
Reachability Queries (String Dataset) 29
¨ String dataset: ~ 0.5B edges >> DBLP ¨ SQLGraph (based on VoltDB):
¤ Materialize the join results at each intermediate stage
¤ Explosion in size of intermediate results (perform more than 11 joins)
¨ SQLGraph and GRFusion follow BFS evaluation
¨ GRFusion follows as iterative model: ¤ Evaluate one path at a time ¤ Also, only the vertex Ids are stored in BFS
queue ¤ More efficient storage-wise than storing
the tuples of the relational joins as intermediate results
![Page 30: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/30.jpg)
Reachability Queries (Twitter Dataset) 30
¨ Twitter dataset: 1.4B edges dataset
¨ Fan-out is also a factor in the performance ¤ But we did not study effect of fan-
out ¤ Would require synthetic datasets ¤ Current study focus on real
datasets
![Page 31: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/31.jpg)
Constrained-Reachability Queries (String Dataset) 31
¨ Select a sub-graph then perform the query
¨ Use synthesized edge attributes to control selectivity
¨ Vary the selectivity from 5% to 50% ¨ Limit path length of the results of the
generated queries to 20 ¤ To emphasize the effect of the selectivity
of the sub-graph to operate on ¨ GRFusion uses the relational engine to
evaluate the filtering predicates on the edges ¤ Utilize pointers from edges to
corresponding tuples
![Page 32: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/32.jpg)
SSSP Queries – Tiger Dataset 32
¨ Grail implemented in-memory using VoltDB
¨ GRFusion: Run Dijkstra’s algorithm
![Page 33: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/33.jpg)
A Note on the Performance Gains of GRFusion 33
¨ Table scan or index scan/seek ¤ Direct pointers are more efficient
¨ Relational joins ¤ Large intermediate results ¤ Inaccurate cardinality estimation
⋈
σ
eTable T1
σ
eTable T2
σ
eTable T3
⋈
![Page 34: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/34.jpg)
Conclusions 34
¨ GRFusion (G+R Core approach) ¤ Execute both relational and graph operations in native mode
¨ Write declarative path-queries with relational predicates ¨ Outperform the state-of-the-art by up to 4 orders-of-magnitude in
query-time speedup ¨ Avoid large intermediate join results ¨ Avoid inaccurate cardinality estimation that may lead to non-optimal
join-algorithm selection
![Page 35: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/35.jpg)
Thank You!
35
![Page 36: Extending In-Memory Relational Database Engines …...¨ Extract graphs from the RDBMS ¨ Store graphs and process queries outside the realm of the RDBMS ¨ E.g., Ringo [SIGMOD’15],](https://reader033.vdocument.in/reader033/viewer/2022042621/5f4f0ddc0a6bf34b5058ad77/html5/thumbnails/36.jpg)
Vertex Query Example 36
¨ Retrige the Birthdate and the number of friends of each user in the social network with last name = ‘Smith’