agensgraph: a multi-model graph database · pdf filebig vendors oracle – 12c spatial and...

27
AgensGraph: a Multi-model Graph Database Bitnine Global Ph. D Kisung Kim ([email protected])

Upload: doquynh

Post on 03-Feb-2018

262 views

Category:

Documents


12 download

TRANSCRIPT

Page 1: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

AgensGraph: a Multi-model Graph Database

Bitnine Global

Ph. D Kisung Kim ([email protected])

Page 2: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

What is Graph Database?

•Change in data representation• Gartner says “it represents a radical change in how data is organized and processed”

•Relationship is the first-class citizen in the graph database• In relational database, it is handled implicitly

• In graph database, you can make your data more connected

Relational Database Graph Database

Entity Row Node (Vertex)

Relationship Row Relationship (Edge)

Page 3: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Benefits of Graph Database

• Intuitive data modeling• ER diagram-like data model

• Concise query• Doesn’t need to specify joins and its conditions• Ex) Cypher (by Neo Technology), SPARQL (by WWW)

• Performance for graph pattern matching• Optimized for processing graph traversals

• Graph analysis• Provide built-in graph analysis functions• Ex) PageRank, ShortestPath, graph clustering

Page 4: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Intuitive Data Modeling

Page 5: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Cypher Query Language

• Cypher is a SQL for graph databases• Declarative query language for the property graph model

• Developed by Neo technology Inc. since 2011

• Inspired by SQL and SPARQL (the standard query language for RDF)• Designed to be human-readable query language

•OpenCypher.org (http://opencypher.org) • Participate in developing the query language

Page 6: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Cypher Example

• Using the graph pattern matching and ASCII code diagrams

with recursive as (

selectparent, child as descendant,1 as level from source union all

selectd.parent, s.child, d.level + 1

from descendants as d

join source s on d.descendant = s.parent

)select * from descendantsorder by parent, level, descendant ;

Query: Find all ancestor-descendant pairs in the graph.

MATCHp=(descendant)-[:Parent*]->(ancestor)

RETURN(ancestor), (descendant), length(p)

ORDER BY (ancestor), (descendant), length(p)

Cypher

descendant ancestor

SQL

Page 7: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Graph Databases

• There exists many graph databases

Graph DB vendorsNeo4j – Single node, OLTP, CypherDatastax Enterprise Graph – Cassandra, Gremlin, OLTP & OLAPOrientDB – Cluster, SQL like language, document storage

Big vendorsOracle – 12c spatial and networkSAP – HANA graph, support Cypher, columnar storageIBM – provide cloud service based-on Titan, System GMicrosoft – Graph engineTeradata Aster Database – provides graph analytics

RDF DBVirtuosoo, AllegroGraph, GraphDB (ontotext)

Graph analysisGiraph – Apache projectGraphX – Spark moduleGraphLab – acquired by Apple and changed to turi.com

NoSQLMongoDB – provide simple graph lookup from 3.4 (2016 Dec) ElasticSearch – provide graph visualization and modeling

Etc.Objectivity’s ThinsSpanArangoDBJanusGraph – Forked from Titan, supported by Linux FoundataionGrakn.AI – Using Titan and Spark

Page 8: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

What We Want to Implement• Property graph model

• OpenCypher query language

• ACID transaction

• OLTP workload and graph analytics framework

• We chose to implement it based on PostgreSQL because it already has• Robust storage engine• Transaction layer using MVCC• Cost-based query optimizer

Page 9: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

AgensGraph

• Newest release: v1.1 (based on PostgreSQL v9.6.2)• Homepage: http://www.agensgraph.com • Download: http://bitnine.net/downloads/• Github: https://github.com/bitnine-oss/agensgraph

• A forked project of PostgreSQL (Apache license)

• Features• Multi-model: property graph data model, relational data model and JSON documents• Cypher query language support• Integrated querying using SQL and Cypher• Multiple graphs and Hierarchical graph label organization• Property indexes on both vertexes and edges• Constraints: unique, mandatory and check constraints

Page 10: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

AgensGraph Data Model

• Extended property graph model with JSON document

• Support multiple graphs in a database

• Label hierarchy• Vertexes and edges can be grouped into labels (e.g. person, student, teacher, …)• Labels are organized as a hierarchy

Property Indexes usingBtree, GIN, BRIN, …for both vertexes and edge

Vertex Vertex

Edge

Page 11: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Cypher Clauses• For reading

• MATCH: find graph patterns• OPTIONAL MATCH: allows incomplete matchings

• For updating• CREATE: create a vertex or an edge• MERGE: like UPSERT• SET: modify property values

• For filtering• WHERE

• For handling results• WITH, RETURN

• And ORDER BY, LIMIT, SKIPhttps://s3.amazonaws.com/artifacts.opencypher.org/M05/railroad/Cypher.html

Page 12: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Example• Create graph objects

• If you want label hierarchy CREATE VLABEL student INHERITS (person);

Page 13: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Example• Create vertexes

Page 14: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Example• Create property indexes

• Create relationships

Page 15: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

AgensGraph Architecture

• Developed in the core of PostgreSQL engine• Not a layered architecture (e.g. Titan)• Forked project of PostgreSQL• PostgreSQL is very reliable and robust

• Add graph objects

• Extend query engine for supporting Cypher query and fast graph traversal

• Maintain transaction and storage layer

JDBC/ODBC/Python/Node.js Driver

Integrated Query Processing EngineGraph query optimizerGraph query executor

Transaction LayerSupport MVCC and ACID TX

Cache LayerSupport caching graph data in memory

Graph StorageSupport label hierarchy

Optimized for fast traversal and updates

SQL & Cypher

Page 16: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Graph Storage• Use PostgreSQL’s heap table and B-tree indexes

• Use composite indexes for edge tables to exploit index-only scans for traversals

• We found that heap table and B-tree fast enough to process graph workload

• But we plan to design a new storage for large-scale graph processing

Page 17: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Cypher Query Processor• Cypher query is processed by the same process with SQL

• We integrate Cypher query processing with SQL query engine from the parser to the executor

• So you can use any PostgreSQL’s expressions and functions in Cypher

• Cypher query’s results is a relation• We treat Cypher query as a subquery• Existing query optimizations can be applied to Cypher query too

(e.g. rolling up subquery, predicate push-down, join ordering, …)

• Can make a query by combining SQL and Cypher as a subquery

Cypher Query

Parser

Analyze

Plan & Optimize

Execute

AST

Query Tree

Plan Tree

Page 18: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Cypher Implementation Issues• Cypher query is a chain of Cypher clauses

• Each clause produces its results as a relation

• Chained execution• The results from the former clause are provided to the next clause

• Transform a Cypher query to a query tree• Each clause is transformed to a query structure• A MATCH clause is transformed to a query structure with joins• The chained clauses are combined as subqueries

Page 19: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Cypher Query Processor

Actor table{name = ‘Tom Cruise’}

ACT_IN table Movie table

Query (the first MATCH)

Query (the second MATCH)

ACT_IN tableActor

{name: ‘Nicole Kidman’}

Actor table{name = ‘Tom Cruise’}

ACT_IN table Movie table

Query

ACT_IN tableActor

{name: ‘Nicole Kidman’}

Subquery rollup

Page 20: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Variable-length Edge (VLE) Query

• Can be implemented using recursive common table expression in SQL

• But we found that CTE is inefficient for VLE query• Using CTE is BFS (Breadth First Search)-style processing• BFS processing needs to buffer intermediate results

• We implement a new execution node for VLE query• DFS-style processing

• It is a way faster than a recursive CTE query

MATCHp=(descendant)-[:Parent*]->(ancestor)

RETURN(ancestor), (descendant), length(p)

ORDER BY (ancestor), (descendant), length(p)

Cypher

descendant ancestor

Page 21: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Example Cypher Plan• match (a)-[*1..5]->(b) return a, b;

Page 22: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Considerations for Graph Query Performance

• Graph pattern matching is usually more efficient using random page reads• set random_page_cost = 0.005• It is more efficient to cache the data in memory or use SSD for fast graph traversal

• Index-only scan is important for graph traversals• It is possible when there are no accessing for edges’ properties

• Query optimization is crucial but it is harder than SQL queries• Graph queries involves many joins• Size estimations are getting inaccurate as increasing the number of joins• PostgreSQL’s optimizer works well usually but needs to improved and more research

Page 23: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

LDBC Benchmark

• Linked Data Benchmark (http://ldbcouncil.org)

• Participants (http://ldbcouncil.org/industry/members)• Oracle labs, IBM, Huawei, SAP, Sparsity, Openlink SW, Ontotext, Neo technology

• Benchmark tool for graph workloads• Social network benchmark (SNB)

• Simulating social network service workloads

• Graph analytics• Semantic publishing benchmark

• For RDF and SPARQL

• We conducted SNB interactive workloads

Page 24: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Performance Comparisons

• Caveat• We had optimized two databases as much as we can• The benchmark results can be changed by configuration settings

• Comparisons• Neo4j 3.1 community edition• AgensGraph 1.0

Page 25: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Future Roadmap

• Distributed and parallel processing• Extend AgensGraph using Postgres-XL

• Graph analysis framework like the vertex-centric programming model

• Support more graph analysis algorithms

• Integration with Big data systems for large-scale graph processing

Page 26: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Thank You!

[email protected]

http://agensgraph.com

Github: https://github.com/bitnine-oss/agensgraph

Page 27: AgensGraph: a Multi-model Graph Database · PDF fileBig vendors Oracle – 12c spatial and network SAP – HANA graph, support Cypher, columnar storage ... •Provide technical services

Bitnine Global

• Headquartered at Seoul in Korea and founded in 2014

• R&D center at Santa Clara in USA

• Provide technical services for PostgreSQL and big data

• Partner with IBM and Cloudera