graph algebra with pattern matching and aggregation support 1
TRANSCRIPT
1
Graph Algebrawith Pattern Matching and Aggregation Support
2
Nowadays GraphVariety of Sources
◦ Scientific Studies◦ Business Activities◦ Social Needs◦ Internet
Data are often of◦ Large Scale◦ Highly Liked◦ Schema-less
3
Managing Graph DataPrimary Role of Database
◦ Persistent store ◦ Efficient Query
RDBMS◦ Storage Model : vertex and edge as tuples◦ Query: Link is by join
Graph Database◦ Storage Model: graphs◦ Query: path traversal
4
Why not RDBMS ?Schema Issue
◦ Every data inserted may of a different schema (Web Graph)
◦ Hard to represent semi structured infoScalability Issues
◦ ACID property VS CAP theoremQuery performance
◦ Difficult to optimize intensive Joins
5
Graph Databases and Query Languages
No Universal Languages !!!
6
No Universal Language Like SQL?No commonly agreed algebra
Relational Algebra ?◦ Expressive, test-of-time to be effective◦ NOT suitable for GRAPH
Graph Algebra ?◦ Still at preliminary work
7
Issues with Relational Algebra (RA)Defined on Tuples or Set of Tuples
◦ Mismatch with graph nature◦ Operators loose semantics
What is Union, Intersection, Join in GRAPH?
◦ I/O type ? Tables not GRAPH
Domain centric, not Data centric◦ Don’t anticipate out-of-order data◦ Treat Tuples as independent
Didn’t aware the links among Tuples Queries written using RA are verbose and complex
8
Advantage of Graph AlgebraAn algebra itself is a query language
◦ Easy to work out a language with Strong theoretic support
Evaluate expressiveness of given languages◦ Justify when to use what: Gremlin, Cypher etc.
Query Optimization◦ Operator order EQUALS execution plan◦ Algebraic Equivalence IMPLIES query
optimization
9
Advantage of Graph AlgebraSeparation of Query and System:
◦ One can write Query on any system as long as common algebra is supported.
◦ Knowing RA, one can write SQL, PL/SQL, MS/SQL on MySQL, Oracle, SQLServer
Integrate new operators to database:◦ Current graph database systems didn’t support
newly developed queries: Graph OLAP, Graph Cube, Graph Aggregation etc.
◦ Proper Algebra can incorporate these operators
10
Existing Works on Graph AlgebraGraph QL [1]
◦ A graph based algebra, operators are based on graphs◦ Selection◦ Join – not properly defined◦ Template
VAQL [2]◦ Focused on visualization◦ Selection◦ Aggregation – restricted◦ Visualization
Selection is restricted on isomorphismAggregation is not defined over edgesNo algebra equivalence[1] He, Huahai, and Ambuj K. Singh. "Graphs-at-a-time: query language and access methods for graph databases." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008.[2] Shaverdian, Anna A., et al. "A graph algebra for scalable visual analytics." Computer Graphics and Applications, IEEE 32.4 (2012): 26-33.
11
What we want for a Graph Algebra?Universal
◦ Independent of graph types: Directed VS Undirected. Simple VS Hyper. Homogeneous VS
heterogeneous.
Expressive◦ Able to answer typical graph queries:
Pattern match, Reachability, Path finding etc.
◦ Cover Relational Algebra (RA) This ensures that graph database can handle relational data as well
Scale ◦ Able to manage data in-scale
Support queries to summarize, aggregate data
12
Extended Algebra – Graph Model is an attributed graphis vertex set, each has a unique IDis edge set contains attributes for each vertex contains attributes for each edge
◦ Edge contain identifier as well◦ In simple graph, edge can be represented by end
points contains information for the graph
13
Extended Algebra – Operators
Projection
Restriction
Unification
Pattern Matching
Aggregation
14
Operators: Projection Purpose:
◦ Select user interested data from base graph
Syntax:
are the attribute lists for vertex, edge and graph
The result is a new graph, whose attributes are trimmed by
15
Operators: Restriction Purpose:
◦ Restrict the attribute value from base graphSyntax:
: vertex restriction, select all the vertices (and their induced edges) which matches predicate
: edge restriction, select all the edges (and their endpoints) which matches predicate
: graph restriction, select graphs whose every vertex matches predicate, every edge matches and the graph matches
16
Operator: Unification Purpose:
◦ Concatenate graphsSyntax:
◦
: vertex unification, unify vertices with identical ids
: edge unification, adding edges between two vertices matching
: attribute unification, create a virtual vertex for each distinct value in
17
Operator: Unification
P(v1,v1) and P(v4,v5) are true
18
Operator: Unification
19
Operator: Pattern Matching
Purpose:◦ Find subgraphs out of base graph matching a
given patternSyntax:
is a pattern, which is also a graph. The definition comes from [1]
returns all the matching graphs returns abstractive matching, where
only vertices appeared in is returned[1] Fan, Wenfei, et al. "Adding regular expressions to graph reachability and pattern queries." Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 2011.
20
Operator: Pattern Matching
21
Operator: Aggregation Purpose:
◦ To summarize a given graph
Syntax:
: graph aggregation, every vertex is supplied to and every edge set is supplied to
: vertex aggregation, given a set of vertices group them by
: edge aggregation, given a set of edges, group them by
22
Operator: Aggregation
23
Expressiveness
This set of operators are more expressive than Relational Algebra and Graph QL
It can represent many graph queries◦ Reachability◦ Graph Cube computation◦ I-OLAP and T-OLAP
24
Algebra EquivalenceWhen operators are chained up, they
can form a query execution plan
Find the network induced by the person whose friends comment on each other’s posts with birthday greater than 1989. Output those names as a graph
friend
Commentfriend
⊕𝑣 (𝜋 (𝜎 𝑣 (Γ (𝑅𝑀 ,𝐺 ) , h𝑏𝑖𝑟𝑡 𝑑𝑎𝑦>1989 ) ,𝑣 .𝑛𝑎𝑚𝑒 ))
Base Graph
Matched
Result
Restriction
h𝑏𝑖𝑟𝑡 𝑑𝑎𝑦>1989 v.name
V-Unification
25
Algebra EquivalenceTo generate multiple execution plans
for a same query, we need theoretic support:
Identity Equivalence:
◦ A operator can be represented by other operators // p is a common attribute predicate
◦ D(P) is to decompose a pattern P into edges
◦ //
...
26
ConclusionGraph Algebra plays an important role
in graph database development
We make one step forward by proposing a Graph Algebra which:◦ extends existing algebraic work with
Regular pattern matching Aggregation
◦ is expressive and well-defined◦ contains equivalence rules for further query
optimization
27
Thank you!