regular path query evaluation on streaming graphsapacaci/papers/sigmod2020.pdfregular path queries...
TRANSCRIPT
Regular Path Query Evaluation on Streaming Graphs
Anil Pacaci 1 Angela Bonifati 2 M. Tamer Ozsu 1
1University of Waterloo
2Lyon 1 University
1
Streaming Graphs
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
b a a
x
y
z
u
v
a
b
a
t10
Characteristics of Streaming Graphs
I Streams are unbounded – no global access
I High streaming rates in real-world applications
2
Streaming Graphs
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
b a a b
x
y
z
u
v
a
b
a
t10
x
y
z
u
v
a
b
a
b
t11
Characteristics of Streaming Graphs
I Streams are unbounded – no global access
I High streaming rates in real-world applications
2
Streaming Graphs
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
b a a b
x
y
z
u
v
a
b
a
t10
x
y
z
u
v
a
b
a
b
t11
Characteristics of Streaming Graphs
I Streams are unbounded – no global access
I High streaming rates in real-world applications
2
Streaming Graphs
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
b a a b a
x
y
z
u
v
a
b
a
t10
x
y
z
u
v
a
b
a
b
t11
x
y
z
u
v
a
a
b
a
b
t13
Characteristics of Streaming Graphs
I Streams are unbounded – no global access
I High streaming rates in real-world applications
2
Streaming Graphs
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
z
u
b a a b a b
x
y
z
u
v
a
b
a
t10
x
y
z
u
v
a
b
a
b
t11
x
y
z
u
v
a
a
b
a
b
t13
x
y
z
u
v
a
a
b
ba
b
t14
Characteristics of Streaming Graphs
I Streams are unbounded – no global access
I High streaming rates in real-world applications
2
Streaming Graphs
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
z
u
b a a b a b
x
y
z
u
v
a
b
a
t10
x
y
z
u
v
a
b
a
b
t11
x
y
z
u
v
a
a
b
a
b
t13
x
y
z
u
v
a
a
b
ba
b
t14
x
y
z
u
v
a
a
ba
b
t15
Characteristics of Streaming Graphs
I Streams are unbounded – no global access
I High streaming rates in real-world applications
2
Streaming Graphs
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
z
u
u
x
b a a b a b bX
x
y
z
u
v
a
b
a
t10
x
y
z
u
v
a
b
a
b
t11
x
y
z
u
v
a
a
b
a
b
t13
x
y
z
u
v
a
a
b
ba
b
t14
x
y
z
u
v
a
a
ba
b
t15
x
y
z
u
v
a
a
ba
t16
Characteristics of Streaming Graphs
I Streams are unbounded – no global access
I High streaming rates in real-world applications
2
Streaming Graphs
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
z
u
u
x
b a a b a b bX
x
y
z
u
v
a
b
a
t10
x
y
z
u
v
a
b
a
b
t11
x
y
z
u
v
a
a
b
a
b
t13
x
y
z
u
v
a
a
b
ba
b
t14
x
y
z
u
v
a
a
ba
b
t15
x
y
z
u
v
a
a
ba
t16
Characteristics of Streaming Graphs
I Streams are unbounded – no global access
I High streaming rates in real-world applications
2
Applications of Streaming Graphs
Fraud detection in e-commerce [Qiu et al., 2018]
Intrusion detection on networks [Kent et al., 2015; Choudhury et al.,2015]
2
Criminal3
Merchant
11
4
credit
transfer
Middleman accounts
transfer
payment
(a) Credit-card fraud(Taken from [Qiu et al., 2018])
Attacker Victim
TCP
TCP
Bots
TCP
TCP
(b) Denial-of-service (DOS) attack(Taken from [Choudhury et al., 2015])
3
Applications of Streaming Graphs
Fraud detection in e-commerce [Qiu et al., 2018]
Intrusion detection on networks [Kent et al., 2015; Choudhury et al.,2015]
2
Criminal3
Merchant
11
4
credit
transfer
Middleman accounts
transfer
payment
(a) Credit-card fraud(Taken from [Qiu et al., 2018])
Attacker Victim
TCP
TCP
Bots
TCP
TCP
(b) Denial-of-service (DOS) attack(Taken from [Choudhury et al., 2015])
3
Applications of Streaming Graphs
Fraud detection in e-commerce [Qiu et al., 2018]
Intrusion detection on networks [Kent et al., 2015; Choudhury et al.,2015]
2
Criminal3
Merchant
11
4
credit
transfer
Middleman accounts
transfer
payment
(a) Credit-card fraud(Taken from [Qiu et al., 2018])
Attacker Victim
TCP
TCP
Bots
TCP
TCP
(b) Denial-of-service (DOS) attack(Taken from [Choudhury et al., 2015])
Our setting: Query processing over streaming graphs
I Persistent graph queries on streaming data
I Real-time results that are continuously updated
3
What is a graph query?
Graph queries feature:
Subgraph Patternc
u1 u2follows
worksAtworksA
t
Path Navigation
P1 P2
(follows ·mentions)+
Path Navigation Queries
Property paths in SPARQL v1.1Single-label reachability in CypherPath expressions in G-CORE & PGQL
Regular Path Queries
Generalized reachability & traversalDirected paths that match regular expressions
4
What is a graph query?
Graph queries feature:
Subgraph Patternc
u1 u2follows
worksAtworksA
t
Path Navigation
P1 P2
(follows ·mentions)+
Path Navigation Queries
Property paths in SPARQL v1.1Single-label reachability in CypherPath expressions in G-CORE & PGQL
Regular Path Queries
Generalized reachability & traversalDirected paths that match regular expressions
4
What is a graph query?
Graph queries feature:
Subgraph Patternc
u1 u2follows
worksAtworksA
t
Path Navigation
P1 P2
(follows ·mentions)+
Path Navigation Queries
Property paths in SPARQL v1.1Single-label reachability in CypherPath expressions in G-CORE & PGQL
Regular Path Queries
Generalized reachability & traversalDirected paths that match regular expressions
4
What is a graph query?
Graph queries feature:
Subgraph Patternc
u1 u2follows
worksAtworksA
t
Path Navigation
P1 P2
(follows ·mentions)+
Path Navigation Queries
Property paths in SPARQL v1.1Single-label reachability in CypherPath expressions in G-CORE & PGQL
Regular Path Queries
Generalized reachability & traversalDirected paths that match regular expressions
4
Regular Path Queries
RPQs are heavily used in practice
SPARQL v1.1, Cypher, PGQL, G-CORE1 in 4 queries in Wikidata query logs [Bonifati et al., 2019]
RPQ evaluation on static graphs
Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013]Cost-based planning for SPARQL property paths [Yakovets et al., 2016]
Unboundedness & high streaming rates
Windowed processingIncremental & non-blocking operators
Continuous processing of streaming graphs
Largely focus on pattern matching
The objective of this paper
Persistent RPQ evaluation over streaming graphs
5
Regular Path Queries
RPQs are heavily used in practice
SPARQL v1.1, Cypher, PGQL, G-CORE1 in 4 queries in Wikidata query logs [Bonifati et al., 2019]
RPQ evaluation on static graphs
Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013]Cost-based planning for SPARQL property paths [Yakovets et al., 2016]
Unboundedness & high streaming rates
Windowed processingIncremental & non-blocking operators
Continuous processing of streaming graphs
Largely focus on pattern matching
The objective of this paper
Persistent RPQ evaluation over streaming graphs
5
Regular Path Queries
RPQs are heavily used in practice
SPARQL v1.1, Cypher, PGQL, G-CORE1 in 4 queries in Wikidata query logs [Bonifati et al., 2019]
RPQ evaluation on static graphs
Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013]Cost-based planning for SPARQL property paths [Yakovets et al., 2016]
Unboundedness & high streaming rates
Windowed processingIncremental & non-blocking operators
Continuous processing of streaming graphs
Largely focus on pattern matching
The objective of this paper
Persistent RPQ evaluation over streaming graphs
5
Regular Path Queries
RPQs are heavily used in practice
SPARQL v1.1, Cypher, PGQL, G-CORE1 in 4 queries in Wikidata query logs [Bonifati et al., 2019]
RPQ evaluation on static graphs
Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013]Cost-based planning for SPARQL property paths [Yakovets et al., 2016]
Unboundedness & high streaming rates
Windowed processingIncremental & non-blocking operators
Continuous processing of streaming graphs
Largely focus on pattern matching
The objective of this paper
Persistent RPQ evaluation over streaming graphs
5
Regular Path Queries
RPQs are heavily used in practice
SPARQL v1.1, Cypher, PGQL, G-CORE1 in 4 queries in Wikidata query logs [Bonifati et al., 2019]
RPQ evaluation on static graphs
Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013]Cost-based planning for SPARQL property paths [Yakovets et al., 2016]
Unboundedness & high streaming rates
Windowed processingIncremental & non-blocking operators
Continuous processing of streaming graphs
Largely focus on pattern matching
The objective of this paper
Persistent RPQ evaluation over streaming graphs
5
Our Contributions
Persistent Regular Path Query Evaluation on Streaming Graphs
1 The design space of persistent RPQ algorithms
Path semanticsResult semantics & stream types
2 Path semantics used in practice
Simple paths (no repeating vertex): navigation on road networksArbitrary paths: reachability on communication networks
3 Result semantics & stream types
Append-only streams with fast insertionsSupport for explicit deletions
4 Physical operators for path navigation queries
Compatible with existing languages: Cypher, G-CORE, PGQL,SPARQL v1.1Efficiency & efficacy in real-world workloads
6
Our Contributions
Persistent Regular Path Query Evaluation on Streaming Graphs
1 The design space of persistent RPQ algorithms
Path semanticsResult semantics & stream types
2 Path semantics used in practice
Simple paths (no repeating vertex): navigation on road networksArbitrary paths: reachability on communication networks
3 Result semantics & stream types
Append-only streams with fast insertionsSupport for explicit deletions
4 Physical operators for path navigation queries
Compatible with existing languages: Cypher, G-CORE, PGQL,SPARQL v1.1Efficiency & efficacy in real-world workloads
6
Our Contributions
Persistent Regular Path Query Evaluation on Streaming Graphs
1 The design space of persistent RPQ algorithms
Path semanticsResult semantics & stream types
2 Path semantics used in practice
Simple paths (no repeating vertex): navigation on road networksArbitrary paths: reachability on communication networks
3 Result semantics & stream types
Append-only streams with fast insertionsSupport for explicit deletions
4 Physical operators for path navigation queries
Compatible with existing languages: Cypher, G-CORE, PGQL,SPARQL v1.1Efficiency & efficacy in real-world workloads
6
Our Contributions
Persistent Regular Path Query Evaluation on Streaming Graphs
1 The design space of persistent RPQ algorithms
Path semanticsResult semantics & stream types
2 Path semantics used in practice
Simple paths (no repeating vertex): navigation on road networksArbitrary paths: reachability on communication networks
3 Result semantics & stream types
Append-only streams with fast insertionsSupport for explicit deletions
4 Physical operators for path navigation queries
Compatible with existing languages: Cypher, G-CORE, PGQL,SPARQL v1.1Efficiency & efficacy in real-world workloads
6
Our Solution
7
Design Space for Streaming RPQ
An automata-based algorithm
Automata transition graph to guide traversals
Spanning-tree index to encode partial results
8
Design Space for Streaming RPQ
An automata-based algorithm
Automata transition graph to guide traversals
Spanning-tree index to encode partial results
8
Design Space for Streaming RPQ
An automata-based algorithm
Automata transition graph to guide traversals
Spanning-tree index to encode partial results
Path SemanticsResult Semantics
Append-only Explicit Deletions
Arbitrary O(n · k2) O(n2 · k)
Simple1 O(n · k2) O(n2 · k).
1These results hold in the absence of conflicts, a condition on cyclic structure of the query and graph.
8
Design Space for Streaming RPQ
An automata-based algorithm
Automata transition graph to guide traversals
Spanning-tree index to encode partial results
Path SemanticsResult Semantics
Append-only Explicit Deletions
Arbitrary O(n · k2) O(n2 · k)
Simple1 O(n · k2) O(n2 · k).
1These results hold in the absence of conflicts, a condition on cyclic structure of the query and graph.
8
Incremental Maintenance of Spanning Trees
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
b a a b a
Q1 = (a · b)+
s0start s1 s2a b
a
(x , 0)
(z , 1)
(w , 2)
(y , 1)
(u, 2)
(v , 1)
(y , 2)
Cross-edge
Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W )
I Expired tuples are removed in batches
I Explicit windows are supported via negative tuples
I A variation of Counting and DRed [Gupta et al., 1993]
9
Incremental Maintenance of Spanning Trees
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
b a a b a
Q1 = (a · b)+
s0start s1 s2a b
a
(x , 0)
(z , 1)
(w , 2)
(y , 1)
(u, 2)
(v , 1)
(y , 2)
Cross-edge
Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W )
I Expired tuples are removed in batches
I Explicit windows are supported via negative tuples
I A variation of Counting and DRed [Gupta et al., 1993]
9
Incremental Maintenance of Spanning Trees
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
z
u
b a a b a b
Q1 = (a · b)+
s0start s1 s2a b
a
(x , 0)
(z , 1)
(w , 2)
(y , 1)
(u, 2)
(v , 1)
(y , 2)
Cross-edge
Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W )
I Expired tuples are removed in batches
I Explicit windows are supported via negative tuples
I A variation of Counting and DRed [Gupta et al., 1993]
9
Incremental Maintenance of Spanning Trees
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
z
u
z
w
b a a b a b b
Q1 = (a · b)+
s0start s1 s2a b
a
(x , 0)
(z , 1)
(w , 2)
(y , 1)
(u, 2)
(v , 1)
(y , 2)
Cross-edge
Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W )
I Expired tuples are removed in batches
I Explicit windows are supported via negative tuples
I A variation of Counting and DRed [Gupta et al., 1993]
9
Incremental Maintenance of Spanning Trees
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
z
u
z
w
b a a b a b b
Q1 = (a · b)+
s0start s1 s2a b
a
(x , 0)
(z , 1)
(w , 2)
(y , 1)
(u, 2)
(v , 1)
(y , 2)
Cross-edge
Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W )
I Expired tuples are removed in batches
I Explicit windows are supported via negative tuples
I A variation of Counting and DRed [Gupta et al., 1993]
9
Incremental Maintenance of Spanning Trees
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
z
u
z
w
b a a b a b b
Q1 = (a · b)+
s0start s1 s2a b
a
(x , 0)
(z , 1)
(w , 2)
(y , 1)
(u, 2)
(v , 1)
(y , 2)
Cross-edge
Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W )
I Expired tuples are removed in batches
I Explicit windows are supported via negative tuples
I A variation of Counting and DRed [Gupta et al., 1993]
9
Incremental Maintenance of Spanning Trees
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
z
u
z
w
b a a b a b b
Q1 = (a · b)+
s0start s1 s2a b
a
(x , 0)
(z , 1)
(w , 2)
(y , 1)
(u, 2)
(v , 1)
(y , 2)
Cross-edge
Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W )
I Expired tuples are removed in batches
I Explicit windows are supported via negative tuples
I A variation of Counting and DRed [Gupta et al., 1993]
9
Incremental Maintenance of Spanning Trees
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
z
u
z
w
v
y
b a a b a b b b
Q1 = (a · b)+
s0start s1 s2a b
a
(x , 0)
(z , 1)
(w , 2)
(y , 1)
(u, 2)
(v , 1)
(y , 2)
Cross-edge
Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W )
I Expired tuples are removed in batches
I Explicit windows are supported via negative tuples
I A variation of Counting and DRed [Gupta et al., 1993]
9
Incremental Maintenance of Spanning Trees
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
z
u
z
w
v
y
b a a b a b b b
Q1 = (a · b)+
s0start s1 s2a b
a
(x , 0)
(z , 1)
(w , 2)
(y , 1)
(u, 2)
(v , 1)
(y , 2)
Cross-edge
Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W )
I Expired tuples are removed in batches
I Explicit windows are supported via negative tuples
I A variation of Counting and DRed [Gupta et al., 1993]
9
Incremental Maintenance of Spanning Trees
Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
y
u
x
z
u
v
u
x
x
y
z
u
z
w
v
y
b a a b a b b b
Q1 = (a · b)+
s0start s1 s2a b
a
(x , 0)
(z , 1)
(w , 2)
(y , 1)
(u, 2)
(v , 1)
(y , 2)
Cross-edge
Append-only Algorithm for Arbitrary Path Semantics
I O(n) amortized insertion cost (n = # of vertices in the window W )
I Expired tuples are removed in batches
I Explicit windows are supported via negative tuples
I A variation of Counting and DRed [Gupta et al., 1993]
9
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]
Conflict-freeness: A sufficient condition for efficient evaluation
A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v
Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection
Optimized for append-only streams
An optimistic conflict-detection mechanism
Aggressive pruning – backtracking only if absolutely necessary
Correctness & complexity proofs are in the paper
10
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]
Conflict-freeness: A sufficient condition for efficient evaluation
A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v
Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection
Optimized for append-only streams
An optimistic conflict-detection mechanism
Aggressive pruning – backtracking only if absolutely necessary
Correctness & complexity proofs are in the paper
10
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]
Conflict-freeness: A sufficient condition for efficient evaluation
A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v
Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection
Optimized for append-only streams
An optimistic conflict-detection mechanism
Aggressive pruning – backtracking only if absolutely necessary
Correctness & complexity proofs are in the paper
10
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]
Conflict-freeness: A sufficient condition for efficient evaluation
A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v
Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection
Optimized for append-only streams
An optimistic conflict-detection mechanism
Aggressive pruning – backtracking only if absolutely necessary
Correctness & complexity proofs are in the paper
10
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]
Conflict-freeness: A sufficient condition for efficient evaluation
A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v
Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection
Optimized for append-only streams
An optimistic conflict-detection mechanism
Aggressive pruning – backtracking only if absolutely necessary
Correctness & complexity proofs are in the paper
10
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]
Conflict-freeness: A sufficient condition for efficient evaluation
A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v
Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection
Optimized for append-only streams
An optimistic conflict-detection mechanism
Aggressive pruning – backtracking only if absolutely necessary
Correctness & complexity proofs are in the paper
10
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]
Conflict-freeness: A sufficient condition for efficient evaluation
A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v
Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection
Optimized for append-only streams
An optimistic conflict-detection mechanism
Aggressive pruning – backtracking only if absolutely necessary
Correctness & complexity proofs are in the paper
10
How to Find Simple Paths?
Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]
Conflict-freeness: A sufficient condition for efficient evaluation
A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v
Requires DFS ordering
Our contribution: A streaming algorithm for conflict detection
Optimized for append-only streams
An optimistic conflict-detection mechanism
Aggressive pruning – backtracking only if absolutely necessary
Correctness & complexity proofs are in the paper
10
Performance Results
Most common RPQs in Wikidata query logs [Bonifati et al., 2019]
Update stream of LDBC SNB [Erling et al., 2015]Stackoverflow temporal graphYago2s RDF graph
Sub-millisecond tail latency (99thpercentile)
Performance matching the complexity analysis
Finding simple paths
Over 70% of the workloads2− 5× impact on the tail latency
Scalability analysis using synthetic RPQ workloads (gMark [Baganet al., 2016])
11
Performance Results
Most common RPQs in Wikidata query logs [Bonifati et al., 2019]
Update stream of LDBC SNB [Erling et al., 2015]Stackoverflow temporal graphYago2s RDF graph
Sub-millisecond tail latency (99thpercentile)
Performance matching the complexity analysis
Finding simple paths
Over 70% of the workloads2− 5× impact on the tail latency
Scalability analysis using synthetic RPQ workloads (gMark [Baganet al., 2016])
11
Performance Results
Most common RPQs in Wikidata query logs [Bonifati et al., 2019]
Update stream of LDBC SNB [Erling et al., 2015]Stackoverflow temporal graphYago2s RDF graph
Sub-millisecond tail latency (99thpercentile)
Performance matching the complexity analysis
Finding simple paths
Over 70% of the workloads2− 5× impact on the tail latency
Scalability analysis using synthetic RPQ workloads (gMark [Baganet al., 2016])
11
Performance Results
Most common RPQs in Wikidata query logs [Bonifati et al., 2019]
Update stream of LDBC SNB [Erling et al., 2015]Stackoverflow temporal graphYago2s RDF graph
Sub-millisecond tail latency (99thpercentile)
Performance matching the complexity analysis
Finding simple paths
Over 70% of the workloads2− 5× impact on the tail latency
Scalability analysis using synthetic RPQ workloads (gMark [Baganet al., 2016])
11
Performance Results
Most common RPQs in Wikidata query logs [Bonifati et al., 2019]
Update stream of LDBC SNB [Erling et al., 2015]Stackoverflow temporal graphYago2s RDF graph
Sub-millisecond tail latency (99thpercentile)
Performance matching the complexity analysis
Finding simple paths
Over 70% of the workloads2− 5× impact on the tail latency
Scalability analysis using synthetic RPQ workloads (gMark [Baganet al., 2016])
11
Performance Results
Most common RPQs in Wikidata query logs [Bonifati et al., 2019]
Update stream of LDBC SNB [Erling et al., 2015]Stackoverflow temporal graphYago2s RDF graph
Sub-millisecond tail latency (99thpercentile)
Performance matching the complexity analysis
Finding simple paths
Over 70% of the workloads2− 5× impact on the tail latency
Scalability analysis using synthetic RPQ workloads (gMark [Baganet al., 2016])
11
Wrapping-up
1 Design space of non-blocking RPQ operators
Path semanticsResult semantics
2 Persistent RPQs under arbitrary path semantics
Support for explicit edge deletionsImplicit & explicit window semantics
3 Streaming conflict-detection algorithm
Safely prune the search space to avoid exhaustive searchTractable evaluation on conflict-free graphs (matching tractabilityresults [Mendelzon and Wood, 1995])
4 Extensive of experiments with real-world workloads [Bonifati et al.,2019]
Scalability w.r.t. the window & query sizeFeasibility of simple path semantics
12
Wrapping-up
1 Design space of non-blocking RPQ operators
Path semanticsResult semantics
2 Persistent RPQs under arbitrary path semantics
Support for explicit edge deletionsImplicit & explicit window semantics
3 Streaming conflict-detection algorithm
Safely prune the search space to avoid exhaustive searchTractable evaluation on conflict-free graphs (matching tractabilityresults [Mendelzon and Wood, 1995])
4 Extensive of experiments with real-world workloads [Bonifati et al.,2019]
Scalability w.r.t. the window & query sizeFeasibility of simple path semantics
12
Wrapping-up
1 Design space of non-blocking RPQ operators
Path semanticsResult semantics
2 Persistent RPQs under arbitrary path semantics
Support for explicit edge deletionsImplicit & explicit window semantics
3 Streaming conflict-detection algorithm
Safely prune the search space to avoid exhaustive searchTractable evaluation on conflict-free graphs (matching tractabilityresults [Mendelzon and Wood, 1995])
4 Extensive of experiments with real-world workloads [Bonifati et al.,2019]
Scalability w.r.t. the window & query sizeFeasibility of simple path semantics
12
Wrapping-up
1 Design space of non-blocking RPQ operators
Path semanticsResult semantics
2 Persistent RPQs under arbitrary path semantics
Support for explicit edge deletionsImplicit & explicit window semantics
3 Streaming conflict-detection algorithm
Safely prune the search space to avoid exhaustive searchTractable evaluation on conflict-free graphs (matching tractabilityresults [Mendelzon and Wood, 1995])
4 Extensive of experiments with real-world workloads [Bonifati et al.,2019]
Scalability w.r.t. the window & query sizeFeasibility of simple path semantics
12
13
References I
Bagan, G., Bonifati, A., Ciucanu, R., Fletcher, G. H., Lemay, A., and Advokaat, N.(2016). gmark: schema-driven generation of graphs and queries. IEEE Trans. Knowl.and Data Eng., 29(4):856–869.
Bagan, G., Bonifati, A., and Groz, B. (2013). A trichotomy for regular simple pathqueries on graphs. In Proc. 32nd ACM SIGACT-SIGMOD-SIGART Symp. onPrinciples of Database Systems, pages 261–272.
Barbieri, D. F., Braga, D., Ceri, S., Della Valle, E., and Grossniklaus, M. (2009).C-sparql: Sparql for continuous querying. In Proc. 18th Int. World Wide Web Conf.,pages 1061–1062.
Bonifati, A., Martens, W., and Timm, T. (2019). Navigating the maze of wikidataquery logs. In Proc. 28th Int. World Wide Web Conf., pages 127–138. ACM.
Calbimonte, J.-P., Corcho, O., and Gray, A. J. (2010). Enabling ontology-based accessto streaming data sources. In Proc. 9th Int. Semantic Web Conf., pages 96–111.Springer.
Choudhury, S., Holder, L. B., Jr, G. C., Agarwal, K., and Feo, J. (2015). A Selectivitybased approach to Continuous Pattern Detection in Streaming Graphs. In Proc. 18thInt. Conf. on Extending Database Technology, pages 157–168.
14
References II
Dell’Aglio, D., Calbimonte, J.-P., Della Valle, E., and Corcho, O. (2015). Towards aunified language for rdf stream query processing. In Proc. 12th Extended SemanticWeb Conf., pages 353–363. Springer.
Erling, O., Averbuch, A., Larriba-Pey, J., Chafi, H., Gubichev, A., Prat, A., Pham,M.-D., and Boncz, P. (2015). The ldbc social network benchmark: Interactiveworkload. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages619–630. http://doi.acm.org/10.1145/2723372.2742786.
Gupta, A., Mumick, I. S., and Subrahmanian, V. S. (1993). Maintaining viewsincrementally. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages157–166.
Iyer, A. P., Li, L. E., Das, T., and Stoica, I. (2016). Time-evolving graph processing atscale. In Proc. 4th Int. Workshop on Graph Data Management Experiences andSystems, pages 1–6.
Kent, A. D., Liebrock, L. M., and Neil, J. C. (2015). Authentication graphs: Analyzinguser behavior within an enterprise network. Computers & Security, 48:150–166.
Kumar, P. and Huang, H. H. (2020). Graphone: A data store for real-time analytics onevolving graphs. ACM Trans. Storage, 15(4):1–40.
15
References III
Le-Phuoc, D., Dao-Tran, M., Parreira, J. X., and Hauswirth, M. (2011). A native andadaptive approach for unified processing of linked streams and linked data. In Proc.10th Int. Semantic Web Conf., pages 370–388. Springer.
Mariappan, M. and Vora, K. (2019). Graphbolt: Dependency-driven synchronousprocessing of streaming graphs. In Proc. 14th ACM SIGOPS/EuroSys EuropeanConf. on Comp. Syst., pages 1–16.
Mendelzon, A. O. and Wood, P. T. (1995). Finding regular simple paths in graphdatabases. SIAM J. on Comput., 24(6):1235–1258.
Qiu, X., Cen, W., Qian, Z., Peng, Y., Zhang, Y., Lin, X., and Zhou, J. (2018).Real-time constrained cycle detection in large dynamic graphs. Proc. VLDBEndowment, 11(12):1876–1888.
Yakovets, N., Godfrey, P., and Gryz, J. (2016). Query planning for evaluating sparqlproperty paths. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages1875–1889. ACM.
16
Additional Slides
17
This presentation is not
1 A streaming RDF engine
C-SPARQL [Barbieri et al., 2009], CQELS [Le-Phuoc et al., 2011],SPARQLstream [Calbimonte et al., 2010], W3C RSP-QL [Dell’Aglioet al., 2015]Based on SPARQL v1.0 - no path navigationOur algorithms can be incorporated to provide incremental RPQevaluation
2 A graph analytics engine for dynamic graphs
GraphOne [Kumar and Huang, 2020], GraphBolt [Mariappan andVora, 2019], GraphTau [Iyer et al., 2016]Efficient maintenance of graph snapshotsIterative graph analytics in the vertex-centric modelWe focus on persistent evaluation of path queries that are specifieddeclaratively
18
Workloads
Table: Most common RPQs in real workloads [Bonifati et al., 2019].
Name Query Name Query
Q1 a∗ Q7 a ◦ b ◦ c∗Q2 a ◦ b∗ Q8 a? ◦ b∗
Q3 a ◦ b∗ ◦c∗ Q9 (a1 + a2 + · · · + ak)+
Q4 (a1 + a2 + · · · + ak)∗ Q10 (a1 + a2 + · · · + ak) ◦ b∗
Q5 a ◦ b∗ ◦ c Q11 a1 ◦ a2 ◦ · · · ◦ akQ6 a∗ ◦ b∗
19
Throughput & Tail Latency
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11Query
102
103
104
105
Thro
ughp
ut (e
dges
/s)
Throughput (edges/s) Tail Latency (μs)
8.2 × 101
8.4 × 101
8.6 × 101
8.8 × 101
9 × 101
9.2 × 101
Tail
Late
ncy
(μs)
(a) Yago2s
Q1 Q2 Q3 Q5 Q6 Q7 Q11Query
102
103
104
105
Thro
ughp
ut (e
dges
/s)
Throughput (edges/s) Tail Latency (μs)
102
103
104
Tail
Late
ncy
(μs)
(b) LDBC SF10
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11Query
102
103
104
105
Thro
ughp
ut (e
dges
/s)
Throughput (edges/s) Tail Latency (μs)
104
Tail
Late
ncy
(μs)
(c) Stackoverflow
Figure: Throughput and tail latency of the Algorithm ??. Y axis is given inlog-scale.
20
Index Size
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11Query
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
20.0#
of tr
ees(
1000
)# of Trees # of Nodes
0
5
10
15
20
25
30
35
40
# of
nod
es (1
M)
Figure: Size of the tree index ∆ on the SO graph with |W | = 30days.
21
Scalability - Tail Latency (99thPercentile)
5M 10M 15M 20MWindow size
2 × 102
3 × 102Ta
il la
tenc
y (μ
s)
0.5M 1M 1.5M 2MSlide size
Q1Q2
Q3Q4
Q5Q6
Q7Q8
Q9Q10
Q11
Figure: Tail latency with various |W | and β
22
Scalability - Window Management Time
5M 10M 15M 20MWindow size
105
2 × 105
3 × 105
Expi
ry T
ime
(μs)
0.5M 1M 1.5M 2MSlide size
Q1Q2
Q3Q4
Q5Q6
Q7Q8
Q9Q10
Q11
Figure: The average window maintenance cost with various |W | and β
23
Explicit Deletions
0 2 4 6 8 10% of explicit deletions
80
90
100
110
120
130
140Ta
il La
tenc
y (μ
s)
Q1Q2
Q3Q4
Q5Q6
Q7Q8
Q9Q10
Q11
Figure: Impact of the ratio of explicit deletions on tail latency for all queries onYago2s RDF graph with |W | =≈ 10M tuples.
24
Impact of DFA Construction
2 4 6 8 10 12 14 16 18 20Query size (|QR|)
2
4
6
8
10
12
# of
stat
es (k
)
2 4 6 8 10 12# of states (k)
20
40
60
80
100
120
Thro
ughp
ut (1
03 edg
es/s
)
Figure: The impact of the query length |QR | on the automata size, k, and thethroughput. RPQs are generated using gMark [Bagan et al., 2016] where thequery size ranges from 2 to 20. Each RPQ is formulated by grouping labels intoconcatenations and alternations of size up to 3, and each group has a 50%probability of having ∗ and +.
25