regular path query evaluation on streaming graphsapacaci/papers/sigmod2020.pdfregular path queries...

71
Regular Path Query Evaluation on Streaming Graphs Anil Pacaci 1 Angela Bonifati 2 M. Tamer ¨ Ozsu 1 1 University of Waterloo 2 Lyon 1 University 1

Upload: others

Post on 09-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Regular Path Query Evaluation on Streaming Graphs

Anil Pacaci 1 Angela Bonifati 2 M. Tamer Ozsu 1

1University of Waterloo

2Lyon 1 University

1

Page 2: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Streaming Graphs

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

b a a

x

y

z

u

v

a

b

a

t10

Characteristics of Streaming Graphs

I Streams are unbounded – no global access

I High streaming rates in real-world applications

2

Page 3: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Streaming Graphs

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

b a a b

x

y

z

u

v

a

b

a

t10

x

y

z

u

v

a

b

a

b

t11

Characteristics of Streaming Graphs

I Streams are unbounded – no global access

I High streaming rates in real-world applications

2

Page 4: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Streaming Graphs

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

b a a b

x

y

z

u

v

a

b

a

t10

x

y

z

u

v

a

b

a

b

t11

Characteristics of Streaming Graphs

I Streams are unbounded – no global access

I High streaming rates in real-world applications

2

Page 5: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Streaming Graphs

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

b a a b a

x

y

z

u

v

a

b

a

t10

x

y

z

u

v

a

b

a

b

t11

x

y

z

u

v

a

a

b

a

b

t13

Characteristics of Streaming Graphs

I Streams are unbounded – no global access

I High streaming rates in real-world applications

2

Page 6: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Streaming Graphs

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

z

u

b a a b a b

x

y

z

u

v

a

b

a

t10

x

y

z

u

v

a

b

a

b

t11

x

y

z

u

v

a

a

b

a

b

t13

x

y

z

u

v

a

a

b

ba

b

t14

Characteristics of Streaming Graphs

I Streams are unbounded – no global access

I High streaming rates in real-world applications

2

Page 7: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Streaming Graphs

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

z

u

b a a b a b

x

y

z

u

v

a

b

a

t10

x

y

z

u

v

a

b

a

b

t11

x

y

z

u

v

a

a

b

a

b

t13

x

y

z

u

v

a

a

b

ba

b

t14

x

y

z

u

v

a

a

ba

b

t15

Characteristics of Streaming Graphs

I Streams are unbounded – no global access

I High streaming rates in real-world applications

2

Page 8: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Streaming Graphs

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

z

u

u

x

b a a b a b bX

x

y

z

u

v

a

b

a

t10

x

y

z

u

v

a

b

a

b

t11

x

y

z

u

v

a

a

b

a

b

t13

x

y

z

u

v

a

a

b

ba

b

t14

x

y

z

u

v

a

a

ba

b

t15

x

y

z

u

v

a

a

ba

t16

Characteristics of Streaming Graphs

I Streams are unbounded – no global access

I High streaming rates in real-world applications

2

Page 9: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Streaming Graphs

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

z

u

u

x

b a a b a b bX

x

y

z

u

v

a

b

a

t10

x

y

z

u

v

a

b

a

b

t11

x

y

z

u

v

a

a

b

a

b

t13

x

y

z

u

v

a

a

b

ba

b

t14

x

y

z

u

v

a

a

ba

b

t15

x

y

z

u

v

a

a

ba

t16

Characteristics of Streaming Graphs

I Streams are unbounded – no global access

I High streaming rates in real-world applications

2

Page 10: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Applications of Streaming Graphs

Fraud detection in e-commerce [Qiu et al., 2018]

Intrusion detection on networks [Kent et al., 2015; Choudhury et al.,2015]

2

Criminal3

Merchant

11

4

credit

transfer

Middleman accounts

transfer

payment

(a) Credit-card fraud(Taken from [Qiu et al., 2018])

Attacker Victim

TCP

TCP

Bots

TCP

TCP

(b) Denial-of-service (DOS) attack(Taken from [Choudhury et al., 2015])

3

Page 11: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Applications of Streaming Graphs

Fraud detection in e-commerce [Qiu et al., 2018]

Intrusion detection on networks [Kent et al., 2015; Choudhury et al.,2015]

2

Criminal3

Merchant

11

4

credit

transfer

Middleman accounts

transfer

payment

(a) Credit-card fraud(Taken from [Qiu et al., 2018])

Attacker Victim

TCP

TCP

Bots

TCP

TCP

(b) Denial-of-service (DOS) attack(Taken from [Choudhury et al., 2015])

3

Page 12: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Applications of Streaming Graphs

Fraud detection in e-commerce [Qiu et al., 2018]

Intrusion detection on networks [Kent et al., 2015; Choudhury et al.,2015]

2

Criminal3

Merchant

11

4

credit

transfer

Middleman accounts

transfer

payment

(a) Credit-card fraud(Taken from [Qiu et al., 2018])

Attacker Victim

TCP

TCP

Bots

TCP

TCP

(b) Denial-of-service (DOS) attack(Taken from [Choudhury et al., 2015])

Our setting: Query processing over streaming graphs

I Persistent graph queries on streaming data

I Real-time results that are continuously updated

3

Page 13: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

What is a graph query?

Graph queries feature:

Subgraph Patternc

u1 u2follows

worksAtworksA

t

Path Navigation

P1 P2

(follows ·mentions)+

Path Navigation Queries

Property paths in SPARQL v1.1Single-label reachability in CypherPath expressions in G-CORE & PGQL

Regular Path Queries

Generalized reachability & traversalDirected paths that match regular expressions

4

Page 14: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

What is a graph query?

Graph queries feature:

Subgraph Patternc

u1 u2follows

worksAtworksA

t

Path Navigation

P1 P2

(follows ·mentions)+

Path Navigation Queries

Property paths in SPARQL v1.1Single-label reachability in CypherPath expressions in G-CORE & PGQL

Regular Path Queries

Generalized reachability & traversalDirected paths that match regular expressions

4

Page 15: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

What is a graph query?

Graph queries feature:

Subgraph Patternc

u1 u2follows

worksAtworksA

t

Path Navigation

P1 P2

(follows ·mentions)+

Path Navigation Queries

Property paths in SPARQL v1.1Single-label reachability in CypherPath expressions in G-CORE & PGQL

Regular Path Queries

Generalized reachability & traversalDirected paths that match regular expressions

4

Page 16: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

What is a graph query?

Graph queries feature:

Subgraph Patternc

u1 u2follows

worksAtworksA

t

Path Navigation

P1 P2

(follows ·mentions)+

Path Navigation Queries

Property paths in SPARQL v1.1Single-label reachability in CypherPath expressions in G-CORE & PGQL

Regular Path Queries

Generalized reachability & traversalDirected paths that match regular expressions

4

Page 17: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Regular Path Queries

RPQs are heavily used in practice

SPARQL v1.1, Cypher, PGQL, G-CORE1 in 4 queries in Wikidata query logs [Bonifati et al., 2019]

RPQ evaluation on static graphs

Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013]Cost-based planning for SPARQL property paths [Yakovets et al., 2016]

Unboundedness & high streaming rates

Windowed processingIncremental & non-blocking operators

Continuous processing of streaming graphs

Largely focus on pattern matching

The objective of this paper

Persistent RPQ evaluation over streaming graphs

5

Page 18: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Regular Path Queries

RPQs are heavily used in practice

SPARQL v1.1, Cypher, PGQL, G-CORE1 in 4 queries in Wikidata query logs [Bonifati et al., 2019]

RPQ evaluation on static graphs

Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013]Cost-based planning for SPARQL property paths [Yakovets et al., 2016]

Unboundedness & high streaming rates

Windowed processingIncremental & non-blocking operators

Continuous processing of streaming graphs

Largely focus on pattern matching

The objective of this paper

Persistent RPQ evaluation over streaming graphs

5

Page 19: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Regular Path Queries

RPQs are heavily used in practice

SPARQL v1.1, Cypher, PGQL, G-CORE1 in 4 queries in Wikidata query logs [Bonifati et al., 2019]

RPQ evaluation on static graphs

Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013]Cost-based planning for SPARQL property paths [Yakovets et al., 2016]

Unboundedness & high streaming rates

Windowed processingIncremental & non-blocking operators

Continuous processing of streaming graphs

Largely focus on pattern matching

The objective of this paper

Persistent RPQ evaluation over streaming graphs

5

Page 20: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Regular Path Queries

RPQs are heavily used in practice

SPARQL v1.1, Cypher, PGQL, G-CORE1 in 4 queries in Wikidata query logs [Bonifati et al., 2019]

RPQ evaluation on static graphs

Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013]Cost-based planning for SPARQL property paths [Yakovets et al., 2016]

Unboundedness & high streaming rates

Windowed processingIncremental & non-blocking operators

Continuous processing of streaming graphs

Largely focus on pattern matching

The objective of this paper

Persistent RPQ evaluation over streaming graphs

5

Page 21: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Regular Path Queries

RPQs are heavily used in practice

SPARQL v1.1, Cypher, PGQL, G-CORE1 in 4 queries in Wikidata query logs [Bonifati et al., 2019]

RPQ evaluation on static graphs

Tractability results [Mendelzon and Wood, 1995; Bagan et al., 2013]Cost-based planning for SPARQL property paths [Yakovets et al., 2016]

Unboundedness & high streaming rates

Windowed processingIncremental & non-blocking operators

Continuous processing of streaming graphs

Largely focus on pattern matching

The objective of this paper

Persistent RPQ evaluation over streaming graphs

5

Page 22: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Our Contributions

Persistent Regular Path Query Evaluation on Streaming Graphs

1 The design space of persistent RPQ algorithms

Path semanticsResult semantics & stream types

2 Path semantics used in practice

Simple paths (no repeating vertex): navigation on road networksArbitrary paths: reachability on communication networks

3 Result semantics & stream types

Append-only streams with fast insertionsSupport for explicit deletions

4 Physical operators for path navigation queries

Compatible with existing languages: Cypher, G-CORE, PGQL,SPARQL v1.1Efficiency & efficacy in real-world workloads

6

Page 23: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Our Contributions

Persistent Regular Path Query Evaluation on Streaming Graphs

1 The design space of persistent RPQ algorithms

Path semanticsResult semantics & stream types

2 Path semantics used in practice

Simple paths (no repeating vertex): navigation on road networksArbitrary paths: reachability on communication networks

3 Result semantics & stream types

Append-only streams with fast insertionsSupport for explicit deletions

4 Physical operators for path navigation queries

Compatible with existing languages: Cypher, G-CORE, PGQL,SPARQL v1.1Efficiency & efficacy in real-world workloads

6

Page 24: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Our Contributions

Persistent Regular Path Query Evaluation on Streaming Graphs

1 The design space of persistent RPQ algorithms

Path semanticsResult semantics & stream types

2 Path semantics used in practice

Simple paths (no repeating vertex): navigation on road networksArbitrary paths: reachability on communication networks

3 Result semantics & stream types

Append-only streams with fast insertionsSupport for explicit deletions

4 Physical operators for path navigation queries

Compatible with existing languages: Cypher, G-CORE, PGQL,SPARQL v1.1Efficiency & efficacy in real-world workloads

6

Page 25: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Our Contributions

Persistent Regular Path Query Evaluation on Streaming Graphs

1 The design space of persistent RPQ algorithms

Path semanticsResult semantics & stream types

2 Path semantics used in practice

Simple paths (no repeating vertex): navigation on road networksArbitrary paths: reachability on communication networks

3 Result semantics & stream types

Append-only streams with fast insertionsSupport for explicit deletions

4 Physical operators for path navigation queries

Compatible with existing languages: Cypher, G-CORE, PGQL,SPARQL v1.1Efficiency & efficacy in real-world workloads

6

Page 26: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Our Solution

7

Page 27: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Design Space for Streaming RPQ

An automata-based algorithm

Automata transition graph to guide traversals

Spanning-tree index to encode partial results

8

Page 28: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Design Space for Streaming RPQ

An automata-based algorithm

Automata transition graph to guide traversals

Spanning-tree index to encode partial results

8

Page 29: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Design Space for Streaming RPQ

An automata-based algorithm

Automata transition graph to guide traversals

Spanning-tree index to encode partial results

Path SemanticsResult Semantics

Append-only Explicit Deletions

Arbitrary O(n · k2) O(n2 · k)

Simple1 O(n · k2) O(n2 · k).

1These results hold in the absence of conflicts, a condition on cyclic structure of the query and graph.

8

Page 30: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Design Space for Streaming RPQ

An automata-based algorithm

Automata transition graph to guide traversals

Spanning-tree index to encode partial results

Path SemanticsResult Semantics

Append-only Explicit Deletions

Arbitrary O(n · k2) O(n2 · k)

Simple1 O(n · k2) O(n2 · k).

1These results hold in the absence of conflicts, a condition on cyclic structure of the query and graph.

8

Page 31: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Incremental Maintenance of Spanning Trees

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

b a a b a

Q1 = (a · b)+

s0start s1 s2a b

a

(x , 0)

(z , 1)

(w , 2)

(y , 1)

(u, 2)

(v , 1)

(y , 2)

Cross-edge

Append-only Algorithm for Arbitrary Path Semantics

I O(n) amortized insertion cost (n = # of vertices in the window W )

I Expired tuples are removed in batches

I Explicit windows are supported via negative tuples

I A variation of Counting and DRed [Gupta et al., 1993]

9

Page 32: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Incremental Maintenance of Spanning Trees

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

b a a b a

Q1 = (a · b)+

s0start s1 s2a b

a

(x , 0)

(z , 1)

(w , 2)

(y , 1)

(u, 2)

(v , 1)

(y , 2)

Cross-edge

Append-only Algorithm for Arbitrary Path Semantics

I O(n) amortized insertion cost (n = # of vertices in the window W )

I Expired tuples are removed in batches

I Explicit windows are supported via negative tuples

I A variation of Counting and DRed [Gupta et al., 1993]

9

Page 33: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Incremental Maintenance of Spanning Trees

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

z

u

b a a b a b

Q1 = (a · b)+

s0start s1 s2a b

a

(x , 0)

(z , 1)

(w , 2)

(y , 1)

(u, 2)

(v , 1)

(y , 2)

Cross-edge

Append-only Algorithm for Arbitrary Path Semantics

I O(n) amortized insertion cost (n = # of vertices in the window W )

I Expired tuples are removed in batches

I Explicit windows are supported via negative tuples

I A variation of Counting and DRed [Gupta et al., 1993]

9

Page 34: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Incremental Maintenance of Spanning Trees

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

z

u

z

w

b a a b a b b

Q1 = (a · b)+

s0start s1 s2a b

a

(x , 0)

(z , 1)

(w , 2)

(y , 1)

(u, 2)

(v , 1)

(y , 2)

Cross-edge

Append-only Algorithm for Arbitrary Path Semantics

I O(n) amortized insertion cost (n = # of vertices in the window W )

I Expired tuples are removed in batches

I Explicit windows are supported via negative tuples

I A variation of Counting and DRed [Gupta et al., 1993]

9

Page 35: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Incremental Maintenance of Spanning Trees

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

z

u

z

w

b a a b a b b

Q1 = (a · b)+

s0start s1 s2a b

a

(x , 0)

(z , 1)

(w , 2)

(y , 1)

(u, 2)

(v , 1)

(y , 2)

Cross-edge

Append-only Algorithm for Arbitrary Path Semantics

I O(n) amortized insertion cost (n = # of vertices in the window W )

I Expired tuples are removed in batches

I Explicit windows are supported via negative tuples

I A variation of Counting and DRed [Gupta et al., 1993]

9

Page 36: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Incremental Maintenance of Spanning Trees

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

z

u

z

w

b a a b a b b

Q1 = (a · b)+

s0start s1 s2a b

a

(x , 0)

(z , 1)

(w , 2)

(y , 1)

(u, 2)

(v , 1)

(y , 2)

Cross-edge

Append-only Algorithm for Arbitrary Path Semantics

I O(n) amortized insertion cost (n = # of vertices in the window W )

I Expired tuples are removed in batches

I Explicit windows are supported via negative tuples

I A variation of Counting and DRed [Gupta et al., 1993]

9

Page 37: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Incremental Maintenance of Spanning Trees

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

z

u

z

w

b a a b a b b

Q1 = (a · b)+

s0start s1 s2a b

a

(x , 0)

(z , 1)

(w , 2)

(y , 1)

(u, 2)

(v , 1)

(y , 2)

Cross-edge

Append-only Algorithm for Arbitrary Path Semantics

I O(n) amortized insertion cost (n = # of vertices in the window W )

I Expired tuples are removed in batches

I Explicit windows are supported via negative tuples

I A variation of Counting and DRed [Gupta et al., 1993]

9

Page 38: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Incremental Maintenance of Spanning Trees

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

z

u

z

w

v

y

b a a b a b b b

Q1 = (a · b)+

s0start s1 s2a b

a

(x , 0)

(z , 1)

(w , 2)

(y , 1)

(u, 2)

(v , 1)

(y , 2)

Cross-edge

Append-only Algorithm for Arbitrary Path Semantics

I O(n) amortized insertion cost (n = # of vertices in the window W )

I Expired tuples are removed in batches

I Explicit windows are supported via negative tuples

I A variation of Counting and DRed [Gupta et al., 1993]

9

Page 39: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Incremental Maintenance of Spanning Trees

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

z

u

z

w

v

y

b a a b a b b b

Q1 = (a · b)+

s0start s1 s2a b

a

(x , 0)

(z , 1)

(w , 2)

(y , 1)

(u, 2)

(v , 1)

(y , 2)

Cross-edge

Append-only Algorithm for Arbitrary Path Semantics

I O(n) amortized insertion cost (n = # of vertices in the window W )

I Expired tuples are removed in batches

I Explicit windows are supported via negative tuples

I A variation of Counting and DRed [Gupta et al., 1993]

9

Page 40: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Incremental Maintenance of Spanning Trees

Timet0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19

y

u

x

z

u

v

u

x

x

y

z

u

z

w

v

y

b a a b a b b b

Q1 = (a · b)+

s0start s1 s2a b

a

(x , 0)

(z , 1)

(w , 2)

(y , 1)

(u, 2)

(v , 1)

(y , 2)

Cross-edge

Append-only Algorithm for Arbitrary Path Semantics

I O(n) amortized insertion cost (n = # of vertices in the window W )

I Expired tuples are removed in batches

I Explicit windows are supported via negative tuples

I A variation of Counting and DRed [Gupta et al., 1993]

9

Page 41: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

How to Find Simple Paths?

Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]

Conflict-freeness: A sufficient condition for efficient evaluation

A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v

Requires DFS ordering

Our contribution: A streaming algorithm for conflict detection

Optimized for append-only streams

An optimistic conflict-detection mechanism

Aggressive pruning – backtracking only if absolutely necessary

Correctness & complexity proofs are in the paper

10

Page 42: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

How to Find Simple Paths?

Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]

Conflict-freeness: A sufficient condition for efficient evaluation

A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v

Requires DFS ordering

Our contribution: A streaming algorithm for conflict detection

Optimized for append-only streams

An optimistic conflict-detection mechanism

Aggressive pruning – backtracking only if absolutely necessary

Correctness & complexity proofs are in the paper

10

Page 43: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

How to Find Simple Paths?

Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]

Conflict-freeness: A sufficient condition for efficient evaluation

A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v

Requires DFS ordering

Our contribution: A streaming algorithm for conflict detection

Optimized for append-only streams

An optimistic conflict-detection mechanism

Aggressive pruning – backtracking only if absolutely necessary

Correctness & complexity proofs are in the paper

10

Page 44: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

How to Find Simple Paths?

Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]

Conflict-freeness: A sufficient condition for efficient evaluation

A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v

Requires DFS ordering

Our contribution: A streaming algorithm for conflict detection

Optimized for append-only streams

An optimistic conflict-detection mechanism

Aggressive pruning – backtracking only if absolutely necessary

Correctness & complexity proofs are in the paper

10

Page 45: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

How to Find Simple Paths?

Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]

Conflict-freeness: A sufficient condition for efficient evaluation

A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v

Requires DFS ordering

Our contribution: A streaming algorithm for conflict detection

Optimized for append-only streams

An optimistic conflict-detection mechanism

Aggressive pruning – backtracking only if absolutely necessary

Correctness & complexity proofs are in the paper

10

Page 46: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

How to Find Simple Paths?

Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]

Conflict-freeness: A sufficient condition for efficient evaluation

A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v

Requires DFS ordering

Our contribution: A streaming algorithm for conflict detection

Optimized for append-only streams

An optimistic conflict-detection mechanism

Aggressive pruning – backtracking only if absolutely necessary

Correctness & complexity proofs are in the paper

10

Page 47: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

How to Find Simple Paths?

Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]

Conflict-freeness: A sufficient condition for efficient evaluation

A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v

Requires DFS ordering

Our contribution: A streaming algorithm for conflict detection

Optimized for append-only streams

An optimistic conflict-detection mechanism

Aggressive pruning – backtracking only if absolutely necessary

Correctness & complexity proofs are in the paper

10

Page 48: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

How to Find Simple Paths?

Finding simple paths for arbitrary graphs and queries is NP-complete[Mendelzon and Wood, 1995]

Conflict-freeness: A sufficient condition for efficient evaluation

A condition on the cyclic structure of the graph and the automataAn arbitrary path pa : u→v =⇒ a simple path ps : u→v

Requires DFS ordering

Our contribution: A streaming algorithm for conflict detection

Optimized for append-only streams

An optimistic conflict-detection mechanism

Aggressive pruning – backtracking only if absolutely necessary

Correctness & complexity proofs are in the paper

10

Page 49: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Performance Results

Most common RPQs in Wikidata query logs [Bonifati et al., 2019]

Update stream of LDBC SNB [Erling et al., 2015]Stackoverflow temporal graphYago2s RDF graph

Sub-millisecond tail latency (99thpercentile)

Performance matching the complexity analysis

Finding simple paths

Over 70% of the workloads2− 5× impact on the tail latency

Scalability analysis using synthetic RPQ workloads (gMark [Baganet al., 2016])

11

Page 50: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Performance Results

Most common RPQs in Wikidata query logs [Bonifati et al., 2019]

Update stream of LDBC SNB [Erling et al., 2015]Stackoverflow temporal graphYago2s RDF graph

Sub-millisecond tail latency (99thpercentile)

Performance matching the complexity analysis

Finding simple paths

Over 70% of the workloads2− 5× impact on the tail latency

Scalability analysis using synthetic RPQ workloads (gMark [Baganet al., 2016])

11

Page 51: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Performance Results

Most common RPQs in Wikidata query logs [Bonifati et al., 2019]

Update stream of LDBC SNB [Erling et al., 2015]Stackoverflow temporal graphYago2s RDF graph

Sub-millisecond tail latency (99thpercentile)

Performance matching the complexity analysis

Finding simple paths

Over 70% of the workloads2− 5× impact on the tail latency

Scalability analysis using synthetic RPQ workloads (gMark [Baganet al., 2016])

11

Page 52: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Performance Results

Most common RPQs in Wikidata query logs [Bonifati et al., 2019]

Update stream of LDBC SNB [Erling et al., 2015]Stackoverflow temporal graphYago2s RDF graph

Sub-millisecond tail latency (99thpercentile)

Performance matching the complexity analysis

Finding simple paths

Over 70% of the workloads2− 5× impact on the tail latency

Scalability analysis using synthetic RPQ workloads (gMark [Baganet al., 2016])

11

Page 53: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Performance Results

Most common RPQs in Wikidata query logs [Bonifati et al., 2019]

Update stream of LDBC SNB [Erling et al., 2015]Stackoverflow temporal graphYago2s RDF graph

Sub-millisecond tail latency (99thpercentile)

Performance matching the complexity analysis

Finding simple paths

Over 70% of the workloads2− 5× impact on the tail latency

Scalability analysis using synthetic RPQ workloads (gMark [Baganet al., 2016])

11

Page 54: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Performance Results

Most common RPQs in Wikidata query logs [Bonifati et al., 2019]

Update stream of LDBC SNB [Erling et al., 2015]Stackoverflow temporal graphYago2s RDF graph

Sub-millisecond tail latency (99thpercentile)

Performance matching the complexity analysis

Finding simple paths

Over 70% of the workloads2− 5× impact on the tail latency

Scalability analysis using synthetic RPQ workloads (gMark [Baganet al., 2016])

11

Page 55: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Wrapping-up

1 Design space of non-blocking RPQ operators

Path semanticsResult semantics

2 Persistent RPQs under arbitrary path semantics

Support for explicit edge deletionsImplicit & explicit window semantics

3 Streaming conflict-detection algorithm

Safely prune the search space to avoid exhaustive searchTractable evaluation on conflict-free graphs (matching tractabilityresults [Mendelzon and Wood, 1995])

4 Extensive of experiments with real-world workloads [Bonifati et al.,2019]

Scalability w.r.t. the window & query sizeFeasibility of simple path semantics

12

Page 56: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Wrapping-up

1 Design space of non-blocking RPQ operators

Path semanticsResult semantics

2 Persistent RPQs under arbitrary path semantics

Support for explicit edge deletionsImplicit & explicit window semantics

3 Streaming conflict-detection algorithm

Safely prune the search space to avoid exhaustive searchTractable evaluation on conflict-free graphs (matching tractabilityresults [Mendelzon and Wood, 1995])

4 Extensive of experiments with real-world workloads [Bonifati et al.,2019]

Scalability w.r.t. the window & query sizeFeasibility of simple path semantics

12

Page 57: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Wrapping-up

1 Design space of non-blocking RPQ operators

Path semanticsResult semantics

2 Persistent RPQs under arbitrary path semantics

Support for explicit edge deletionsImplicit & explicit window semantics

3 Streaming conflict-detection algorithm

Safely prune the search space to avoid exhaustive searchTractable evaluation on conflict-free graphs (matching tractabilityresults [Mendelzon and Wood, 1995])

4 Extensive of experiments with real-world workloads [Bonifati et al.,2019]

Scalability w.r.t. the window & query sizeFeasibility of simple path semantics

12

Page 58: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Wrapping-up

1 Design space of non-blocking RPQ operators

Path semanticsResult semantics

2 Persistent RPQs under arbitrary path semantics

Support for explicit edge deletionsImplicit & explicit window semantics

3 Streaming conflict-detection algorithm

Safely prune the search space to avoid exhaustive searchTractable evaluation on conflict-free graphs (matching tractabilityresults [Mendelzon and Wood, 1995])

4 Extensive of experiments with real-world workloads [Bonifati et al.,2019]

Scalability w.r.t. the window & query sizeFeasibility of simple path semantics

12

Page 59: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

13

Page 60: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

References I

Bagan, G., Bonifati, A., Ciucanu, R., Fletcher, G. H., Lemay, A., and Advokaat, N.(2016). gmark: schema-driven generation of graphs and queries. IEEE Trans. Knowl.and Data Eng., 29(4):856–869.

Bagan, G., Bonifati, A., and Groz, B. (2013). A trichotomy for regular simple pathqueries on graphs. In Proc. 32nd ACM SIGACT-SIGMOD-SIGART Symp. onPrinciples of Database Systems, pages 261–272.

Barbieri, D. F., Braga, D., Ceri, S., Della Valle, E., and Grossniklaus, M. (2009).C-sparql: Sparql for continuous querying. In Proc. 18th Int. World Wide Web Conf.,pages 1061–1062.

Bonifati, A., Martens, W., and Timm, T. (2019). Navigating the maze of wikidataquery logs. In Proc. 28th Int. World Wide Web Conf., pages 127–138. ACM.

Calbimonte, J.-P., Corcho, O., and Gray, A. J. (2010). Enabling ontology-based accessto streaming data sources. In Proc. 9th Int. Semantic Web Conf., pages 96–111.Springer.

Choudhury, S., Holder, L. B., Jr, G. C., Agarwal, K., and Feo, J. (2015). A Selectivitybased approach to Continuous Pattern Detection in Streaming Graphs. In Proc. 18thInt. Conf. on Extending Database Technology, pages 157–168.

14

Page 61: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

References II

Dell’Aglio, D., Calbimonte, J.-P., Della Valle, E., and Corcho, O. (2015). Towards aunified language for rdf stream query processing. In Proc. 12th Extended SemanticWeb Conf., pages 353–363. Springer.

Erling, O., Averbuch, A., Larriba-Pey, J., Chafi, H., Gubichev, A., Prat, A., Pham,M.-D., and Boncz, P. (2015). The ldbc social network benchmark: Interactiveworkload. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages619–630. http://doi.acm.org/10.1145/2723372.2742786.

Gupta, A., Mumick, I. S., and Subrahmanian, V. S. (1993). Maintaining viewsincrementally. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages157–166.

Iyer, A. P., Li, L. E., Das, T., and Stoica, I. (2016). Time-evolving graph processing atscale. In Proc. 4th Int. Workshop on Graph Data Management Experiences andSystems, pages 1–6.

Kent, A. D., Liebrock, L. M., and Neil, J. C. (2015). Authentication graphs: Analyzinguser behavior within an enterprise network. Computers & Security, 48:150–166.

Kumar, P. and Huang, H. H. (2020). Graphone: A data store for real-time analytics onevolving graphs. ACM Trans. Storage, 15(4):1–40.

15

Page 62: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

References III

Le-Phuoc, D., Dao-Tran, M., Parreira, J. X., and Hauswirth, M. (2011). A native andadaptive approach for unified processing of linked streams and linked data. In Proc.10th Int. Semantic Web Conf., pages 370–388. Springer.

Mariappan, M. and Vora, K. (2019). Graphbolt: Dependency-driven synchronousprocessing of streaming graphs. In Proc. 14th ACM SIGOPS/EuroSys EuropeanConf. on Comp. Syst., pages 1–16.

Mendelzon, A. O. and Wood, P. T. (1995). Finding regular simple paths in graphdatabases. SIAM J. on Comput., 24(6):1235–1258.

Qiu, X., Cen, W., Qian, Z., Peng, Y., Zhang, Y., Lin, X., and Zhou, J. (2018).Real-time constrained cycle detection in large dynamic graphs. Proc. VLDBEndowment, 11(12):1876–1888.

Yakovets, N., Godfrey, P., and Gryz, J. (2016). Query planning for evaluating sparqlproperty paths. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages1875–1889. ACM.

16

Page 63: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Additional Slides

17

Page 64: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

This presentation is not

1 A streaming RDF engine

C-SPARQL [Barbieri et al., 2009], CQELS [Le-Phuoc et al., 2011],SPARQLstream [Calbimonte et al., 2010], W3C RSP-QL [Dell’Aglioet al., 2015]Based on SPARQL v1.0 - no path navigationOur algorithms can be incorporated to provide incremental RPQevaluation

2 A graph analytics engine for dynamic graphs

GraphOne [Kumar and Huang, 2020], GraphBolt [Mariappan andVora, 2019], GraphTau [Iyer et al., 2016]Efficient maintenance of graph snapshotsIterative graph analytics in the vertex-centric modelWe focus on persistent evaluation of path queries that are specifieddeclaratively

18

Page 65: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Workloads

Table: Most common RPQs in real workloads [Bonifati et al., 2019].

Name Query Name Query

Q1 a∗ Q7 a ◦ b ◦ c∗Q2 a ◦ b∗ Q8 a? ◦ b∗

Q3 a ◦ b∗ ◦c∗ Q9 (a1 + a2 + · · · + ak)+

Q4 (a1 + a2 + · · · + ak)∗ Q10 (a1 + a2 + · · · + ak) ◦ b∗

Q5 a ◦ b∗ ◦ c Q11 a1 ◦ a2 ◦ · · · ◦ akQ6 a∗ ◦ b∗

19

Page 66: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Throughput & Tail Latency

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11Query

102

103

104

105

Thro

ughp

ut (e

dges

/s)

Throughput (edges/s) Tail Latency (μs)

8.2 × 101

8.4 × 101

8.6 × 101

8.8 × 101

9 × 101

9.2 × 101

Tail

Late

ncy

(μs)

(a) Yago2s

Q1 Q2 Q3 Q5 Q6 Q7 Q11Query

102

103

104

105

Thro

ughp

ut (e

dges

/s)

Throughput (edges/s) Tail Latency (μs)

102

103

104

Tail

Late

ncy

(μs)

(b) LDBC SF10

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11Query

102

103

104

105

Thro

ughp

ut (e

dges

/s)

Throughput (edges/s) Tail Latency (μs)

104

Tail

Late

ncy

(μs)

(c) Stackoverflow

Figure: Throughput and tail latency of the Algorithm ??. Y axis is given inlog-scale.

20

Page 67: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Index Size

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11Query

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

20.0#

of tr

ees(

1000

)# of Trees # of Nodes

0

5

10

15

20

25

30

35

40

# of

nod

es (1

M)

Figure: Size of the tree index ∆ on the SO graph with |W | = 30days.

21

Page 68: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Scalability - Tail Latency (99thPercentile)

5M 10M 15M 20MWindow size

2 × 102

3 × 102Ta

il la

tenc

y (μ

s)

0.5M 1M 1.5M 2MSlide size

Q1Q2

Q3Q4

Q5Q6

Q7Q8

Q9Q10

Q11

Figure: Tail latency with various |W | and β

22

Page 69: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Scalability - Window Management Time

5M 10M 15M 20MWindow size

105

2 × 105

3 × 105

Expi

ry T

ime

(μs)

0.5M 1M 1.5M 2MSlide size

Q1Q2

Q3Q4

Q5Q6

Q7Q8

Q9Q10

Q11

Figure: The average window maintenance cost with various |W | and β

23

Page 70: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Explicit Deletions

0 2 4 6 8 10% of explicit deletions

80

90

100

110

120

130

140Ta

il La

tenc

y (μ

s)

Q1Q2

Q3Q4

Q5Q6

Q7Q8

Q9Q10

Q11

Figure: Impact of the ratio of explicit deletions on tail latency for all queries onYago2s RDF graph with |W | =≈ 10M tuples.

24

Page 71: Regular Path Query Evaluation on Streaming Graphsapacaci/papers/sigmod2020.pdfRegular Path Queries RPQs are heavily used in practice SPARQL v1.1, Cypher, PGQL, G-CORE 1 in 4 queries

Impact of DFA Construction

2 4 6 8 10 12 14 16 18 20Query size (|QR|)

2

4

6

8

10

12

# of

stat

es (k

)

2 4 6 8 10 12# of states (k)

20

40

60

80

100

120

Thro

ughp

ut (1

03 edg

es/s

)

Figure: The impact of the query length |QR | on the automata size, k, and thethroughput. RPQs are generated using gMark [Bagan et al., 2016] where thequery size ranges from 2 to 20. Each RPQ is formulated by grouping labels intoconcatenations and alternations of size up to 3, and each group has a 50%probability of having ∗ and +.

25