gábor szárnyas opencypher meetup @ nyc · olap global queries overview of graph processing oltp...

Post on 04-Jun-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Academic research on graph processing:connecting recent findings to industrial technologies

Gábor Szárnyas

openCypher Meetup @ NYC

LINKED DATA BENCHMARK COUNCIL

LDBC is a non-profit organization dedicated to establishing

benchmarks, benchmark practices and benchmark results for

graph data management software.

LDBC’s Social Network Benchmark is an industrial and academic

initiative, formed by principal actors in the field of graph-like

data management.

OVERVIEW OF GRAPH PROCESSING

OLTP

analytics

OLAP

local queries

global queries

global computations

OLAP global queries

OVERVIEW OF GRAPH PROCESSING

OLTP

analytics

local queries

global computations

Example: „Friends’ recent likes”

MATCH (u:User {id: $userId})-[:FRIEND]-(f:User)-[l:LIKES]->(p:Post)

RETURN f, pORDER BY l.timestamp DESCLIMIT 10

OLAP global queries

OVERVIEW OF GRAPH PROCESSING

OLTP

analytics

local queries

global computations

Orri Erling et al.,

The LDBC Social Network Benchmark: Interactive Workload,

SIGMOD 2015

14 queries and 8 updates

frequent up.limited data

OVERVIEW OF GRAPH PROCESSING

OLTP

analytics

local queries

global computations

OLAP global queries

frequent up.limited data

Example: „One-sided friendships”

MATCH (u1:User)-[:FRIEND]-(u2:User)-[l:LIKES]->(p:Post),(u1)-[:AUTHOR_OF]->(p)

WITH u1, u2, count(l) AS likesWHERE likes > 10AND NOT (u1)-[:LIKES]->(:Post)<-[:AUTHOR_OF]-(u2)

RETURN u1, u2

OVERVIEW OF GRAPH PROCESSING

OLTP

analytics

local queries

global computations

Arnau Prat, Gábor Szárnyas, Alex Averbuch et al.,

The LDBC Social Network Benchmark: BI Workload,

Technical report available, peer-reviewed paper in 2018

OLAP global queries

25 queries with infrequent executions

lots of data infrequent up.

frequent up.limited data

OVERVIEW OF GRAPH PROCESSING

OLTP

analytics

OLAP

local queries

global queries

global computations

frequent up.limited data

lots of data infrequent up.

• PageRank

• Shortest paths

• Clustering coefficient

Example: „Find the most central individuals.”

Neo4j: Graph Algorithms library

OVERVIEW OF GRAPH PROCESSING

OLTP

analytics

OLAP

local queries

global queries

global computations

Alexandru Iosup et al.,

LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on

Parallel and Distributed Platforms,

VLDB 2016

frequent up.limited data

lots of data infrequent up.

One-time execution

all data no updates

OVERVIEW OF GRAPH PROCESSING

OLTP

analytics

OLAP

local queries

global queries

global computations all data no updates

frequent up.limited data

lots of data infrequent up.

OVERVIEW OF GRAPH PROCESSING

OLTP

analytics

OLAP

local queries

global queries

global computations

validation global queries

all data no updates

frequent up.limited data

lots of data infrequent up.

Example: „Emergency contact for juvenile users”

MATCH (u1:User)WHERE u1.age < 18AND NOT (u1)-[:EMERGENCY_CONTACT]->(:User)

RETURN u1

OVERVIEW OF GRAPH PROCESSING

OLTP

analytics

OLAP

local queries

global queries

global computations

Gábor Szárnyas et al.

The Train Benchmark: cross-technology performance

evaluation of continuous model queries,

Software and Systems Modeling, 2017

validation global queries lots of data frequent up.

all data no updates

frequent up.limited data

lots of data infrequent up.

FAULT-TOLERANT SYSTEMS RESEARCH GROUP

Critical systems

Avionics

Railway

Automotive

MODEL-DRIVEN ENGINEERING

Models are first class citizens during development

o SysML / requirements, statecharts, etc.

Validation and code generation techniques for correctness

Technology:

Eclipse Modeling Framework (EMF)

Originally started at IBM as an implementation of the Object

Management Group’s (OMG) Meta Object Facility (MOF).

i.e., an object-oriented model

i.e., a property graph-like structure with a metamodel

MODEL VALIDATION

Implemented with model queries

Models are typed, attributed graphs

Typical queries

o Get two components connected by a particular edge

MATCH (r:R)…(s:S) WHERE NOT (r)-[:E]->(s)

o Check if two objects are reachable

MATCH (r:R)…(s:S) WHERE NOT (r)-[:E1|E2*]->(s)

o Property checks

MATCH (r:R)-->(s:S) WHERE r.a = 'x' OR (s:Y)

Complex graph queries

1

switch

sensor C

sensor B

2

sensor A

segment

route

1

2

segment

segment

segmentswitch

sensor Csensor Bsensor A

route 2route 1

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sw: Switchsensor: Sensor

route: Route swP: SwitchPosition

NEG

sw: Switchsensor: Sensor

route: Route swP: SwitchPosition

NEG

MATCH (route:Route)

sw: Switchsensor: Sensor

route: Route swP: SwitchPosition

NEG

MATCH (route:Route)-->(swP:SwitchPosition)

sw: Switchsensor: Sensor

route: Route swP: SwitchPosition

NEG

MATCH (route:Route)-->(swP:SwitchPosition)-->(sw:Switch)

sw: Switchsensor: Sensor

route: Route swP: SwitchPosition

NEG

MATCH (route:Route)-->(swP:SwitchPosition)-->(sw:Switch)<--(sensor:Sensor)

sw: Switchsensor: Sensor

route: Route swP: SwitchPosition

NEG

MATCH (route:Route)-->(swP:SwitchPosition)-->(sw:Switch)<--(sensor:Sensor)

WHERE NOT (route)-->(sensor)

sw: Switchsensor: Sensor

route: Route swP: SwitchPosition

NEG

MATCH (route:Route)-->(swP:SwitchPosition)-->(sw:Switch)<--(sensor:Sensor)

WHERE NOT (route)-->(sensor)RETURN route, sensor, swP, sw

sw: Switchsensor: Sensor

route: Route swP: SwitchPosition

NEG

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

LOCAL SEARCH-BASED QUERY EVALUATION

Matching: 𝑃 → 𝐺 (graph morphism)

Constraints satisfaction on a finite domain (CSP/FD):

o Variables: vertices of 𝑃

o Constraints: edges of 𝑃

o Domain values: 𝐺

Complexity: 𝐺 |𝑃|

G. Varró, F. Deckwerth, M. Wieber, A. Schürr,

An algorithm for generating model-sensitive search plans for pattern

matching on EMF models,

Software and Systems Modeling, 2013

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight] 00

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

0001

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

switch

000102

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

switch

switchPosition

[diverging] 00010203

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

route 1

switch

switchPosition

[diverging] 0001020304

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

route 1

switch

switchPosition

[diverging] 000102030405

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

route 1

switch

switchPosition

[diverging] 000102030405

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

switch

switchPosition

[diverging] 000102030405

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

switch

000102030405

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

switch

switchPosition

[straight] 00010203040506

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

route 2

switch

switchPosition

[straight] 0001020304050607

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

route 2

switch

switchPosition

[straight] 000102030405060708

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

route 2

switch

switchPosition

[straight] 000102030405060708

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

switch

switchPosition

[straight] 000102030405060708

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

switch

000102030405060708

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

000102030405060708

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight] 000102030405060708

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor B

00010203040506070809

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight] 00010203040506070809

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

0001020304050607080910

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

switch

000102030405060708091011

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

switch

switchPosition

[diverging] 00010203040506070809101112

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

route 1

switch

switchPosition

[diverging] 0001020304050607080910111213

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

route 1

switch

switchPosition

[diverging] 000102030405060708091011121314

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

route 1

switch

switchPosition

[diverging] 000102030405060708091011121314

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

switch

switchPosition

[diverging] 000102030405060708091011121314

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

switch

000102030405060708091011121314

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

switch

switchPosition

[straight] 00010203040506070809101112131415

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

route 2

switch

switchPosition

[straight] 0001020304050607080910111213141516

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

route 2

switch

switchPosition

[straight] 000102030405060708091011121314151617

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

route 2

switch

switchPosition

[straight] 000102030405060708091011121314151617

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight] 00

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 1

0001

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 1

switchPosition

[diverging] 000102

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 1

switch

switchPosition

[diverging] 00010203

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

route 1

switch

switchPosition

[diverging] 0001020304

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

route 1

switch

switchPosition

[diverging] 000102030405

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 1

switch

switchPosition

[diverging] 000102030405

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

route 1

switch

switchPosition

[diverging] 00010203040506

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

route 1

switch

switchPosition

[diverging] 0001020304050607

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 1

switch

switchPosition

[diverging] 0001020304050607

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 1

switchPosition

[diverging] 0001020304050607

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 1

0001020304050607

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight] 0001020304050607

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 2

000102030405060708

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 2

switchPosition

[straight] 00010203040506070809

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 2

switch

switchPosition

[straight] 0001020304050607080910

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

route 2

switch

switchPosition

[straight] 000102030405060708091011

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor A

route 2

switch

switchPosition

[straight] 00010203040506070809101112

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 2

switch

switchPosition

[straight] 00010203040506070809101112

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

route 2

switch

switchPosition

[straight] 0001020304050607080910111213

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

route 2

switch

switchPosition

[straight] 000102030405060708091011121314

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

sensor C

route 2

switch

switchPosition

[straight] 000102030405060708091011121314

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 2

switch

switchPosition

[straight] 000102030405060708091011121314

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 2

switchPosition

[straight] 000102030405060708091011121314

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight]

route 2

000102030405060708091011121314

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight] 000102030405060708091011121314

segment segment

sensor Csensor Bsensor A

route 2route 1

switch

segment

switchPosition

[diverging]

switchPosition

[straight] 000102030405060708091011121314

17

SEARCH SPACE OF THE GRAPH SEARCH PROBLEM

Information on cardinalities

oMetamodel-level

oModel-level

Index structures

Homogeneity

Frequency of changes and queries

K. Zeng et al. (Microsoft Research),

A Distributed Graph Engine for Web Scale RDF Data,

VLDB 2013

TOP VENUES ON MODEL-DRIVEN ENGINEERING

Journal on Software and Systems Modeling (Springer)

Journal of Systems and Software (Elsevier)

ACM/IEEE MODELS

FASE: Fundamental Approaches to Software Engineering

STAF: Software Technology Applications and Foundations

o ICMT/ICGT: International Conference on Model/Graph Transformation

o TTC: Transformation Tool Contest

CONFERENCES

JOURNALS

odd years even years

MATCH (a1:Actor)-[:PLAYS_IN]->(m:Movie)<-[:PLAYS_IN]-(a2:Actor)

WITH a1, a2, count(m) AS movieCountWHERE movieCount >= 3RETURN a1, a2, movieCount

TRAIN BENCHMARK FRAMEWORK

Scalable graph generator

EMF

Property graph

RDF

SQL

Validation queries and

model transformations

Implemented for 12+ tools

G. Szárnyas, B. Izsó, I. Ráth, D. Varró,

The Train Benchmark: cross-technology performance

evaluation of continuous model queries,

Software and Systems Modeling, 2017 ftsrg/trainbenchmark

MODEL-DRIVEN ENGINEERING TOOLS

VIATRA framework:

reactive model

transformations

OTHER COMPUTER SCIENCE FIELDS

Semantic web

o Semantic graphs built from triples

o Ontologies for metamodeling

o SPARQL graph queries

Object-oriented databases

o Big hype in the ’90s

o Lots of similarity to EMF

…and potentially others.

SUMMARY

MDE has a lot of graph query problems

Lots of research has been conducted

Chance to avoid reinventing the wheel

o Pattern matching algorithms

o Transformation semantics

o Performance benchmarks

RELATED RESOURCES

Train Benchmark github.com/ftsrg/trainbenchmark

Incremental Graph Engine github.com/ftsrg/ingraph

LDBC Benchmarks github.com/ldbc

List of papers: github.com/szarnyasg/mde-graph-processing

Siddhartha Sahu et al. (University of Waterloo),

The Ubiquity of Large Graphs and Surprising Challenges of

Graph Processing: A User Survey,

arXiv preprint, 2017

top related