gábor szárnyas opencypher meetup @ nyc · olap global queries overview of graph processing oltp...
TRANSCRIPT
Academic research on graph processing:connecting recent findings to industrial technologies
Gábor Szárnyas
openCypher Meetup @ NYC
LINKED DATA BENCHMARK COUNCIL
LDBC is a non-profit organization dedicated to establishing
benchmarks, benchmark practices and benchmark results for
graph data management software.
LDBC’s Social Network Benchmark is an industrial and academic
initiative, formed by principal actors in the field of graph-like
data management.
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
OLAP
local queries
global queries
global computations
OLAP global queries
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
local queries
global computations
Example: „Friends’ recent likes”
MATCH (u:User {id: $userId})-[:FRIEND]-(f:User)-[l:LIKES]->(p:Post)
RETURN f, pORDER BY l.timestamp DESCLIMIT 10
OLAP global queries
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
local queries
global computations
Orri Erling et al.,
The LDBC Social Network Benchmark: Interactive Workload,
SIGMOD 2015
14 queries and 8 updates
frequent up.limited data
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
local queries
global computations
OLAP global queries
frequent up.limited data
Example: „One-sided friendships”
MATCH (u1:User)-[:FRIEND]-(u2:User)-[l:LIKES]->(p:Post),(u1)-[:AUTHOR_OF]->(p)
WITH u1, u2, count(l) AS likesWHERE likes > 10AND NOT (u1)-[:LIKES]->(:Post)<-[:AUTHOR_OF]-(u2)
RETURN u1, u2
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
local queries
global computations
Arnau Prat, Gábor Szárnyas, Alex Averbuch et al.,
The LDBC Social Network Benchmark: BI Workload,
Technical report available, peer-reviewed paper in 2018
OLAP global queries
25 queries with infrequent executions
lots of data infrequent up.
frequent up.limited data
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
OLAP
local queries
global queries
global computations
frequent up.limited data
lots of data infrequent up.
• PageRank
• Shortest paths
• Clustering coefficient
Example: „Find the most central individuals.”
Neo4j: Graph Algorithms library
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
OLAP
local queries
global queries
global computations
Alexandru Iosup et al.,
LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on
Parallel and Distributed Platforms,
VLDB 2016
frequent up.limited data
lots of data infrequent up.
One-time execution
all data no updates
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
OLAP
local queries
global queries
global computations all data no updates
frequent up.limited data
lots of data infrequent up.
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
OLAP
local queries
global queries
global computations
validation global queries
all data no updates
frequent up.limited data
lots of data infrequent up.
Example: „Emergency contact for juvenile users”
MATCH (u1:User)WHERE u1.age < 18AND NOT (u1)-[:EMERGENCY_CONTACT]->(:User)
RETURN u1
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
OLAP
local queries
global queries
global computations
Gábor Szárnyas et al.
The Train Benchmark: cross-technology performance
evaluation of continuous model queries,
Software and Systems Modeling, 2017
validation global queries lots of data frequent up.
all data no updates
frequent up.limited data
lots of data infrequent up.
FAULT-TOLERANT SYSTEMS RESEARCH GROUP
Critical systems
Avionics
Railway
Automotive
MODEL-DRIVEN ENGINEERING
Models are first class citizens during development
o SysML / requirements, statecharts, etc.
Validation and code generation techniques for correctness
Technology:
Eclipse Modeling Framework (EMF)
Originally started at IBM as an implementation of the Object
Management Group’s (OMG) Meta Object Facility (MOF).
i.e., an object-oriented model
i.e., a property graph-like structure with a metamodel
MODEL VALIDATION
Implemented with model queries
Models are typed, attributed graphs
Typical queries
o Get two components connected by a particular edge
MATCH (r:R)…(s:S) WHERE NOT (r)-[:E]->(s)
o Check if two objects are reachable
MATCH (r:R)…(s:S) WHERE NOT (r)-[:E1|E2*]->(s)
o Property checks
MATCH (r:R)-->(s:S) WHERE r.a = 'x' OR (s:Y)
Complex graph queries
1
switch
sensor C
sensor B
2
sensor A
segment
route
1
2
segment
segment
segmentswitch
sensor Csensor Bsensor A
route 2route 1
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sw: Switchsensor: Sensor
route: Route swP: SwitchPosition
NEG
sw: Switchsensor: Sensor
route: Route swP: SwitchPosition
NEG
MATCH (route:Route)
sw: Switchsensor: Sensor
route: Route swP: SwitchPosition
NEG
MATCH (route:Route)-->(swP:SwitchPosition)
sw: Switchsensor: Sensor
route: Route swP: SwitchPosition
NEG
MATCH (route:Route)-->(swP:SwitchPosition)-->(sw:Switch)
sw: Switchsensor: Sensor
route: Route swP: SwitchPosition
NEG
MATCH (route:Route)-->(swP:SwitchPosition)-->(sw:Switch)<--(sensor:Sensor)
sw: Switchsensor: Sensor
route: Route swP: SwitchPosition
NEG
MATCH (route:Route)-->(swP:SwitchPosition)-->(sw:Switch)<--(sensor:Sensor)
WHERE NOT (route)-->(sensor)
sw: Switchsensor: Sensor
route: Route swP: SwitchPosition
NEG
MATCH (route:Route)-->(swP:SwitchPosition)-->(sw:Switch)<--(sensor:Sensor)
WHERE NOT (route)-->(sensor)RETURN route, sensor, swP, sw
sw: Switchsensor: Sensor
route: Route swP: SwitchPosition
NEG
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
LOCAL SEARCH-BASED QUERY EVALUATION
Matching: 𝑃 → 𝐺 (graph morphism)
Constraints satisfaction on a finite domain (CSP/FD):
o Variables: vertices of 𝑃
o Constraints: edges of 𝑃
o Domain values: 𝐺
Complexity: 𝐺 |𝑃|
G. Varró, F. Deckwerth, M. Wieber, A. Schürr,
An algorithm for generating model-sensitive search plans for pattern
matching on EMF models,
Software and Systems Modeling, 2013
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight] 00
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
0001
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
switch
000102
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
switch
switchPosition
[diverging] 00010203
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
route 1
switch
switchPosition
[diverging] 0001020304
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
route 1
switch
switchPosition
[diverging] 000102030405
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
route 1
switch
switchPosition
[diverging] 000102030405
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
switch
switchPosition
[diverging] 000102030405
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
switch
000102030405
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
switch
switchPosition
[straight] 00010203040506
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
route 2
switch
switchPosition
[straight] 0001020304050607
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
route 2
switch
switchPosition
[straight] 000102030405060708
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
route 2
switch
switchPosition
[straight] 000102030405060708
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
switch
switchPosition
[straight] 000102030405060708
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
switch
000102030405060708
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
000102030405060708
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight] 000102030405060708
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor B
00010203040506070809
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight] 00010203040506070809
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
0001020304050607080910
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
switch
000102030405060708091011
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
switch
switchPosition
[diverging] 00010203040506070809101112
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
route 1
switch
switchPosition
[diverging] 0001020304050607080910111213
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
route 1
switch
switchPosition
[diverging] 000102030405060708091011121314
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
route 1
switch
switchPosition
[diverging] 000102030405060708091011121314
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
switch
switchPosition
[diverging] 000102030405060708091011121314
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
switch
000102030405060708091011121314
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
switch
switchPosition
[straight] 00010203040506070809101112131415
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
route 2
switch
switchPosition
[straight] 0001020304050607080910111213141516
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
route 2
switch
switchPosition
[straight] 000102030405060708091011121314151617
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
route 2
switch
switchPosition
[straight] 000102030405060708091011121314151617
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight] 00
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 1
0001
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 1
switchPosition
[diverging] 000102
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 1
switch
switchPosition
[diverging] 00010203
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
route 1
switch
switchPosition
[diverging] 0001020304
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
route 1
switch
switchPosition
[diverging] 000102030405
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 1
switch
switchPosition
[diverging] 000102030405
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
route 1
switch
switchPosition
[diverging] 00010203040506
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
route 1
switch
switchPosition
[diverging] 0001020304050607
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 1
switch
switchPosition
[diverging] 0001020304050607
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 1
switchPosition
[diverging] 0001020304050607
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 1
0001020304050607
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight] 0001020304050607
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 2
000102030405060708
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 2
switchPosition
[straight] 00010203040506070809
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 2
switch
switchPosition
[straight] 0001020304050607080910
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
route 2
switch
switchPosition
[straight] 000102030405060708091011
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor A
route 2
switch
switchPosition
[straight] 00010203040506070809101112
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 2
switch
switchPosition
[straight] 00010203040506070809101112
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
route 2
switch
switchPosition
[straight] 0001020304050607080910111213
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
route 2
switch
switchPosition
[straight] 000102030405060708091011121314
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
sensor C
route 2
switch
switchPosition
[straight] 000102030405060708091011121314
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 2
switch
switchPosition
[straight] 000102030405060708091011121314
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 2
switchPosition
[straight] 000102030405060708091011121314
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight]
route 2
000102030405060708091011121314
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight] 000102030405060708091011121314
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
[diverging]
switchPosition
[straight] 000102030405060708091011121314
17
SEARCH SPACE OF THE GRAPH SEARCH PROBLEM
Information on cardinalities
oMetamodel-level
oModel-level
Index structures
Homogeneity
Frequency of changes and queries
K. Zeng et al. (Microsoft Research),
A Distributed Graph Engine for Web Scale RDF Data,
VLDB 2013
TOP VENUES ON MODEL-DRIVEN ENGINEERING
Journal on Software and Systems Modeling (Springer)
Journal of Systems and Software (Elsevier)
ACM/IEEE MODELS
FASE: Fundamental Approaches to Software Engineering
STAF: Software Technology Applications and Foundations
o ICMT/ICGT: International Conference on Model/Graph Transformation
o TTC: Transformation Tool Contest
CONFERENCES
JOURNALS
odd years even years
MATCH (a1:Actor)-[:PLAYS_IN]->(m:Movie)<-[:PLAYS_IN]-(a2:Actor)
WITH a1, a2, count(m) AS movieCountWHERE movieCount >= 3RETURN a1, a2, movieCount
TRAIN BENCHMARK FRAMEWORK
Scalable graph generator
EMF
Property graph
RDF
SQL
Validation queries and
model transformations
Implemented for 12+ tools
G. Szárnyas, B. Izsó, I. Ráth, D. Varró,
The Train Benchmark: cross-technology performance
evaluation of continuous model queries,
Software and Systems Modeling, 2017 ftsrg/trainbenchmark
MODEL-DRIVEN ENGINEERING TOOLS
VIATRA framework:
reactive model
transformations
OTHER COMPUTER SCIENCE FIELDS
Semantic web
o Semantic graphs built from triples
o Ontologies for metamodeling
o SPARQL graph queries
Object-oriented databases
o Big hype in the ’90s
o Lots of similarity to EMF
…and potentially others.
SUMMARY
MDE has a lot of graph query problems
Lots of research has been conducted
Chance to avoid reinventing the wheel
o Pattern matching algorithms
o Transformation semantics
o Performance benchmarks
RELATED RESOURCES
Train Benchmark github.com/ftsrg/trainbenchmark
Incremental Graph Engine github.com/ftsrg/ingraph
LDBC Benchmarks github.com/ldbc
List of papers: github.com/szarnyasg/mde-graph-processing
Siddhartha Sahu et al. (University of Waterloo),
The Ubiquity of Large Graphs and Surprising Challenges of
Graph Processing: A User Survey,
arXiv preprint, 2017