graphs as streams: rethinking graph processing in the streaming era
TRANSCRIPT
![Page 1: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/1.jpg)
GRAPHS AS STREAMSRETHINKING GRAPH PROCESSING IN THE STREAMING ERA
Vasia Kalavri [email protected]
@vkalavri
![Page 2: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/2.jpg)
2
![Page 3: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/3.jpg)
MODERN STREAMING TECHNOLOGY
➤ sub-second latencies
➤ high throughput
➤ dynamic topologies
➤ powerful semantics
➤ ecosystem integration
3
![Page 4: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/4.jpg)
MORE THAN COUNTING WORDS
Complex Event Processing Online Machine Learning Streaming SQL
4
![Page 5: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/5.jpg)
WHAT ABOUT GRAPH PROCESSING?
5
![Page 6: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/6.jpg)
HOW WE’VE DONE GRAPH PROCESSING SO FAR
1. Load: read the graph from disk and partition it in memory
6
![Page 7: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/7.jpg)
HOW WE’VE DONE GRAPH PROCESSING SO FAR
1. Load: read the graph from disk and partition it in memory
2. Compute: read and mutate the graph state
7
![Page 8: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/8.jpg)
HOW WE’VE DONE GRAPH PROCESSING SO FAR
1. Load: read the graph from disk and partition it in memory
2. Compute: read and mutate the graph state
3. Store: write the final graph state back to disk
8
![Page 9: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/9.jpg)
“ If what you need is to analyze a static graph over and over again then this model is great!
9
![Page 10: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/10.jpg)
WHAT’S WRONG WITH THIS MODEL?
➤ It is slow ➤ wait until the computation is over before you see any result
➤ pre-processing and partitioning
➤ It is expensive ➤ lots of memory and CPU required in order to scale
➤ It requires re-computation for graph changes ➤ no efficient way to deal with updates
10
![Page 11: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/11.jpg)
➤ Maintain the dynamic graph structure
➤ Provide up-to-date results with low latency
➤ Compute on fresh state only
11
GRAPH STREAMING CHALLENGES
![Page 12: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/12.jpg)
12
![Page 13: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/13.jpg)
ACADEMIA TO THE RESCUE
➤ Graph streaming in the 90s-00s ➤ input fits in secondary storage
➤ limited memory
➤ few passes over the input data
➤ compact graph representations and summaries
13
![Page 14: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/14.jpg)
GRAPH SUMMARIES
➤ spanners ➤ connectivity, distance
➤ sparsifiers ➤ cut estimation
➤ neighborhood sketches
graph summary
~algorithm algorithmR1 R2
14
![Page 15: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/15.jpg)
1
43
2
5
i=0
BATCH CONNECTED COMPONENTS
15
6
7
8
![Page 16: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/16.jpg)
1
43
2
5
6
7
8
i=0
BATCH CONNECTED COMPONENTS
16
14
345
235
24
78
67
68
![Page 17: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/17.jpg)
1
21
2
2
i=1
BATCH CONNECTED COMPONENTS
17
6
6
6
![Page 18: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/18.jpg)
1
21
1
2
6
6
6
i=1
BATCH CONNECTED COMPONENTS
18
2
122
112
12
76
6
6
![Page 19: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/19.jpg)
1
11
1
1
i=2
BATCH CONNECTED COMPONENTS
19
6
6
6
![Page 20: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/20.jpg)
54
76
86
31
52
20
STREAM CONNECTED COMPONENTS
Graph Summary: Disjoint Set (Union-Find)
➤ Only store component IDs and vertex IDs
![Page 21: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/21.jpg)
54
76
86
42
31
52
21
1
3
Cid = 1
![Page 22: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/22.jpg)
54
76
86
42
43
31
52
22
1
3
Cid = 1
2
5
Cid = 2
![Page 23: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/23.jpg)
54
76
86
42
43
87
31
52
23
1
3
Cid = 1
2
5
Cid = 2
4
![Page 24: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/24.jpg)
54
76
86
42
43
87
41
31
52
24
1
3
Cid = 1
2
5
Cid = 2
4
6
7Cid = 6
![Page 25: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/25.jpg)
54
76
86
42
43
87
41
5225
1
3
Cid = 1
2
5
Cid = 2
4
6
7Cid = 6
8
![Page 26: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/26.jpg)
54
76
86
42
43
87
41
26
1
3
Cid = 1
2
5
Cid = 2
4
6
7Cid = 6
8
![Page 27: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/27.jpg)
76
86
42
43
87
41
27
6
7Cid = 6
8
1
3
Cid = 1
2
5
Cid = 2
4
![Page 28: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/28.jpg)
76
86
42
43
87
41
28
1
3
Cid = 1
2
5
4
6
7Cid = 6
8
![Page 29: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/29.jpg)
DISTRIBUTED STREAM CONNECTED COMPONENTS
29
![Page 30: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/30.jpg)
THE BAD NEWS
➤A slightly different motivation ➤ finite graph stored in disk vs. unbounded graph arriving in real-time ➤ some algorithms assume we know |V|, |E| ➤ most algorithms designed for single-node execution
30
![Page 31: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/31.jpg)
THE GOOD NEWS
➤A quite different reality ➤ memory is getting bigger ➤ … and cheaper ➤ we know how to design distributed algorithms
31
![Page 32: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/32.jpg)
GELLY-STREAM SINGLE-PASS STREAM GRAPH PROCESSING WITH APACHE FLINK
32
![Page 33: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/33.jpg)
GELLY ON STREAMS
DataStreamDataSet
Distributed Dataflow
Deployment
Gelly Gelly-Stream
➤ Static Graphs
➤ Multi-Pass Algorithms
➤ Full Computations
➤ Dynamic Graphs
➤ Single-Pass Algorithms
➤ Approximate Computations
DataStream
33
![Page 34: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/34.jpg)
DISTRIBUTED STREAM CONNECTED COMPONENTS
34
![Page 35: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/35.jpg)
STREAM CONNECTED COMPONENTS WITH FLINK
DataStream<DisjointSet> cc = edgeStream .keyBy(0) .timeWindow(Time.of(100, TimeUnit.MILLISECONDS)) .fold(new DisjointSet(), new UpdateCC()) .flatMap(new Merger()) .setParallelism(1);
35
![Page 36: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/36.jpg)
STREAM CONNECTED COMPONENTS WITH FLINK
DataStream<DisjointSet> cc = edgeStream .keyBy(0) .timeWindow(Time.of(100, TimeUnit.MILLISECONDS)) .fold(new DisjointSet(), new UpdateCC()) .flatMap(new Merger()) .setParallelism(1);
36
Partition the edge stream
![Page 37: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/37.jpg)
STREAM CONNECTED COMPONENTS WITH FLINK
DataStream<DisjointSet> cc = edgeStream .keyBy(0) .timeWindow(Time.of(100, TimeUnit.MILLISECONDS)) .fold(new DisjointSet(), new UpdateCC()) .flatMap(new Merger()) .setParallelism(1);
37
Define the merging frequency
![Page 38: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/38.jpg)
STREAM CONNECTED COMPONENTS WITH FLINK
DataStream<DisjointSet> cc = edgeStream .keyBy(0) .timeWindow(Time.of(100, TimeUnit.MILLISECONDS)) .fold(new DisjointSet(), new UpdateCC()) .flatMap(new Merger()) .setParallelism(1);
38
merge locally
![Page 39: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/39.jpg)
STREAM CONNECTED COMPONENTS WITH FLINK
DataStream<DisjointSet> cc = edgeStream .keyBy(0) .timeWindow(Time.of(100, TimeUnit.MILLISECONDS)) .fold(new DisjointSet(), new UpdateCC()) .flatMap(new Merger()) .setParallelism(1);
39
merge globally
![Page 40: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/40.jpg)
GELLY-STREAM STATUS
➤ Properties and Metrics ➤ Transformations ➤ Aggregations ➤ Discretization ➤ Neighborhood Aggregations
40
➤ Graph Streaming Algorithms ➤ Connected Components ➤ Bipartiteness Check ➤ Window Triangle Count ➤ Triangle Count Estimation ➤ Continuous Degree Aggregate
![Page 41: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/41.jpg)
FEELING GELLY?
➤Gelly-Stream Repository
github.com/vasia/gelly-streaming
➤A list of graph streaming papers
citeulike.org/user/vasiakalavri/tag/graph-streaming
➤A related talk at FOSDEM’16
slideshare.net/vkalavri/gellystream-singlepass-graph-streaming-analytics-with-apache-flink
41
![Page 42: Graphs as Streams: Rethinking Graph Processing in the Streaming Era](https://reader031.vdocument.in/reader031/viewer/2022022414/587138e91a28abf0568b64b7/html5/thumbnails/42.jpg)
GRAPHS AS STREAMSRETHINKING GRAPH PROCESSING IN THE STREAMING ERA
Vasia Kalavri [email protected]
@vkalavri