asymmetry in large-scale graph analysis, explained

14
Asymmetry in Large-Scale Graph Analysis, Explained 2nd International Workshop on Graph Data Management Experiences and Systems (GRADES 2014) Vasiliki Kalavri 1 , Stephan Ewen 2 , Kostas Tzoumas 3 , Vladimir Vlassov 4 , Volker Markl 5 , Seif Haridi 6 1, 4, 6 KTH Royal Institute of Technology, 2, 3, 5 Technical University of Berlin 1, 4, 6 {kalavri, vladv, haridi}@kth.se 2, 3, 5 {firstname.lastname}@tu-berlin.de

Upload: vasia-kalavri

Post on 08-Jan-2017

371 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Asymmetry in Large-Scale Graph Analysis, Explained

Asymmetry in Large-Scale Graph Analysis, Explained

2nd International Workshop on Graph Data Management Experiences and Systems (GRADES 2014)

Vasiliki Kalavri1, Stephan Ewen2, Kostas Tzoumas3, Vladimir Vlassov4, Volker Markl5, Seif Haridi6

1, 4, 6 KTH Royal Institute of Technology, 2, 3, 5 Technical University of Berlin1, 4, 6{kalavri, vladv, haridi}@kth.se

2, 3, 5{firstname.lastname}@tu-berlin.de

Page 2: Asymmetry in Large-Scale Graph Analysis, Explained

Motivation

● Many of large-scale data processingapplications include fixed point iterations○ social network analysis○ web graph analysis○ machine learning

2

Page 3: Asymmetry in Large-Scale Graph Analysis, Explained

Asymmetrical Convergence

● Often, in fixed point iterations, some elements converge faster than others

● Not all elements require an update in every iteration

3

Page 4: Asymmetry in Large-Scale Graph Analysis, Explained

Can we detect the elements that require recomputation and avoid redundant computations?

4

Page 5: Asymmetry in Large-Scale Graph Analysis, Explained

Contributions

● A categorization of optimizations for fixed point iterative graph processing

● Necessary conditions under which, it is safe to apply optimizations

● A mapping of existing techniques to graph processing abstractions

● An implementation of template execution plans

Optimized algorithms yield order of magnitude gains!

5

Page 6: Asymmetry in Large-Scale Graph Analysis, Explained

Asymmetry in Connected Components

6

Page 7: Asymmetry in Large-Scale Graph Analysis, Explained

Iterative Plans - Bulk

● In each iteration, all elements are computed

● Always applicable

7

Page 8: Asymmetry in Large-Scale Graph Analysis, Explained

Iterative Plans - Dependency

8

● In each iteration, only elements whose at least one neighbor has changed are computed

● The state is computed using the values of all neighbors

● Always applicable

Page 9: Asymmetry in Large-Scale Graph Analysis, Explained

Iterative Plans - Incremental

9

● In each iteration, only elements whose at least one neighbor has changed are computed

● The state is computed using only the values of updated neighbors

● Applicable when the update function is idempotent and weakly monotonic (e.g. min)

Page 10: Asymmetry in Large-Scale Graph Analysis, Explained

Iterative Plans - Delta

10

● In each iteration, only elements whose at least one neighbor has changed are computed

● The state is computed using only the deltas of updated neighbors

● Applicable when the update function is linear over the composition operator (e.g. sum)

Page 11: Asymmetry in Large-Scale Graph Analysis, Explained

Iteration Techniques Support in Graph Processing Systems

X : provided by defaultX : can be easily implementedX : possible, but non-intuitive

System Bulk Dependency Incremental Delta

Pregel X X X X

GraphLab X X X X

GraphX X X X X

Powergraph X X X X

Stratosphere X X X X

11

Page 12: Asymmetry in Large-Scale Graph Analysis, Explained

Performance - Connected Components

12

Page 13: Asymmetry in Large-Scale Graph Analysis, Explained

Performance - PageRank

13

Page 14: Asymmetry in Large-Scale Graph Analysis, Explained

Conclusions & Future Work

● Exploiting asymmetrical convergence can lead to order of magnitude performance gains

● In the future, we plan to○ Use cost-based optimization, to automatically select the

most efficient iterative plan, at runtime.○ Implement a set of representative applications and

compare performance with iterative and graph-processing systems.

14