distributed graph algorithms - disi, university of...

62
Distributed Graph Algorithms Alessio Guerrieri University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Upload: others

Post on 03-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Distributed Graph Algorithms

Alessio Guerrieri

University of Trento, Italy

2016/04/26

This work is licensed under a Creative CommonsAttribution-ShareAlike 4.0 International License.

references

Page 2: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Contents

1 IntroductionP2PDistance computation

2 Thinking distributivelyBig DataPregel and Vertex-Centric frameworksGraph partitioning

3 Graph algorithmsTriangle countingConnectednessStrongly Connected ComponentsGraph clustering

Page 3: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction

Who am I?

My name is Alessio Guerrieri

Postdoctoral researcher with prof. Montresor

Specialization on distributed large-scale graph processing

Graph G = (V,E)

V set of nodes

E set of edges (links, connections)between pair of nodes

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 1 / 42

Page 4: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction

Today’s topic

Vertex-centric model for graph computation

Possible applications of vertex-centric model

Vertex-centric graph algorithms

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 2 / 42

Page 5: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction P2P

Peer-to-Peer network

Collection of peer(nodes)

Have physicalconnections

Compute an overlaygraph

Nodes know existenceof neighbors

Can discover othernodes via peer-samplingtechniques

Network

TCP/IP

Overlay

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 3 / 42

Page 6: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction P2P

Computing in Peer-to-Peer systems

Peers might want to run protocols on the network:

Finding centrality of peer

Test properties of network

Compute distances

...

Can we simply reuse distributed algorithms for Computer Networks?

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 4 / 42

Page 7: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction Distance computation

Example: decentralized distance computation

Problem

For each peer x in overlay G, compute its distance from target peerTarget.

Vertex-centric model

Initially each peer only knows its direct neighbors in the overlay

Each peer can send messages to each peer of which it knows theexistence

Amount of memory used by each peer should be sub-linear

No control on rate of activity of peers and speed of messages(asynchronous execution)

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 5 / 42

Page 8: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction Distance computation

Distributed Breadth-First-Search

D-BFS protocol executed by p:

upon initialization doif p = Target then

Dp ← 0send 〈Dp + 1〉 to neighbors

elseDp ←∞

upon receive 〈d〉 doif d < Dp then

Dp ← dsend 〈Dp + 1〉 to neighbors

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 6 / 42

Page 9: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction Distance computation

Distributed Breadth-First-Search

D-BFS protocol executed by p:

upon initialization doif p = Target then

Dp ← 0send 〈Dp + 1〉 to neighbors

elseDp ←∞

upon receive 〈d〉 doif d < Dp then

Dp ← dsend 〈Dp + 1〉 to neighbors

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 6 / 42

Page 10: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction Distance computation

Distributed Breadth-First-Search

D-BFS protocol executed by p:

upon initialization doif p = Target then

Dp ← 0send 〈Dp + 1〉 to neighbors

elseDp ←∞

upon receive 〈d〉 doif d < Dp then

Dp ← dsend 〈Dp + 1〉 to neighbors

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 6 / 42

Page 11: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction Distance computation

Distributed Breadth-First-Search

D-BFS protocol executed by p:

upon initialization doif p = Target then

Dp ← 0send 〈Dp + 1〉 to neighbors

elseDp ←∞

upon receive 〈d〉 doif d < Dp then

Dp ← dsend 〈Dp + 1〉 to neighbors

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 6 / 42

Page 12: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction Distance computation

Distributed Breadth-First-Search

D-BFS protocol executed by p:

upon initialization doif p = Target then

Dp ← 0send 〈Dp + 1〉 to neighbors

elseDp ←∞

upon receive 〈d〉 doif d < Dp then

Dp ← dsend 〈Dp + 1〉 to neighbors

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 6 / 42

Page 13: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction Distance computation

Distributed Breadth-First-Search

D-BFS protocol executed by p:

upon initialization doif p = Target then

Dp ← 0send 〈Dp + 1〉 to neighbors

elseDp ←∞

upon receive 〈d〉 doif d < Dp then

Dp ← dsend 〈Dp + 1〉 to neighbors

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 6 / 42

Page 14: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction Distance computation

Distributed Breadth-First-Search

D-BFS protocol executed by p:

upon initialization doif p = Target then

Dp ← 0send 〈Dp + 1〉 to neighbors

elseDp ←∞

upon receive 〈d〉 doif d < Dp then

Dp ← dsend 〈Dp + 1〉 to neighbors

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 6 / 42

Page 15: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction Distance computation

Distributed Breadth-First-Search

D-BFS protocol executed by p:

upon initialization doif p = Target then

Dp ← 0send 〈Dp + 1〉 to neighbors

elseDp ←∞

upon receive 〈d〉 doif d < Dp then

Dp ← dsend 〈Dp + 1〉 to neighbors

repeat every ∆ time unitssend 〈Dp + 1〉 to neighbors

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 7 / 42

Page 16: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Introduction Distance computation

Notes

Works if:

Nodes execute in any order

Messages arrive in any order

Nodes or edges are added

Messages are lost

Fails if:

Nodes fail

Edges are removed

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 8 / 42

Page 17: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Thinking distributively Big Data

New Graph Applications

Many settings in which there is need to analyze graph:

Web data

Computer networks

Biological networks

Social networks

...

Size can be extremely large

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 9 / 42

Page 18: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Thinking distributively Big Data

Approaches

Centralized/Sequential

Collect data in one place

Load it in memory

Analyze it with traditionaltechniques

Decentralized

One machine for each node

Construct overlay network

Run vertex-centric protocol

Both unfeasible! These are better:

Centralized/Parallel

Parallel processes and threads inshared memory

Distributed system

Multiple (not N) machines, eachhosting part of the graph

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 10 / 42

Page 19: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Thinking distributively Big Data

Why distributed?

Disadvantages

Slower than parallelsystems

Younger field ofresearch

Needs distribution ofdata

Advantages

Fault tolerant

Cheaper

Extendible

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 11 / 42

Page 20: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Thinking distributively Big Data

What do we want?

Nice interface for distributed graph algorithms and fast framework torun those algorithms

Graph analyst are not distributed systems experts!

Issues with:

Data replication

Fault tolerance

Message deliverance

...

should be dealt by the system!

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 12 / 42

Page 21: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Thinking distributively Big Data

Vertex centric

Why not reuse vertex-centric interface?

User

Defines data associated tovertex

Defines type of messages

Defines code executed bysingle vertex when it receivesmessages

System

Divides input graph betweenthe available machines

Each machine simulates eachvertex independently

Takes care of fault tolerance

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 13 / 42

Page 22: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Thinking distributively Big Data

Vertex centric distributed system

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 14 / 42

Page 23: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Thinking distributively Pregel and Vertex-Centric frameworks

Pregel

Developed by Google

First large-scale graphprocessing system withvertex-centric interface

Created for PageRank

Code automatically deployedon thousands of machines

Programming API

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 15 / 42

Page 24: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Thinking distributively Pregel and Vertex-Centric frameworks

Pregel-like frameworks

Many:

Giraph: on top of Hadoop

GraphX: on top of Spark

Gelly: on top of Flink

Graphlab: standalone

All frameworks allow user to run vertex-centric programs ondistributed systems

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 16 / 42

Page 25: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Thinking distributively Pregel and Vertex-Centric frameworks

Example (Giraph)

public void compute ( I t e r ab l e<IntWritable> messages ) {int minDist = In t eg e r .MAXVALUE;for ( IntWritab le message : messages ) {

minDist = Math . min (minDist , message . get ( ) ) ;}i f ( minDist < getValue ( ) . get ( ) ) {

setValue (new IntWritab le ( minDist ) ) ;IntWritab le d i s t ance = new IntWritab le ( minDist + 1)for (Edge< . . .> edge : getEdges ( ) ) {

sendMessage ( edge . getTargetVertexId ( ) , d i s t anc e ) ;}

}voteToHalt ( ) ;

}

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 17 / 42

Page 26: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Thinking distributively Pregel and Vertex-Centric frameworks

Common characteristics of Vertex-centric models

Independence from problem size

Code stays the same regardless of

size of graph

number of machines used

Embedded fault-tolerance

Periodic benchmarking

Machines can ”steal nodes”from slower machines

Distributed File System

Usually HDFS

Allow all machines to readand write in common space

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 18 / 42

Page 27: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Thinking distributively Pregel and Vertex-Centric frameworks

Synchronicity of Vertex-centric models

Synchronous

Execution in rounds

Each vertex receives in roundi all messages sent to it inround i− 1

Better guarantees → easiestmodel

Asynchronous

No rounds

Messages processed withoutguarantees

More difficult, more efficient

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 19 / 42

Page 28: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Thinking distributively Graph partitioning

Graph partitioning

First step of analysisWhich machine hosts which subset of nodes/edges?

Balance

Size of partitions should bebalanced

Quality

Messages between nodeshosted in same machine arecheap

Messages between machinesare costly

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 20 / 42

Page 29: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Thinking distributively Graph partitioning

Heuristic solutions

Graph partitioning is NP-Hard

Hash-based heuristics

Node n ends in machine H(n)%K

Iterative algorithms

Start from random andincrementally improve

Heuristics can be devised for specific data-sets

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 21 / 42

Page 30: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms

Graph Algorithms

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 22 / 42

Page 31: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms

What algorithms are available

This area is not completely new: there are distributed algorithms forcomputer networksBut:

Computer networks are small

Many interesting graph problems are not interesting on computernetworks

Some ideas can be taken from research in PRAM ( Parallel RandomAccess Machine)

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 23 / 42

Page 32: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms

Problems

We’ll look at:

Triangle counting

Connected components

Strongly connected components

Clustering

We won’t look at:

Pagerank

Single-Value decomposition

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 24 / 42

Page 33: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Triangle counting

Triangle Counting

Definition

In an undirected graph Compute, for each node n, how many pairs ofnodes (v, w) exists such that (n, v), (v, w), (w, n) are edges.

Application on social networks

Seems easier than it is...

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 25 / 42

Page 34: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Triangle counting

Triangle counting protocol

Send neighbor-list to neighbors and see which neighbors we have incommon

Executed by node p

upon initialization doTp ← 0send neighbor-list to neighbors

upon receive 〈M〉 doCommon← neighbor-list

⋂M

Tp ← Tp + |Common|

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 26 / 42

Page 35: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Triangle counting

Issues

Multi-counting

Each triangle is counted 6 times!

Should make sure that every triangle is counted only once!

Hubs

Hubs are nodes with disproportionally large degree

Exists in almost all real graphs

Will send large neighbor-lists to large number of neighbors

Will receive large quantity of messages

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 27 / 42

Page 36: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Triangle counting

Triangle counting protocol (2)

Send list of neighbors with smaller id

Executed by node p

upon initialization doTp ← 0M ← {n : n ∈ neighbor-list, idn < idp}send M to neighbors

upon receive 〈M〉 doCommon← neighbor-list

⋂M

Tp ← Tp + |Common|

Cut messages by half

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 28 / 42

Page 37: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Triangle counting

Triangle counting protocol (3)

Executed by node p

upon initialization doTp ← 0foreach n ∈ neighbor-list do

if idn < idp thenM ← {q : q ∈ neighbor-list, idq < idn}send LIST (M) to n

upon receive 〈LIST (M)〉 from n doCommon← neighbor-list

⋂M

Tp ← Tp + |Common| send TR(|Common|) to nsend TR(1) to v ∈ Common

upon receive 〈TR(t)〉 doTp ← Tp + t

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 29 / 42

Page 38: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Connectedness

Connected components (problem)

Problem definition

Two nodes v, w of an undirected graph are in the same weaklyconnected component (WCC) if exists a path from v to w.For each node n compute value Cn such that:

∀v, w ∈ V,Cv = Cw ⇐⇒ WCCv = WCCw

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 30 / 42

Page 39: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Connectedness

Connected components (solution)

Executed by node p:

upon initialization doCp ← p.idsend 〈Cp〉 to neighbors

upon receive 〈c〉 doif c < Cp then

Cp ← csend 〈c〉 to neighbors

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 31 / 42

Page 40: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Connectedness

Connected components (solution)

Executed by node p:

upon initialization doCp ← p.idsend 〈Cp〉 to neighbors

upon receive 〈c〉 doif c < Cp then

Cp ← csend 〈c〉 to neighbors

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 31 / 42

Page 41: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Connectedness

Connected components (solution)

Executed by node p:

upon initialization doCp ← p.idsend 〈Cp〉 to neighbors

upon receive 〈c〉 doif c < Cp then

Cp ← csend 〈c〉 to neighbors

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 31 / 42

Page 42: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Strongly Connected Components

Strongly connected components

Problem definition

Two nodes v, w of an directed graph are in the same strongly connectedcomponent (SCC) if exist paths from v to w and from w to v (bothdirections).

Centralized solution

Single depth-first search visit using Tarjan’s algorithm

Two depth-first search visits using Kosaraju’s algorithm

Distributed solution much more difficult!

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 32 / 42

Page 43: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Strongly Connected Components

Strongly Connected components (solution)

SCC: Executed by node p:

upon initialization doFp ← p.id // Lowest id of nodes that reaches pBp ← p.id // Lowest id of nodes reachable from psend 〈FOR(Fp)〉 to out-neighborssend 〈BACK(Bp)〉 to in-neighbors

upon receive 〈FOR(c)〉 doif c < Fp then

Fp ← c send 〈FOR(c)〉 to out-neighbors

upon receive 〈BACK(c)〉 doif c < Bp then

Bp ← c send 〈BACK(c)〉 to in-neighbors

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 33 / 42

Page 44: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Strongly Connected Components

Properties

Bp = minimum id of nodesreachable from pFp = minimum id of nodes thatcan reach p

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 34 / 42

Page 45: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Strongly Connected Components

Properties

Bp = minimum id of nodesreachable from pFp = minimum id of nodes thatcan reach p

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 34 / 42

Page 46: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Strongly Connected Components

Properties

Lemma1

(Fp = Bp = i)→ p and i are in the same SCC

Lemma2

(Fp = Fq and BP = B1 → p and q are in the same SCC

Are the two lemmas correct?

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 35 / 42

Page 47: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Strongly Connected Components

Properties

Bp = minimum id of nodesreachable from pFp = minimum id of nodes thatcan reach p

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 36 / 42

Page 48: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Strongly Connected Components

Complete algorithm

Complete algorithm

While graph not empty:

Run SCC algorithm

remove each node p such that Fp = Bp

Can improve number of SCC found through heuristics

Quicker convergence for largest SCC

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 37 / 42

Page 49: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Graph clustering

Graph clustering

Given a graph, divide its vertices inclusters such that:

Most edges connect nodes in samecluster

Few edges go across cluster

Clusters are significant

Precise definition depends fromapplication scenarioAll versions are NP-Complete

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 38 / 42

Page 50: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Graph clustering

Label-propagation algorithm

Executed by node p:

upon initialization doCp ← p.id // starting label equal to idNp ← {} // dictionary containing labels of neighborssend Cp to neighbors

upon receive c from q doNp[q] = c // update dictionary with new label of neighbor

repeat every ∆ time unitsc′ ← most common label in neighborsif Cp 6= c′ then

send c′ to neighborsCp ← c′

Note:ties resolved randomlyAlessio Guerrieri (UniTN) DS - Graphs 2016/04/26 39 / 42

Page 51: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Graph clustering

Sample run

Initialization:

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 40 / 42

Page 52: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Graph clustering

Sample run

Active node chooses label 1 (randomly)

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 40 / 42

Page 53: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Graph clustering

Sample run

Active node chooses label 3(randomly)

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 40 / 42

Page 54: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Graph clustering

Sample run

Active node chooses label 6(randomly)

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 40 / 42

Page 55: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Graph clustering

Sample run

Active node chooses label 6

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 40 / 42

Page 56: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Graph clustering

Sample run

Active node chooses label 1

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 40 / 42

Page 57: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Graph clustering

Sample run

Active node chooses label 1

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 40 / 42

Page 58: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Graph clustering

Sample run

Active node chooses label 6

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 40 / 42

Page 59: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Graph clustering

Sample run

Converged state: no node change its mind

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 40 / 42

Page 60: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Graph clustering

Disadvantages

Non-deterministic

Non-null probability of collapsing to single cluster

Low quality

In practice, it’s not that good

Better options

Slow, iterative algorithms

Fast, heuristics

My algorithm!

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 41 / 42

Page 61: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Graph algorithms Graph clustering

Conclusions

Vertex-centric algorithms can be used in:I P2P systemsI Computer networksI Large-scale graph analysis

Field still in fluxI New frameworksI New programming modelsI New algorithms

Many applicationsI Everything is a graphI Could be interesting for a project

Alessio Guerrieri (UniTN) DS - Graphs 2016/04/26 42 / 42

Page 62: Distributed Graph Algorithms - DISI, University of Trentodisi.unitn.it/~montreso/ds/handouts/12-graphs.pdf · 2018-12-16 · Distributed Graph Algorithms Alessio Guerrieri University

Reading Material

Pregel: A System for Large-Scale Graph Processing by Malewicz et al.

Pregel Algorithms for Graph Connectivity Problems with PerformanceGuarantees by Yan et al.

Giraph http://giraph.apache.org/

Graphx http://spark.apache.org/graphx/