pregel

52
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkwoski In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 135-146). ACM

Upload: weiru-dai

Post on 01-Nov-2014

869 views

Category:

Technology


2 download

DESCRIPTION

Google's graph processing framework.

TRANSCRIPT

Page 1: Pregel

Pregel: A System for Large-Scale Graph Processing

Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkwoski

In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 135-146). ACM

Page 2: Pregel

2Source: SIGMETRICS ’09 Tutorial – MapReduce: The Programming Model and Practice, by Jerry Zhao

Page 3: Pregel

3

Outline

• Introduction• Computation Model• Writing a Pregel Program• System Implementation• Applications• Experiments• Related Work• Conclusion & Future Work

Page 4: Pregel

4

The Problem• Many practical computing problems concern large

graphs.

• Efficient processing of large graphs is challenging:Poor locality of memory accessVery little work per vertexChanging degree of parallelismRunning over many machines makes the problem worse

Large graph dataWeb graph

Transportation routesCitation relationships

Social networks

Graph algorithmsPageRank

Shortest pathConnected components

Clustering techniques

Page 5: Pregel

5

Want to Process a Large Scale Graph? The Options:

1. Crafting a custom distributed infrastructure.Substantial engineering effort.

2. Relying on an existing distributed platform: e.g. Map Reduce.Inefficient: Must store graph state in each state too

much communication between stages.

3. Using a single-computer graph algorithm library.Not scalable.

4. Using an existing parallel graph system.Not fault tolerance.

Page 6: Pregel

6

Pregel

• Google, to overcome, these challenges came up with Pregel.Provides scalabilityFault-toleranceFlexibility to express arbitrary algorithms

• The high level organization of Pregel programs is inspired by Valiant’s Bulk Synchronous Parallel model [45].

[45] Leslie G. Valiant, A Bridging Model for Parallel Computation. Comm. ACM 33(8), 1990

Page 7: Pregel

7

Bulk Synchronous Parallel

• Series of iterations (supersteps) .• Each vertex V invokes a function in parallel.• Can read messages sent in previous superstep (S-1).• Can send messages, to be read at the next superstep

(S+1).• Can modify state of outgoing edges.

InputAll Voteto Halt Output

Page 8: Pregel

8

Advantage? In Vertex-Centric Approach

• Users focus on a local action.• Processing each item independently.• Ensures that pregel programs are inherently free of

deadlocks and data races common in asynchronous systems.

Page 9: Pregel

9

Outline

• Introduction• Computation Model• Writing a Pregel Program• System Implementation• Applications• Experiments• Related Work• Conclusion & Future Work

Page 10: Pregel

10

Model of Computation

• A Directed Graph is given to Pregel.• It runs the computation at each vertex.• Until all nodes vote for halt.• Pregel gives you a directed graph back.

All Voteto Halt

Output

Page 11: Pregel

11

• Algorithm termination is based on every vertex voting to halt.

• In superstep 0, every vertex is in the active state.• A vertex deactivates itself by voting to halt.• It can be reactivated by receiving an (external)

message.

Vertex State Machine

Page 12: Pregel

12

3 6 2 1

3 6 2 16 2 66

6 6 2 66 6

6 6 6 66

Blue Arrows are messages.

Blue vertices have voted to halt.6

Example: Finding the largest value in a graph

Page 13: Pregel

13

Outline

• Introduction• Computation Model• Writing a Pregel Program• System Implementation• Applications• Experiments• Related Work• Conclusion & Future Work

Page 14: Pregel

14

The C++ API

• Subclassing the predefined Vertex class, and writes a Compute method.Compute() method: which will be executed at each active

vertex in every superstep.• Can get/set vertex value.

GetValue() / MutableValue()• Can get/set outgoing edges values.

GetOutEdgeIterator()• Can send/receive messages.

SendMessageTo() / Compute()

Page 15: Pregel

15

The C++ API – Vertex Class

3 value types

Override this!

in msgs

out msg

VertexEdge

Page 16: Pregel

16

The C++ API

Message passing:• No guaranteed message delivery order.• Messages are delivered exactly once.• Can send messages to any node.• If dest_vertex doesn’t exist, user’s function is

called.

void SendMessageTo(const string& dest_vertex,

const MessageValue& message);

Page 17: Pregel

17

The C++ API

Combiners (not active by default): • Sending a message to another vertex that exists on a

different machine has some overhead. • User specifies a way to reduce many messages into

one value (ala Reduce in MR).by overriding the Combine() method.Must be commutative and associative.

• Exceedingly useful in certain contexts (e.g., 4x speedup on shortest-path computation).

Page 18: Pregel

18

The C++ API

Aggregators:• A mechanism for global communication, monitoring,

and data.Each vertex can produce a value in a superstep S for the

Aggregator to use. The Aggregated value is available to all the vertices in

superstep S+1. • Aggregators can be used for statistics and for global

communication.E.g., Sum applied to out-edge count of each vertex.

generates the total number of edges in the graph and communicate it to all the vertices.

Page 19: Pregel

19

The C++ API

Topology mutations:• Some graph algorithms need to change the graph's

topology.E.g. A clustering algorithm may need to replace a cluster

with a node• Vertices can create / destroy vertices at will.• Resolving conflicting requests:

Partial ordering: E Remove,V Remove,V Add, E Add.User-defined handlers: You fix the conflicts on your own.

Page 20: Pregel

20

The C++ API

Input and output: • It has Reader/Writer for common file formats:

Text fileVertices in a relational DBRows in BigTable

• User can customize Reader/Writer for new input/outputs.Subclassing Reader/Writer classes.

Page 21: Pregel

21

Outline

• Introduction• Computation Model• Writing a Pregel Program• System Implementation• Applications• Experiments• Related Work• Conclusion & Future Work

Page 22: Pregel

22

Implementation

• Pregel was designed for the Google cluster architecture.

• Persistent data is stored as files on a distributed storage system like GFS or BigTable.

• Temporary data is stored on local disk.• Vertices are assigned to the machines based on their

vertex-ID ( hash(ID) ) so that it can easily be understood that which node is where.

Page 23: Pregel

23

System Architecture

• Executable is copied to many machines.• One machine becomes the Master.

Maintains worker.Recovers faults of workers.Provides Web-UI monitoring tool of job progress.

• Other machines become Workers.Processes its task.Communicates with the other workers.

Page 24: Pregel

24

Pregel Execution

1. User programs are copied on machines.2. One machine becomes the master.

Other computer can find the master using name service and register themselves to it.

The master determines how many partitions the graph have

3. The master assigns one or more partitions and a portion of user input to each worker.

4. The workers run the compute function for active vertices and send the messages asynchronously.There is one thread for each partition in each worker.When the superstep is finished workers tell the master how

many vertices will be active for next superstep.

Page 25: Pregel

25Source: http://www.cnblogs.com/huangfox/archive/2013/01/03/2843103.html

Page 26: Pregel

26

Fault Tolerance• Checkpointing

The master periodically instructs the workers to save the state of their partitions to persistent storage. e.g., Vertex values, edge values, incoming messages.

• Failure detection Using regular “ping” messages.

• RecoveryThe master reassigns graph partitions to the currently

available workers.The workers all reload their partition state from most

recent available checkpoint.

Page 27: Pregel

27

Outline

• Introduction• Computation Model• Writing a Pregel Program• System Implementation• Applications• Experiments• Related Work• Conclusion & Future Work

Page 28: Pregel

28

Application – Page Rank

•A = A given page•T1 …. Tn = Pages that point to page A (citations)•d = Damping factor between 0 and 1 (usually kept as 0.85)•C(T) = number of links going out of T•PR(A) = the PageRank of page A

))(

)(........

)(

)(

)(

)(()1()(

2

2

1

1

n

n

TC

TPR

TC

TPR

TC

TPRddAPR

Page 29: Pregel

29

Application – Page Rank

Source: Wikipedia

Page 30: Pregel

30

Application – Page Rank

class PageRankVertex: public Vertex<double, void, double> {

public: virtual void Compute(MessageIterator* msgs) {

if (superstep() >= 1) {double sum = 0;for (; !msgs->Done(); msgs->Next())

sum += msgs->Value();*MutableValue() = 0.15 / NumVertices() + 0.85 *

sum;}if (superstep() < 30) {

const int64 n = GetOutEdgeIterator().size();SendMessageToAllNeighbors(GetValue() / n);

} elseVoteToHalt();

}};

Store and carry PageRank

For convergence, either there is a limit on the number of supersteps or aggregators are used to detect convergence.

Page 31: Pregel

31

Application – Shortest Pathclass ShortestPathVertex

: public Vertex<int, int, int> { void Compute(MessageIterator* msgs) {

int mindist = IsSource(vertex_id()) ? 0 : INF;for (; !msgs->Done(); msgs->Next())

mindist = min(mindist, msgs->Value());if (mindist < GetValue()) {

*MutableValue() = mindist;OutEdgeIterator iter = GetOutEdgeIterator();for (; !iter.Done(); iter.Next())

SendMessageTo(iter.Target(),mindist + iter.GetValue());

}VoteToHalt();

}};

a constant larger than any feasible distance

In the 1st superstep, only the source vertex will

update its value (from INF to zero)

Page 32: Pregel

32

Example: SSSP in Pregel

0

10

5

2 3

2

1

9

7

4 6

Page 33: Pregel

33

Example: SSSP in Pregel

0

10

5

2 3

2

1

9

7

4 6

10

5

Page 34: Pregel

34

Example: SSSP in Pregel

0

10

5

10

5

2 3

2

1

9

7

4 6

Page 35: Pregel

35

Example: SSSP in Pregel

0

10

5

10

5

2 3

2

1

9

7

4 6

11

7

12

814

Page 36: Pregel

36

Example: SSSP in Pregel

0

8

5

11

7

10

5

2 3

2

1

9

7

4 6

Page 37: Pregel

37

Example: SSSP in Pregel

0

8

5

11

7

10

5

2 3

2

1

9

7

4 6

9

14

13

15

Page 38: Pregel

38

Example: SSSP in Pregel

0

8

5

9

7

10

5

2 3

2

1

9

7

4 6

Page 39: Pregel

39

Example: SSSP in Pregel

0

8

5

9

7

10

5

2 3

2

1

9

7

4 6

13

Page 40: Pregel

40

Example: SSSP in Pregel

0

8

5

9

7

10

5

2 3

2

1

9

7

4 6

Page 41: Pregel

41

Outline

• Introduction• Computation Model• Writing a Pregel Program• System Implementation• Applications• Experiments• Related Work• Conclusion & Future Work

Page 42: Pregel

42

Experiments

• 300 multicore commodity PCs used.• Only running time is counted.

Checkpointing disabled.• Measures scalability of Worker tasks.• Measures scalability w.r.t. # of Vertices.

in binary trees and log-normal trees.

• Naïve single-source shortest paths (SSSP) implementation.The weight of all edges = 1

Page 43: Pregel

43

SSSP - 1 billion vertex binary tree: # of Pregel workers varies from 50 to 800

174 s

17.3 s

16 times workers↓

Speedup of 10

Page 44: Pregel

44

SSSP – binary trees: varying graph sizes on 800 worker tasks

702 s

17.3 s

Graph with a low average outdegree the runtime Increases linearly in the graph size.

Page 45: Pregel

45

SSSP – log-normal random graphs (mean outdegree = 127.1): varying graph sizes on 800 worker tasks

The runtime Increases linearly in the graph size, too.

Page 46: Pregel

46

Outline

• Introduction• Computation Model• Writing a Pregel Program• System Implementation• Applications• Experiments• Related Work• Conclusion & Future Work

Page 47: Pregel

47

Related Work

• MapReducePregel is similar in concept to MapReduce, but with a

natural graph API and much more efficient support for iterative computations over the graph.

• Bulk Synchronous Parallel model the Oxford BSP Library[38], Green BSP library[21], BSPlib[26]

and Paderborn University BSP library. The scalability and fault-tolerance implementation has not been

evaluated beyond several dozen machines, and none of them provides a graph-specific API.

Page 48: Pregel

48

Related Work

• The closest matches to Pregel are:Parallel Boost Graph Library[22],[23]

Pregel provides fault-tolerance CGMgraph[8]

object-oriented programming style at some performance cost

• There have been few systems reporting experimental results for graphs at the scale of billions of vertices.

Page 49: Pregel

49

Outline

• Introduction• Computation Model• Writing a Pregel Program• System Implementation• Applications• Experiments• Related Work• Conclusion & Future Work

Page 50: Pregel

50

Conclusion & Future Work• Pregel is a scalable and fault-tolerant platform with

an API that is sufficiently flexible to express arbitrary graph algorithms.

• Future workRelaxing the synchronicity of the model.

Not to wait for slower workers at inter-superstep barriers.

Assigning vertices to machines to minimize inter-machine communication.

Caring dense graphs in which most vertices send messages to most other vertices.

Page 51: Pregel

51

Comment

• No comparison with other systems.• The user has to modify Pregel a lot in order to

personalize it to his/her needs. • No failure detection is mentioned for the master,

making it a single point of failure.

Page 52: Pregel

52

THANK YOUAny questions?