replication-based fault-tolerance for large-scale graph processing

35
Replication-based Fault-tolerance for Large-scale Graph Processing Peng Wang, Kaiyuan Zhang, Rong Chen, Haibo Chen, Haibing Guan Shanghai Jiao Tong University

Upload: marc

Post on 23-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Replication-based Fault-tolerance for Large-scale Graph Processing. Peng Wang , Kaiyuan Zhang, Rong Chen, Haibo Chen, Haibing Guan Shanghai Jiao Tong University. Graph. Useful information in graph Many applications SSSP Community Detection ……. Graph computing. Graphs are large - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Replication-based Fault-tolerance for Large-scale Graph Processing

Replication-based Fault-tolerance for Large-scale Graph Processing

Peng Wang, Kaiyuan Zhang, Rong Chen, Haibo Chen, Haibing Guan

Shanghai Jiao Tong University

Page 2: Replication-based Fault-tolerance for Large-scale Graph Processing

Graph

• Useful information in graph

• Many applications– SSSP– Community Detection……

Page 3: Replication-based Fault-tolerance for Large-scale Graph Processing

Graph computing

• Graphs are large– Require a lot of machines

• Fault tolerance is important

Page 4: Replication-based Fault-tolerance for Large-scale Graph Processing

How graph computing works

1

2

3

1

W1 W2

Compute Compute

SendMsg SendMsg

EnterBarrier

Commit Commit

LeaveBarrier

2

3

1

PageRank(i) // compute its own rank total = 0 foreach ( j in in_neighbors(i)) : total = total + R[j] * Wji

R[i] = 0.15 + total

// trigger neighbors to run again if R[i] not converged then activate all neighbors

LoadGraph LoadGraph

1

1

Master

Replica

Page 5: Replication-based Fault-tolerance for Large-scale Graph Processing

Related work about fault tolerance

• Simple re-execution (MapReduce)– Complex data dependency

• Coarse-grained FT (Spark)– Fine-grained update on each vertex

• State-of-the-art fault tolerance for graph computing– Checkpoint– Trinity, PowerGraph, Pregel, etc.

Page 6: Replication-based Fault-tolerance for Large-scale Graph Processing

How checkpoint works

1 53 7

2 64 1 3

2 6 1 5

4 6

1 5

4 6Crash

Loading Graph Iter X Iter X+1

Iter XPartition && Topology

W1

W2

Recovery

DFS

recovery

checkpoint

global barrier

Page 7: Replication-based Fault-tolerance for Large-scale Graph Processing

Problems of checkpoint

• Large execution overhead– Large amount of states to write– Synchronization overhead

NO 1 2 40

50

100

150

200

ckptsynccommcomp

Exec

ution

tim

e (s

ec)

Checkpoint period

PageRank on LiveJournal

Page 8: Replication-based Fault-tolerance for Large-scale Graph Processing

Problems of checkpoint

• Large overhead

• Slow recovery– A lot of I/O operations– Require standby node

avg-time 1 iteration 2 iterations 3 iterations0

10

20

30

40

50

60

70

Recovery Time

tot

Seco

nd

w/o CKPT

Checkpoint Period

Page 9: Replication-based Fault-tolerance for Large-scale Graph Processing

Observation and motivation• Reuse existing replicas to provide fault tolerance

• Reuse existing replicas small overhead

• Replicas distributed in different machines fast recovery

GWeb LJournal Wiki SYN-GL DBLP RoadCA0%

4%

8%

12%

16%

0.84% 0.96% 0.26% 0.13%

Verti

ces w

ithou

t rep

lica

Almost all the vertices have replicas

Page 10: Replication-based Fault-tolerance for Large-scale Graph Processing

Contribution

• Imitator: a replication based fault tolerance system for graph processing

• Small overhead– Less than 5% for all cases

• Fast recovery– Up to 17.5 times faster than the checkpoint one

Page 11: Replication-based Fault-tolerance for Large-scale Graph Processing

Outline

• Execution flow

• Replication management

• Recovery

• Evaluation

• Conclusion

Page 12: Replication-based Fault-tolerance for Large-scale Graph Processing

Normal execution flowLoadGraph

Compute Compute

SendMsg SendMsg

EnterBarrier

Commit Commit

LeaveBarrier

1. Adding FT support

2. Extending normal synchronization message

LoadGraph

Page 13: Replication-based Fault-tolerance for Large-scale Graph Processing

Failure before barrier

Compute Compute

SendMsg SendMsg

EnterBarrier

Commit Commit

LeaveBarrier

enterBarrier

Compute

SendMsg

EnterBarrier

Commit

LeaveBarrier

Rollback && RecoveryRecovery

Newbie joinsCrash

LoadGraph LoadGraph

Page 14: Replication-based Fault-tolerance for Large-scale Graph Processing

Failure during barrier

Compute Compute

SendMsg SendMsg

EnterBarrier

Commit Commit

LeaveBarrier

leaveBarrier

Compute

SendMsg

EnterBarrier

Commit

LeaveBarrier

Recovery

Recovery

Newbie boot

Crash

LoadGraph LoadGraph

Page 15: Replication-based Fault-tolerance for Large-scale Graph Processing

Management of replication

• Fault tolerance replicas– every vertex has at least f replicas to tolerate f failures

• Full state replica (mirror)– Existing replica lacks meta information– Such as replication location

1

4

75

2

4

3

12

5

1 5

4 2

3

6

6

7

Node1

Node2

Node3

Master

Replica

Vertex: 5Master: n2Replicas: n1 | n3

Page 16: Replication-based Fault-tolerance for Large-scale Graph Processing

Optimization: selfish vertices

• States of selfish vertices have no consumer• Their states may only decided by their neighbors• Opt: get their states by re-computation

Page 17: Replication-based Fault-tolerance for Large-scale Graph Processing

How to recover

• Challenges– Parallel recovery

– Consistent state after recovery

Page 18: Replication-based Fault-tolerance for Large-scale Graph Processing

Problems of recovery

1

4

75

2

4

3

12

5

1 5

4 2

3

66 7Node1 Node2 Node3

1

1

1

CrashMaster

Mirror

Replica

• Which vertices have crashed?• How to recover without a central coordinator?

Rules:1. Master recovers replicas2. If master crashed, mirror recovers master and replicas

Replication Location

Vertex 3:Master: n3Mirror: n2

Page 19: Replication-based Fault-tolerance for Large-scale Graph Processing

Rebirth

1

4

75

2

4

3

12

5

1

6 7

Node1 Node2Newbie3

64 35

2

Rule:1. Master recovers replicas2. If master crashed, mirror recovers master and replicas

1

4

75

2

4

3

12

5

1 5

4 2

3

66 7Node1 Node2 Node3

1

1

1

Crash

Master

Mirror

Replica

Page 20: Replication-based Fault-tolerance for Large-scale Graph Processing

Problems of Rebirth

• Standby machine

• A single newbie machine

Migrate tasks to surviving machines

Page 21: Replication-based Fault-tolerance for Large-scale Graph Processing

Migration

1

4

75

2

4

3

12

56 7

Node1Node2

6

1

4

75

2

4

3

12

5

1 5

4 2

3

66 7Node1 Node2 Node3

1

1

1

Master

Mirror

Replica

Crash

Procedure:1. Mirrors upgrade to masters and broadcast2. Reload missing graph structure and reconstruct

Page 22: Replication-based Fault-tolerance for Large-scale Graph Processing

Inconsistency after recovery

1

2

3

1

W1 W2

Compute Compute

SendMsg SendMsg

EnterBarrier

Commit Commit

LeaveBarrier

2

3

Replica 2 on W1

Rank

Activated false0.10.2

Master 2 on W2

Rank 0.1

Activated falsetrue

Page 23: Replication-based Fault-tolerance for Large-scale Graph Processing

Replay Activation

1

2

3

1

W1 W2

2

3

Replica 2 on W1

Rank

Activated false

Master 2 on W2

Rank 0.1

Activated

Master 1 on W1

Rank 0.2

Activated false

ActNgbs true

Replica 1 on W2

Rank

Activated false

ActNgbs

falsetrue0.20.1

falsetrue

0.20.4

Page 24: Replication-based Fault-tolerance for Large-scale Graph Processing

Evaluation

• 50 VMs (10G memory 4cores)

• HDFS (3 Replications)

• Applications Application Graph Vertices Edge

PageRankGWeb 0.87M 5.11M

LJournal 4.85M 70.0MWiki 5.72M 130.1M

ALS SYN-GL 0.11M 2.7MCD DBLP 0.32M 1.05M

SSSP RoadCA 1.97M 5.53M

Page 25: Replication-based Fault-tolerance for Large-scale Graph Processing

Speedup over Hama• Imitator is based on Hama, a open source clone of Pregel

– Replication for dynamic computing [Distributed Graphlab, VLDB’12]

• Evaluated systems– Baseline: Imitator without fault tolerance– REP: Baseline + Replication based FT– CKPT: Baseline + Checkpoint based FT

GWeb LJournal Wiki SYN-GL DBLP RoadCA0

1

2

3

4

Speedup

Page 26: Replication-based Fault-tolerance for Large-scale Graph Processing

Normal execution overhead

Replication has negligible execution overhead

Page 27: Replication-based Fault-tolerance for Large-scale Graph Processing

Communication Overhead

Page 28: Replication-based Fault-tolerance for Large-scale Graph Processing

Performance of recovery

41

Exec

ution

Tim

e (S

econ

d) 56

GWeb LJournal Wiki SYN-GL DBLP RoadCA02468

101214161820

CKPTRebirthMigration

Page 29: Replication-based Fault-tolerance for Large-scale Graph Processing

Recovery Scalability

10 20 30 40 500

10

20

30

40

50

60

RebirthMigration

Reco

very

Tim

e (S

econ

d)

The more machines, the faster the recovery

Page 30: Replication-based Fault-tolerance for Large-scale Graph Processing

Simultaneous failure

One Two Three0

5

10

15

20

25

30

35

Recovery Time

RebirthRecovery

GWeb LJournal Wiki SYN-GL DBLP RoadCA95%

97%

99%

101%

103%

105%

107%

109%

111%

Overhead

OneTwoThree

Exec

ution

Tim

e (S

econ

d)Add more replicas to tolerate more than 1 machines simultaneous failure

Page 31: Replication-based Fault-tolerance for Large-scale Graph Processing

Case study

– Application: PageRank on the dataset of LiveJournal– A checkpoint for every 4 iterations– A Failure is injected between the 6th iteration and the 7th iteration

0 20 40 60 80 100 120 140 1600

2

4

6

8

10

12

14

16

18

20

BASE

CKPT/4

REP

CKPT/4 + 1 Failure

Rebirth + 1 Failure

Migration + 1 Failure

Execution time (Second)

Fini

shed

ite

ratio

ns

Detect Failure45 8.8

2.6

Replay

Page 32: Replication-based Fault-tolerance for Large-scale Graph Processing

Conclusion

• Imitator: a graph engine which supports fault tolerance

• Imitator’s execution overhead is negligible because it leverages existing replicas

• Imitator’s recovery is fast because of its parallel recovery approach

Page 33: Replication-based Fault-tolerance for Large-scale Graph Processing

Backup

Page 34: Replication-based Fault-tolerance for Large-scale Graph Processing

Memory Consumption

Page 35: Replication-based Fault-tolerance for Large-scale Graph Processing

Partition Impact