joseph gonzalez yucheng low aapo kyrola danny bickson joe hellerstein alex smola distributed...

62
Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu 2 The Team: Carlos Guestrin

Upload: magdalen-hamilton

Post on 24-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Joseph Gonzalez

YuchengLow

AapoKyrola

DannyBickson

JoeHellerstein

AlexSmola

Distributed Graph-Parallel Computation on Natural Graphs

HaijieGu

2

The Team:

CarlosGuestrin

Page 2: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

How will wedesign and implement

parallel learning systems?

Big-Learning

Page 3: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Map-Reduce / Hadoop

Build learning algorithms on-top of high-level parallel abstractions

The popular answer:

Page 4: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Map-Reduce for Data-Parallel ML

• Excellent for large data-parallel tasks!

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Graphical ModelsGibbs Sampling

Belief PropagationVariational Opt.

Semi-Supervised Learning

Label PropagationCoEM

Graph AnalysisPageRank

Triangle Counting

Collaborative Filtering

Tensor Factorization

Page 5: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Profile

Label Propagation• Social Arithmetic:

• Recurrence Algorithm:

– iterate until convergence• Parallelism:– Compute all Likes[i] in

parallel

Sue Ann

Carlos

Me

50% What I list on my profile40% Sue Ann Likes10% Carlos Like

40%

10%

50%

80% Cameras20% Biking

30% Cameras70% Biking

50% Cameras50% Biking

I Like:

+60% Cameras, 40% Biking

http://www.cs.cmu.edu/~zhuxj/pub/CMU-CALD-02-107.pdf

Page 6: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Properties of Graph-Parallel Algorithms

DependencyGraph

IterativeComputation

My Interests

Friends Interests

LocalUpdates

Parallelism: Run local updates simultaneously

Page 7: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Map-Reduce for Data-Parallel ML

• Excellent for large data-parallel tasks!

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Graphical ModelsGibbs Sampling

Belief PropagationVariational Opt.

Semi-Supervised Learning

Label PropagationCoEM

Data-MiningPageRank

Triangle Counting

Collaborative Filtering

Tensor Factorization

Map Reduce?Graph-Parallel Abstraction

Page 8: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Graph-Parallel Abstractions

• Vertex-Program associated with each vertex• Graph constrains the interaction along edges– Pregel: Programs interact through Messages– GraphLab: Programs can read each-others state

Page 9: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Barrie

rThe Pregel Abstraction

Compute CommunicatePregel_LabelProp(i) // Read incoming messages msg_sum = sum (msg : in_messages)

// Compute the new interests Likes[i] = f( msg_sum )

// Send messages to neighbors for j in neighbors: send message(g(wij, Likes[i])) to j

Page 10: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

The GraphLab AbstractionVertex-Programs are executed asynchronously and directly read the neighboring vertex-program state.

GraphLab_LblProp(i, neighbors Likes) // Compute sum over neighbors sum = 0 for j in neighbors of i: sum = g(wij, Likes[j]) // Update my interests Likes[i] = f( sum ) // Activate Neighbors if needed if Like[i] changes then activate_neighbors();

Activated vertex-programs are executed eventually and can read the new state of their neighbors

Page 11: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Bett

er

Optimal

GraphLab CoEM

Never Ending Learner Project (CoEM)

11

GraphLab 16 Cores 30 min

15x Faster!6x fewer CPUs!

Hadoop 95 Cores 7.5 hrs

DistributedGraphLab

32 EC2 machines

80 secs

0.3% of Hadoop time

Page 12: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

The Cost of the Wrong AbstractionLo

g-Sc

ale!

Page 13: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Startups Using GraphLab

Companies experimenting (or downloading) with GraphLab

Academic projects exploring (or downloading) GraphLab

Page 14: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Why do we need

2

Page 15: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Natural Graphs

[Image from WikiCommons]

Page 16: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Assumptions of Graph-Parallel Abstractions

Ideal Structure

• Small neighborhoods– Low degree vertices

• Vertices have similar degree• Easy to partition

Natural Graph

• Large Neighborhoods– High degree vertices

• Power-Law degree distribution• Difficult to partition

Page 17: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Power-Law Structure

Top 1% of vertices are adjacent to

50% of the edges!

-Slope = α ≈ 2

High-Degree Vertices

Page 18: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Challenges of High-Degree Vertices

Touches a largefraction of graph

(GraphLab)

SequentialVertex-Programs

Produces manymessages(Pregel)

Edge informationtoo large for single

machine

Asynchronous consistencyrequires heavy locking (GraphLab)

Synchronous consistency is prone tostragglers (Pregel)

Page 19: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Graph Partitioning

• Graph parallel abstraction rely on partitioning:– Minimize communication– Balance computation and storage

Machine 1 Machine 2

Page 20: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Natural Graphs are Difficult to Partition

• Natural graphs do not have low-cost balanced cuts [Leskovec et al. 08, Lang 04]

• Popular graph-partitioning tools (Metis, Chaco,…) perform poorly [Abou-Rjeili et al. 06]– Extremely slow and require substantial memory

Page 21: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Random Partitioning

• Both GraphLab and Pregel proposed Random (hashed) partitioning for Natural Graphs

Machine 1 Machine 210 Machines 90% of edges cut100 Machines 99% of edges cut!

Page 22: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

In Summary

GraphLab and Pregel are not well suited for natural graphs

• Poor performance on high-degree vertices• Low Quality Partitioning

Page 23: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

2• Distribute a single vertex-program – Move computation to data– Parallelize high-degree vertices

• Vertex Partitioning– Simple online heuristic to effectively partition large

power-law graphs

Page 24: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Decompose Vertex-Programs

+ + … +

Y YY

ParallelSum

User Defined:

Gather( ) ΣY

Σ1 + Σ2 Σ3

Y Scope

Gather (Reduce)

Y

YApply( , Σ) Y’

Apply the accumulated value to center vertex

User Defined:

Apply

Y’

Scatter( )

Update adjacent edgesand vertices.

User Defined:Y

Scatter

Page 25: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

LabelProp_GraphLab2(i)Gather(Likes[i], wij, Likes[j]) :

return g(wij, Likes[j])

sum(a, b) : return a + b;

Apply(Likes[i], Σ) : Likes[i] = f(Σ)

Scatter(Likes[i], wij, Likes[j]) :if (change in Likes[i] > ε) then activate(j)

Writing a GraphLab2 Vertex-Program

Page 26: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Machine 2Machine 1

Y

Distributed Execution of a Factorized Vertex-Program

( + )( )Y

YYΣ1 Σ 2

YY

O(1) data transmitted over network

Page 27: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Cached Aggregation• Repeated calls to gather wastes computation:

• Solution: Cache previous gather and update incrementally

Y

Y Y YY+ + … + + Σ’

Wasted computation

Y Y Y

+ +…+ + Δ Σ’Cached

Gather (Σ)YΔ

Y New

Val

ue

Old

Val

ue

Page 28: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

LabelProp_GraphLab2(i)Gather(Likes[i], wij, Likes[j]) :

return g(wij, Likes[j])

sum(a, b) : return a + b;

Apply(Likes[i], Σ) : Likes[i] = f(Σ)

Scatter(Likes[i], wij, Likes[j]) :if (change in Likes[i] > ε) then activate(j)

Post Δj = g(wij ,Likes[i]new) - g(wij ,Likes[i]old)

Writing a GraphLab2 Vertex-Program

Reduces Runtime of PageRank by 50%!

Page 29: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Execution Models

Synchronous and Asynchronous

Page 30: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

• Similar to Pregel• For all active vertices– Gather– Apply– Scatter– Activated vertices are run

on the next iteration• Fully deterministic• Potentially slower convergence for some

machine learning algorithms

Synchronous Execution

Page 31: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

• Similar to GraphLab• Active vertices are

processed asynchronouslyas resources becomeavailable.

• Non-deterministic• Optionally enable serial consistency

Asynchronous Execution

Page 32: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Preventing Overlapping Computation

• New distributed mutual exclusion protocol

Conflict

EdgeCo

nflict

Edge

Page 33: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

0 2000 4000 6000 8000 10000 12000 140001.00E-021.00E-011.00E+001.00E+011.00E+021.00E+031.00E+041.00E+051.00E+061.00E+071.00E+08

Runtime (s)

L1 E

rror

Multi-core Performance

Multicore PageRank (25M Vertices, 355M Edges)

GraphLab

GraphLab2Factorized

Pregel (Simulated)

GraphLab2Factorized +

Caching

Page 34: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Vertex-Cuts for Partitioning

Percolation theory suggests that Power Law graphs can be split by removing only a small set

of vertices. [Albert et al. 2000]

What about graph partitioning?

Page 35: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

GraphLab2 Abstraction PermitsNew Approach to Partitioning

• Rather than cut edges:

• we cut vertices:CPU 1 CPU 2

YY Must synchronize

many edges

CPU 1 CPU 2

Y Y Must synchronize a single vertex

Theorem:For any edge-cut we can directly construct a vertex-cut which requires strictly less communication and storage.

Page 36: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Constructing Vertex-Cuts

• Goal: Parallel graph partitioning on ingress.• Propose three simple approaches:– Random Edge Placement• Edges are placed randomly by each machine

– Greedy Edge Placement with Coordination• Edges are placed using a shared objective

– Oblivious-Greedy Edge Placement • Edges are placed using a local objective

Page 37: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Random Vertex-Cuts• Assign edges randomly to machines and allow

vertices to span machines.

Y

Machine 1 Machine 2

Y

Page 38: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Random Vertex-Cuts• Assign edges randomly to machines and allow

vertices to span machines.• Expected number of machines spanned by a vertex:

Number of Machines

Spanned by v

Degree of v

NumericalFunctions

Page 39: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Random Vertex-Cuts• Assign edges randomly to machines and allow

vertices to span machines.• Expected number of machines spanned by a vertex:

0 20 40 60 80 100 120 1401

10

100

1000

Number of Machines

Impr

ovem

ent o

ver

Ran

dom

Edg

e-Cu

ts

α = 1.65α = 1.7α = 1.8α = 2

Page 40: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Greedy Vertex-Cuts by Derandomization

• Place the next edge on the machine that minimizes the future expected cost:

• Greedy– Edges are greedily placed using shared placement history

• Oblivious– Edges are greedily placed using local placement history

Placement information for

previous vertices

Page 41: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Shared Objective (Communication)

Greedy Placement• Shared objective

Machine1 Machine 2

Page 42: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Local ObjectiveLocal Objective

Oblivious Placement• Local objectives:

CPU 1 CPU 2

Page 43: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Partitioning Performance

Twitter Graph: 41M vertices, 1.4B edges

Oblivious/Greedy balance partition quality and partitioning time.

Span

ned

Mac

hine

s

Load

-tim

e (S

econ

ds)

Page 44: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

32-Way Partitioning Quality

Vertices EdgesTwitter 41M 1.4BUK 133M 5.5BAmazon 0.7M 5.2MLiveJournal 5.4M 79MHollywood 2.2M 229M

Oblivious 2x Improvement + 20% load-time

Greedy 3x Improvement + 100% load-time

Span

ned

Mac

hine

s

Page 45: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

System Evaluation

Page 46: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Implementation

• Implemented as C++ API• Asynchronous IO over TCP/IP• Fault-tolerance is achieved by check-pointing • Substantially simpler than original GraphLab– Synchronous engine < 600 lines of code

• Evaluated on 64 EC2 HPC cc1.4xLarge

Page 47: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Comparison with GraphLab & Pregel

• PageRank on Synthetic Power-Law Graphs– Random edge and vertex cuts

Denser Denser

GraphLab2GraphLab2

Runtime Communication

Page 48: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Benefits of a good Partitioning

Better partitioning has a significant impact on performance.

Page 49: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Performance: PageRankTwitter Graph: 41M vertices, 1.4B edges

Oblivious

Greedy

Oblivious

RandomRandom

Greedy

Page 50: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Matrix Factorization

• Matrix Factorization of Wikipedia Dataset (11M vertices, 315M edges)

Doc

s

Words

Wiki

Consistency = Lower Throughput

Page 51: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Matrix FactorizationConsistency Faster Convergence

Fully AsynchronousSerially Consistent

Page 52: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

PageRank on AltaVista Webgraph

1.4B vertices, 6.7B edges

Pegasus 1320s800 cores

GraphLab2 76s512 cores

Page 53: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Conclusion• Graph-Parallel abstractions are an emerging tool

for large-scale machine learning• The Challenges of Natural Graphs– Power-Law degree distribution– Difficult to partition

• GraphLab2:– Distributes single vertex programs– New vertex partitioning heuristic to rapidly place large

power-law graphs• Experimentally outperforms existing graph-parallel

abstractions

Page 54: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Carnegie Mellon University

Official release in July.http://graphlab.org

2

[email protected]

Page 55: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Pregel Message Combiners

User defined commutative associative (+) message operation:

Machine 1 Machine 2

+ Sum

Page 56: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Costly on High Fan-Out

Many identical messages are sent across the network to the same machine:

Machine 1 Machine 2

Page 57: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

GraphLab Ghosts

Neighbors values are cached locally and maintained by system:

Machine 1 Machine 2

Ghost

Page 58: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Reduces Cost of High Fan-Out

Change to a high degree vertex is communicated with “single message”

Machine 1 Machine 2

Page 59: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Increases Cost of High Fan-In

Changes to neighbors are synchronized individually and collected sequentially:

Machine 1 Machine 2

Page 60: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Comparison with GraphLab & Pregel

• PageRank on Synthetic Power-Law Graphs

GraphLab2 GraphLab2

Power-Law Fan-In Power-Law Fan-Out

Denser Denser

Page 61: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Straggler Effect• PageRank on Synthetic Power-Law Graphs

Power-Law Fan-In Power-Law Fan-Out

Denser Denser

GraphLab

Pregel (Piccolo)

GraphLab2 GraphLab GraphLab2

Pregel (Piccolo)

Page 62: Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The

Cached Gather for PageRankInitial Accum computation

time

Reduces runtime by ~ 50%.