big learning with graph computation joseph gonzalez [email protected] download the talk:...

88
Big Learning with Graph Computation Joseph Gonzalez [email protected] Download the talk: http://tinyurl.com/7tojdmw http://www.cs.cmu.edu/~jegonzal/talks/biglearning_with_graphs.ppt x

Upload: edwina-mcdonald

Post on 24-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Big Learning with Graph Computation

Joseph [email protected]

Download the talk: http://tinyurl.com/7tojdmwhttp://www.cs.cmu.edu/~jegonzal/talks/biglearning_with_graphs.pptx

Page 2: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

48 Hours of Video Uploaded every Minute

750 MillionFacebook Users

6 Billion Flickr Photos

Big Data already Happened

1 Billion TweetsPer Week

Page 3: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

How do we understand and use

Big Data?

Big Learning

Page 4: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Big Learning Today: Regression

• Pros:– Easy to understand/predictable– Easy to train in parallel– Supports Feature Engineering– Versatile: classification, ranking, density

estimation

Philosophy of Big Data and Simple Models

Simple Models

Page 5: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

“Invariably, simple models and a lot of data trump more elaborate

models based on less data.”

Alon Halevy, Peter Norvig, and Fernando Pereira, Google

http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/35179.pdf

Page 6: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

“Invariably, simple models and a lot of data trump more elaborate

models based on less data.”

Alon Halevy, Peter Norvig, and Fernando Pereira, Google

http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/35179.pdf

Page 7: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Why not build elaborate models with

lots of data?

Difficult ComputationallyIntensive

Page 8: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Big Learning Today: Simple Models

• Pros:– Easy to understand/predictable– Easy to train in parallel– Supports Feature Engineering– Versatile: classification, ranking, density

estimation• Cons:

– Favors bias in the presence of Big Data

–Strong independence assumptions

Page 9: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Shopper 1 Shopper 2

Cameras Cooking

9

Social Network

Page 10: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Big Data exposes the opportunity for structured

machine learning

10

Page 11: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Examples

Page 12: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Profile

Label Propagation• Social Arithmetic:

• Recurrence Algorithm:

– iterate until convergence• Parallelism:

– Compute all Likes[i] in parallel

Sue Ann

Carlos

Me

50% What I list on my profile40% Sue Ann Likes10% Carlos Like

40%

10%

50%

80% Cameras20% Biking

30% Cameras70% Biking

50% Cameras50% Biking

I Like:

+60% Cameras, 40% Biking

http://www.cs.cmu.edu/~zhuxj/pub/CMU-CALD-02-107.pdf

Page 13: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

PageRank (Centrality Measures)• Iterate:

• Where:– α is the random reset probability– L[j] is the number of links on page j

1 32

4 65

http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

Page 14: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Matrix FactorizationAlternating Least Squares (ALS)

r11

r12

r23

r24

u1

u2

m1

m2

m3Use

r Fac

tors

(U)

Movie Factors (M

)U

sers

Movies

Netflix

Use

rs

≈x

Movies

ui

mj

Update Function computes:

http://dl.acm.org/citation.cfm?id=1424269

Page 15: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Other Examples

• Statistical Inference in Relational Models – Belief Propagation– Gibbs Sampling

• Network Analysis– Centrality Measures– Triangle Counting

• Natural Language Processing– CoEM– Topic Modeling

Page 16: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Graph Parallel Algorithms

DependencyGraph

IterativeComputation

My Interests

FriendsInterests

LocalUpdates

Page 17: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

?

BeliefPropagation

Label Propagation

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

What is the right tool for Graph-Parallel ML

17

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Map Reduce?

Page 18: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Why not use Map-Reducefor

Graph Parallel algorithms?

Page 19: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Data Dependencies are Difficult• Difficult to express dependent data in MR

– Substantial data transformations – User managed graph structure– Costly data replication

Inde

pend

ent D

ata

Reco

rds

Page 20: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Iterative Computation is Difficult• System is not optimized for iteration:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Disk Pe

nalty

Disk Pe

nalty

Disk Pe

nalty

Sta

rtup

Pen

alty

Sta

rtup

Pen

alty

Sta

rtup

Pen

alty

Page 21: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

BeliefPropagation

SVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Map-Reduce for Data-Parallel ML• Excellent for large data-parallel tasks!

21

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Map Reduce?MPI/Pthreads

Page 22: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Threads, Locks, & Messages

“low level parallel primitives”

We could use ….

Page 23: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Threads, Locks, and Messages• ML experts repeatedly solve the same

parallel design challenges:– Implement and debug complex parallel system– Tune for a specific parallel platform– Six months later the conference paper contains:

“We implemented ______ in parallel.”• The resulting code:

– is difficult to maintain– is difficult to extend

• couples learning model to parallel implementation23

Graduate

students

Page 24: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

BeliefPropagation

SVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Addressing Graph-Parallel ML• We need alternatives to Map-Reduce

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

MPI/PthreadsPregel (BSP)

Page 25: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Barrie

rPregel: Bulk Synchronous Parallel

Compute Communicate

http://dl.acm.org/citation.cfm?id=1807184

Page 26: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Open Source Implementations

• Giraph: http://incubator.apache.org/giraph/• Golden Orb: http://goldenorbos.org/

An asynchronous variant:• GraphLab: http://graphlab.org/

Page 27: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

PageRank in Giraph (Pregel)

http://incubator.apache.org/giraph/

public void compute(Iterator<DoubleWritable> msgIterator) {

double sum = 0;while (msgIterator.hasNext())

sum += msgIterator.next().get();DoubleWritable vertexValue =

new DoubleWritable(0.15 + 0.85 * sum);setVertexValue(vertexValue);

if (getSuperstep() < getConf().getInt(MAX_STEPS, -1)) {

long edges = getOutEdgeMap().size();sentMsgToAllEdges(

new DoubleWritable(getVertexValue().get() / edges));

} else voteToHalt();}

Sum PageRank over incoming

messages

Page 28: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Tradeoffs of the BSP Model

• Pros:– Graph Parallel– Relatively easy to implement and reason about– Deterministic execution

Page 29: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Barrie

rEmbarrassingly Parallel Phases

Compute Communicate

http://dl.acm.org/citation.cfm?id=1807184

Page 30: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Tradeoffs of the BSP Model

• Pros:– Graph Parallel– Relatively easy to build– Deterministic execution

• Cons:– Doesn’t exploit the graph structure– Can lead to inefficient systems

Page 31: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Curse of the Slow Job

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Barr

ier

Barr

ier

Data

Data

Data

Data

Data

Data

Data

Barr

ier

http://www.www2011india.com/proceeding/proceedings/p607.pdf

Page 32: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Tradeoffs of the BSP Model

• Pros:– Graph Parallel– Relatively easy to build– Deterministic execution

• Cons:– Doesn’t exploit the graph structure– Can lead to inefficient systems– Can lead to inefficient computation

Page 33: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Example:Loopy Belief Propagation (Loopy BP)

• Iteratively estimate the “beliefs” about vertices– Read in messages– Updates marginal

estimate (belief)– Send updated

out messages• Repeat for all variables

until convergence

34http://www.merl.com/papers/docs/TR2001-22.pdf

Page 34: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Bulk Synchronous Loopy BP

• Often considered embarrassingly parallel – Associate processor

with each vertex– Receive all messages– Update all beliefs– Send all messages

• Proposed by:– Brunton et al. CRV’06– Mendiburu et al. GECC’07– Kang,et al. LDMTA’10– …

35

Page 35: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Sequential Computational Structure

36

Page 36: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Hidden Sequential Structure

37

Page 37: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Hidden Sequential Structure

• Running Time:

EvidenceEvidence

Time for a singleparallel iteration

Number of Iterations

38

Page 38: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Optimal Sequential Algorithm

Forward-Backward

Bulk Synchronous

2n2/p

p ≤ 2n

RunningTime

2n

Gap

p = 1

Optimal Parallel

n

p = 2 39

Page 39: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

The Splash Operation• Generalize the optimal chain algorithm:

to arbitrary cyclic graphs:

~

1) Grow a BFS Spanning tree with fixed size

2) Forward Pass computing all messages at each vertex

3) Backward Pass computing all messages at each vertex

40http://www.select.cs.cmu.edu/publications/paperdir/aistats2009-gonzalez-low-guestrin.pdf

Page 40: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

BSP is Provably Inefficient

Limitations of bulk synchronous model can lead to provably inefficient parallel algorithms

1 2 3 4 5 6 7 80

100020003000400050006000700080009000

Number of CPUs

Runti

me

in S

econ

ds

Bulk Synchronous (Pregel)

Asynchronous Splash BP

BS

PS

pla

sh B

P

Gap

Page 41: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Tradeoffs of the BSP Model

• Pros:– Graph Parallel– Relatively easy to build– Deterministic execution

• Cons:– Doesn’t exploit the graph structure– Can lead to inefficient systems– Can lead to inefficient computation– Can lead to invalid computation

Page 42: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

43

The problem with Bulk Synchronous Gibbs Sampling

• Adjacent variables cannot be sampled simultaneously.

Strong PositiveCorrelation

t=0

Parallel Execution

t=2 t=3

Strong PositiveCorrelation

t=1

Sequential

Execution

Strong NegativeCorrelation

Page 43: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

BeliefPropagationSVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

The Need for a New AbstractionIf not Pregel, then what?

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Pregel (Giraph)

Page 44: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

GraphLab Addresses the Limitations of the BSP Model

• Use graph structure – Automatically manage the movement of data

• Focus on Asynchrony – Computation runs as resources become available– Use the most recent information

• Support Adaptive/Intelligent Scheduling– Focus computation to where it is needed

• Preserve Serializability– Provide the illusion of a sequential execution– Eliminate “race-conditions”

Page 45: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

What is GraphLab?

http://graphlab.org/ Checkout Version 2

Page 46: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

Page 47: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Data GraphA graph with arbitrary data (C++ Objects) associated with each vertex and edge.

Vertex Data:• User profile text• Current interests estimates

Edge Data:• Similarity weights

Graph:• Social Network

Page 48: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Implementing the Data Graph

• All data and structure is stored in memory– Supports fast random lookup needed for dynamic

computation• Multicore Setting:

– Challenge: Fast lookup, low overhead– Solution: dense data-structures

• Distributed Setting:– Challenge: Graph partitioning– Solutions: ParMETIS and Random placement

Page 49: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Natural graphs have poor edge separatorsClassic graph partitioning tools (e.g., ParMetis, Zoltan …) fail

Natural graphs have good vertex separators

New Perspective on Partitioning

CPU 1 CPU 2

YY Must synchronize

many edges

CPU 1 CPU 2

Y Y Must synchronize a single vertex

Page 50: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

pagerank(i, scope){ // Get Neighborhood data (R[i], Wij, R[j]) scope;

// Update the vertex data

// Reschedule Neighbors if needed if R[i] changes then reschedule_neighbors_of(i); }

;][)1(][][

iNj

ji jRWiR

Update FunctionsAn update function is a user defined program which when applied to a vertex transforms the data in the scope of the vertex

Page 51: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

PageRank in GraphLab (V2)

struct pagerank : public iupdate_functor<graph, pagerank> {

void operator()(icontext_type& context) {double sum = 0;foreach ( edge_type edge,

context.in_edges() )sum +=

1/context.num_out_edges(edge.source()) *

context.vertex_data(edge.source());double& rank = context.vertex_data(); double old_rank = rank;rank = RESET_PROB + (1-RESET_PROB) * sum;double residual = abs(rank – old_rank)if (residual > EPSILON)

context.reschedule_out_neighbors(pagerank());}

};

Page 52: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Converged Slowly ConvergingFocus Effort

Dynamic Computation

54

Page 53: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

The Scheduler

CPU 1

CPU 2

The scheduler determines the order that vertices are updated

e f g

kjih

dcba b

ih

a

i

b e f

j

c

Sch

edule

r

The process repeats until the scheduler is empty

Page 54: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Choosing a Schedule

GraphLab provides several different schedulersRound Robin: vertices are updated in a fixed orderFIFO: Vertices are updated in the order they are addedPriority: Vertices are updated in priority order

56

The choice of schedule affects the correctness and parallel performance of the algorithm

Obtain different algorithms by simply changing a flag! --scheduler=sweep --scheduler=fifo --scheduler=priority

Page 55: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

Page 56: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Ensuring Race-Free ExecutionHow much can computation overlap?

Page 57: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

GraphLab Ensures Sequential Consistency

For each parallel execution, there exists a sequential execution of update functions which produces the same result.

CPU 1

CPU 2

SingleCPU

Parallel

Sequential

time

Page 58: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Consistency Rules

60

Guaranteed sequential consistency for all update functions

Data

Page 59: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Full Consistency

61

Page 60: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Obtaining More Parallelism

62

Page 61: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Edge Consistency

63

CPU 1 CPU 2

Safe

Read

Page 62: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Consistency Through SchedulingEdge Consistency Model:

Two vertices can be Updated simultaneously if they do not share an edge.

Graph Coloring:Two vertices can be assigned the same color if they do not share an edge.

Barr

ier

Phase 1

Barr

ier

Phase 2

Barr

ier

Phase 3Ex

ecut

e in

Par

alle

lSy

nchr

onou

sly

Page 63: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Consistency Through R/W LocksRead/Write locks:

Full Consistency

Edge Consistency

Write Write WriteCanonical Lock Ordering

Read Write ReadRead Write

Page 64: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

Page 65: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Bayesian Tensor Factorization

Dynamic Block Gibbs Sampling

MatrixFactorization

Lasso

SVM

Belief PropagationPageRank

CoEM

K-Means

SVD

LDA

…Many others…Linear Solvers

Splash SamplerAlternating Least

Squares

Page 66: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Startups Using GraphLab

Companies experimenting (or downloading) with GraphLab

Academic projects exploring (or downloading) GraphLab

Page 67: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

GraphLab vs. Pregel (BSP)

PageRank (25M Vertices, 355M Edges)

0 10 20 30 40 50 60 701

100

10000

1000000

100000000

Number of Updates

Num

-Ver

tices 51% updated only once

Page 68: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

CoEM (Rosie Jones, 2005)Named Entity Recognition Task

the dog

Australia

Catalina Island

<X> ran quickly

travelled to <X>

<X> is pleasant

Hadoop 95 Cores 7.5 hrs

Is “Dog” an animal?Is “Catalina” a place?

Vertices: 2 MillionEdges: 200 Million

Page 69: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Bett

er

Optimal

GraphLab CoEM

CoEM (Rosie Jones, 2005)

72

GraphLab 16 Cores 30 min

15x Faster!6x fewer CPUs!

Hadoop 95 Cores 7.5 hrs

Page 70: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

CoEM (Rosie Jones, 2005)

Optimal

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Bett

er

Small

Large

GraphLab 16 Cores 30 min

Hadoop 95 Cores 7.5 hrs

GraphLabin the Cloud

32 EC2 machines

80 secs

0.3% of Hadoop time

Page 71: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

The Cost of the Wrong AbstractionLo

g-Sc

ale!

Page 72: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Tradeoffs of GraphLab• Pros:

– Separates algorithm from movement of data– Permits dynamic asynchronous scheduling– More expressive consistency model– Faster and more efficient runtime performance

• Cons:– Non-deterministic execution– Defining “residual” can be tricky– Substantially more complicated to implement

Page 73: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Scalability and Fault-Tolerance in Graph Computation

Page 74: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Scalability

• MapReduce: Data Parallel– Map heavy jobs scale VERY WELL– Reduce places some pressure on the networks– Typically Disk/Computation bound– Favors: Horizontal Scaling (i.e., Big Clusters)

• Pregel/GraphLab: Graph Parallel– Iterative communication can be network intensive– Network latency/throughput become the

bottleneck– Favors: Vertical Scaling (i.e., Faster networks and

Stronger machines)

Page 75: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Cost-Time Tradeoff

video co-segmentation results

more machines, higher cost

fast

er

a few machines helps a lot

diminishingreturns

Page 76: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Video Cosegmentation

Segments mean the same

Model: 10.5 million nodes, 31 million edges

Gaussian EM clustering + BP on 3D grid

Video version of [Batra]

Page 77: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Video Coseg. Strong Scaling

GraphLab

Ideal

Page 78: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Video Coseg. Weak Scaling

GraphLab

Ideal

Page 79: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Fault Tolerance

Page 80: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Rely on Checkpoint

Barrie

r

Compute Communicate

Checkp

oin

t

Pregel (BSP) GraphLab

Synchronous Checkpoint Construction

Asynchronous Checkpoint Construction

Page 81: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Checkpoint Interval

• Tradeoff: – Short Ti: Checkpoints become too costly

– Long Ti : Failures become too costly

Time

Re-compute

Checkpoint

Checkpoint Interval: Ti

Checkpoint Length: Ts Machine Failure

Time Checkpoint CheckpointCheckpoint Checkpoint Checkpoint

Time Checkpoint

Page 82: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Optimal Checkpoint Intervals • Construct a first order approximation:

• Example:– 64 machines with a per machine MTBF of 1 year

• Tmtbf = 1 year / 64 ≈ 130 Hours

– Tc = of 4 minutes

– Ti ≈ of 4 hours From: http://dl.acm.org/citation.cfm?id=361115

CheckpointInterval

Length ofCheckpoint

Mean timebetween failures

Page 83: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Open Challenges

Page 84: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Dynamically Changing Graphs

• Example: Social Networks– New users New Vertices– New Friends New Edges

• How do you adaptively maintain computation:– Trigger computation with changes in the graph– Update “interest estimates” only where needed– Exploit asynchrony – Preserve consistency

Page 85: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Graph Partitioning

• How can you quickly place a large data-graph in a distributed environment:

• Edge separators fail on large power-law graphs– Social networks, Recommender Systems, NLP

• Constructing vertex separators at scale:– No large-scale tools!– How can you adapt the placement in changing

graphs?

Page 86: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Graph Simplification for Computation

• Can you construct a “sub-graph” that can be used as a proxy for graph computation?

• See Paper:– Filtering: a method for solving graph problems in

MapReduce.• http://research.google.com/pubs/pub37240.html

Page 87: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx

Concluding BIG Ideas• Modeling Trend: Independent Data Dependent Data

– Extract more signal from noisy structured data• Graphs model data dependencies

– Captures locality and communication patterns• Data-Parallel tools not well suited to Graph Parallel problems• Compared several Graph Parallel Tools:

– Pregel / BSP Models: • Easy to Build, Deterministic• Suffers from several key inefficiencies

– GraphLab: • Fast, efficient, and expressive • Introduces non-determinism

• Scaling and Fault Tolerance: – Network bottlenecks and Optimal Checkpoint intervals

• Open Challenges: Enormous Industrial Interest

Page 88: Big Learning with Graph Computation Joseph Gonzalez jegonzal@cs.cmu.edu Download the talk: //tinyurl.com/7tojdmw jegonzal/talks/biglearning_with_graphs.pptx