data parallel and graph parallel systems for large-scal e data p rocessing

Data Parallel and Graph Parallel Systems for Large-scale Data Processing

Presenter: Kun Li

Threads, Locks, and Messages• ML experts repeatedly solve the same parallel

design challenges:– Implement and debug complex parallel system– Tune for a specific parallel platform– Two months later the conference paper contains:

“We implemented ______ in parallel.”• The resulting code:– is difficult to maintain– is difficult to extend– couples learning model to parallel implementation

2

Map-Reduce / Hadoop

Build learning algorithms on-top of high-level parallel abstractions

... a better answer:

Motivation

• Large-Scale Data Processing– Want to use 1000s of CPUs• But don’t want hassle of managing things

• MapReduce provides– Automatic parallelization & distribution– Fault tolerance– I/O scheduling– Monitoring & status updates

Map/Reduce• map(key, val) is run on each item in set– emits new-key / new-val pairs

• reduce(key, vals) is run for each unique key emitted by map()– emits final output

Count count indocs

map(key=url, val=contents):For each word w in contents, emit (w, “1”)

reduce(key=word, values=uniq_counts):Sum all “1”s in values listEmit result “(word, sum)”

see bob throwsee spot run

see 1bob 1 run 1see 1spot 1throw 1

bob 1 run 1see 2spot 1throw 1

Grep

– Input consists of (url+offset, single line)– map(key=url+offset, val=line):• If contents matches regexp, emit (line, “1”)

– reduce(key=line, values=uniq_counts):• Don’t do anything; just emit line

Reverse Web-Link Graph

• Map– For each URL linking to target, …– Output <target, source> pairs

• Reduce– Concatenate list of all source URLs– Outputs: <target, list (source)> pairs

Job Processing

JobTracker

TaskTracker 0TaskTracker 1 TaskTracker 2

TaskTracker 3 TaskTracker 4 TaskTracker 5

1. Client submits “grep” job, indicating code and input files

2. JobTracker breaks input file into k chunks, (in this case 6). Assigns work to ttrackers.

3. After map(), tasktrackers exchange map-output to build reduce() keyspace

4. JobTracker breaks reduce() keyspace into m chunks (in this case 6). Assigns work.

5. reduce() output may go to NDFS

“grep”

Execution

Parallel Execution

Refinement: Locality Optimization

• Master scheduling policy: – Asks GFS for locations of replicas of input file blocks – Map tasks scheduled so GFS input block replica are on same

machine or same rack

• Effect– Thousands of machines read input at local disk speed

• Without this, rack switches limit read rate

• Combiner– Useful for saving network bandwidth

BeliefPropagation

Label Propagation

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Map-Reduce for Data-Parallel ML• Excellent for large data-parallel tasks!

19

Data-ParallelGraph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Is there more toMachine Learning

?

Properties of Graph Parallel Algorithms

DependencyGraph

IterativeComputation

What I Like

What My Friends Like

Factored Computation

?

BeliefPropagation

Label Propagation

KernelMethods

Deep BeliefNetworks

NeuralNetworks


PageRank

Lasso


21


CrossValidation

Feature Extraction

Map Reduce


Map Reduce?

Why not use Map-Reducefor

Graph Parallel Algorithms?

Data Dependencies• Map-Reduce does not efficiently express

dependent data– User must code substantial data transformations – Costly data replication

Inde

pend

ent D

ata

Row

s

Slow

Proc

esso

rIterative Algorithms

• Map-Reduce not efficiently express iterative algorithms:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Barr

ier

Barr

ier

Barr

ier

MapAbuse: Iterative MapReduce• Only a subset of data needs computation:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Barr

ier

Barr

ier

Barr

ier

MapAbuse: Iterative MapReduce• System is not optimized for iteration:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Disk Pe

nalty

Disk Pe

nalty

Disk Pe

nalty

Sta

rtup

Pen

alty

Sta

rtup

Pen

alty

Sta

rtup

Pen

alty

BeliefPropagation

SVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks


PageRank

Lasso


27


CrossValidation

Feature Extraction

Map Reduce


Map Reduce?GraphLab

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

28

Data Graph

29

A graph with arbitrary data (C++ Objects) associated with each vertex and edge.

Vertex Data:•User profile text• Current interests estimates

Edge Data:• Similarity weights

Graph:• Social Network

Implementing the Data GraphMulticore Setting

• In Memory• Relatively Straight Forward

– vertex_data(vid) data– edge_data(vid,vid) data– neighbors(vid) vid_list

• Challenge:– Fast lookup, low overhead

• Solution:– Dense data-structures– Fixed Vdata&Edata types– Immutable graph structure





31

label_prop(i, scope){// Get Neighborhood data (Likes[i], Wij, Likes[j]) scope;

// Update the vertex data

// Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); }

Update Functions

32

An update function is a user defined program which when applied to a vertex transforms the data in the scopeof the vertex





33

The Scheduler

34

CPU 1

CPU 2

The scheduler determines the order that vertices are updated.

e f g

kjih

dcba b

ih

a

i

b e f

j

c

Sch

edule

r

The process repeats until the scheduler is empty.





36

Ensuring Race-Free Code• How much can computation overlap?

GraphLab Ensures Sequential Consistency

38

For each parallel execution, there exists a sequential execution of update functions which produces the same result.

CPU 1

CPU 2

SingleCPU

Parallel

Sequential

time

Consistency Rules

40

Guaranteed sequential consistency for all update functions

Data

Full Consistency

41

Obtaining More Parallelism

42

Edge Consistency

43

CPU 1 CPU 2

Safe

Read

Consistency Through R/W Locks• Read/Write locks:– Full Consistency

– Edge Consistency

Write Write WriteCanonical Lock Ordering

Read Write ReadRead Write

Consistency Through Scheduling• Edge Consistency Model:– Two vertices can be Updated simultaneously if they do

not share an edge.• Graph Coloring:– Two vertices can be assigned the same color if they do

not share an edge.

Barr

ier

Phase 1

Barr

ier

Phase 2

Barr

ier

Phase 3

data parallel and graph parallel systems for large-scal e data p rocessing

Documents