parallelizing - epfllampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · friday, june...

65
Philipp HALLER | Heather MILLER Parallelizing Machine Learning- A FRAMEWORK and ABSTRACTIONS for Parallel Graph Processing Functionally Friday, June 3, 2011

Upload: others

Post on 14-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Philipp HALLER | Heather MILLER

ParallelizingMachine Learning-

A FRAMEWORK and ABSTRACTIONS for Parallel Graph Processing

Functionally

Friday, June 3, 2011

Page 2: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Data is growing.At the same time,

do with that data.there is a growing desire

toMORE

Friday, June 3, 2011

Page 3: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

As an example,MACHINE LEARNING (ML)

has provided elegant and sophisticated solutions to many complex problems on a small scale,

✗✗

Friday, June 3, 2011

Page 4: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

As an example,MACHINE LEARNING (ML)

has provided elegant and sophisticated solutions to many complex problems on a small scale,

✗✗

could open up NEW APPLICATIONS + NEW

AVENUES OF RESEARCH if ported to a larger scale

Friday, June 3, 2011

Page 5: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

As an example,MACHINE LEARNING (ML)

has provided elegant and sophisticated solutions to many complex problems on a small scale,

✗✗

but efforts are routinely limited by complexity and running time of algorithms.

✗✗SEQUENTIAL

Friday, June 3, 2011

Page 6: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

As an example,MACHINE LEARNING (ML)

has provided elegant and sophisticated solutions to many complex problems on a small scale,

✗✗

but efforts are routinely limited by complexity and running time of algorithms.

✗✗SEQUENTIAL

described as,a community full of “ENTRENCHED PROCEDURAL PROGRAMMERS”

typically focus on optimizing sequential algorithms when faced with scaling problems.

Friday, June 3, 2011

Page 7: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

As an example,MACHINE LEARNING (ML)

has provided elegant and sophisticated solutions to many complex problems on a small scale,

✗✗

but efforts are routinely limited by complexity and running time of algorithms.

✗✗SEQUENTIAL

described as,a community full of “ENTRENCHED PROCEDURAL PROGRAMMERS”

typically focus on optimizing sequential algorithms when faced with scaling problems.

need to make it easier to experiment with parallelism

Friday, June 3, 2011

Page 8: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

What about MapReduce?

Friday, June 3, 2011

Page 9: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

What about MapReduce?

MapReduce instances must be chained together in order to achieve iteration.

Not always straightforward.

Overhead is significant.

✗✗

✗✗

Poor support for iteration.

Even building non-cyclic pipelines is hard (e.g., FlumeJava, PLDI’10).

Communication, serialization (e.g., Phoenix, IISWC’09).

Friday, June 3, 2011

Page 10: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Menthor...

Friday, June 3, 2011

Page 11: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Menthor...

is a framework for parallel graph processing.✗✗(But it is not limited to graphs.)

Friday, June 3, 2011

Page 12: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Menthor...

is a framework for parallel graph processing.

is inspired by BSP.

✗✗

✗✗

(But it is not limited to graphs.)

With functional reduction/aggregation mechanisms.

Friday, June 3, 2011

Page 13: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Menthor...

is a framework for parallel graph processing.

is inspired by BSP.

✗✗

✗✗

(But it is not limited to graphs.)

With functional reduction/aggregation mechanisms.

avoids an inversion of control✗✗of other BSP-inspired graph-processing frameworks.

Friday, June 3, 2011

Page 14: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Menthor...

is a framework for parallel graph processing.

is inspired by BSP.

✗✗

✗✗

(But it is not limited to graphs.)

With functional reduction/aggregation mechanisms.

avoids an inversion of control✗✗of other BSP-inspired graph-processing frameworks.

is implemented in Scala,✗✗and there is a preliminary experimental evaluation.

Friday, June 3, 2011

Page 15: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Model of Computation.Menthor’s

Friday, June 3, 2011

Page 16: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Data.

Friday, June 3, 2011

Page 17: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Data.Split into data items managed by vertices.and sizes range from primitives to large matrices

Friday, June 3, 2011

Page 18: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Data.Split into data items managed by vertices.

Relationships expressed using edges between vertices.

Friday, June 3, 2011

Page 19: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Algorithms.

Friday, June 3, 2011

Page 20: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Algorithms.Data items stored inside of vertices iteratively updated.✗✗

Friday, June 3, 2011

Page 21: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Algorithms.Data items stored inside of vertices iteratively updated.Iterations happen as SYNCHRONIZED SUPERSTEPS.

✗✗

✗✗(inspired by the BSP model)

Friday, June 3, 2011

Page 22: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Algorithms.Data items stored inside of vertices iteratively updated.Iterations happen as SYNCHRONIZED SUPERSTEPS.

✗✗

✗✗

timeFriday, June 3, 2011

Page 23: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Algorithms.Data items stored inside of vertices iteratively updated.Iterations happen as SYNCHRONIZED SUPERSTEPS.

✗✗

✗✗

1.

def update

update each vertex in parallel.

def update

def update

def update

def update

def update

def update

def update

def update

timesuperstep #1

Friday, June 3, 2011

Page 24: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Algorithms.Data items stored inside of vertices iteratively updated.Iterations happen as SYNCHRONIZED SUPERSTEPS.

✗✗

✗✗

1.2.

update each vertex in parallel.

update produces outgoing messages to other vertices

timesuperstep #1

Friday, June 3, 2011

Page 25: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Algorithms.Data items stored inside of vertices iteratively updated.Iterations happen as SYNCHRONIZED SUPERSTEPS.

✗✗

✗✗

1.2.3.

update each vertex in parallel.

update produces outgoing messages to other verticesincoming messages available at the beginning of the next SUPERSTEP.

timesuperstep #2

Friday, June 3, 2011

Page 26: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Substeps. (and Messages)SUBSTEPS are computations that,

Friday, June 3, 2011

Page 27: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Substeps. (and Messages)SUBSTEPS are computations that,

1. update the value of this Vertex

Friday, June 3, 2011

Page 28: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Substeps. (and Messages)SUBSTEPS are computations that,

1. update the value of this Vertex

2. return a list of messages:case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data)

Friday, June 3, 2011

Page 29: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Substeps. (and Messages)SUBSTEPS are computations that,

1. update the value of this Vertex

2. return a list of messages:case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data)

EXAMPLES...{ value = ... List()}

Friday, June 3, 2011

Page 30: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Substeps. (and Messages)SUBSTEPS are computations that,

1. update the value of this Vertex

2. return a list of messages:case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data)

EXAMPLES...{ value = ... List()}

{  ...  for (nb <- neighbors)    yield Message(this, nb, value)}

Friday, June 3, 2011

Page 31: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Substeps. (and Messages)SUBSTEPS are computations that,

1. update the value of this Vertex

2. return a list of messages:case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data)

EXAMPLES...{ value = ... List()}

{  ...  for (nb <- neighbors)    yield Message(this, nb, value)}

Each is implicitly converted to a Substep[Data]

Friday, June 3, 2011

Page 32: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Some Examples...

Friday, June 3, 2011

Page 33: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

PageRank.

class PageRankVertex extends Vertex[Double](0.0d) {  def update() = {    var sum = incoming.foldLeft(0)(_ + _.value)    value = (0.15 / numVertices) + 0.85 * sum

    if (superstep < 30) {      for (nb <- neighbors) yield        Message(this, nb, value / neighbors.size)    } else      List()  }}

Friday, June 3, 2011

Page 34: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Another Example.

class PhasedVertex extends Vertex[MyData] {  var phase = 1

  def update() = {    if (phase == 1) {      ...      if (condition)        phase = 2    } else if (phase == 2) {      ...    }  }}

Friday, June 3, 2011

Page 35: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Another Example.

class PhasedVertex extends Vertex[MyData] {  var phase = 1

  def update() = {    if (phase == 1) {      ...      if (condition)        phase = 2    } else if (phase == 2) {      ...    }  }}

INVERSION OF CONTROL!!Thus, manual stack management...

Friday, June 3, 2011

Page 36: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Inverting the Inversion.

class PhasedVertex extends Vertex[MyData] {

def update() = { thenUntil(condition) { ... } then { ... } }}

Use high-level combinators to build expressions of type Substep[Data]

✗✗

Friday, June 3, 2011

Page 37: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Inverting the Inversion.

class PhasedVertex extends Vertex[MyData] {

def update() = { thenUntil(condition) { ... } then { ... } }}

Use high-level combinators to build expressions of type Substep[Data]

✗✗

Friday, June 3, 2011

Page 38: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Inverting the Inversion.

class PhasedVertex extends Vertex[MyData] {

def update() = { thenUntil(condition) { ... } then { ... } }}

Use high-level combinators to build expressions of type Substep[Data]

Thus avoiding manual stack management.

✗✗

✗✗

Friday, June 3, 2011

Page 39: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Reduction Combinators:crunch steps.

Friday, June 3, 2011

Page 40: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Reduction Combinators:crunch steps.

Reduction operations important.✗✗Replacement for shared data.Global decisions.

Friday, June 3, 2011

Page 41: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Reduction Combinators:crunch steps.

Reduction operations important.✗✗Replacement for shared data.Global decisions.

Provided as just another kind of Substep[Data]✗✗

Friday, June 3, 2011

Page 42: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Reduction Combinators:crunch steps.

Reduction operations important.✗✗Replacement for shared data.Global decisions.

Provided as just another kind of Substep[Data]✗✗

def update() = { then { value = ... } crunch ((v1: Double, v2: Double) => v1 + v2) then { incoming match { case List(reduced) => ... } } ...}

Friday, June 3, 2011

Page 43: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

ImplementationMenthor’s

Friday, June 3, 2011

Page 44: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Actors.Implementation based upon Actors.

} } } }

GRAPH

WORKERS

FOREMEN Central GRAPH instance is an actor, which manages a set of WORKER actors

Friday, June 3, 2011

Page 45: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Actors.Implementation based upon Actors.

} } } }

GRAPH

WORKERS

FOREMEN Central GRAPH instance is an actor, which manages a set of WORKER actors

Friday, June 3, 2011

Page 46: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Actors.Implementation based upon Actors.

} } } }

GRAPH

WORKERS

FOREMEN Central GRAPH instance is an actor, which manages a set of WORKER actors

GRAPH synchronizes workers using supersteps.

Friday, June 3, 2011

Page 47: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Actors.Implementation based upon Actors.

} } } }

GRAPH

WORKERS

FOREMEN

Each WORKER manages a partition of the graph’s vertices,

Deliver incoming messages that were sent in the previous superstep;

Select and execute update step on each vertex in its partition;

Forward outgoing messages generated by its vertices in the current superstep.

Friday, June 3, 2011

Page 48: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Implementing Reduction.} } } }

GRAPH

WORKERS

FOREMEN

Friday, June 3, 2011

Page 49: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Implementing Reduction.

1. WORKER reduces the values of all vertices in its partition.

} } } }

GRAPH

WORKERS

FOREMEN

reduced✔ reduced✔ reduced✔ reduced✔

Friday, June 3, 2011

Page 50: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Implementing Reduction.

1.2.

WORKER reduces the values of all vertices in its partition.

The result and the closure that was used to compute it is sent to the GRAPH actor, which computes the final reduced value.

} } } }

GRAPH

WORKERS

FOREMEN

Friday, June 3, 2011

Page 51: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Implementing Reduction.

1.2.

3.

WORKER reduces the values of all vertices in its partition.

The result and the closure that was used to compute it is sent to the GRAPH actor, which computes the final reduced value.

The final result is passed to all WORKERS which make it available to their vertices as incoming messages (at the beginning of the next superstep)

} } } }

GRAPH

WORKERS

FOREMEN

Friday, June 3, 2011

Page 52: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Implementation Principles.

Friday, June 3, 2011

Page 53: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Implementation Principles.A pure Scala library✗✗

No staging and code generation.No dependency on language virtualization.

Friday, June 3, 2011

Page 54: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Implementation Principles.A pure Scala library✗✗

No staging and code generation.No dependency on language virtualization.

Benefits✗✗Compatible with mainline Scala compiler.Fast compilation.Simple debugging and troubleshooting.Framework developer-friendly.

Friday, June 3, 2011

Page 55: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Implementation Principles.A pure Scala library✗✗

No staging and code generation.No dependency on language virtualization.

Benefits✗✗Compatible with mainline Scala compiler.Fast compilation.Simple debugging and troubleshooting.Framework developer-friendly.

Drawbacks✗✗No aggressive optimizations.No support for heterogeneous hardware platforms.

Friday, June 3, 2011

Page 56: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

GOOGLE’S PREGEL

CONTROLInverted

REQUIRES STAGING

OPTIMIZATIONSAggressive

Related Work.

GRAPHLABSPARK

No graph supportNon-determinism

SIGNAL/COLLECT OPTIMLMAIN INSPIRATION

Graphs/BSP

ASYNC EXECUTIONNon-determinism

DEBUGGINGNot optimal, yet

Designed for IterationCluster support

Friday, June 3, 2011

Page 57: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

GOOGLE’S PREGEL

CONTROLInverted

REQUIRES STAGING

OPTIMIZATIONSAggressive

Related Work.

GRAPHLABSPARK

No graph supportNon-determinism

Be sure to see their talk!

SIGNAL/COLLECT OPTIMLMAIN INSPIRATION

Graphs/BSP

ASYNC EXECUTIONNon-determinism

DEBUGGINGNot optimal, yet

Designed for IterationCluster support

Friday, June 3, 2011

Page 58: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

GOOGLE’S PREGEL

CONTROLInverted

REQUIRES STAGING

OPTIMIZATIONSAggressive

Related Work.

GRAPHLABSPARK

No graph supportNon-determinism

Be sure to see their talk!

(Many more discussed in the paper.)

SIGNAL/COLLECT OPTIMLMAIN INSPIRATION

Graphs/BSP

ASYNC EXECUTIONNon-determinism

DEBUGGINGNot optimal, yet

Designed for IterationCluster support

Friday, June 3, 2011

Page 59: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Conclusions

Friday, June 3, 2011

Page 60: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Can avoid inversion of control in vertex-based BSP using closures.

✗✗

Conclusions

Friday, June 3, 2011

Page 61: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Can avoid inversion of control in vertex-based BSP using closures.

✗✗

Conclusions

Higher-order functions useful for reductions, in an imperative model.

✗✗

Friday, June 3, 2011

Page 62: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Can avoid inversion of control in vertex-based BSP using closures.

✗✗

Conclusions

Higher-order functions useful for reductions, in an imperative model.

Explicit parallelism feasible if computational model simple (cf. MapReduce)

✗✗

✗✗

Friday, June 3, 2011

Page 63: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Can avoid inversion of control in vertex-based BSP using closures.

✗✗

Conclusions

Higher-order functions useful for reductions, in an imperative model.

Explicit parallelism feasible if computational model simple (cf. MapReduce)

The puzzle pieces are there to make analyzing bigger data easier.

✗✗

✗✗

✗✗

Friday, June 3, 2011

Page 64: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Questions?

Can avoid inversion of control in vertex-based BSP using closures.

✗✗

Conclusions

Higher-order functions useful for reductions, in an imperative model.

Explicit parallelism feasible if computational model simple (cf. MapReduce)

The puzzle pieces are there to make analyzing bigger data easier.

✗✗

✗✗

✗✗

http://lamp.epfl.ch/~phaller/menthor/

Friday, June 3, 2011

Page 65: Parallelizing - EPFLlampphaller/doc/scaladays2011-presentation.pdf · 2013. 5. 22. · Friday, June 3, 2011. Data is growing. At the same time, do with that data. there is a growing

Experimental Results.

Applications✗✗PageRank on (subset of) WikipediaHierarchical clustering

Very preliminary results✗✗

Implementation details changingParallel collections (extensions)

Evaluating BSP-based model

Loopy belief propagation

Friday, June 3, 2011