scala parallel collections aleksandar prokopec, tiark rompf scala team epfl

Scala Parallel CollectionsAleksandar Prokopec, Tiark Rompf

Scala TeamEPFL

Introduction

• multi-core programming – not straightforward

• need better higher order abstractions

• libraries and tools have only begun using these new capabilites

• collections - everywhere

Scala Collection Framework

• most operations implemented in terms of an abstract method

def foreach[U](f: T => U): Unit

• new collections are created using builders

trait Builder[Elem, To]

Example

• the filter method:

def filter(p: A => Boolean): Repr = {

val b = newBuilder

for (x <- this) if (p(x)) b += x

b.result

List(1, 2, 3, 4, 5, 6, 7).filter(_ % 2 == 0)

1 2 3 4 5 6 7 Nil2 4 6

Builder

Parallel operations

• parallel traversal should be easy for some data structures

• could filter be parallelized by having a concurrent builder?

• 3 problems:– order may not be preserved anymore – sequences?– performance concerns– there are more complicated methods such as span

11-1 -599 6311 423 217

Method span

7 3 11 99 99 21 42 33-1 19 22 63 -5 11 -2 -7 1

prefixElems suffixElems

um... not a good idea

• assume an array (keep it simple) array.span(_ >= 0)

Method reduce

• span seems inherently sequential• we’ll get back to this, let’s try

something simpler – reduce

def reduce[U >: T](op: (U, U) => U): U

• takes an associative operator and applies it between all the elements (examples: adding, concatenation)

family to use Scala.Tell your friends and

Scala.and usefriends toyour familyTell

Method reduce

• assume associative operator is concatenation

val s = “Tell your friends and family to use Scala.”

s.split(“ ”).toArray.reduce(_ + _)

Tell your friends and family to use Scala.

Method reduce

• we might have more processors

• this is a well known pattern from parallel programming

• but, we need a right abstraction

1 2 3 4 5 6 7 8

3 7 11 15

Method split

• we can implement methods such as reduce, foreach, count, find and forall assuming we can divide the collection

• new abstract operation

def split: Seq[Repr]

• returns a non-trivial partition of the collection

Method split

def split: Seq[Repr]

• how to implement?

– copy elements– produce a wrapper– use data structure properties (e.g. tree)

Method filter

• this abstract method can be used to implement accessor methods

• for transformer methods such as filter this is not sufficient – collection results should be merged

1, 2, 3, 4 5, 6, 7, 82, 4 6, 8

2, 4, 6, 8

3, 1, 8, 0 2, 2, 1, 98, 0 2, 2

8, 0, 2, 2

2, 4, 6, 8, 8, 0, 2, 2

Method combine

• we need another abstraction

def combine[Other >: Repr]

(that: Other): Other

• creates a collection that contains all the elements of this collection and that collection

Method combine

def combine[Other >: Repr]

(that: Other): Other

• how to implement?– copy elements– use lazy evaluation to copy twice– use specialized data structures

Lazy collection evaluation

• merge occurs more than once• each processor adds results to its

own builder• evaluation occurs in the root

1, 2, 3, 4 5, 6, 7, 82, 4 6, 8 3, 1, 8, 0 2, 2, 1, 98, 0 2, 2

merge merge

mergecopy

allocate

2 4 6 8 8 0 2 2

Lazy collection evaluation

• advantages:– easier to apply to existing collections– for certain data structures copying is

cheap (arrays)– merging is very cheap

• disadvantages:– copying occurs twice – affects cheap

operations– garbage collection occurs more often

Specialized data structures

• some data structures such can be merged efficiently (trees, heaps, skiplists…)

• immutable vectors – immutable sequences with efficient splitting and concatenation

Method span

• each processors keeps 2 builders• merge has 2 cases

– counterexample in the left partition– no counterexample in the left partition

3 9 -1 2 4 -5 7 3 2 4 -7 2

1 2 3 . . . 750 751 752 753 754 755

Load balancing

• processor availability and data processing cost may not be uniform

• fine grained division – more tasks than processors

Work-stealing

• need to schedule tasks to processors – work stealing

• each processor has a task queue• when it runs out of tasks – it steals

from other queues

proc 1 proc 2

steal!

Adaptive work-stealing

• still, a large number of tasks can lead to an overhead

adaptive partitioning

Adaptive work-stealing

• ensures better load balancing

proc 1 proc 2

steal!

Package hierarchy

• subpackage of collection package

collection

mutable immutable parallel

mutable immutable

Class hierarchy

• consistent with existing collections• clients can refer to parallel

collections transparently

Iterable

Map Seq Set ParallelIterable

ParallelMap ParallelSeq ParallelSet

How to use

• be aware of side-effects

var k = 0

array.foreach(k += _)

• parallel collections are not concurrent collections

• careful with small collections – cost of setup may be higher

How to use

• parallel ranges – a way to parallelize for-loops

for (i <- (0 until 1000).par) yield {

var num = i

var lst: List[Int] = Nil

while (num > 0) {

lst ::= num % 2

num = num / 2

Benchmarks

• microbenchmarks with low cost per-element operations

foreach 1 2 4 6 8

Sequential 1227 1227 1227 1227 1227

ParallelArray 1180 797 529 449 421

Extra166 1195 757 544 442 403

reduce 1 2 4 6 8

Sequential 949 949 949 949 949

ParallelArray 832 551 375 328 297

Extra166 890 566 363 300 282

Benchmarks

• microbenchmarks with low cost per-element operations

filter 1 2 4 6 8

Sequential 611 611 611 611 611

ParallelArray 476 333 235 216 208

Extra166 581 372 296 280 264

find 1 2 4 6 8

Sequential 1181 1181 1181 1181 1181

ParallelArray 961 608 410 331 300

Extra166 841 602 393 309 294

Current state

• an array - ParallelArray• ranges - ParallelRange• views - ParallelView• working on – ParallelVector and ParallelHashMap

Conclusion

• good performance results• nice integration with existing collections• more parallel collections worked on• will be integrated into Scala 2.8.1

scala parallel collections aleksandar prokopec, tiark rompf scala team epfl

Documents

optimizing data structures in high-level programsoptimizing...

scala exchange: building robust data pipelines in scala

scala 1000 remote three phase - itv...5.500 18.767 190 3 2,5...

scala-gopher: csp-style programming techniques with...

scala pt - secil argamassas · title: scala pt created...

table de conversion des mesures linéaires duodécimal et...

building program generators for high-performance spiral on...

scala tutorial -...

scala next sf scala meetup dec 8 th , 2011

architecting a query compiler for spatial...

scala 2013: a pragmatic guide to scala...

casbah (mongodb + scala toolkit)...

scala - cbcg.netjava can even call into scala, too* (*) most...

scala book

scala by example - the scala programming language · scala...

84 scala brochure front & back€¦ · scala scala . scala...

scala - a scalable language - oracle.com · scala compared...

serie scala - americanstandard-la.com el fin de facilitar el...

camel scala

building kermeta compiler using...