scala parallel collections aleksandar prokopec epfl

Scala Parallel Collections

Aleksandar ProkopecEPFL

Scala collections

for { s <- surnames n <- names if s endsWith n} yield (n, s)

McDonald

Scala collections

1040 ms

Scala parallel collections

for { s <- surnames.par n <- names.par if s endsWith n} yield (n, s)

2 cores

575 ms

4 cores

305 ms

for comprehensions

surnames.par.flatMap { s => names.par .filter(n => s endsWith n) .map(n => (n, s))}

for comprehensionsnested parallelized bulk operations

surnames.par.flatMap { s => names.par .filter(n => s endsWith n) .map(n => (n, s))}

Nested parallelism

Nested parallelismparallel within parallel

composition

surnames.par.flatMap { s => surnameToCollection(s) // may invoke parallel ops}

Nested parallelismgoing recursive

def vowel(c: Char): Boolean = ...

def vowel(c: Char): Boolean = ...def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc

def vowel(c: Char): Boolean = ...def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield

recursive algorithms

def vowel(c: Char): Boolean = ...def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c

def vowel(c: Char): Boolean = ...def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c

def vowel(c: Char): Boolean = ...def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s

gen(5, Array(""))

def vowel(c: Char): Boolean = ...def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s

gen(5, Array(""))

1545 ms

def vowel(c: Char): Boolean = ...def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s

gen(5, ParArray(""))

gen(5, ParArray("")) 1 core

1575 ms

gen(5, ParArray("")) 2 cores

809 ms

gen(5, ParArray("")) 4 cores

530 ms

So, I just use par and I’m home free?

How to think parallel

Character countuse case for foldLeft

val txt: String = ...txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1}

6543210

txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1}

going left to right - not parallelizable!

A B C D E F

txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1}

going left to right – not really necessary

3210 A B C

3210 D E F

_ + _6

Character countin parallel

txt.fold(0) { case (a, ‘ ‘) => a case (a, c) => a + 1}

Character countin parallel

3211 A B C

: (Int, Char) => Int

Character countfold not applicable

3213 A B C

_ + _ 33

3213 A B C

! (Int, Int) => Int

Character countuse case for aggregate

txt.aggregate(0)({ case (a, ‘ ‘) => a case (a, c) => a + 1}, _ + _)

3211 A B C

_ + _ 33

3213 A B C

aggregation element

3211 A B C

_ + _ 33

3213 A B C

aggregation aggregation aggregation element

3211 A B C

_ + _ 33

3213 A B C

Word countanother use case for foldLeft

txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false)}

Word countinitial accumulation

0 words so far last character was a space

“Folding me softly.”

Word counta space

last seen character is a space

Word counta non space

last seen character was a space – a new word

Word counta non space

last seen character wasn’t a space – no new word

Word countin parallel

“softly.““Folding me “

wc = 2; rs = 1 wc = 1; ls = 0

wc = 2; rs = 1 wc = 1; ls = 0wc = 3

Word countmust assume arbitrary partitions

“g me softly.““Foldin“

wc = 1; rs = 0 wc = 3; ls = 0

Word count must assume arbitrary partitions

“g me softly.““Foldin“

wc = 1; rs = 0 wc = 3; ls = 0

wc = 3

Word countinitial aggregation

txt.par.aggregate((0, 0, 0))

# spaces on the left # spaces on the right#words

””

Word countaggregation aggregation

...}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res

“““Folding me“ “softly.“““

Word count aggregation aggregation

...}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs)

“e softly.“ “Folding m“

Word count aggregation aggregation

...}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs)

“ softly.““Folding me”

Word count aggregation element

txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)

”_”

0 words and a space – add one more space each side

txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0)

” m”

0 words and a non-space – one word, no spaces on the right side

txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1)

” me_”

nonzero words and a space – one more space on the right side

txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0)

” me sof”

nonzero words, last non-space and current non-space – no change

txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0)

” me s”

nonzero words, last space and current non-space – one more word

txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0)}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs)})

Word countusing parallel strings?

txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0)}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs)})

Word countstring not really parallelizable

scala> (txt: String).par

scala> (txt: String).parcollection.parallel.ParSeq[Char] = ParArray(…)

different internal representation!

ParArray

copy string contents into an array

Conversionsgoing parallel

// `par` is efficient for...mutable.{Array, ArrayBuffer, ArraySeq}

mutable.{HashMap, HashSet}immutable.{Vector, Range}immutable.{HashMap, HashSet}

// `par` is efficient for...mutable.{Array, ArrayBuffer, ArraySeq}

mutable.{HashMap, HashSet}immutable.{Vector, Range}immutable.{HashMap, HashSet}

most other collections construct a new parallel collection!

sequential parallel

Array, ArrayBuffer, ArraySeq mutable.ParArray

mutable.HashMap mutable.ParHashMap

mutable.HashSet mutable.ParHashSet

immutable.Vector immutable.ParVector

immutable.Range immutable.ParRange

immutable.HashMap immutable.ParHashMap

immutable.HashSet immutable.ParHashSet

// `seq` is always efficientParArray(1, 2, 3).seqList(1, 2, 3, 4).seqParHashMap(1 -> 2, 3 -> 4).seq”abcd”.seq

// `par` may not be...”abcd”.par

Custom collections

Custom collection

class ParString(val str: String)

Custom collection

class ParString(val str: String)extends parallel.immutable.ParSeq[Char] {

Custom collection

class ParString(val str: String)extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length

Custom collection

class ParString(val str: String)extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str)

Custom collection

class ParString(val str: String)extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str) def splitter: Splitter[Char]

Custom collection

class ParString(val str: String)extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str) def splitter = new ParStringSplitter(0, str.length)

Custom collectionsplitter definition

class ParStringSplitter(var i: Int, len: Int)extends Splitter[Char] {

Custom collectionsplitters are iterators

class ParStringSplitter(i: Int, len: Int)extends Splitter[Char] { def hasNext = i < len def next = { val r = str.charAt(i) i += 1 r }

Custom collectionsplitters must be duplicated

... def dup = new ParStringSplitter(i, len)

Custom collectionsplitters know how many elements remain

... def dup = new ParStringSplitter(i, len) def remaining = len - i

Custom collectionsplitters can be split

... def psplit(sizes: Int*): Seq[ParStringSplitter] = { val splitted = new ArrayBuffer[ParStringSplitter] for (sz <- sizes) { val next = (i + sz) min ntl splitted += new ParStringSplitter(i, next) i = next } splitted }

Word countnow with parallel strings

new ParString(txt).aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0)}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs)})

Word countperformance

new ParString(txt).aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0)}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs)})

100 ms

cores: 1 2 4time: 137 ms 70 ms 35 ms

Hierarchy

GenTraversable

GenIterable

GenSeq

Traversable

Iterable

ParIterable

ParSeq

Hierarchy

def nonEmpty(sq: Seq[String]) = { val res = new mutable.ArrayBuffer[String]()for (s <- sq) {

if (s.nonEmpty) res += s } res}

Hierarchy

def nonEmpty(sq: ParSeq[String]) = { val res = new mutable.ArrayBuffer[String]()for (s <- sq) {

Hierarchy

side-effects!ArrayBuffer is not synchronized!

Hierarchy

side-effects!ArrayBuffer is not synchronized!

ParSeq

Hierarchy

def nonEmpty(sq: GenSeq[String]) = { val res = new mutable.ArrayBuffer[String]()for (s <- sq) {

if (s.nonEmpty) res.synchronized { res += s } } res}

Accessors vs. transformerssome methods need more than just splitters

foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, …

map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, …

These return collections!

Sequential collections – builders

Sequential collections – buildersParallel collections – combiners

Buildersbuilding a sequential collection

1 2 3 4 5 6 7 Nil2 4 6

ListBuilder

+= += +=

result

How to build parallel?

Combinersbuilding parallel collections

trait Combiner[-Elem, +To]extends Builder[Elem, To] { def combine[N <: Elem, NewTo >: To] (other: Combiner[N, NewTo]): Combiner[N, NewTo]}

CombinerCombiner Combiner

Should be efficient – O(log n) worst case

How to implement this combine?

Parallel arrays

1, 2, 3, 4 5, 6, 7, 82, 4 6, 8 3, 1, 8, 0 2, 2, 1, 98, 0 2, 2

merge merge

mergecopy

allocate

2 4 6 8 8 0 2 2

Parallel hash tables

ParHashMap

ParHashMap0 1 2 4 5 7 8 9

e.g. calling filter

ParHashCombiner ParHashCombiner

e.g. calling filter

0 51 7 94

ParHashCombiner

5 70 1 4

ParHashMap

How to merge?

5 70 1 4 9

5 7 8 91 40

buckets!ParHashCombiner ParHashCombiner

0 1 4 975

ParHashMap20 = 00002

1 = 00012

4 = 01002

combine

ParHashCombiner

no copying!

ParHashCombiner

9750 1 4

ParHashMap

Custom combinersfor methods returning custom collections

new ParString(txt).filter(_ != ‘ ‘)

What is the return type here?

creates a ParVector!

class ParString(val str: String)extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i)...

class ParString(val str: String)extends immutable.ParSeq[Char] with ParSeqLike[Char, ParString, WrappedString]{ def apply(i: Int) = str.charAt(i)...

class ParString(val str: String)extends immutable.ParSeq[Char] with ParSeqLike[Char, ParString, WrappedString]{ def apply(i: Int) = str.charAt(i)...protected[this] override def newCombiner : Combiner[Char, ParString]

class ParString(val str: String)extends immutable.ParSeq[Char] with ParSeqLike[Char, ParString, WrappedString]{ def apply(i: Int) = str.charAt(i)...protected[this] override def newCombiner = new ParStringCombiner

class ParStringCombinerextends Combiner[Char, ParString] {

class ParStringCombinerextends Combiner[Char, ParString] { var size = 0

class ParStringCombinerextends Combiner[Char, ParString] { var size = 0 val chunks = ArrayBuffer(new StringBuilder)

chunks

class ParStringCombinerextends Combiner[Char, ParString] { var size = 0 val chunks = ArrayBuffer(new StringBuilder) var lastc = chunks.last

chunks

class ParStringCombinerextends Combiner[Char, ParString] { var size = 0 val chunks = ArrayBuffer(new StringBuilder) var lastc = chunks.last

size lastc

chunks

class ParStringCombinerextends Combiner[Char, ParString] { var size = 0 val chunks = ArrayBuffer(new StringBuilder) var lastc = chunks.last def +=(elem: Char) = { lastc += elem size += 1 this }

size lastc

chunks+1

... def combine[U <: Char, NewTo >: ParString] (other: Combiner[U, NewTo]) = other match { case psc: ParStringCombiner => sz += that.sz chunks ++= that.chunks lastc = chunks.last this }

... def combine[U <: Char, NewTo >: ParString] (other: Combiner[U, NewTo])

chunks

... def result = { val rsb = new StringBuilder for (sb <- chunks) rsb.append(sb) new ParString(rsb.toString) }...

... def result = ...

chunks

StringBuilder

Custom combinersfor methods expecting implicit builder factories

// only for big boys... with GenericParTemplate[T, ParColl]...

object ParColl extends ParFactory[ParColl] { implicit def canCombineFrom[T] = new GenericCanCombineFrom[T] ...

Custom combinersperformance measurement

txt.filter(_ != ‘ ‘)

106 ms

1 core

125 ms

106 ms

1 core

125 ms2 cores

106 ms

1 core

125 ms2 cores

81 ms4 cores

1 core

125 ms2 cores

81 ms4 cores

125 ms

81 ms56 ms

1 core

125 ms2 cores

81 ms4 cores

125 ms

81 ms56 ms

def result

(not parallelized)

Custom combinerstricky!

• two-step evaluation– parallelize the result method in combiners

• efficient merge operation– binomial heaps, ropes, etc.

• concurrent data structures– non-blocking scalable insertion operation– we’re working on this

Future workcoming up

• concurrent data structures• more efficient vectors• custom task pools• user defined scheduling• parallel bulk in-place modifications

Thank you!

Examples at:git://github.com/axel22/sd.git

scala parallel collections aleksandar prokopec epfl

Documents

aleksandar cuckovic - aristotel

aleksandar makedonski

aleksandar pasaricapasaric.com/aleksandar pasaric...

speaker: aleksandar mihajlovski · hangouts in 30 minutes...

komunitarizam (aleksandar jovanovski

aleksandar veselinovic

renaissance: benchmarking suite for parallel applications...

bioenergetika, aleksandar loven

présentation epfl-public | 2014 1 ecole polytechnique...

aleksandar...

aleksandar milenković

isolates, channels and event...

aleksandar is back!

scalameter performance regression testing framework...

aleksandar putic, 2010

automating grammar comparison ravichandhran madhavan, epfl...

aleksandar fotic

aleksandar diploma

laboratories _ epfl

data-parallel operations - parallel programming and data...