parallel functional programming repa · flat nested amorphous accelerate repa data parallel haskell...

85
Parallel Functional Programming Repa Mary Sheeran http://www.cse.chalmers.se/edu/course/pfp

Upload: others

Post on 30-Jan-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Parallel Functional ProgrammingRepa

MarySheeran

http://www.cse.chalmers.se/edu/course/pfp

Page 2: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Flat

Nested

Amorphous

RepaAccelerate

Data ParallelHaskell

Embedded (2nd class)

Full (1st class)

Slideborrowed fromG.Keller’slecture

Page 3: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

DPH

Parallelarrays[:e:](whichcancontainarrays)

Page 4: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

DPH

Parallelarrays[:e:](whichcancontainarrays)

Expressingparallelism=applyingcollectiveoperationstoparallelarrays

Note:demand forany elementinaparallelarrayresultsineval ofallelements

Page 5: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

DPHarrayoperations

(!:)::[:a:]->Int ->asliceP ::[:a:]->(Int,Int) ->[:a:]replicateP ::Int ->a->[:a:]mapP ::(a->b)->[:a:]->[:b:]zipP ::[:a:]->[:b:]->[:(a,b):]zipWithP ::(a->b->c)->[:a:]->[:b:]->[:c:]filterP ::(a->Bool)->[:a:]->[:a:]concatP ::[:[:a:]:]->[:a:]concatMapP ::(a->[:b:])->[:a:]->[:b:]unconcatP ::[:[:a:]:]->[:b:]->[:[:b:]:]transposeP ::[:[:a:]:]->[:[:a:]:]expandP ::[:[:a:]:]->[:b:]->[:b:]combineP ::[:Bool:]->[:a:]->[:a:]->[:a:]splitP ::[:Bool:]->[:a:]->([:a:],[:a:])

Page 6: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Examples

svMul :: [:(Int,Float):] -> [:Float:] -> FloatsvMul sv v = sumP [: f*(v !: i) | (i,f) <- sv :]

smMul :: [:[:(Int,Float):]:] -> [:Float:] -> [:Float:]smMul sm v = [: svMul row v | row <- sm :]

NesteddataparallelismParallelop(svMul)oneachrow

Page 7: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

DataparallelismPerformsame computation onacollection ofdiffering datavalues

examples:HPF(HighPerformanceFortran)CUDA

Both supportonly flatdataparallelism

Flat:each oftheindividual computationson(array)elementsissequential

those computations don’tneed tocommunicateparallel computations don’t sparkfurther parallel computations

Page 8: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

APIforpurely functional,collectiveoperationsoverdense,rectangular,multi-dimensional arrays supporting shapepolymorphism

ICFP2010

Page 9: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Ideas

Purely functional array interfaceusing collective (whole array)operationslikemap,fold andpermutationscan– combine efficiency andclarity– focus attention onstructure ofalgorithm,away fromlow level details

Influenced byworkonalgorithmic skeletons based onBirdMeertens formalism(lookforPRG-56)

Providesshape polymorphismnotinastandalone specialistcompiler likeSAC,but using theHaskell type system

Page 10: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

terminology

Regular arraysdense,rectangular,most elementsnon-zero

shape polymorphicfunctions workoverarrays ofarbitrary dimension

Page 11: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

terminology

Regular arraysdense,rectangular,most elementsnon-zero

shape polymorphicfunctions workoverarrays ofarbitrary dimension

note:thearrays arepurelyfunctional andimmutable

Allelementsofanarray aredemanded atonce ->parallelism

Pprocessingelements,narrayelements=>n/Pconsecutiveelementsoneachproc.element

Page 12: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

data Array sh e = Manifest sh (Vector e)| Delayed sh (sh -> e)

Page 13: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

data Array sh e = Manifest sh (Vector e)| Delayed sh (sh -> e)

class Shape sh wheretoIndex :: sh -> sh -> IntfromIndex :: sh -> Int -> shsize :: sh -> Int...more operations...

Page 14: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

data DIM1 = DIM1 !Intdata DIM2 = DIM2 !Int !Int...more dimensions...

Page 15: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

index :: Shape sh => Array sh e -> sh -> eindex (Delayed sh f) ix = f ixindex (Manifest sh vec) ix = indexV vec (toIndex sh ix)

Page 16: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

delay :: Shape sh => Array sh e -> (sh, sh -> e)delay (Delayed sh f) = (sh, f)delay (Manifest sh vec)= (sh, \ix -> indexV vec (toIndex sh ix))

Page 17: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

map :: Shape sh => (a -> b) -> Array sh a -> Array sh bmap f arr = let (sh, g) = delay arr

in Delayed sh (f . g)

Page 18: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

zipWith :: Shape sh => (a -> b -> c)-> Array sh a -> Array sh b -> Array sh c

zipWith f arr1 arr2 = let (sh1, f1) = delay arr1

(_sh2, f2) = delay arr2get ix = f (f1 ix) (f2 ix)

in Delayed sh1 get

Page 19: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

force :: Shape sh => Array sh e -> Array sh eforce arr= unsafePerformIO$ case arr of

Manifest sh vec-> return $ Manifest sh vecDelayed sh f-> do mvec <- unsafeNew (size sh)

fill (size sh) mvec (f . fromIndex sh)vec <- unsafeFreeze mvecreturn $ Manifest sh vec

Page 20: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Delayed(orpull)arraysgreatidea!

Representarrayasfunction fromindextovalue

NotanewideaOriginatedinPan inthefunctionalworldIthink

SeealsoCompilingEmbeddedLangauges

Page 21: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Butthisis100*slowerthanexpected

doubleZip :: Array DIM2 Int -> Array DIM2 Int-> Array DIM2 Int

doubleZip arr1 arr2 = map (* 2) $ zipWith (+) arr1 arr2

Page 22: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

FastbutcluttereddoubleZip arr1@(Manifest !_ !_) arr2@(Manifest !_ !_) = force $ map (* 2) $ zipWith (+) arr1 arr2

Page 23: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Thingsmovedon!Repa fromICFP2010hadONEtypeofarray(thatcouldbeeitherdelayedormanifest,likeinmanyEDSLs)

ApaperfromHaskell’11showedefficientparallelstencilconvolution

http://www.cse.unsw.edu.au/~keller/Papers/stencil.pdf

Page 24: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Fancierarraytype(Repa 2)

Page 25: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Fancierarraytype

Butyouneedtobeagurutogetgoodperformance!

Page 26: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

PutArrayrepresentationintothetype!

Page 27: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Repa 3(Haskell’12)

http://www.youtube.com/watch?v=YmZtP11mBho

quoteonpreviousslidewasfromthispaper

Page 28: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

versionIuse themost recentRepa(with recentHaskell platform)cabalupdatecabalinstall repa

There is also repa-examples,which pulls inall Repa libraries

http://repa.ouroborus.net/

(Iinstalled llvm andthis givessome speedup,though notinmycase 40%asmentioned inPCPH.)

Page 29: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Repa arraysarewrappersaroundalinearstructurethatholds theelementdata.

Therepresentationtagdetermineswhatstructureholds thedata.

DelayedRepresentations(functions thatcomputeelements)D-- Functions fromindices toelements.C-- Cursor functions.

ManifestRepresentations(realdata)U-- Adaptiveunboxedvectors.V-- Boxedvectors.B-- StrictByteStrings.F-- Foreignmemorybuffers.

MetaRepresentationsP-- Arraysthatarepartitionedintoseveralrepresentations.S-- Hintsthatcomputing thisarrayisasmallamountofwork,socomputationshould besequential ratherthan

paralleltoavoidscheduling overheads.I-- Hintsthatcomputingthisarraywillbeanunbalancedworkload, socomputationofsuccessive elementsshould be

interleavedbetweentheprocessorsX-- Arrayswhoseelementsareallundefined.

Repa Arrays

Page 30: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

10Arrayrepresentations!

Page 31: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

10Arrayrepresentations!

http://www.youtube.com/watch?v=YmZtP11mBho

Butthe18minutepresentationatHaskell’12makesitallmakesense!!Watchit!

Page 32: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

TypeIndexing

datafamilyArrayrepsh e

typeindexgivingrepresentation

Page 33: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

TypeIndexing

datafamilyArrayrepsh e

shape

Page 34: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

TypeIndexing

datafamilyArrayrepsh e

elementtype

Page 35: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

map

map:: (Shape sh, Source r a) =>

(a -> b) -> Array r sh a -> Array D sh b

Page 36: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

map

map:: (Shape sh, Source r a) =>

(a -> b) -> Array r sh a -> Array D sh b

map f arr = case delay arr of ADelayed sh g -> ADelayed sh (f . g)

Page 37: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Fusion

Delayed(andcursored)arraysenablefusionthatavoidsintermediatearrays

User-definedworkerfunctionscanbefused

Thisiswhatgivestightloopsinthefinalcode

Page 38: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Parallelcomputationofarrayelements

computeP :: (Load r1 sh e, Target r2 e, Source r2 e, Monad m)=> Array r1 sh e -> m (Array r2 sh e)

Page 39: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

example

transpose2P :: Monad m => Array U DIM2 Double -> m (Array U DIM2 Double)

import Data.Array.Repa as R

Page 40: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

example

transpose2P :: Monad m => Array U DIM2 Double -> m (Array U DIM2 Double)

import Data.Array.Repa as R

indextypeSHAPEEXTENT

Page 41: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

example

transpose2P :: Monad m => Array U DIM2 Double -> m (Array U DIM2 Double)

import Data.Array.Repa as R

DIM0=Z(scalar)DIM1=DIM0:.IntDIM2=DIM1:.Int

Page 42: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

snoc listsHaskell listsarecons lists1:2:3:[]isthesameas[1,2,3]

Repauses snoc listsattype level forshape typesandatvalue level forshapes

DIM2=Z:.Int :.Int isashape type

Z:.i:.jreadas(i,j)isanindexinto atwo dim.array

Page 43: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

transpose2Darrayinparallel

transpose2P:: Monad m => Array U DIM2 Double -> m (Array U DIM2 Double)

transpose2P arr= arr `deepSeqArray`

do computeUnboxedP$ unsafeBackpermute new_extent swap arr

where swap (Z :. i :. j) = Z :. j :. inew_extent = swap (extent arr)

Page 44: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

more generaltranspose(oninnertwo dimensions)

transpose:: (Shape sh, Source r e) =>

Array r ((sh :. Int) :. Int) e -> Array D ((sh :. Int) :. Int) e

Page 45: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

more generaltranspose(oninnertwo dimensions)

isprovided

Thistype says anarray withatleast 2dimensions.Thefunction isshape polymorphic

Page 46: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

more generaltranspose(oninnertwo dimensions)

isprovided

Functionswithat-least constraints become aparallel map overtheunspecified dimensions(calledrankgeneralisation)

Important way toexpressparallel patterns

Page 47: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

RememberArraysoftype (ArrayDsh a) or (ArrayCsh a) are notrealarrays.Theyarerepresentedasfunctions thatcomputeeachelementondemand.Youneedtouse computeS, computeP, computeUnboxedP andsoontoactuallyevaluatetheelements.

(quote fromhttp://hackage.haskell.org/package/repa-3.4.0.1/docs/Data-Array-Repa.htmlwhichhaslotsmoregoodadvice,including aboutcompiler flags)

Page 48: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Example:sorting

Batcher’sbitonic sort(seelecturefromlastweek)

“hardware-like”data-independent

http://www.cs.kent.edu/~batcher/sort.pdf

Page 49: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

bitonic sequence

inc (not decreasing)then

dec (not increasing)

or a cyclic shift of such a sequence

Page 50: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

1 2 3 4 5 6 7 8 9 10 8 6 4 2 1 0

Page 51: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

1 2 3 4 5 6 7 8 9 10 8 6 4 2 1 0

1 9

Page 52: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

1 2 3 4 5 6 7 8 9 10 8 6 4 2 1 0

1 2 9 10

Page 53: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

1 2 3 4 5 6 7 8 9 10 8 6 4 2 1 0

1 2 3 4 9 10 8 6

Page 54: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

1 2 3 4 5 6 7 8 9 10 8 6 4 2 1 0

1 2 3 4 4 9 10 8 6 5

Swap!

Page 55: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

1 2 3 4 5 6 7 8 9 10 8 6 4 2 1 0

1 2 3 4 4 2 9 10 8 6 5 6

Page 56: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

1 2 3 4 5 6 7 8 9 10 8 6 4 2 1 0

1 2 3 4 4 2 1 0 9 10 8 6 5 6 7 8

Page 57: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

1 2 3 4 5 6 7 8 9 10 8 6 4 2 1 0

1 2 3 4 4 2 1 0 9 10 8 6 5 6 7 8

bitonic bitonic≤

Page 58: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Butterfly

bitonic

Page 59: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Butterfly

bitonicbitonic

bitonic>=

Page 60: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

bitonic merger

Page 61: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Question

What are the work and depth (or span) of bitonic merger?

Page 62: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Making a recursive sorter (D&C)

Make a bitonic sequence using twohalf-size sorters

Page 63: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Batcher’s sorter (bitonic)

S

Sreverse

M

Page 64: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Let’s try to write this sorter down in Repa

Page 65: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

bitonic merger

Page 66: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

bitonic merger

wholearrayoperation

Page 67: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

dee fordiamond

dee :: (Shape sh, Monad m) => (Int -> Int -> Int) -> (Int -> Int -> Int)-> Int-> Array U (sh :. Int) Int-> m (Array U (sh :. Int) Int)

dee f g s arr = let sh = extent arr in computeUnboxedP $ fromFunction sh ixfwhere

ixf (sh :. i) = if (testBit i s) then (g a b) else (f a b) where

a = arr ! (sh :. i) b = arr ! (sh :. (i `xor` s2)) s2 = (1::Int) `shiftL` s

assumeinputarrayhaslengthapowerof2,s>0inthisandlaterfunctions

Page 68: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

dee fordiamond

dee :: (Shape sh, Monad m) => (Int -> Int -> Int) -> (Int -> Int -> Int)-> Int-> Array U (sh :. Int) Int-> m (Array U (sh :. Int) Int)

dee f g s arr = let sh = extent arr in computeUnboxedP $ fromFunction sh ixfwhere

ixf (sh :. i) = if (testBit i s) then (g a b) else (f a b) where

a = arr ! (sh :. i) b = arr ! (sh :. (i `xor` s2)) s2 = (1::Int) `shiftL` s

deefg3givesindexi matchedwithindex(i xor 8)

Page 69: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

bitonicMerge n = compose [dee min max (n-i) | i <- [1..n]]

Page 70: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

tmerge

Page 71: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

vee

vee :: (Shape sh, Monad m) => (Int -> Int -> Int) -> (Int -> Int -> Int)-> Int-> Array U (sh :. Int) Int-> m (Array U (sh :. Int) Int)

vee f g s arr = let sh = extent arr in computeUnboxedP $ fromFunction sh ixfwhere

ixf (sh :. ix) = if (testBit ix s) then (g a b) else (f a b) where a = arr ! (sh :. ix) b = arr ! (sh :. newix) newix = flipLSBsTo s ix

Page 72: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

vee

vee :: (Shape sh, Monad m) => (Int -> Int -> Int) -> (Int -> Int -> Int)-> Int-> Array U (sh :. Int) Int-> m (Array U (sh :. Int) Int)

vee f g s arr = let sh = extent arr in computeUnboxedP $ fromFunction sh ixfwhere

ixf (sh :. ix) = if (testBit ix s) then (g a b) else (f a b) where a = arr ! (sh :. ix) b = arr ! (sh :. newix) newix = flipLSBsTo s ix

vee f g 3 out(0) -> f a(0) a(7) out(7) -> g a(7) a(0)out(1) -> f a(1) a(6)out(6) -> g a(6) a(1)

Page 73: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

tmerge

tmerge n = compose $ vee min max (n-1) : [dee min max (n-i) | i <- [2..n]]

Page 74: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Obsidian

Page 75: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

tsort n = compose [tmerge i | i <- [1..n]]

Page 76: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Question

Whatareworkanddepthofthissorter??

Page 77: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Performanceisdecent!

Initialbenchmarking for2^20Ints

Around800mson4coresonthislaptop

Comparestoaround1.6secondsforData.List.sort (whichisseqential)

StillslowerthanPersson’s non-entryfromthesortingcompetition inthe2012course(whichwasat400ms) -- afactorofabitunder2,whichisaboutwhatyouwould

expectwhencomparingBatcher’sbitonic sorttoquicksort

Page 78: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

CommentsShould beveryscalable

Canprobablybespedup!Needtoaddsequentialness J

SimilarapproachmightgreatlyspeeduptheFFTinrepa-examples(andIfound aguy running anFFTinHaskellcompetition)

Notethatthisapproachturnedanestedalgorithm intoaflatone

IdiomaticRepa (writtenbyexperts)isabout3timesslower.Genericitycostshere!

Message:map,foldandscanarenotenough.Weneedtothinkmoreabouthigherorder functionsonarrays(e.g.withbinaryoperators)

Page 79: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Repa’s realstrength

Stencilcomputations!

0 1 0

1 0 1

0 1 0

Figure 2: A simple example stencil.

makeStencil2 (Z :. 3 :. 3)

(\ix -> case ix of

Z :. -1 :. 0 -> Just 1

Z :. 0 :. -1 -> Just 1

Z :. 0 :. 1 -> Just 1

Z :. 1 :. 0 -> Just 1

_ -> Nothing)

Figure 3: A lambda modelling the simple stencil in figure 2.

We see that Repa provides the function makeStencil2, for specifying a 2-dimensional stencil of a given dimension, here 3 by 3. Since this tutorial focuseson image manipulation, we will only concern ourselves with the special case of2-dimensional stencils.

This way of making stencils from lambdas is however somewhat cumbersome.It is not immediately apparent from looking at the lambda in 3 that it actuallyrepresents the stencil in figure 2. To remedy this, Repa provides us with theconvenience method stencil2, which allows us to specify stencils like in figure4.

[stencil2| 0 1 0

1 0 1

0 1 0 |]

Figure 4: A stencil defined using the QuasiQuotes language extension.

If this syntax looks foreign, do not fret. Just know that it exploits a languageextension called QuasiQuotes, and that by putting the pragma:

{-# LANGUAGE QuasiQuotes #-}

at the top of your source file, the code in figure 4 will be preprocessed andconverted into the code in figure 3 during compile time.

3.2.2 Applying stencils

So now that we know how to specify stencils, it only remains to apply themto something. Let us keep working with our simple stencil from the previoussection, and see what happens when we apply it to an actual image.

Repa allows us to read bitmaps into arrays using the readImageFromBMP func-tion, which reads a bitmap from a relative file path and produces a two-dimensional

6

array of Word8 triples. Each of these triples represents a pixel in the source im-age, and the individual words in these triples represent the red, green and bluecomponents of the pixel.

Figure 5: Example image.

To read in the image in figure 5, the following code is su�cient:

do

arrImage <- liftM (either (error . show) id) $

readImageFromBMP "image.bmp"

This gives us an array of Word8 triples. We use the run function for the Eithermonad to handle the case when reading the image results in an error.

makeStencil2 :: Num a =>

Int -> Int ->

(DIM2 -> Maybe a) ->

Stencil DIM2 a

mapStencil2 :: Source r a =>

Boundary a ->

Stencil DIM2 a ->

Array r DIM2 a ->

Array PC5 DIM2 a

If we look at the types of makeStencil2 and the corresponding mapStencil2, wesee that the type of the array we map our stencil over must contain elemnts ofthe same type as we specified in our stencil. We also see that mapping a stencilrequires specifying what happens in the boundaries of the array. In our case,it is su�cient to use (BoundConst 0), which means that we pretend that thereare zeroes everywhere outside the boundaries of our array.

To make our integer stencil work for arrays of Word8:s, we will unzip the arrayof triples into three separate arrays of numbers. After acquiring three separatearrays containing the red, blue and green components, we will map our stencilon these arrays separately as such:

do

(r, g, b) <- liftM (either (error . show) R.unzip3) readImageFromBMP "in.bmp"

[r’, g’, b’] <- mapM (applyStencil simpleStencil) [r, g, b]

writeImageToBMP "out.bmp" (U.zip3 r’ g’ b’)

After acquiring our transformed component arrays, we promptly zip them back

7

Page 80: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Repa’s realstrength

array of Word8 triples. Each of these triples represents a pixel in the source im-age, and the individual words in these triples represent the red, green and bluecomponents of the pixel.

Figure 5: Example image.

To read in the image in figure 5, the following code is su�cient:

do

arrImage <- liftM (either (error . show) id) $

readImageFromBMP "image.bmp"

This gives us an array of Word8 triples. We use the run function for the Eithermonad to handle the case when reading the image results in an error.

makeStencil2 :: Num a =>

Int -> Int ->

(DIM2 -> Maybe a) ->

Stencil DIM2 a

mapStencil2 :: Source r a =>

Boundary a ->

Stencil DIM2 a ->

Array r DIM2 a ->

Array PC5 DIM2 a

If we look at the types of makeStencil2 and the corresponding mapStencil2, wesee that the type of the array we map our stencil over must contain elemnts ofthe same type as we specified in our stencil. We also see that mapping a stencilrequires specifying what happens in the boundaries of the array. In our case,it is su�cient to use (BoundConst 0), which means that we pretend that thereare zeroes everywhere outside the boundaries of our array.

To make our integer stencil work for arrays of Word8:s, we will unzip the arrayof triples into three separate arrays of numbers. After acquiring three separatearrays containing the red, blue and green components, we will map our stencilon these arrays separately as such:

do

(r, g, b) <- liftM (either (error . show) R.unzip3) readImageFromBMP "in.bmp"

[r’, g’, b’] <- mapM (applyStencil simpleStencil) [r, g, b]

writeImageToBMP "out.bmp" (U.zip3 r’ g’ b’)

After acquiring our transformed component arrays, we promptly zip them back

7

into an image and write it to a bitmap using the Repa provided writeImageToBMP,resulting in the image in figure 6.

Figure 6: The stencil in figure 2 applied to the image in figure 5.

Our filter from the stencil in figure 2 has successfully spiced up the colors of ourexample image.

Let us now look at a couple of example stencils that actually result in somethinguseful.

3.3 Some examples

Now that we know how to handle stencil convolution in Repa, let’s try it outon a couple of real examples.

We are going to experiment with two fundamental image filters and their cor-responding stencils, Gaussian blur and simple edge detection.

In order to easily experiment with di↵erent filters and stencils, let us use theprimitives provided by Repa to define a couple of useful combinators.

First o↵, we represent our images as 2-dimensional arrays of numerals. However,the Word8 type in which images are represented when read from files does notlend itself very well to non-integer arithmetic, so let us define images as arraysof floating point numbers, and a way to easily switch between representations.

Converting between di↵erent numeric representations in Haskell can be a realpain, but fortunately the authors of Repa have provided us with a way of doingit in their example files:

type Image = Array U DIM2 Double

promote :: Array U DIM2 Word8 -> IO (Image)

promote = computeP . R.map ffs

where

ffs :: Word8 -> Double

ffs x = fromIntegral (fromIntegral x :: Int)

demote :: Image -> IO (Array U DIM2 Word8)

demote = computeP . R.map ffs

where

ffs :: Double -> Word8

8

http://www.cse.chalmers.se/edu/year/2015/course/DAT280_Parallel_Functional_Programming/Papers/RepaTutorial13.pdf

Page 81: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

NicesuccessstoryatNYT

HaskellintheNewsroom

HaskellinIndustry

Page 82: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

stackoverflowisyour friend

Seeforexample

http://stackoverflow.com/questions/14082158/idiomatic-option-pricing-and-risk-using-repa-parallel-arrays?rq=1

Page 83: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

ConclusionsBased onDPHtechnology

Goodspeedups!

Neat programs

Goodcontrol ofParallelism

BUTCACHEAWARENESSneedstobetackled

Page 84: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Conclusions

Development seems tobehappeninginAccelerate,which nowworks forbothmulticore andGPU(work ongoing)

Arrayrepresentationsforparallelfunctionalprogrammingisanimportant,funandfrustratingresearchtopicJ

Page 85: Parallel Functional Programming Repa · Flat Nested Amorphous Accelerate Repa Data Parallel Haskell Embedded (2nd class) Full (1st class) Slide borrowed from G. Keller’s lecture

Questionstothinkabout

Whatistherightsetofwholearrayoperations?

(rememberBackusfromthefirstlecture)