loop fusion in haskell
DESCRIPTION
http://unlines.wordpress.com/2009/10/22/talk-on-loop-fusion-in-haskell/by Roman LeshchinskiyRoman gave a talk about loop fusion in Haskell at FP-Syd, the Sydney Functional Programming group. It covered stream fusion and fusion for distributed types which are two of the optimisations that make Data Parallel Haskell fast.Original PDF: http://www.cse.unsw.edu.au/~rl/talks/fp-syd-fusion.pdfTRANSCRIPT
![Page 1: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/1.jpg)
Loop fusion in Haskell
Roman Leshchinskiy
Programming Languages and SystemsUniversity of New South Wales
![Page 2: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/2.jpg)
What is this about?
What I do
Data Parallel Haskell
compiles nested data-parallel programs to flat data-parallel ones
lots of arrays and collective operations involved
What other people do
array programs with lots of collective operations
![Page 3: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/3.jpg)
What is this about?
What I do
Data Parallel Haskell
compiles nested data-parallel programs to flat data-parallel ones
lots of arrays and collective operations involved
What other people do
array programs with lots of collective operations
zipWith (-)(zipWith (*)
(zipWith (-) (replicate_s segd as1) xs)(zipWith (-) (replicate_s segd bs1) ys))
(zipWith (*)(zipWith (-) (replicate_s segd bs2) ys)(zipWith (-) (replicate_s segd as2) xs))
![Page 4: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/4.jpg)
What is this about?
What I do
Data Parallel Haskell
compiles nested data-parallel programs to flat data-parallel ones
lots of arrays and collective operations involved
What other people do
array programs with lots of collective operations
return . foldl’ hash 5381. map toLower. filter isAlpha =<< readFile f
![Page 5: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/5.jpg)
What is this about?
What I do
Data Parallel Haskell
compiles nested data-parallel programs to flat data-parallel ones
lots of arrays and collective operations involved
What other people do
array programs with lots of collective operations
What everybody wants
no temporary arrays
fused loops
C-like speed
![Page 6: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/6.jpg)
Loop fusion is easy!
foo xs = map (*5) (map (+3) xs)
![Page 7: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/7.jpg)
Loop fusion is easy!
foo xs = map (*5) (map (+3) xs)
RULES
"map/map" map f (map g xs) = map (f . g) xs
![Page 8: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/8.jpg)
Loop fusion is easy!
foo xs = map (*5) (map (+3) xs)bar ys = filter even (filter (<42) ys)
RULES
"map/map" map f (map g xs) = map (f . g) xs
![Page 9: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/9.jpg)
Loop fusion is easy!
foo xs = map (*5) (map (+3) xs)bar ys = filter even (filter (<42) ys)
RULES
"map/map" map f (map g xs) = map (f . g) xs"filter/filter" filter f (filter g xs)
= filter (λ x → f x && g x) xs
![Page 10: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/10.jpg)
Loop fusion is easy!
foo xs = map (*5) (map (+3) xs)bar ys = filter even (filter (<42) ys)baz zs = map (+1) (filter even zs)
RULES
"map/map" map f (map g xs) = map (f . g) xs"filter/filter" filter f (filter g xs)
= filter (λ x → f x && g x) xs
![Page 11: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/11.jpg)
Loop fusion is easy!
foo xs = map (*5) (map (+3) xs)bar ys = filter even (filter (<42) ys)baz zs = map (+1) (filter even zs)
RULES
"map/map" map f (map g xs) = map (f . g) xs"filter/filter" filter f (filter g xs)
= filter (λ x → f x && g x) xs"map/filter" map f (filter g xs) = mapFilter f g xs
![Page 12: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/12.jpg)
Loop fusion is easy!
foo xs = map (*5) (map (+3) xs)bar ys = filter even (filter (<42) ys)baz zs = map (+1) (filter even zs)
RULES
"map/map" map f (map g xs) = map (f . g) xs"filter/filter" filter f (filter g xs)
= filter (λ x → f x && g x) xs"map/filter" map f (filter g xs) = mapFilter f g xs"map/mapFilter" map f (mapFilter g h xs)
= mapFilter (f . g) h xs"mapFilter/filter" mapFilter f g (filter h xs)
= mapFilter (f λ x → g x && h x) xs...
![Page 13: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/13.jpg)
Loop fusion is easy!
foo xs = map (*5) (map (+3) xs)bar ys = filter even (filter (<42) ys)baz zs = map (+1) (filter even zs)
RULES
"map/map" map f (map g xs) = map (f . g) xs"filter/filter" filter f (filter g xs)
= filter (λ x → f x && g x) xs"map/filter" map f (filter g xs) = mapFilter f g xs"map/mapFilter" map f (mapFilter g h xs)
= mapFilter (f . g) h xs"mapFilter/filter" mapFilter f g (filter h xs)
= mapFilter (f λ x → g x && h x) xs... BAD
IDEA
![Page 14: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/14.jpg)
The challenge
use a constant number of rewrite rules
don’t require new rules for new combinators
make adding new combinators easy
fuse everything!
don’t require specialised compiler support
handle both sequential and parallel loops
![Page 15: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/15.jpg)
Sequential loops
![Page 16: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/16.jpg)
Streams
data Step s a = Yield a s| Done
data Stream a = ∃s. Stream (s → Step s a) s
stepper produces next element and state from current state
similar to an iterator
actually encodes an anamorphism (unfold)
![Page 17: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/17.jpg)
Streams
data Step s a = Yield a s| Done
data Stream a = ∃s. Stream (s → Step s a) s
stepper produces next element and state from current state
similar to an iterator
actually encodes an anamorphism (unfold)
stepper
state
![Page 18: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/18.jpg)
Streams
data Step s a = Yield a s| Done
data Stream a = ∃s. Stream (s → Step s a) s
stepper produces next element and state from current state
similar to an iterator
actually encodes an anamorphism (unfold)
sumS :: Num a ⇒ Stream a → asumS (Stream step s) = go 0 swhere go z s = case step s of
Yield x s’ → go (z+x) s’Done → z
![Page 19: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/19.jpg)
Streams
data Step s a = Yield a s| Done
data Stream a = ∃s. Stream (s → Step s a) s
stepper produces next element and state from current state
similar to an iterator
actually encodes an anamorphism (unfold)
stream :: Array a → Stream astream arr = Stream step 0where step i | i < length arr = Yield (arr ! i) (i+1)
| otherwise = Done
![Page 20: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/20.jpg)
Streams
data Step s a = Yield a s| Done
data Stream a = ∃s. Stream (s → Step s a) s
stepper produces next element and state from current state
similar to an iterator
actually encodes an anamorphism (unfold)
mapS :: (a → b) → Stream a → Stream bmapS f (Stream step s) = Stream step’ swhere step’ s = case step s of
Yield x s’ → Yield (f x) s’Done → Done
![Page 21: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/21.jpg)
Streams
data Step s a = Yield a s| Done
data Stream a = ∃s. Stream (s → Step s a) s
stepper produces next element and state from current state
similar to an iterator
actually encodes an anamorphism (unfold)
unstream :: Stream a → Array aunstream (Stream step s) = <allocate, fill and freeze>
![Page 22: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/22.jpg)
Stream fusion in three easy steps
Step 1: implement array operations in terms of streams
sum :: Num a ⇒ Array a → asum = sumS . stream
map :: (a → b) → Array a → Array bmap f = unstream . mapS f . stream
Step 2: inline them
Step 3: eliminate stream/unstream pairs
"stream/unstream" stream (unstream s) = s
![Page 23: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/23.jpg)
Stream fusion in three easy steps
Step 1: implement array operations in terms of streams
sum :: Num a ⇒ Array a → asum = sumS . stream
map :: (a → b) → Array a → Array bmap f = unstream . mapS f . stream
Step 2: inline them
Step 3: eliminate stream/unstream pairs
"stream/unstream" stream (unstream s) = s
![Page 24: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/24.jpg)
Stream fusion in three easy steps
Step 1: implement array operations in terms of streams
sum :: Num a ⇒ Array a → asum = sumS . stream
map :: (a → b) → Array a → Array bmap f = unstream . mapS f . stream
Step 2: inline them
sumsq :: Num a ⇒ Array a → asumsq = sum . map (λx -> x*x)
Step 3: eliminate stream/unstream pairs
"stream/unstream" stream (unstream s) = s
![Page 25: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/25.jpg)
Stream fusion in three easy steps
Step 1: implement array operations in terms of streams
sum :: Num a ⇒ Array a → asum = sumS . stream
map :: (a → b) → Array a → Array bmap f = unstream . mapS f . stream
Step 2: inline them
sumsq :: Num a ⇒ Array a → asumsq = sum . map (λx -> x*x)
= sumS . stream . unstream . mapS f . stream
Step 3: eliminate stream/unstream pairs
"stream/unstream" stream (unstream s) = s
![Page 26: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/26.jpg)
Stream fusion in three easy steps
Step 1: implement array operations in terms of streams
sum :: Num a ⇒ Array a → asum = sumS . stream
map :: (a → b) → Array a → Array bmap f = unstream . mapS f . stream
Step 2: inline them
sumsq :: Num a ⇒ Array a → asumsq = sum . map (λx -> x*x)
= sumS . stream . unstream . mapS f . stream
Step 3: eliminate stream/unstream pairs
"stream/unstream" stream (unstream s) = s
![Page 27: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/27.jpg)
Stream fusion in three easy steps
Step 1: implement array operations in terms of streams
sum :: Num a ⇒ Array a → asum = sumS . stream
map :: (a → b) → Array a → Array bmap f = unstream . mapS f . stream
Step 2: inline them
sumsq :: Num a ⇒ Array a → asumsq = sum . map (λx -> x*x)
= sumS . stream . unstream . mapS f . stream
Step 3: eliminate stream/unstream pairs
"stream/unstream" stream (unstream s) = s
![Page 28: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/28.jpg)
Stream fusion in three easy steps
Step 1: implement array operations in terms of streams
sum :: Num a ⇒ Array a → asum = sumS . stream
map :: (a → b) → Array a → Array bmap f = unstream . mapS f . stream
Step 2: inline them
sumsq :: Num a ⇒ Array a → asumsq = sum . map (λx -> x*x)
= sumS . mapS f . stream
Step 3: eliminate stream/unstream pairs
"stream/unstream" stream (unstream s) = s
![Page 29: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/29.jpg)
Stream fusion in three easy steps
Step 1: implement array operations in terms of streams
sum :: Num a ⇒ Array a → asum = sumS . stream
map :: (a → b) → Array a → Array bmap f = unstream . mapS f . stream
Step 2: inline them
sumsq :: Num a ⇒ Array a → asumsq = sum . map (λx -> x*x)
= sumS . mapS f . stream
Step 3: eliminate stream/unstream pairs
"stream/unstream" stream (unstream s) = s
LetGHC
doth
ere
st
![Page 30: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/30.jpg)
Optimising stream operations
sumsq xs = sumS (mapS square ( stream xs))
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
![Page 31: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/31.jpg)
Optimising stream operations
sumsq xs = sumS (mapS square ( stream xs))
stream :: Array a → Stream astream arr = Stream step 0where step i | i < length arr = Yield (arr ! i) (i+1)
| otherwise = Done
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
inline
![Page 32: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/32.jpg)
Optimising stream operations
sumsq xs = sumS ( mapS square (Stream step1 0))wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
![Page 33: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/33.jpg)
Optimising stream operations
sumsq xs = sumS ( mapS square (Stream step1 0))wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
mapS :: (a → b) → Stream a → bmapS f (Stream step s) = Stream step’ swhere step’ s = case step s of
Yield x s’ → Yield (f x) s’Done → Done
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
inline
![Page 34: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/34.jpg)
Optimising stream operations
sumsq xs = sumS (Stream step2 0)wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
step2 i = case step1 i ofYield x i’ → Yield (square x) i’Done → Done
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
![Page 35: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/35.jpg)
Optimising stream operations
sumsq xs = sumS (Stream step2 0)wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
step2 i = case step1 i ofYield x i’ → Yield (square x) i’Done → Done
sumS :: Num a ⇒ Stream a → asumS (Stream step s) = go 0 swhere go z s = case step s of
Yield x s’ → go (z+x) s’Done → z
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
inline
![Page 36: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/36.jpg)
Optimising stream operations
sumsq xs = go 0 0wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
step2 i = case step1 i ofYield x i’ → Yield (square x) i’Done → Done
go z i = case step2 i of
Yield x i’ → go (z+x) i’Done → z
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
![Page 37: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/37.jpg)
Optimising stream operations
sumsq xs = go 0 0wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
step2 i = case step1 i ofYield x i’ → Yield (square x) i’Done → Done
go z i = case step2 i of
Yield x i’ → go (z+x) i’Done → z
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
inline
![Page 38: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/38.jpg)
Optimising stream operations
sumsq xs = go 0 0wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
go z i = case (case step1 i ofYield x i’ → Yield (square x) i’Done → Done) of
Yield x i’ → go (z+x) i’Done → z
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
![Page 39: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/39.jpg)
Optimising stream operations
sumsq xs = go 0 0wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
go z i = case (case step1 i of
Yield x i’ → Yield (square x) i’
Done → Done) ofYield x i’ → go (z+x) i’Done → z
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
case of case
![Page 40: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/40.jpg)
Optimising stream operations
sumsq xs = go 0 0wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
go z i = case step1 i of
Yield x i’ → go (z + square x) i’Done → z
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
![Page 41: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/41.jpg)
Optimising stream operations
sumsq xs = go 0 0wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
go z i = case step1 i of
Yield x i’ → go (z + square x) i’Done → z
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
inline
![Page 42: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/42.jpg)
Optimising stream operations
sumsq xs = go 0 0wherego z i = case (case i < length xs of
True → Yield (xs ! i) (i+1)False → Done) of
Yield x i’ → go (z + square x) i’Done → z
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
![Page 43: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/43.jpg)
Optimising stream operations
sumsq xs = go 0 0wherego z i = case (case i < length xs of
True → Yield (xs ! i) (i+1)False → Done) of
Yield x i’ → go (z + square x) i’Done → z
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
case of case
![Page 44: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/44.jpg)
Optimising stream operations
sumsq xs = go 0 0wherego z i = case i < length xs of
True → go (z + square (xs ! i)) (i+1)False → z
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
![Page 45: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/45.jpg)
Optimising stream operations
sumsq xs = go 0 0wherego z i = case i < length xs of
True → go (z + square (xs ! i)) (i+1)False → z
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
![Page 46: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/46.jpg)
Why does it work?
sumsq xs = go 0 0wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
step2 i = case step1 i ofYield x i’ → Yield (square x) i’Done → Done
go z i = case step2 i of
Yield x i’ → go (z+x) i’Done → z
![Page 47: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/47.jpg)
Why does it work?
sumsq xs = go 0 0wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
step2 i = case step1 i ofYield x i’ → Yield (square x) i’Done → Done
go z i = case step2 i of
Yield x i’ → go (z+x) i’Done → z
non-recursive
![Page 48: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/48.jpg)
Why does it work?
sumsq xs = go 0 0wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
step2 i = case step1 i ofYield x i’ → Yield (square x) i’Done → Done
go z i = case step2 i of
Yield x i’ → go (z+x) i’Done → z
non-recursive
non-recursive
![Page 49: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/49.jpg)
Why does it work?
sumsq xs = go 0 0wherestep1 i = case i < length xs of
True → Yield (xs ! i) (i+1)False → Done
step2 i = case step1 i ofYield x i’ → Yield (square x) i’Done → Done
go z i = case step2 i of
Yield x i’ → go (z+x) i’Done → z
non-recursive
non-recursive
recursive
![Page 50: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/50.jpg)
A slight problem
filterS :: (a → Bool) → Stream a → Stream afilterS f (Stream step s) = Stream step’ swherestep’ s = case step s of
Yield x s’| f x → Yield x s’| otherwise → step s’
Done → Done
![Page 51: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/51.jpg)
A slight problem
filterS :: (a → Bool) → Stream a → Stream afilterS f (Stream step s) = Stream step’ swherestep’ s = case step s of
Yield x s’| f x → Yield x s’| otherwise → step s’
Done → Done
recursive
![Page 52: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/52.jpg)
Extending streams
Idea: allow a loop iteration not to produce an element
data Step s a = Yield a s| Skip s| Done
![Page 53: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/53.jpg)
Extending streams
Idea: allow a loop iteration not to produce an element
data Step s a = Yield a s| Skip s| Done
![Page 54: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/54.jpg)
Extending streams
Idea: allow a loop iteration not to produce an element
data Step s a = Yield a s| Skip s| Done
filterS :: (a → Bool) → Stream a → Stream afilterS f (Stream step s) = Stream step’ swherestep’ s = case step s of
Yield x s’| f x → Yield x s’| otherwise → Skip s’
Skip s’ → Skip s’Done → Done
![Page 55: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/55.jpg)
Extending streams
Idea: allow a loop iteration not to produce an element
data Step s a = Yield a s| Skip s| Done
filterS :: (a → Bool) → Stream a → Stream afilterS f (Stream step s) = Stream step’ swherestep’ s = case step s of
Yield x s’| f x → Yield x s’| otherwise → Skip s’
Skip s’ → Skip s’Done → Done
non-recursive
![Page 56: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/56.jpg)
Stream fusion - summary
encode loops by streams
implement array operations in terms of streams
eliminate stream/unstream pairs (temporaries)
stream producers are non-recursive
standard optimisations remove overhead (loop fusion)
Standard optimisations: inlining, case-of-case, worker/wrappertransformation, SpecConstr, LiberateCase, specialisation ...
![Page 57: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/57.jpg)
Stream fusion - summary
encode loops by streams
implement array operations in terms of streams
eliminate stream/unstream pairs (temporaries)
stream producers are non-recursive
standard optimisations remove overhead (loop fusion)
Standard optimisations: inlining, case-of-case, worker/wrappertransformation, SpecConstr, LiberateCase, specialisation ...
![Page 58: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/58.jpg)
Parallel loops
![Page 59: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/59.jpg)
DPH on multicores
Evaluation strategy after vectorisation
operations are data parallel and flat
executed by a gang of worker threads
essentially fork-join parallelism
![Page 60: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/60.jpg)
DPH on multicores
Evaluation strategy after vectorisation
operations are data parallel and flat
executed by a gang of worker threads
essentially fork-join parallelism
mapP :: (a → b) → Array a → Array bmapP f xs = <split xs across workers>
<map f over each chunk><collect local results>
![Page 61: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/61.jpg)
DPH on multicores
Evaluation strategy after vectorisation
operations are data parallel and flat
executed by a gang of worker threads
essentially fork-join parallelism
mapP :: (a → b) → Array a → Array bmapP f xs = <split xs across workers>
<map f over each chunk><collect local results>
f is sequential
![Page 62: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/62.jpg)
DPH on multicores
Evaluation strategy after vectorisation
operations are data parallel and flat
executed by a gang of worker threads
essentially fork-join parallelism
mapP :: (a → b) → Array a → Array bmapP f xs = <split xs across workers>
<map f over each chunk><collect local results>
sumP :: Num a ⇒ Array a → asumP xs = <split xs across workers>
<sum each chunk><reduce local sums>
![Page 63: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/63.jpg)
DPH on multicores
Evaluation strategy after vectorisation
operations are data parallel and flat
executed by a gang of worker threads
essentially fork-join parallelism
sumsqP = sumP . mapP square
![Page 64: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/64.jpg)
DPH on multicores
Evaluation strategy after vectorisation
operations are data parallel and flat
executed by a gang of worker threads
essentially fork-join parallelism
sumsqP xs = <split xs across workers><map square over each chunk><collect local results><split results across workers><sum each chunk><reduce local sums>
![Page 65: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/65.jpg)
DPH on multicores
Evaluation strategy after vectorisation
operations are data parallel and flat
executed by a gang of worker threads
essentially fork-join parallelism
sumsqP xs = <split xs across workers><map square over each chunk><collect local results><split results across workers><sum each chunk><reduce local sums>
![Page 66: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/66.jpg)
DPH on multicores
Evaluation strategy after vectorisation
operations are data parallel and flat
executed by a gang of worker threads
essentially fork-join parallelism
sumsqP xs = <split xs across workers><map square over each chunk><collect local results><split results across workers><sum each chunk><reduce local sums>
![Page 67: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/67.jpg)
Distributed types
Idea: let’s make the evaluation strategy explicit! (Keller 1999)
data Dist a a is distributed across threadsDist (Array a) each thread has a local array (chunk)Dist Double each thread has a local Double
splitD distribute an array across threadsjoinD collect thread-local chunks
mapD execute a sequential operation in each threadsumD compute sum of local values
![Page 68: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/68.jpg)
Distributed types
Idea: let’s make the evaluation strategy explicit! (Keller 1999)
data Dist a a is distributed across threadsDist (Array a) each thread has a local array (chunk)Dist Double each thread has a local Double
splitD distribute an array across threadsjoinD collect thread-local chunks
mapD execute a sequential operation in each threadsumD compute sum of local values
![Page 69: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/69.jpg)
Distributed types
Idea: let’s make the evaluation strategy explicit! (Keller 1999)
data Dist a a is distributed across threadsDist (Array a) each thread has a local array (chunk)Dist Double each thread has a local Double
splitD distribute an array across threadsjoinD collect thread-local chunks
mapD execute a sequential operation in each threadsumD compute sum of local values
splitD :: Array a → Dist (Array a)joinD :: Dist (Array a) → Array a
![Page 70: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/70.jpg)
Distributed types
Idea: let’s make the evaluation strategy explicit! (Keller 1999)
data Dist a a is distributed across threadsDist (Array a) each thread has a local array (chunk)Dist Double each thread has a local Double
splitD distribute an array across threadsjoinD collect thread-local chunks
mapD execute a sequential operation in each threadsumD compute sum of local values
splitD :: Array a → Dist (Array a)joinD :: Dist (Array a) → Array amapD :: (a → b) → Dist a → Dist bsumD :: Num a ⇒ Dist a → a
![Page 71: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/71.jpg)
Programming with distributed types
mapP f xs = <split xs across workers><map f over each chunk><collect local results>
![Page 72: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/72.jpg)
Programming with distributed types
mapP f = joinD -- collect. mapD (map f) -- map f over chunks. splitD -- split
![Page 73: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/73.jpg)
Programming with distributed types
mapP f = joinD -- collect. mapD (map f) -- map f over chunks. splitD -- split
sumP xs = <split xs across workers><sum each chunk><reduce local sums>
![Page 74: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/74.jpg)
Programming with distributed types
mapP f = joinD -- collect. mapD (map f) -- map f over chunks. splitD -- split
sumP = sumD -- reduce. mapD sum -- sum each chunk. splitD -- split
![Page 75: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/75.jpg)
Fusing distributed types
sumsqP = sumP . mapP square
![Page 76: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/76.jpg)
Fusing distributed types
sumsqP = sumD -- reduce. mapD sum -- sum each chunk. splitD -- split. joinD -- collect. mapD (map square) -- map square over chunks. splitD -- split
![Page 77: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/77.jpg)
Fusing distributed types
sumsqP = sumD -- reduce. mapD sum -- sum each chunk. splitD -- split. joinD -- collect. mapD (map square) -- map square over chunks. splitD -- split
RULES
splitD (joinD xs) = xs
![Page 78: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/78.jpg)
Fusing distributed types
sumsqP = sumD -- reduce. mapD sum -- sum each chunk. mapD (map square) -- map square over chunks. splitD -- split
RULES
splitD (joinD xs) = xs
![Page 79: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/79.jpg)
Fusing distributed types
sumsqP = sumD -- reduce. mapD sum -- sum each chunk. mapD (map square) -- map square over chunks. splitD -- split
RULES
splitD (joinD xs) = xsmapD f (mapD g xs) = mapD (f . g) xs
![Page 80: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/80.jpg)
Fusing distributed types
sumsqP = sumD -- reduce
. mapD (sum . map square) -- work
. splitD -- split
RULES
splitD (joinD xs) = xsmapD f (mapD g xs) = mapD (f . g) xs
![Page 81: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/81.jpg)
Fusing distributed types
sumsqP = sumD -- reduce
. mapD (sum . map square) -- work
. splitD -- split
RULES
splitD (joinD xs) = xsmapD f (mapD g xs) = mapD (f . g) xs
stream fusion
![Page 82: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/82.jpg)
Distributed types on multicores
data Dist a a is distributed across threads
splitD distribute xs across threadsjoinD collect thread-local chunksmapD execute a sequential operation in each thread
splitD/joinD eliminate communicationmapD/mapD eliminate synchronisation
![Page 83: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/83.jpg)
Distributed types on clusters
data Dist a a is distributed across nodes
splitD scatterjoinD gathermapD execute operation on each node
splitD/joinD eliminate communicationmapD/mapD eliminate synchronisation
![Page 84: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/84.jpg)
Distributed types on GPUs
data Dist a a is in GPU memory
splitD CPU −→ GPU transferjoinD GPU −→ CPU transfermapD execute kernel on the GPU
splitD/joinD eliminate memory transfers (communication)mapD/mapD fuse kernels (synchronisation)
![Page 85: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/85.jpg)
Distribured types – summary
encode parallel loops as split/work/join
eliminate unnecessary split/join pairs
fuse sequential work (stream fusion)
very general mechanism for fusing parallel computations
applicable to a wide range of architectures
again, no specialised compiler support
![Page 86: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/86.jpg)
Obligatory benchmark1
10
100
1000
1 2 4 8
sumsq, Haskell sumsq, C dotp, Haskelldotp, C smvm, Haskell smvm, C
1
10
100
1000
10000
1 2 4 8 16 32 64
Runtime @ greyarea
sumsq, Haskell sumsq, C dotp, Haskelldotp, C smvm, Haskell smvm, C
![Page 87: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/87.jpg)
Parting thoughts
it’s nice, it’s easy to use, it works
high-level functional programs compiled to highly efficient code
even parallel ones!
rewrite rules + great optimiser = win
DPH doesn’t require any special-purpose optimisations
try this in an imperative language...
Stream fusion: dph, bytestring, vector, uvector
Distributed types: dph
![Page 88: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/88.jpg)
Parting thoughts
it’s nice, it’s easy to use, it works
high-level functional programs compiled to highly efficient code
even parallel ones!
rewrite rules + great optimiser = win
DPH doesn’t require any special-purpose optimisations
try this in an imperative language...
Stream fusion: dph, bytestring, vector, uvector
Distributed types: dph
don’t
![Page 89: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/89.jpg)
Parting thoughts
it’s nice, it’s easy to use, it works
high-level functional programs compiled to highly efficient code
even parallel ones!
rewrite rules + great optimiser = win
DPH doesn’t require any special-purpose optimisations
try this in an imperative language...
Stream fusion: dph, bytestring, vector, uvector
Distributed types: dph
don’t
![Page 90: Loop Fusion in Haskell](https://reader034.vdocument.in/reader034/viewer/2022050801/5469969baf79593b558b4d1c/html5/thumbnails/90.jpg)