![Page 1: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/1.jpg)
Parallel Functional ProgrammingLecture 1
John Hughes
![Page 2: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/2.jpg)
Moore’s Law (1965)
”The number of transistors per chip increases by a factor of two every year”
…two years(1975)
![Page 3: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/3.jpg)
Number oftransistors
![Page 4: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/4.jpg)
What shall we do with them all?
Turing Award address, 1978
A computer consists of three parts: a central processing unit (or CPU), a store, and a connecting tube that can transmit a single word between the CPU and the store (and send an address to the store). I propose to call this tube the von Neumann bottleneck.
![Page 5: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/5.jpg)
When one considers that this task must be accomplished entirely by pumping single words back and forth through the von Neumann bottleneck, the reason for its name is clear.
Since the state cannot change during the computation… there are no side effects. Thus independent applications can be evaluated in parallel.
![Page 6: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/6.jpg)
//el programming
is HARD!!
![Page 7: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/7.jpg)
Clock speed
Smallertransistors switch faster
Pipelinedarchitecturespermit faster clocks
![Page 8: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/8.jpg)
Performanceper clock
Cache memory
Superscalarprocessors
Out-of order execution
Speculativeexecution (branchprediction)
Value speculation
![Page 9: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/9.jpg)
Power consumption
Higher clockfrequencyhigher powerconsumption
![Page 10: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/10.jpg)
“By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket nozzle than touching a chip. And soon after 2010, PC chips could feel like the bubbly hot surface of the sun itself.”
—Patrick Gelsinger, Intel’s CTO, 2004
![Page 11: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/11.jpg)
Stableclock
frequency
Stableperf. per
clock
Morecores
![Page 12: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/12.jpg)
The Future is ParallelIntel Xeon12 cores
24 threads
AMD Opteron16 cores
Tilera Gx-3000
100 cores
Azul Systems Vega 3Cores per chip: 54
Cores per system: 864
![Page 13: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/13.jpg)
Why is parallel programming hard?
x = x + 1; x = x + 1;||
0
0
1
0
11
Race conditions lead to incorrect, non-deterministicbehaviour—a nightmare to debug!
![Page 14: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/14.jpg)
x = x + 1;
• Locking is error prone—forgetting to lock leads to errors
• Locking leads to deadlock and other concurrency errors
• Locking is costly—provokes a cache miss (~100 cycles)
![Page 15: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/15.jpg)
It gets worse…
• ”Relaxed” memory consistency
x := 0;x := 1;read y;
y := 0;y := 1;read x;
||
Sees 0 Sees 0
![Page 16: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/16.jpg)
Shared MutableData
![Page 17: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/17.jpg)
Why Functional Programming?
• Data is immutable can be shared without problems!
• No side-effectsparallel computations cannot interfere
• Just evaluate everything in parallel!
![Page 18: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/18.jpg)
A Simple Example
• A trivial function that returns the number ofcalls made—and makes a very large number!
nfib :: Integer -> Integernfib n | n<2 = 1nfib n = nfib (n-1) + nfib (n-2) + 1
n nfib n10 17720 2189125 24278530 2692537
![Page 19: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/19.jpg)
Compiling Parallel Haskell
• Add a main program
• Compile
main = print (nfib 40)
ghc –O2–threaded–rtsopts–eventlog NF.hs
Enable parallelexecution
Enable run-timesystem flags
Enable parallelprofiling
![Page 20: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/20.jpg)
Run the code!NF.exe331160281NF.exe +RTS –N1331160281NF.exe +RTS –N2331160281NF.exe +RTS –N4331160281NF.exe +RTS –N4 –ls331160281
Tell the run-timesystem to use one
core (one OS thread)
Tell the run-timesystem to collect
an event log
![Page 21: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/21.jpg)
Look at the event log!
OBS!If you have trouble with
the latest Haskellplatform and
threadscope, try HaskellPlatform 2012.4
![Page 22: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/22.jpg)
Look at the event log!
![Page 23: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/23.jpg)
What eachcore was
doingCores working: a
maximum of one!
Actual usefulwork
Collectinggarbage—in
parallel!
![Page 24: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/24.jpg)
Explicit Parallelism
par x y• ”Spark” x in parallel with computing y
– (and return y)
• The run-time system may convert a spark intoa parallel task—or it may not
• Starting a task is cheap, but not free
![Page 25: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/25.jpg)
Using par
• Evaluate nf in parallel with the body• Note lazy evaluation: where nf = … binds nf to
an unevaluated expression
import Control.Parallel
nfib :: Integer -> Integernfib n | n < 2 = 1nfib n = par nf (nf + nfib (n-2) + 1)
where nf = nfib (n-1)
![Page 26: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/26.jpg)
Threadscope again…
![Page 27: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/27.jpg)
Benchmarks: nfib 30
• Performance is worse for the parallel version• Performance worsens as we use more HECs!
0100200300400500600
sfibnfibTi
me
in m
s
![Page 28: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/28.jpg)
What’s happening?
• There are only four hyperthreads!• HECs are being scheduled out, waiting for
each other…
5 HECs
![Page 29: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/29.jpg)
With 4 HECs
• Looks better (after some GC at startup)• But let’s zoom in…
![Page 30: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/30.jpg)
Detailed profile
• Lots of idle time!• Very short tasks
![Page 31: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/31.jpg)
Another clue
• Many short-lived tasks
![Page 32: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/32.jpg)
What’s wrong?
• Both tasks start by evaluating nf!• One task will block almost immediately, and
wait for the other• (In the worst case) both may compute nf!
nfib n | n < 2 = 1nfib n = par nf (nf + nfib (n-2) + 1)
where nf = nfib (n-1)
![Page 33: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/33.jpg)
Lazy evaluation in parallel Haskell
n = 29
nfib (n-1)
832040
Zzzz…
![Page 34: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/34.jpg)
Lazy evaluation in parallel Haskell
n = 29
nfib (n-1)
832040
![Page 35: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/35.jpg)
Fixing the bug
• Make sure we don’t wait for nf until afterdoing the recursive call
rfib n | n < 2 = 1rfib n = par nf (rfib (n-2) + nf + 1)
where nf = rfib (n-1)
![Page 36: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/36.jpg)
Much better!
• 2 HECs beat sequential performance• (But hyperthreading is not really paying off)
0100200300400500600
sfibnfibrfib
![Page 37: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/37.jpg)
A bit fragile
• How do we know + evaluates its arguments left-to-right?
• Lazy evaluation makes evaluation order hard to predict… but we must compute rfib (n-2) first
rfib n | n < 2 = 1rfib n = par nf (rfib (n-2) + nf + 1)
where nf = rfib (n-1)
![Page 38: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/38.jpg)
Explicit sequencing
• Evaluate x before y (and return y)
• Used to ensure we get the right evaluationorder
pseq x y
![Page 39: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/39.jpg)
rfib with pseq
• Same behaviour as previous rfib… but no longer dependent on evaluation order of +
rfib n | n < 2 = 1rfib n = par nf1 (pseq nf2 (nf1 + nf2 + 1))where nf1 = rfib (n-1)
nf2 = rfib (n-2)
![Page 40: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/40.jpg)
Spark Sizes
• Most of the sparks are short• Spark overheads may dominate!
Spark size on a log scale
![Page 41: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/41.jpg)
Controlling Granularity
• Let’s go parallel only up to a certain depth
pfib :: Integer -> Integer -> Integerpfib 0 n = sfib npfib _ n | n < 2 = 1pfib d n = par nf1 (pseq nf2 (nf1 + nf2) + 1)where nf1 = pfib (d-1) (n-1)
nf2 = pfib (d-1) (n-2)
![Page 42: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/42.jpg)
Depth 1
• Two sparks—but uneven lengths leads towaste
![Page 43: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/43.jpg)
Depth 2
• Four sparks, but uneven sizes still leave HECsidle
![Page 44: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/44.jpg)
Depth 5
• 32 sparks• Much more even distribution of work
![Page 45: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/45.jpg)
Benchmarks (last year)
0
50
100
150
200
0 1 2 3 4 5 6 7 8 9 10
1 HEC2 HEC3 HEC4 HEC
Best speedup: 1.9x
Tim
e
Depth
![Page 46: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/46.jpg)
On a 4-core i7 this morning
0123456789
1 2 3 4 5 6 7 8
Speed-upMax
![Page 47: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/47.jpg)
Another Example: Sorting
• Classic QuickSort• Divide-and-conquer algorithm
– Parallelize by performing recursive calls in //– Exponential //ism
qsort [] = []qsort (x:xs) = qsort [y | y <- xs, y<x]
++ [x]++ qsort [y | y <- xs, y>=x]
![Page 48: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/48.jpg)
Parallel Sorting
• Same idea: name a recursive call and spark it with par
• I know ++ evaluates it arguments left-to-right
psort [] = []psort (x:xs) = par rest $
psort [y | y <- xs, y<x]++ [x]++ rest
where rest = psort [y | y <- xs, y>=x]
![Page 49: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/49.jpg)
Benchmarking
• Need to run each benchmark many times– Run times vary, depending on other activity
• Need to measure carefully and computestatistics
• A benchmarking library is very useful
![Page 50: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/50.jpg)
Criterion
• cabal install criterion
import Criterion.Main
main = defaultMain[bench "qsort" (nf qsort randomInts),bench "head" (nf (head.qsort) randomInts),bench "psort" (nf psort randomInts)]
randomInts = take 200000 (randoms (mkStdGen 211570155)) :: [Integer]
Import the libraryRun a list of
benchmarks
Name a benchmark
Call fun on arg and evaluate result
Generate a fixed list of random integers as
test data
![Page 51: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/51.jpg)
Results
• Only a 12% speedup—but easy to get!• Note how fast head.qsort is!
0100200300400500600
qsortpsorthead
![Page 52: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/52.jpg)
Results on i7 4-core/8-thread
0200400600800
1 HE
C2
HEC
3 HE
C4
HEC
5 HE
C6
HEC
7 HE
C8
HEC
qsortpsorthead
Best performance with 4 HECs
![Page 53: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/53.jpg)
Speedup on i7 4-core
• Best speedup: 1.39x on four cores
0246
1 HE
C2
HEC
3 HE
C4
HEC
qsortpsortlimit
![Page 54: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/54.jpg)
Too lazy evaluation?
• What would happen if we replaced par rest by par (rnf rest)?
psort [] = []psort (x:xs) = par rest $
psort [y | y <- xs, y<x]++ [x]++ rest
where rest = psort [y | y <- xs, y>=x]
This only evaluates the firstconstructor of the list!
![Page 55: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/55.jpg)
Notice what’s missing
• Thread synchronization• Thread communication• Detecting termination• Distinction between shared and private data• Division of work onto threads• …
![Page 56: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/56.jpg)
Par par everywhere, and not a task toschedule?
• How much speed-up can we get by evaluatingeverything in parallel?
• A ”limit study” simulates a perfect situation:– ignores overheads– assumes perfect knowledge of which values will
be needed– infinitely many cores– gives an upper bound on speed-ups.
• Refinement: only tasks > a threshold time arerun in parallel.
![Page 57: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/57.jpg)
Limit study results
Some programs have next-to-no
parallelism
Some onlyparallelize with
tiny tasks
A few haveoodles of
parallelism
![Page 58: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/58.jpg)
Amdahl’s Law
• The speed-up of a program on a parallelcomputer is limited by the time spent in the sequential part
• If 5% of the time is sequential, the maximum speed-up is 20x
• THERE IS NO FREE LUNCH!
![Page 59: Parallel Functional Programming Lecture 1 · “By mid-decade, that Pentium PC may need the power of a nuclear reactor. By the end of the decade, you might as well be feeling a rocket](https://reader033.vdocument.in/reader033/viewer/2022041603/5e324aab680f2827303c9866/html5/thumbnails/59.jpg)
References• Haskell on a shared-memory multiprocessor, Tim Harris, Simon
Marlow, Simon Peyton Jones, Haskell Workshop, Tallin, Sept 2005. The first paper on multicore Haskell.
• Feedback directed implicit parallelism, Tim Harris and Satnam Singh. The limit study discussed, and a feedback-directed mechanism to increase its granularity.
• Runtime Support for Multicore Haskell, Simon Marlow, Simon Peyton Jones, and Satnam Singh. ICFP'09. An overview of GHC's parallel runtime, lots of optimisations, and lots of measurements.
• Real World Haskell, by Bryan O'Sullivan, Don Stewart, and John Goerzen. The parallel sorting example in more detail.