mit opencourseware 6.189 multicore programming primer, january (iap… · 2020-01-04 · 6.189 iap...
TRANSCRIPT
![Page 1: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/1.jpg)
MIT OpenCourseWare http://ocw.mit.edu
6.189 Multicore Programming Primer, January (IAP) 2007
Please use the following citation format:
Arvind, 6.189 Multicore Programming Primer, January (IAP) 2007. (Massachusetts Institute of Technology: MIT OpenCourseWare). http://ocw.mit.edu (accessed MM DD, YYYY). License: Creative Commons Attribution-Noncommercial-Share Alike.
Note: Please use the actual date you accessed this material in your citation.
For more information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms
![Page 2: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/2.jpg)
6.189 IAP 2007
Lecture 14
Synthesizing Parallel Programs
Prof. Arvind, MIT. 6.189 IAP 2007 MIT
![Page 3: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/3.jpg)
1
Synthesizing parallel programs (or borrowing some ideas fromhardware design)
Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
6.189
January 24, 2007
![Page 4: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/4.jpg)
2
SoC Trajectory: multicores, heterogeneous, regular, ...
On-chip memory banks
Structured on-chip networks
General-purpose
processors
Can we rapidly produce high-quality chips andsurrounding systems and software?
Application-specific
processing units
Image removed due tocopyright restrictions.IBM cell processor.
![Page 5: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/5.jpg)
Plan for this talkMy old way of thinking (up to 1998)� “Where are my threads?” � Not necessarily wrong
My new way of thinking (since July) � “Parallel program module as a resource” � Not necessarily right
Connections with transactional programming, though obvious, not fully explored yet
Acknowledgement: Nirav Dave 3
![Page 6: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/6.jpg)
Only reason for parallel programming used to be performance
This made programming very difficult� Had to know a lot about the machine
� Codes were not portable – endless performance tuning on each machine
� Parallel libraries were not composable
� Difficult to deal with heap structures and memory hierarchy
� Synchronization costs were too high to exploit fine-grain parallelism
How to exploit 100s of threads from software?
4
![Page 7: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/7.jpg)
5
Implicit Parallelism Extract parallelism from programs written in sequential languages � Lot of research over four decades –
limited success
Program in functional languages which may not obscure parallelism in an algorithm
If the algorithm has no parallelism then forget it
Image removed due to copyright restrictions.
![Page 8: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/8.jpg)
6
If parallelism can’t be detected automatically ...
High-level � Data parallel: Fortran 90, HPF, ... � Multithreaded: Id, pH, Cilk,..., Java
Low-level � Message passing: PVM, MPI, ... � Threads & synchronization:
Forks & Joins, Locks, Futures, ...
Design/use new explicitly parallel programming models ...
Works well but not general enough
![Page 9: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/9.jpg)
7
Fully Parallel, Multithreaded Model Global Heap of Shared Objects
Tree of Activation Frames
h:g:
f:
loop
active threads
asynchronous at all levels
Synchronization?
Efficient mappings on architectures proved difficult
![Page 10: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/10.jpg)
My unrealized dream
A time when Freshmen will be taught sequential programming as a special case
of parallel programming
8
![Page 11: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/11.jpg)
9
Has the situation changed?
Yes � Multicores have arrived � Even Microsoft wants to exploit
parallelism � Explosion of cell phones � Explosion of game boxes
Freshmen are going to be hacking game boxes and cell phones
Image removed due tocopyright restrictions.Cellular phone and gamebox with controller. It is all about parallelism now!
![Page 12: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/12.jpg)
10
now ...
Cell phone
Mine sometimes misses a call when I am surfing the web � To what extent the phone call
software should be aware of web surfing software, or vice versa?
� Is it merely a scheduling issue? � Is it a performance issue?
Sequential “modules” are often used in concurrent environments with unforeseen consequences
Image removed due tocopyright restrictions.Cellular phone.
![Page 13: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/13.jpg)
New Goals Synthesis as opposed to Decomposition
Know
how
to d
o t
his
A method of designing and connecting modules such that the functionality and performance are predictable � Must facilitate natural descriptions of concurrent
systems
A method of refining individual modules into hardware or software for SoCs A method of mapping such designs onto “multicores” � Time multiplexing of resources complicates the
problem
11
![Page 14: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/14.jpg)
12
A hardware inspired methodology for “synthesizing” parallel programs
Rule-based specification of behavior (Guarded Atomic Actions) � Lets you think one rule at a time Composition of modules with guarded interfaces
Some examples: � GCD � Airline reservation � Video codec: H.264 � Inserting in an ordered list
Bluespec
Unity – late 80s Chandy & Misra
![Page 15: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/15.jpg)
13
Bluespec: State and Rules organized into modules
All state (e.g., Registers, FIFOs, RAMs, ...) is explicit. Behavior is expressed in terms of atomic actions on the state:
Rule: condition Î action Rules can manipulate state in other modules only via their interfaces.
interface
module
![Page 16: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/16.jpg)
14
Execution model
Repeatedly: Select a rule to execute Compute the state updates Make the state updates
Highly non-deterministic
Primitives are provided to control the selection
![Page 17: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/17.jpg)
Example: Euclid’s GCD A GCD program
GCD(x, y) = if y = 0 then x elseif x>y then GCD(y, x) else GCD(x, y-x)
Execution GCD(6, 15) ⇒ GCD(6, 9) ⇒ GCD(6, 3) ⇒
GCD(3, 6) ⇒ GCD(3, 3) ⇒ GCD(3, 0) ⇒ 3
What does this program mean in a concurrent setting ?
GCD(623971, 150652) + GCD(1543276, 9760552) 15
![Page 18: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/18.jpg)
16
Suppose we want to build a GCD machine (i.e., IP module)
GCD
Parallel invocations? � Recursive calls vs Independent calls
Does the answer come out immediately? In predictable time? Can the machine be shared? Can it be pipelined, i.e., accept another input before the firstone has produced an answer?
These questions arise naturally in hardware design
But these questions are equally valid in a parallel software setting
GCD as a resource
![Page 19: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/19.jpg)
x y
swap sub
17
module mkGCD x <- mkReg(0); y <- mkReg(0);
rule swap when ((x > y) & (y != 0)) ==> x := y | y := x
rule subtract when ((x <= y) & (y != 0)) ==> y := y – x
method start(a,b) when (y==0) ==> x := a | y := b
method result() when (y==0) ==> return (x)
end
GCD in Bluespec
External interface
State Synthesized hardware
Internal behavior
What happened to the recursive calls?
![Page 20: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/20.jpg)
18
rdy enab
int
int rdy
star
tre
sult
GCD
module
int
y == 0
y == 0
implicit conditions
interface I_GCD;method Action start (int a, int b); method int result();
endinterface
GCD Hardware Module t
#(type t)
t
t
t t t
In a GCD call t could be Int#(32),UInt#(16),Int#(13), ...
The module can easily be made polymorphic
Many different implementations, including pure software ones, can provide the same interface
module mkGCD (I_GCD)
![Page 21: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/21.jpg)
19
The Bluespec Language
![Page 22: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/22.jpg)
20
Bluespec: A Language of Atomic Actions A program is a collection of instantiated modules m1 ; m2 ; ...
Module ::= Module name [State variable r] [Rule R a] [Action method g (x) = a] [Read method f (x) = e]
e ::= r | c | t | Op(e , e) | e ? e : e | (t = e in e) | m.f(e) | e when e
a ::= r := e | if e then a | a | a | a ; a | (t = e in a) | m.g(e) | a when e
Conditional actionParallel CompositionSequentialComposition
Method call Guarded action
![Page 23: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/23.jpg)
Guards vs If’s Guards affect the surroundings
(a1 when p1) | a2 ==> (a1 | a2) when p1
Effect of an “if” is local
(if p1 then a1) | a2 ==> if p1 then (a1 | a2) else a2
p1 has no effect on a2
21
![Page 24: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/24.jpg)
22
Airline Reservation
![Page 25: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/25.jpg)
23
Example: Airline reservation a problem posed by Jayadev Misra
Ask quotes from two airlines � If any one quotes below $300, buy
immediately � Buy the lower quote if over $300 � After one minute buy from
whosoever has quoted, otherwise flag error
Solution is easy to express in Misra’s ORC
Express it using threads? Complicated
![Page 26: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/26.jpg)
24
Solution in Bluespec module mkGetQuotes();
define state elements Aquote, Bquote, done, timer
rule getA when !done ==> ... // executes when A responds rule getB ... rule timeout ... rule timer
end
method bookTicket(r) when done ==> A.request(r) | B.request(r) | done := False
w | Aquote := INF | Bquote := INF | timer :=0
method getTicket() when done ==> return (ticket)
“done” also means “not busy”
Straightforward
rule pickCheapest when w !done & (Aquote != INF) & (Bquote != INF) ==> w (if (Aquote < Bquote) then ticket <- A.purchase(Aquote) w else ticket <- B.purchase(Bquote)) w | (done := True)
![Page 27: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/27.jpg)
25
Video Codec: H.264
![Page 28: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/28.jpg)
26
Example: H.264 Decoder NAL
unwrap
Parse +
CAVLC
Inverse Quant
Transformation
Deblock Filter
Intra Prediction
Inter Prediction
Scale / YUV2RGB
Ref FramesA dataflow-like network
May be implemented in hardware or software depending upon ...
![Page 29: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/29.jpg)
Available codes (not multithreaded)
Reference code � 80K lines, awful coding style, slow
ffmpeg code for Linux � 200K lines, mixed with other codecs
Codes don’t reflect the dataflow structure � Pointers to data structures are passed around and
modified. Difficult to figure out which block is modifying which parts
� No model of concurrency. Even the streaming aspect gets obscured by the code
The code can be written in a style which will serve both hardware and software communities.
27
![Page 30: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/30.jpg)
28
H.264 Decoder in Bluespec Work in Progress - Chun-Chieh Lin et al
Lines of Bluespec
Total 9309
NAL unwrap
Parse +
CAVLC
Inverse Quant
Transformation
Deblock Filter
Intra Prediction
Inter Prediction
Scale / YUV2RGB
Ref Frames
171 2871 838
817
2789
996 136
Misc 691
Synthesis results12/15/06 Decodes 720p@18fps Critical path 50Mz Area 5.5 mm sq
Baseline profile
Any module can be implemented in software Each module can be refined separately Behaviors of modules are composable � Good source code for multicores
![Page 31: MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP… · 2020-01-04 · 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007](https://reader033.vdocument.in/reader033/viewer/2022053009/5f0cb3047e708231d436b283/html5/thumbnails/31.jpg)
29
Takeaway Parallel programming should be based on well defined modules and parallel composition of such modules Modules must embody a notion of resources, and consequently, sharing and time-multiplexed reuse Guarded Atomic Actions and Modules with guarded interfaces provide a solid foundation for doing so
Thanks