![Page 1: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/1.jpg)
!"##$ %&'&(!"#)*+)$#)*+,*-./01
6.172 Performance Engineering of Software Systems
© 2008–2018 by the MIT 6.172 Lecturers
LECTURE 6 Multicore Programming Julian Shun
1
![Page 2: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/2.jpg)
Multicore Processors
2
Q Why do semicon-ductor vendors provide chips with multiple processor cores?
A Because of Moore’s Law and the end of the scaling of clock frequency.
Intel Haswell-E
© Intel. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/
© 2008–2018 by the MIT 6.172 Lecturers
![Page 3: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/3.jpg)
Technology Scaling Transistor
count is still 10,000,000
1,000,000
100,000
Transistors x 1000! Clock frequency (MHz)
rising, …
but clock speedis bounded at
~4GHz.
10,000
1,000
100
10
1
1970 1980 1990 2000 2010
© 2008–2018 by the MIT 6.172 Lecturers 3
0
![Page 4: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/4.jpg)
Power Density
Source: Patrick Gelsinger, Intel Developer’s Forum, Intel Corporation, 2004.
Projected power density, if clock frequency had continued its trend of scaling 25%-30% per year.
© Paul Gelsinger at Intel Corporation. All rights reserved. This content is excluded from our Creative Commons license. For more information, see
© 2008–2018 by the MIT 6.172 Lecturers 4 https://ocw.mit.edu/help/faq-fair-use/
![Page 5: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/5.jpg)
Technology Scaling
10,000,000
© 2008–2018 by the MIT 6.172 Lecturers
0
1
10
100
1,000
10,000
100,000
1,000,000
1970 1980 1990 2000 2010
Transistors x 1000 ! Clock frequency (MHz) ! Power (W) " Cores
2010Each generation of
Moore’s Law potentially doubles
the number of cores. 5
![Page 6: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/6.jpg)
Abstract Multicore Architecture
…
Memory I/O
$
P
$
P
$
P
Network
Chip Multiprocessor (CMP) © 2008–2018 by the MIT 6.172 Lecturers 6
![Page 7: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/7.jpg)
OUTLINE
• Shared-Memory Hardware • Concurrency Platforms Pthreads (and WinAPI Threads) Threading Building Blocks OpenMP Cilk Plus
© 2008–2018 by the MIT 6.172 Lecturers 7
![Page 8: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/8.jpg)
Cache Coherence
… Load !
P P P
!"#
© 2008–2018 by the MIT 6.172 Lecturers 8
!"#!"#
![Page 9: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/9.jpg)
Cache Coherence
© 2008–2018 by the MIT 6.172 Lecturers
… P P P
!"# !"#
Load !
!"#!"#
9
![Page 10: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/10.jpg)
Cache Coherence
!"#
… P P P
!"# !"# Load !
!"# !"#
© 2008–2018 by the MIT 6.172 Lecturers 10
![Page 11: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/11.jpg)
Cache Coherence
© 2008–2018 by the MIT 6.172 Lecturers
!"#
… P P P
!"# !"# !"#
Load !
!"#
11
![Page 12: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/12.jpg)
Cache Coherence
… P P P
!"# !"# !"# Store !
!"$
© 2008–2018 by the MIT 6.172 Lecturers 12
!"#
![Page 13: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/13.jpg)
Cache Coherence
!"#
… P P P
!"# !"# !"$ Load ! !"#
Oops!
© 2008–2018 by the MIT 6.172 Lecturers 13
![Page 14: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/14.jpg)
MSI Protocol
Each cache line is labeled with a state: • !: cache block has been modified. No other
caches contain this block in ! or ( states. • ": other caches may be sharing this block. • #: cache block is invalid (same as not there).
!"#$%&' ("#)%&* +"#,%-
("#)%&* !"#,%*
+"#$%.
+"#,%'
+"#$%&/ ("#)%&*
Before a cache modifies a location, the hardware first invalidates all other copies.
© 2008–2018 by the MIT 6.172 Lecturers 14
![Page 15: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/15.jpg)
MSI Protocol
Each cache line is labeled with a state: • !: cache block has been modified. No other
caches contain this block in ! or ( states. • ": other caches may be sharing this block. • #: cache block is invalid (same as not there).
!"#$%&' ("#)%&* +"#,%-
("#)%&* !"#,%*
+"#$%.
+"#,%'
+"#$%&/ ("#)%&*
Store )%0
© 2008–2018 by the MIT 6.172 Lecturers 15
![Page 16: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/16.jpg)
MSI Protocol
Each cache line is labeled with a state: • !: cache block has been modified. No other
caches contain this block in ! or ( states. • ": other caches may be sharing this block. • #: cache block is invalid (same as not there).
!"#$%&' ("#)%&* +"#,%-
("#)%&* !"#,%*
+"#$%.
+"#,%'
+"#$%&/ ("#)%&*
Store )%0
© 2008–2018 by the MIT 6.172 Lecturers 16
![Page 17: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/17.jpg)
MSI Protocol
Each cache line is labeled with a state: • !: cache block has been modified. No other
caches contain this block in ! or - states. • ": other caches may be sharing this block. • #: cache block is invalid (same as not there).
!"#$%&' ("#)%&* ("#+%,
-"#)%&* !"#+%*
("#$%.
("#+%'
("#$%&/ ("#)%&*
Store )%0
© 2008–2018 by the MIT 6.172 Lecturers 17
![Page 18: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/18.jpg)
MSI Protocol
Each cache line is labeled with a state: • !: cache block has been modified. No other
caches contain this block in ! or - states. • ": other caches may be sharing this block. • #: cache block is invalid (same as not there).
!"#$%&' ("#)%&* ("#+%,
!"#)%. !"#+%*
("#$%/
("#+%'
("#$%&0 ("#)%&*
Store )%.
© 2008–2018 by the MIT 6.172 Lecturers 18
![Page 19: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/19.jpg)
MSI Protocol
Each cache line is labeled with a state: • !: cache block has been modified. No other
caches contain this block in ! or - states. • ": other caches may be sharing this block. • #: cache block is invalid (same as not there).
!"#$%&' ("#)%&* ("#+%,
!"#)%. !"#+%*
("#$%/
("#+%'
("#$%&0 ("#)%&*
Store )%.
© 2008–2018 by the MIT 6.172 Lecturers 19
![Page 20: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/20.jpg)
Outline
• Shared-Memory Hardware • Concurrency Platforms Pthreads (and WinAPI Threads) Threading Building Blocks OpenMP Cilk
© 2008–2018 by the MIT 6.172 Lecturers 20
![Page 21: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/21.jpg)
Concurrency Platforms
• Programming directly on processor cores is painful and error-prone.
• A concurrency platform abstracts processor cores, handles synchronization and communication protocols, and performs load balancing.
• Examples Pthreads and WinAPI threads Threading Building Blocks (TBB) OpenMP Cilk
© 2008–2018 by the MIT 6.172 Lecturers 21
![Page 22: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/22.jpg)
Fibonacci Numbers The Fibonacci numbers are the sequence !!, ", ", #, $, %, &, "$, #", $', …", where each number is the sum of the previous two.
Recurrence: (! = !, (" = ", () = ()*" + ()*# for ) > ".
The sequence is named after Leonardo di Pisa (1170– 1250 A.D.), also known as Fibonacci, a contraction of filius Bonaccii —“son of Bonaccio.” Fibonacci’s 1202 book Liber Abaci introduced the sequence to Western mathematics, although it had previously been discovered by Indian mathematicians.
© 2008–2018 by the MIT 6.172 Lecturers 22 Image is in the public domain.
![Page 23: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/23.jpg)
Fibonacci Program !"#$%&'()*"#++,-(./01 !"#$%&'()*.+'"2/01 !"#$%&'()*.+'%"3/01
"#+456+ 7"38"#+456+)#9):) "7 8# *);9):) <(+&<#)#=)
>)(%.( : "#+456+)? @ 7"38#AB9= "#+456+), @)7"38#A;9= <(+&<#)8? C),9=
> >
"#+ DE"#8"#+ E<F$G $0E<)HE<FIJK9): "#+456+)# @)E+2"8E<FIJBK9= "#+456+)<(.&%+)@)7"38#9= -<"#+78LM"32#E$$")27)NL OPQ'45)L)".)NL)OPQ'45)L/R#LG)))
#G)<(.&%+9= <(+&<#)S=
>
Disclaimer to Algorithms Police This recursive program is a poor way to compute the #th Fibonacci number, but it provides for a good didactic example.
23 © 2008–2018 by the MIT 6.172 Lecturers
![Page 24: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/24.jpg)
Fibonacci Execution
!"#$)& !"#$*&
Key idea for parallelization The calculations of !"#$+,)&and !"#$+,(& can be executed simultaneously without mutual interference.
"+-.%/- !"#$"+-.%/-0+&010 "! $+ 20(&010 34-53+0+60
704894 1 "+-.%/-0: ; !"#$+,)&6 "+-.%/-0< ;0!"#$+,(&6 34-53+0$: =0<&6
7 7
© 2008–2018 by the MIT 6.172 Lecturers 24
!"#$%&
!"#$(&
!"#$)& !"#$*& !"#$)&
!"#$'&
!"#$(&
![Page 25: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/25.jpg)
OUTLINE
• Shared-Memory Hardware• Concurrency Platforms
• Pthreads (and WinAPI Threads)• Threading Building Blocks• OpenMP• Cilk
© 2008–2018 by the MIT 6.172 Lecturers 25
![Page 26: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/26.jpg)
Pthreads*
∙ Standard API for threading specified byANSI/IEEE POSIX 1003.1-2008.
∙ Do-it-yourself concurrency platform.∙ Built as a library of functions with “special”
non-C semantics.∙ Each thread implements an abstraction of a
processor, which are multiplexed onto machineresources.
∙ Threads communicate though shared memory.∙ Library functions mask the protocols involved
in interthread coordination.
*WinAPI threads provide similar functionality.© 2008–2018 by the MIT 6.172 Lecturers 26
![Page 27: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/27.jpg)
Key Pthread Functions
© 2008–2018 by the MIT 6.172 Lecturers
!"# $#%&'()*+&'(#', $#%&'()*# -#%&'(). !!"#$%"&#'()'#&$)*)#"(*+"($,#(&#-($,"#.'(
+/"0#1$#%&'()*(##&*# -(##&.1 !!+/0#1$($+(2#$($,"#.'(.$$")/%$#2(34566(*+"('#*.%7$8
2/!)1-,-34"+5,2/!) -5.1 !!"+%$)&#(#9#1%$#'(.*$#"(1"#.$)+&(
2/!)1-(&6 !!.(2)&:7#(.":%;#&$(<.22#'($+(*%&1
5 !!"#$%"&2(#""+"(2$.$%2
!"# $#%&'()*7/!", $#%&'()*# #%&'(). !!)'#&$)*)#"(+*($,"#.'($+(-.)$(*+"
2/!)1--0#(#40 !!$#";)&.$)&:($,"#.'=2(2$.$%2(34566($+():&+"#8
51!!"#$%"&2(#""+"(2$.$%2
27
![Page 28: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/28.jpg)
Pthread Implementation
F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>
?
"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>
"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<
2(.&%+)A)9"5:#;> ?)(%.()<
32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)
HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>
!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>
? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))
#L)2(.&%+;> 2(+&2#)O>
?
28 © 2008–2018 by the MIT 6.172 Lecturers
!"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01
"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)
2(+&2#)#>) ?)(%.( <
"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>
+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>
?)+02(3'832E.>
? ?
![Page 29: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/29.jpg)
Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01
"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)
2(+&2#)#>) ?)(%.( <
"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>
? ?
+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>
?)+02(3'832E.>
F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>
?
"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>
"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<
2(.&%+)A)9"5:#;> ?)(%.()<
32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)
HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>
!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>
? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))
#L)2(.&%+;> 2(+&2#)O>
?
Original code.
29
© 2008–2018 by the MIT 6.172 Lecturers
![Page 30: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/30.jpg)
Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01
"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)
2(+&2#)#>) ?)(%.( <
"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>
? ?
+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>
?)+02(3'832E.>
F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>
?
"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>
"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<
2(.&%+)A)9"5:#;> ?)(%.()<
32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)
HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>
!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>
? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))
#L)2(.&%+;> 2(+&2#)O>
?
Structure for thread
arguments.
30
© 2008–2018 by the MIT 6.172 Lecturers
![Page 31: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/31.jpg)
Pthread Implementation
© 2008–2018 by the MIT 6.172 Lecturers
!"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01
"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)
2(+&2#)#>) ?)(%.( <
"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>
? ?
+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>
?)+02(3'832E.>
F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>
?
"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>
"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<
2(.&%+)A)9"5:#;> ?)(%.()<
32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)
HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>
!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>
? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))
#L)2(.&%+;> 2(+&2#)O>
?
F4"' G-+2;)<
Function called when
thread is created.
31
![Page 32: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/32.jpg)
Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01
"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)
2(+&2#)#>) ?)(%.( <
"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>
? ?
+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>
?)+02(3'832E.>
F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>
?
"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'>+02(3'832E. 32E.> "#+ .+3+&.> "#+678+)2(.&%+>
"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<
2(.&%+)A)9"5:#;> ?)(%.()<
32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)
HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>
!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>
? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))
#L)2(.&%+;> 2(+&2#)O>
?
"#++,-(./0"#++,-(./01-+02(3'/0.+'"4/0.+'%"5/0
"#+678+)
9"5:#9"5:#BC;>9"5:#9"5:#B=;>
"#+-+02(3'8++02(3'832E."#+"#+678+)
"9)"#+678+)"9)"9)
"#+
"#++,-(./0"#++,-(./0
"#+678+)"#+678+)
No point in creating thread if there isn’t
enough to do.
32 © 2008–2018 by the MIT 6.172 Lecturers
![Page 33: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/33.jpg)
Pthread Implementation
F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>
?
"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>
"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<
2(.&%+)A)9"5:#;> ?)(%.()<
32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)
HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>
!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>
? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))
#L)2(.&%+;> 2(+&2#)O>
?
2(+&2#)C>)?.+2+4&%:32EFMCNL)HIJJ
Marshal input argument to
thread.
33 © 2008–2018 by the MIT 6.172 Lecturers
!"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01
"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)
2(+&2#)#>) ?)(%.( <
"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>
? ?
+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>
?)+02(3'832E.>
![Page 34: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/34.jpg)
Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01
"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)
2(+&2#)#>) ?)(%.( <
"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>
? ?
+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>
?)+02(3'832E.>
F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>
?
"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>
"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<
2(.&%+)A)9"5:#;> ?)(%.()<
32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)
HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>
!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>
? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))
#L)2(.&%+;> 2(+&2#)O>
?
-+2;)<
.+3+&.)A)
!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)2(.&%+)A)G;)-+2; "#-&+>
;)<
Create thread to execute 9"5:#ZC;.
34
© 2008–2018 by the MIT 6.172 Lecturers
![Page 35: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/35.jpg)
Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01
"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)
2(+&2#)#>) ?)(%.( <
"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>
? ?
+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>
?)+02(3'832E.>
F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>
?
"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>
"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<
2(.&%+)A)9"5:#;> ?)(%.()<
32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)
HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>
!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>
? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))
#L)2(.&%+;> 2(+&2#)O>
?
F4"'+02(3'832E.;)-+2
32E./"#-&+.+3+&.)
!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9)2(.&%+)!!".$%)"/(0")1+")10+$2")(")+0#%&$)++02(3'832E. ;)-+2;B1B1"#-&+>
;)-+2; 4&+-&+) 9"5:9"5:";>
Main program executes
9"5:#Z=; "#) -323%%(%.
35
© 2008–2018 by the MIT 6.172 Lecturers
![Page 36: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/36.jpg)
Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01
"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)
2(+&2#)#>) ?)(%.( <
"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>
? ?
+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>
?)+02(3'832E.>
F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>
?
"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>
"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<
2(.&%+)A)9"5:#;> ?)(%.()<
32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)
HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>
!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>
? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))
#L)2(.&%+;> 2(+&2#)O>
?
F4"' -+2;)<+02(3'832E. G;)-+2;B1B1"#-&+>;)-+2;B14&+-&+)A)9"5:9"5:";>
32E./"#-&+.+3+&.)
!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9)2(.&%+)!!".$%)"/(0")1+")10+$2")(")+0#%&$)+.+3+&.)
F4"' G-+2;)<
Block until the auxiliary thread
finishes.
36 © 2008–2018 by the MIT 6.172 Lecturers
![Page 37: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/37.jpg)
Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01
"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)
2(+&2#)#>) ?)(%.( <
"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>
? ?
+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>
?)+02(3'832E.>
F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>
?
"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>
"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<
2(.&%+)A)9"5:#;> ?)(%.()<
32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)
HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>
!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!"#$%&"'()"&*+"&*)+$,"&("&+)-%.$&+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>
? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))
#L)2(.&%+;> 2(+&2#)O>
?
2(.&%+)?-2"#+9
2(+&2#)?
Add the results together to produce the final output.
37 © 2008–2018 by the MIT 6.172 Lecturers
![Page 38: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/38.jpg)
The cost of creating a thread >104
cycles ! coarse-grained concurrency. (Thread pools can help.) Fibonacci code gets at most about 1.5 speedup for 2 cores. Need a rewrite for more cores. The Fibonacci logic is no longer neatly encapsulated in the !"#$% function. Programmers must marshal arguments (shades of 1958! ) and engage in error-prone protocols in order to load-balance.
38
Issues with Pthreads
© 2008–2018 by the MIT 6.172 Lecturers
Overhead
Scalability
Modularity
Code Simplicity
![Page 39: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/39.jpg)
Outline
• Shared-Memory Hardware• Concurrency Platforms
• Pthreads (and WinAPI Threads)• Threading Building Blocks• OpenMP• Cilk
© 2008–2018 by the MIT 6.172 Lecturers 39
![Page 40: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/40.jpg)
Threading Building Blocks
! Developed by Intel.! Implemented as a C!! library
that runs on top of nativethreads.
! Programmer specifies tasksrather than threads.
! Tasks are automatically loadbalanced across the threadsusing a work-stealingalgorithm inspired by researchat MIT.
! Focus on performance.
© 2008–2018 by the MIT 6.172 Lecturers 40
![Page 41: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/41.jpg)
Fibonacci in TBB
!"#$%&$'()"*'+)&,--. +/'""&0#-1'"23&*!-/#+&,'"2&4 *!-/#+3
+5$", #$,678,&$.&#$,678,9&+5$",&"!(.&
0#-1'"2:#$,678,&$8;&#$,678,9&"!(8<&3&
$:$8<;&"!(:"!(8<&4=&
,'"29&)>)+!,):<&4& #?:&$&@ A <&4
9"!(&B $. =&)/")&4
#$,678,&>;&C.& 0#-1'"2D ' B 9$)E:&'//5+',)8+F#/G:<&<
0#-1'"2:$HI;&D><.& 0#-1'"2D&- B 9$)E:&'//5+',)8+F#/G:<&<&
0#-1'"2:$HA;&DC<.& !"#$%"&$'()*#:J<.& !+,-*:-<.& !+,-*$,*.$-,/#$&(%$,00:'<.& 9"!(&B >&K&C.
=& L),!L$&MNOO.&
= =.
P#$+/!G)&@+",G#$,Q P#$+/!G)&@#5",L)'(Q P#$+/!G)&R,--S,'"2TFR
#$, ('#$:#$, 'L%+; +F'L&9'L%UVW<&4 #$,678,&L)". #?&:'L%+ @ A<&4&L),!L$&I.&= #$,678,&$ B
",L,5!/:'L%UVIW;&MNOO;&X<. 0#-1'"2D ' B 9$)E:,'"233'//5+',)8L55,:<<
0#-1'"2:$;&DL)"<.& ,'"233!+,-*$%((#$,*.$-,/#:'<.
",G33+5!, @@&R0#-5$'++#&5?&R&@@&$& @@ R&#"&R&@@&L)"&@@&",G33)$G/.
L),!L$&X. =
41 © 2008–2018 by the MIT 6.172 Lecturers
![Page 42: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/42.jpg)
Fibonacci in TBB
!"#$%&$'()"*'+)&,--. +/'""&0#-1'"23&*!-/#+&,'"2&4 *!-/#+3
+5$", #$,678,&$.&#$,678,9&+5$",&"!(.&
0#-1'"2:#$,678,&$8;&#$,678,9&"!(8<&3&
$:$8<;&"!(:"!(8<&4=&
,'"29&)>)+!,):<&4& #?:&$&@ A <&4
9"!(&B $. =&)/")&4
#$,678,&>;&C.& 0#-1'"2D ' B 9$)E:&'//5+',)8+F#/G:<&<
0#-1'"2:$HI;&D><.& 0#-1'"2D&- B 9$)E:&'//5+',)8+F#/G:<&<&
0#-1'"2:$HA;&DC<.& !"#$%"&$'()*#:J<.& !+,-*:-<.& !+,-*$,*.$-,/#$&(%$,00:'<.& 9"!(&B >&K&C.
=& L),!L$&MNOO.&
= =.
A computation organized as explicit tasks.
P#$+/!G)&@+",G#$,Q P#$+/!G)&@#5",L)'(Q P#$+/!G)&R,--S,'"2TFR
#$, ('#$:#$, 'L%+; +F'L&9'L%UVW<&4 #$,678,&L)". #?&:'L%+ @ A<&4&L),!L$&I.&= #$,678,&$ B
",L,5!/:'L%UVIW;&MNOO;&X<. 0#-1'"2D ' B 9$)E:,'"233'//5+',)8L55,:<<
0#-1'"2:$;&DL)"<.& ,'"233!+,-*$%((#$,*.$-,/#:'<.
",G33+5!, @@&R0#-5$'++#&5?&R&@@&$& @@ R&#"&R&@@&L)"&@@&",G33)$G/.
L),!L$&X. =
42
© 2008–2018 by the MIT 6.172 Lecturers
![Page 43: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/43.jpg)
Fibonacci in TBB
!"#$%&$'()"*'+)&,--. +/'""&0#-1'"23&*!-/#+&,'"2&4 *!-/#+3
+5$", #$,678,&$.&#$,678,9&+5$",&"!(.&
0#-1'"2:#$,678,&$8;&#$,678,9&"!(8<&3&
$:$8<;&"!(:"!(8<&4=&
,'"29&)>)+!,):<&4& #?:&$&@ A <&4
9"!(&B $. =&)/")&4
#$,678,&>;&C.& 0#-1'"2D ' B 9$)E:&'//5+',)8+F#/G:<&<
0#-1'"2:$HI;&D><.& 0#-1'"2D&- B 9$)E:&'//5+',)8+F#/G:<&<&
0#-1'"2:$HA;&DC<.& !"#$%"&$'()*#:J<.& !+,-*:-<.& !+,-*$,*.$-,/#$&(%$,00:'<.& 9"!(&B >&K&C.
=& L),!L$&MNOO.&
= =.
0#-1'"2 has an input parameter $ and an output parameter "!(.
P#$+/!G)&@+",G#$,Q P#$+/!G)&@#5",L)'(Q P#$+/!G)&R,--S,'"2TFR
#$, ('#$:#$, 'L%+; +F'L&9'L%UVW<&4 #$,678,&L)". #?&:'L%+ @ A<&4&L),!L$&I.&= #$,678,&$ B
",L,5!/:'L%UVIW;&MNOO;&X<. 0#-1'"2D ' B 9$)E:,'"233'//5+',)8L55,:<<
0#-1'"2:$;&DL)"<.& ,'"233!+,-*$%((#$,*.$-,/#:'<.
",G33+5!, @@&R0#-5$'++#&5?&R&@@&$& @@ R&#"&R&@@&L)"&@@&",G33)$G/.
L),!L$&X. =
43
© 2008–2018 by the MIT 6.172 Lecturers
![Page 44: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/44.jpg)
Fibonacci in TBB
!"#$%&$'()"*'+)&,--. +/'""&0#-1'"23&*!-/#+&,'"2&4 *!-/#+3
+5$", #$,678,&$.&#$,678,9&+5$",&"!(.& 0#-1'"2:#$,678,&$8;&#$,678,9&"!(8<&3&
$:$8<;&"!(:"!(8<&4=&
,'"29&)>)+!,):<&4& #?:&$&@ A <&4
9"!(&B $. =&)/")&4
#$,678,&>;&C.& 0#-1'"2D ' B 9$)E:&'//5+',)8+F#/G:<&<
0#-1'"2:$HI;&D><.& 0#-1'"2D&- B 9$)E:&'//5+',)8+F#/G:<&<&
0#-1'"2:$HA;&DC<.& !"#$%"&$'()*#:J<.& !+,-*:-<.& !+,-*$,*.$-,/#$&(%$,00:'<.& 9"!(&B >&K&C.
=& L),!L$&MNOO.&
= =.
*!-/#+&,'"2&4
.&
;&#$,678,9&"!(8<&
$:$8<;&"!(:"!(8<&4=&
The )>)+!,):<&function performs the
computation when thetask is started.
P#$+/!G)&@+",G#$,Q P#$+/!G)&@#5",L)'(Q P#$+/!G)&R,--S,'"2TFR
#$, ('#$:#$, 'L%+; +F'L&9'L%UVW<&4 #$,678,&L)". #?&:'L%+ @ A<&4&L),!L$&I.&= #$,678,&$ B
",L,5!/:'L%UVIW;&MNOO;&X<. 0#-1'"2D ' B 9$)E:,'"233'//5+',)8L55,:<<
0#-1'"2:$;&DL)"<.& ,'"233!+,-*$%((#$,*.$-,/#:'<.
",G33+5!, @@&R0#-5$'++#&5?&R&@@&$& @@ R&#"&R&@@&L)"&@@&",G33)$G/.
L),!L$&X. =
44
© 2008–2018 by the MIT 6.172 Lecturers
![Page 45: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/45.jpg)
Fibonacci in TBB
!"#$%&$'()"*'+)&,--. +/'""&0#-1'"23&*!-/#+&,'"2&4 *!-/#+3
+5$", #$,678,&$.&#$,678,9&+5$",&"!(.&
0#-1'"2:#$,678,&$8;&#$,678,9&"!(8<&3& $:$8<;&"!(:"!(8<&4=&
,'"29&)>)+!,):<&4& #?:&$&@ A <&4
9"!(&B $. =&)/")&4
#$,678,&>;&C.& 0#-1'"2D ' B 9$)E:&'//5+',)8+F#/G:<&<
0#-1'"2:$HI;&D><.& 0#-1'"2D&- B 9$)E:&'//5+',)8+F#/G:<&<&
0#-1'"2:$HA;&DC<.& !"#$%"&$'()*#:J<.& !+,-*:-<.& !+,-*$,*.$-,/#$&(%$,00:'<.& 9"!(&B >&K&C.
=& L),!L$&MNOO.&
= =.
P#$+/!G)&@+",G#$,Q P#$+/!G)&@#5",L)'(Q P#$+/!G)&R,--S,'"2TFR
#$, ('#$:#$, 'L%+; +F'L&9'L%UVW<&4 #$,678,&L)". #?&:'L%+ @ A<&4&L),!L$&I.&= #$,678,&$ B
",L,5!/:'L%UVIW;&MNOO;&X<. 0#-1'"2D ' B 9$)E:,'"233'//5+',)8L55,:<<
0#-1'"2:$;&DL)"<.& ,'"233!+,-*$%((#$,*.$-,/#:'<.
",G33+5!, @@&R0#-5$'++#&5?&R&@@&$& @@ R&#"&R&@@&L)"&@@&",G33)$G/.
L),!L$&X. =
#$,678,9&"!(8<&3&
$:$8<;&"!(:"!(8<&4=&
P#$+/!G)&@+",G#$,+",G#$,Q
P#$+/!G)&@#5",L)'(#5",L)'(Q
P#$+/!G)&R,--S,'"2TF,--S,'"2TFR
#$, ('#$:#$, 'L%+'L%+; +F'L&9'L%U
#$,678,&L)".
#?&: <&4&<&4& .&=
Recursively create two child tasks, ' and -.
45
© 2008–2018 by the MIT 6.172 Lecturers
![Page 46: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/46.jpg)
Fibonacci in TBB
!"#$%&'()*$+,'"#,-
!"#$%&'()*".+,/(01-
!"#$%&'()2,334,0+5672
"#, 10"#8"#, 0/9$: $70/);0/9<=>?)@ "#,ABC,)/(+D "E)80/9$ * F?)@)/(,&/#)GD)H "#,ABC,)# I
+,/,.&%80/9<=G>:)JKLL:)M?D N"3O0+5P 0 I ;#(Q8,0+5RR0%%.$0,(C/..,8??
N"3O0+58#:)P/(+?D) ,0+5RR!"#$%&'(()&#%*&$#+)80?D
+,'RR$.&, **)2N"3.#0$$").E)2)**)#) ** 2)"+)2)**)/(+)**)+,'RR(#'%D
/(,&/#)MD H
&+"#9)#01(+S0$(),33D $%0++)N"3O0+5R)S&3%"$),0+5)@ S&3%"$R
$.#+, "#,ABC,)#D) "#,ABC,;)$.#+,)+&1D) N"3O0+58"#,ABC,)#C:)"#,ABC,;)+&1C?)R)
#8#C?:)+&18+&1C?)@H)
,0+5;)(T($&,(8?)@) "E8)#)* F ?)@
;+&1)I #D H)(%+()@
"#,ABC,)T:)UD) N"3O0+5P 0 I ;#(Q8)0%%.$0,(C$7"%'8?)?
N"3O0+58#VG:)PT?D) N"3O0+5P)3 I ;#(Q8)0%%.$0,(C$7"%'8?)?)
N"3O0+58#VF:)PU?D) !,)&',-&.(/%)8W?D) !"#$%83?D) !"#$%&#%*&$#+)&-('P?D) ;+&1)I T)X)UD
H) /(,&/#)JKLLD)
H HD
!"#$%&'() $+,'"#,$+,'"#,-
!"#$%&'()*".+,/(01".+,/(01-
!"#$%&'()2,334,0+567,334,0+5672
"#, 10"#8"#, 0/9$0/9$: $70/);0/9<=>?)@=>?)@
"#,ABC,)/(+D
"E)80/9$ * F?)@)?)@)/(,&/#)GD)H
"#,ABC,)# I
+,/,.&%80/9<=+,/,.&%80/9<=G>:)JKLL:)M?D
N"3O0+5P 0 I ;I ;#(Q8,0+5RR0%%.$0,(C/..,8??RR0%%.$0,(C/..,8??
N"3O0+58#:)PP/(+?D)
,0+5)@
"#,ABC,;)+&1C?)R)
#8#C?:)+&18+&1C?)@H)
8)0%%.$0,(C$7"%'8?)?8)0%%.$0,(C$7"%'8?)?
N"3O0+58#N"3O0+58#N"3O0+58#N"3O0+58#VGG:)PPT?D)T?D)
8)0%%.$0,(C$7"%'8?)?)8)0%%.$0,(C$7"%'8?)?)
N"3O0+5N"3O0+58#VF:)PU?D)
Set the number of tasks to wait for (2 children + 1
implicit for bookkeeping).
46 © 2008–2018 by the MIT 6.172 Lecturers
![Page 47: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/47.jpg)
Fibonacci in TBB
!"#$%&'()*$+,'"#,-
!"#$%&'()*".+,/(01-
!"#$%&'()2,334,0+5672
"#, 10"#8"#, 0/9$: $70/);0/9<=>?)@ "#,ABC,)/(+D "E)80/9$ * F?)@)/(,&/#)GD)H "#,ABC,)# I
+,/,.&%80/9<=G>:)JKLL:)M?D N"3O0+5P 0 I ;#(Q8,0+5RR0%%.$0,(C/..,8??
N"3O0+58#:)P/(+?D) ,0+5RR!"#$%&'(()&#%*&$#+)80?D
+,'RR$.&, **)2N"3.#0$$").E)2)**)#) ** 2)"+)2)**)/(+)**)+,'RR(#'%D
/(,&/#)MD H
&+"#9)#01(+S0$(),33D $%0++)N"3O0+5R)S&3%"$),0+5)@ S&3%"$R
$.#+, "#,ABC,)#D) "#,ABC,;)$.#+,)+&1D) N"3O0+58"#,ABC,)#C:)"#,ABC,;)+&1C?)R)
#8#C?:)+&18+&1C?)@H)
,0+5;)(T($&,(8?)@) "E8)#)* F ?)@
;+&1)I #D H)(%+()@
"#,ABC,)T:)UD) N"3O0+5P 0 I ;#(Q8)0%%.$0,(C$7"%'8?)?
N"3O0+58#VG:)PT?D) N"3O0+5P)3 I ;#(Q8)0%%.$0,(C$7"%'8?)?)
N"3O0+58#VF:)PU?D) !,)&',-&.(/%)8W?D) !"#$%83?D) !"#$%&#%*&$#+)&-('P?D) ;+&1)I T)X)UD
H) /(,&/#)JKLLD)
H HD
!"#$%&'()*$+,'"#,$+,'"#,-
!"#$%&'()*".+,/(01".+,/(01-
!"#$%&'()2,334,0+567,334,0+5672
"#, 10"#8"#, 0/9$0/9$: $70/);0/9<=>?)@
"#,ABC,)/(+D
"E)80/9$ * F?)@)?)@)/(,&/#)GD)H
"#,ABC,)# I
+,/,.&%80/9<=+,/,.&%80/9<=G>:)JKLL:)M?D
N"3O0+5P 0 I ;I ;#(Q8,0+5RR0%%.$0,(C/..,8??
N"3O0+58#:)
,0+5RR!"#$%&'(()&#%*&$#+)!"#$%&'(()&#%*&$#+)80?D
S&3%"$),0+5)@
#D)
$.#+,)+&1D)
"#,ABC,)#C:)"#,ABC,;)+&1C?)R)
#8#C?:)+&18+&1C?)@H)
8?)@)
?)@
:)UD)
0 I ;#(Q8)0%%.$0,(C$7"%'8?)?8)0%%.$0,(C$7"%'8?)?
N"3O0+58#N"3O0+58#N"3O0+58#N"3O0+58#VGG:)PPT?D)T?D)
3 I ;#(Q8)0%%.$0,(C$7"%'8?)?)8)0%%.$0,(C$7"%'8?)?)
N"3O0+5N"3O0+58#VF:)PU?D)
!,)&',-&.(/%)8W?D)
Start task 3.
47 © 2008–2018 by the MIT 6.172 Lecturers
![Page 48: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/48.jpg)
Fibonacci in TBB
!"#$%&'()*$+,'"#,-
!"#$%&'()*".+,/(01-
!"#$%&'()2,334,0+5672
"#, 10"#8"#, 0/9$: $70/);0/9<=>?)@ "#,ABC,)/(+D "E)80/9$ * F?)@)/(,&/#)GD)H "#,ABC,)# I
+,/,.&%80/9<=G>:)JKLL:)M?D N"3O0+5P 0 I ;#(Q8,0+5RR0%%.$0,(C/..,8??
N"3O0+58#:)P/(+?D) ,0+5RR!"#$%&'(()&#%*&$#+)80?D
+,'RR$.&, **)2N"3.#0$$").E)2)**)#) ** 2)"+)2)**)/(+)**)+,'RR(#'%D
/(,&/#)MD H
&+"#9)#01(+S0$(),33D $%0++)N"3O0+5R)S&3%"$),0+5)@ S&3%"$R
$.#+, "#,ABC,)#D) "#,ABC,;)$.#+,)+&1D) N"3O0+58"#,ABC,)#C:)"#,ABC,;)+&1C?)R)
#8#C?:)+&18+&1C?)@H)
,0+5;)(T($&,(8?)@) "E8)#)* F ?)@
;+&1)I #D H)(%+()@
"#,ABC,)T:)UD) N"3O0+5P 0 I ;#(Q8)0%%.$0,(C$7"%'8?)?
N"3O0+58#VG:)PT?D) N"3O0+5P)3 I ;#(Q8)0%%.$0,(C$7"%'8?)?)
N"3O0+58#VF:)PU?D) !,)&',-&.(/%)8W?D) !"#$%83?D) !"#$%&#%*&$#+)&-('P?D) ;+&1)I T)X)UD
H) /(,&/#)JKLLD)
H HD
!"#$%&'() $+,'"#,$+,'"#,-
!"#$%&'()*".+,/(01".+,/(01-
!"#$%&'()2,334,0+567,334,0+5672
"#, 10"#8"#, 0/9$0/9$: $70/);0/9<=>?)@=>?)@
"#,ABC,)/(+D
"E)80/9$ * F?)@)?)@)/(,&/#)GD)H
"#,ABC,)# I
+,/,.&%80/9<=+,/,.&%80/9<=G>:)JKLL:)M?D
N"3O0+5P 0 I ;I ;#(Q8,0+5RR0%%.$0,(C/..,8??RR0%%.$0,(C/..,8??
N"3O0+58#:)PP/(+?D)
,0+5RR!"#$%&'(()&#%*&$#+)!"#$%&'(()&#%*&$#+)80?D
+&1C?)R)
8)0%%.$0,(C$7"%'8?)?
N"3O0+58#N"3O0+58#VGG:)PPT?D)T?D)
8)0%%.$0,(C$7"%'8?)?)
8#VF:)PU?D)
Start task 0 and wait for both 0 and 3 to finish.
48 © 2008–2018 by the MIT 6.172 Lecturers
![Page 49: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/49.jpg)
Fibonacci in TBB
!"#$%&'()*$+,'"#,-
!"#$%&'()*".+,/(01-
!"#$%&'()2,334,0+5672
"#, 10"#8"#, 0/9$: $70/);0/9<=>?)@ "#,ABC,)/(+D "E)80/9$ * F?)@)/(,&/#)GD)H "#,ABC,)# I
+,/,.&%80/9<=G>:)JKLL:)M?D N"3O0+5P 0 I ;#(Q8,0+5RR0%%.$0,(C/..,8??
N"3O0+58#:)P/(+?D) ,0+5RR!"#$%&'(()&#%*&$#+)80?D
+,'RR$.&, **)2N"3.#0$$").E)2)**)#) ** 2)"+)2)**)/(+)**)+,'RR(#'%D
/(,&/#)MD H
&+"#9)#01(+S0$(),33D $%0++)N"3O0+5R)S&3%"$),0+5)@ S&3%"$R
$.#+, "#,ABC,)#D) "#,ABC,;)$.#+,)+&1D) N"3O0+58"#,ABC,)#C:)"#,ABC,;)+&1C?)R)
#8#C?:)+&18+&1C?)@H)
,0+5;)(T($&,(8?)@) "E8)#)* F ?)@
;+&1)I #D H)(%+()@
"#,ABC,)T:)UD) N"3O0+5P 0 I ;#(Q8)0%%.$0,(C$7"%'8?)?
N"3O0+58#VG:)PT?D) N"3O0+5P)3 I ;#(Q8)0%%.$0,(C$7"%'8?)?)
N"3O0+58#VF:)PU?D) !,)&',-&.(/%)8W?D) !"#$%83?D) !"#$%&#%*&$#+)&-('P?D) ;+&1)I T)X)UD
H) /(,&/#)JKLLD)
H HD
!"#$%&'()*$+,'"#,$+,'"#,-
!"#$%&'()*".+,/(01".+,/(01-
!"#$%&'()2,334,0+567,334,0+5672
"#, 10"#8"#, 0/9$0/9$: $70/);0/9<=>?)@=>?)@
"#,ABC,)/(+D
"E)80/9$ * F?)@)?)@)/(,&/#)GD)H
"#,ABC,)# I
+,/,.&%80/9<=+,/,.&%80/9<=G>:)JKLL:)M?D
N"3O0+5P 0 I ;I ;#(Q8,0+5RR0%%.$0,(C/..,8??RR0%%.$0,(C/..,8??
N"3O0+58#:)PP/(+?D)
,0+5RR!"#$%&'(()&#%*&$#+)!"#$%&'(()&#%*&$#+)80?D
+,'RR$.&, **)2N"3.#0$$").E)2)**)#)
2)"+)2)2)"+)2) +,'RR(#'%RR(#'%
S&3%"$),0+5)@
D)
:)"#,ABC,;)+&1C?)R)
#8#C?:)+&18+&1C?)@H)
#(Q8)0%%.$0,(C$7"%'8?)?8)0%%.$0,(C$7"%'8?)?
N"3O0+58#N"3O0+58#N"3O0+58#N"3O0+58# :)PPT?D)T?D)
#(Q8)0%%.$0,(C$7"%'8?)?)8)0%%.$0,(C$7"%'8?)?)
N"3O0+5N"3O0+58#VF:)PU?D)
W?D)
!"#$%&#%*&$#+)&-('P?D)80?D)
Add the results together to produce the final output.
49 © 2008–2018 by the MIT 6.172 Lecturers
![Page 50: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/50.jpg)
Fibonacci in TBB
!"#$%&'()*$+,'"#,-
!"#$%&'()*".+,/(01-
!"#$%&'()2,334,0+5672
"#, 10"#8"#, 0/9$: $70/);0/9<=>?)@ "#,ABC,)/(+D "E)80/9$ * F?)@)/(,&/#)GD)H "#,ABC,)# I
+,/,.&%80/9<=G>:)JKLL:)M?D N"3O0+5P 0 I ;#(Q8,0+5RR0%%.$0,(C/..,8??
N"3O0+58#:)P/(+?D) ,0+5RR!"#$%&'(()&#%*&$#+)80?D
+,'RR$.&, **)2N"3.#0$$").E)2)**)#) ** 2)"+)2)**)/(+)**)+,'RR(#'%D
/(,&/#)MD H
&+"#9)#01(+S0$(),33D $%0++)N"3O0+5R)S&3%"$),0+5)@ S&3%"$R
$.#+, "#,ABC,)#D) "#,ABC,;)$.#+,)+&1D) N"3O0+58"#,ABC,)#C:)"#,ABC,;)+&1C?)R)
#8#C?:)+&18+&1C?)@H)
,0+5;)(T($&,(8?)@) "E8)#)* F ?)@
;+&1)I #D H)(%+()@
"#,ABC,)T:)UD) N"3O0+5P 0 I ;#(Q8)0%%.$0,(C$7"%'8?)?
N"3O0+58#VG:)PT?D) N"3O0+5P)3 I ;#(Q8)0%%.$0,(C$7"%'8?)?)
N"3O0+58#VF:)PU?D) !,)&',-&.(/%)8W?D) !"#$%83?D) !"#$%&#%*&$#+)&-('P?D) ;+&1)I T)X)UD
H) /(,&/#)JKLLD)
H HD
!"#$%&'()
!"#$%&'()
"#,
I ;#(Q8)0%%.$0,(C$7"%'8?)?8)0%%.$0,(C$7"%'8?)?8)0%%.$0,(C$7"%'8?)?
N"3O0+58#N"3O0+58#N"3O0+58#N"3O0+58#VGG:)PPT?D)T?D)
I ;#(Q8)0%%.$0,(C$7"%'8?)?)8)0%%.$0,(C$7"%'8?)?)
N"3O0+5N"3O0+58#VF:)PU?D)
8W?D)
Create root task; spawn and wait.
50 © 2008–2018 by the MIT 6.172 Lecturers
![Page 51: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/51.jpg)
Other TBB Features
∙ TBB provides many C++ templates to expresscommon patterns simply, such as• parallel_for for loop parallelism,• parallel_reduce for data aggregation,• pipeline and filter for software pipelining.
∙ TBB provides concurrent container classes, whichallow multiple threads to safely access andupdate items in the container concurrently.
∙ TBB also provides a variety of mutual-exclusionlibrary functions, including locks and atomicupdates.
© 2008–2018 by the MIT 6.172 Lecturers 51
![Page 52: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/52.jpg)
Outline
• Shared-Memory Hardware• Concurrency Platforms
• Pthreads (and WinAPI Threads)• Threading Building Blocks• OpenMP• Cilk
© 2008–2018 by the MIT 6.172 Lecturers 52
![Page 53: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/53.jpg)
OpenMP
∙ Specification by an industry consortium.∙ Several compilers available, both open-
source and proprietary, including GCC, ICC,Clang, and Visual Studio.
∙ Linguistic extensions to C/C++ and Fortran inthe form of compiler pragmas.
∙ Runs on top of native threads.∙ Supports loop parallelism, task parallelism,
and pipeline parallelism.
© 2008–2018 by the MIT 6.172 Lecturers 53
![Page 54: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/54.jpg)
Fibonacci in OpenMP
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,* /0#1/"*"2*
3*0450 , !"#$%&#*67*82
9:/;<=; !"# $%&'(&)%*+,-./01 6 > '!()"?@+2
9:/;<=; !"# $%&'(&)%*+,-2/01 8 >*'!()"?.+2
9:/;<=; !"# $%&'3%4$ /0#1/"*)6 A*8+2
3 3
© 2008–2018 by the MIT 6.172 Lecturers 54
![Page 55: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/55.jpg)
Fibonacci in OpenMP
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,* /0#1/"*"2*
3*0450 , !"#$%&#*67*82
9:/;<=; !"# $%&'(&)%*+,-./01 6 > '!()"?@+2
9:/;<=; !"# $%&'(&)%*+,-2/01 8 >*'!()"?.+2
9:/;<=; !"# $%&'3%4$ /0#1/"*)6 A*8+2
3 3
Compiler directive.
© 2008–2018 by the MIT 6.172 Lecturers 55 56
![Page 56: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/56.jpg)
Fibonacci in OpenMP
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,* /0#1/"*"2*
3*0450 , !"#$%&#*67*82
9:/;<=; !"# $%&'(&)%*+,-./01 6 > '!()"?@+2
9:/;<=; !"# $%&'(&)%*+,-2/01 8 >*'!()"?.+2
9:/;<=; !"# $%&'3%4$ /0#1/"*)6 A*8+2
3 3
The following statement is an
independent task.
© 2008–2018 by the MIT 6.172 Lecturers 56
![Page 57: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/57.jpg)
Fibonacci in OpenMP
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,* /0#1/"*"2*
3*0450 , !"#$%&#*67*82
9:/;<=; !"# $%&'(&)%*+,-./01 6 > '!()"?@+2
9:/;<=; !"# $%&'(&)%*+,-2/01 8 >*'!()"?.+2
9:/;<=; !"# $%&'3%4$ /0#1/"*)6 A*8+2
3 3
Sharing of memory is managed explicitly.
© 2008–2018 by the MIT 6.172 Lecturers 57
![Page 58: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/58.jpg)
Fibonacci in OpenMP
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,* /0#1/"*"2*
3*0450 , !"#$%&#*67*82
9:/;<=; !"# $%&'(&)%*+,-./01 6 > '!()"?@+2
9:/;<=; !"# $%&'(&)%*+,-2/01 8 >*'!()"?.+2
9:/;<=; !"# $%&'3%4$ /0#1/"*)6 A*8+2
3 3 Wait for the two
tasks to complete before continuing.
© 2008–2018 by the MIT 6.172 Lecturers 58
![Page 59: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/59.jpg)
Other OpenMP Features
∙ OpenMP provides many pragma directivesto express common patterns, such as• parallel for for loop parallelism,• reduction for data aggregation,• directives for scheduling and data sharing.
∙ OpenMP supplies a variety ofsynchronization constructs, such asbarriers, atomic updates, and mutual-exclusion (mutex) locks.
© 2008–2018 by the MIT 6.172 Lecturers 59
![Page 60: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/60.jpg)
Outline
• Shared-Memory Hardware • Concurrency Platforms Pthreads (and WinAPI Threads) Threading Building Blocks OpenMP Cilk
© 2008–2018 by the MIT 6.172 Lecturers 60
![Page 61: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/61.jpg)
Intel Cilk Plus
∙ The “Cilk” part is a small set of linguistic extensionsto C/C++ to support fork-join parallelism. (The“Plus” part supports vector parallelism.)
∙ Developed originally by Cilk Arts, an MIT spin-off,which was acquired by Intel in July 2009.
∙ Based on the award-winning Cilk multithreadedlanguage developed at MIT.
∙ Features a provably efficient work-stealing scheduler.∙ Provides a hyperobject library for parallelizing code
with global variables.∙ Ecosystem includes the Cilkscreen race detector and
Cilkview scalability analyzer.
© 2008–2018 by the MIT 6.172 Lecturers 61
![Page 62: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/62.jpg)
Tapir/LLVM and Cilk
6.172 will be using the Tapir/LLVM compiler, which supports the Cilk subset of Cilk Plus. ● Tapir/LLVM was developed at MIT by Tao B. Schardl,
William Moses, and Charles Leiserson.● Tapir/LLVM generally produces better code relative
to its base compiler than other implementations ofCilk.
● Tapir/LLVM uses Intel’s Cilk Plus runtime system.● Tapir/LLVM also supports more general features,
such as the spawning of code blocks.
© 2008–2018 by the MIT 6.172 Lecturers 62
![Page 63: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/63.jpg)
Nested Parallelism in Cilk
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*
/0#1/"*"2* 3*0450 ,
!"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2
3 3
!"#$%&#*!"#$%&#*"+*,*+*,*2*
67*82!"#$%&'()* '!()":;+2+2
+*,*The named child function may execute in parallel with the parent caller.
.+22***<*<*88+2+2+2+2
Control cannot passthis point until all spawned children have returned.
Cilk keywords grant permission for parallel execution. They do not command parallel execution.
© 2008–2018 by the MIT 6.172 Lecturers 63
![Page 64: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/64.jpg)
Loop Parallelism in Cilk
Example: a11 a12 ! a1n a11 a21 ! an1
In-place a21 a22 ! a2n a12 a22 ! an2
matrix " " # " " " # "transpose an1 an2 ! ann a1n a2n ! ann
A AT
The iterations of a !"#$%&'( loopexecute in parallel.
!!"#$%#&'(")*$"+),-"./"$,0"1 !"#$%&'( )"*+ ",-./"0*./11"2/3
&'(/)"*+/4,5./40"./1142/3 6'78#9/+9:;/, <=">=4>. <=">=4>/, <=4>=">. <=4>=">/, +9:;.
? ?
© 2008–2018 by the MIT 6.172 Lecturers 64
![Page 65: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/65.jpg)
Reducers in Cilk
Example: Parallel summation
!"#$%"&'()*"%(#!+ ,(-.!"#$%&'( /$"0 $,-.($1".(22$3(4
#!+(2,($.567$"08/9:';"9<#!+3.
!"#$%"&'()*"%(#!+ ,(-. 8*7 /$"0 $,-.($1".(22$3(4
#!+(2,($. 5 67$"08/9:';"9<#!+3.
=>?@A=ABCDE=CBAFGHDD/#!+<(!"#$%"&'()*"%<(-3. =>?@A=ABCI>JKCBABCDE=CB/#!+3. L$)MA8*7/$"0 $,-.($1".(22$3(4
BCDE=CBAN>CO/#!+3(2,($. 5 67$"08/9KP&(#!+($#(:8;"9<(BCDE=CBAN>CO/#!+33. =>?@A=AEQBCI>JKCBABCDE=CB/#!+3.
© 2008–2018 by the MIT 6.172 Lecturers 65
![Page 66: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/66.jpg)
Reducers in Cilk
Reducers can be created for monoids (algebraic structures with an associative binary operation and an identity element)
Cilk has several predefined reducers (add, multiply, min, max, and, or, xor, etc.)
© 2008–2018 by the MIT 6.172 Lecturers 66
![Page 67: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/67.jpg)
Serial Semantics Cilk source serial elision
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*
/0#1/"*"2* 3*0450 ,
!"#$%&#*67*82 6 9 !"#$%)*+,- '!()":;+2 8 9*'!()":.+2 !"#$%).-!2*** /0#1/"*)6 <*8+2
3 3
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*
/0#1/"*"2* 3*0450 ,
!"#$%&#*67*82 6 9 '!()":;+2 8 9*'!()":.+2
/0#1/"*)6 <*8+2 3
3
The serial elision of a Cilk program is always a legal interpretation of the program’s semantics.
Remember, Cilk keywords grant permission for parallel execution. They do not command parallel execution.
=>0'!"0 !"#$%&'( '?/ =>0'!"0 !"#$%)*+,-=>0'!"0 !"#$%).-!
To obtain the serial elision:
© 2008–2018 by the MIT 6.172 Lecturers 67
![Page 68: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/68.jpg)
Scheduling
! The Cilk concurrencyplatform allows the programmer to express logical parallelism in an application.
! The Cilk scheduler mapsthe executing programonto the processor coresdynamically at runtime.
! Cilk’s work-stealingscheduling algorithm isprovably efficient.
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*
/0#1/"*"2* 3*0450 ,
!"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2
3 3
…
Memory I/O
$
P
$
P
$
P
$$ $$ $$
Network
© 2008–2018 by the MIT 6.172 Lecturers 68
![Page 69: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/69.jpg)
Cilk Platform
© 2008–2018 by the MIT 6.172 Lecturers
Parallel performance
Cilk compiler
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*/0#1/"*"2*3* 0450 , !"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2
3 3 Cilk source
P! P P
Binary
Programinput
69
![Page 70: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/70.jpg)
Serial Testing
Reliable single-threaded code
C/C++ compiler
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*/0#1/"*"2*3* 0450 , !"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2
3 3 Cilk source
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*/0#1/"*"2*3* 0450 , !"#$%&# 6*9 '!()":;+2 !"#$%&# 8 9*'!()":.+2 /0#1/"*)6 <*8+2
3 3 Serial elision
Binary
P Serial regression
tests
70 © 2008–2018 by the MIT 6.172 Lecturers
![Page 71: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/71.jpg)
Alternative Serial Testing
The parallel programexecuting on one core should behave exactly the same as the execution of the serial elision.
© 2008–2018 by the MIT 6.172 Lecturers
Reliable single-threaded code
Cilk compiler
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*/0#1/"*"2*3* 0450 , !"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2
3 3 Cilk source
P
Binary
Serial regression
tests
71
![Page 72: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/72.jpg)
Parallel Testing
Cilksan finds and localizes determinacy races.
© 2008–2018 by the MIT 6.172 Lecturers
Reliable multi-threaded code
Cilk compiler with Cilksan
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*/0#1/"*"2*3* 0450 , !"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2
3 3 Cilk source
Parallel regression
tests P
Binary
72
![Page 73: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/73.jpg)
Scalability Analysis
Cilkscale analyzes how well your program will scale to larger machines.
© 2008–2018 by the MIT 6.172 Lecturers
Scalability report
Cilk compiler with Cilkscale
!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*/0#1/"*"2*3* 0450 , !"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2
3 3 Cilk source
Program input
P
Binary
73
![Page 74: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/74.jpg)
Summary
• Processors today have multiple cores, andobtaining high performance requires parallelprogramming
• Programming directly on processor cores ispainful and error-prone.
• Cilk abstracts processor cores, handlessynchronization and communication protocols,and performs provably efficient load balancing.
• Project 2: Parallel screen saver using Cilk
© 2008–2018 by the MIT 6.172 Lecturers 74
![Page 75: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because](https://reader033.vdocument.in/reader033/viewer/2022060521/60506df787c785253e41b248/html5/thumbnails/75.jpg)
MIT OpenCourseWare https://ocw.mit.edu
6.172 Performance Engineering of Software Systems Fall 2018
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.
75