parallelism: a serious goal or a silly mantra (some half-thought-out ideas)
DESCRIPTION
Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas). Random thoughts on Parallelism. Why the sudden preoccupation with parallelism? The Silliness (or what I call Meganonsense) Break the problem Use half the energy 1000 mickey mouse cores Hardware is sequential - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/1.jpg)
Parallelism: A Serious Goal or a Silly Mantra(some half-thought-out ideas)
![Page 2: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/2.jpg)
Random thoughts on Parallelism
• Why the sudden preoccupation with parallelism?
• The Silliness (or what I call Meganonsense)– Break the problem Use half the energy– 1000 mickey mouse cores– Hardware is sequential– Server throughput (how many pins?)– What about GPUs and Data Base?
• Current bugs to exploiting parallelism (or are they?)– Dark silicon– Amdahl’s Law– The Cloud
• The answer– The fundamental concept vis-à-vis parallelism– What it means re: the transformation hierarchy
![Page 3: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/3.jpg)
Random thoughts on Parallelism
• Why the sudden preoccupation with parallelism?
• The Silliness (or what I call Meganonsense)– Break the problem Use half the energy– 1000 mickey mouse cores– Hardware is sequential– Server throughput (how many pins?)– What about GPUs and Data Base?
• Current bugs to exploiting parallelism (or are they?)– Dark silicon– Amdahl’s Law– The Cloud
• The answer– The fundamental concept vis-à-vis parallelism– What it means re: the transformation hierarchy
![Page 4: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/4.jpg)
It starts with the raw material (Moore’s Law)
• The first microprocessor (Intel 4004), 1971– 2300 transistors– 106 KHz
• The Pentium chip, 1992– 3.1 million transistors– 66 MHz
• Today– more than one billion transistors– Frequencies in excess of 5 GHz
• Tomorrow ?
![Page 5: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/5.jpg)
And what we have done with this raw material
Time
Nu
mb
er o
f T
ran
sist
ors
Cache
Microprocessor
![Page 6: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/6.jpg)
Too many people do not realize:Parallelism did not start with Multi-core
• Pipelining
• Out-of-order Execution
• Multiple operations in a single microinstruction
• VLIW (horizontal microcode exposed to the software)
![Page 7: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/7.jpg)
Random thoughts on Parallelism
• Why the sudden preoccupation with parallelism?
• The Silliness (or what I call Meganonsense)– Break the problem Use half the energy– 1000 mickey mouse cores– Hardware is sequential– Server throughput (how many pins?)– What about GPUs and Data Base?
• Current bugs to exploiting parallelism (or are they?)– Dark silicon– Amdahl’s Law– The Cloud
• The answer– The fundamental concept vis-à-vis parallelism– What it means re: the transformation hierarchy
![Page 8: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/8.jpg)
One thousand mickey mouse cores
• Why not a million? Why not ten million?
• Let’s start with 16– What if we could replace 4 with one more powerful core?
• …and we learned:– One more powerful core is not enough
– Sometimes we need several
– Morphcore was born
– BUT not all morphcore (fixed function vs flexibility)
![Page 9: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/9.jpg)
The Asymmetric Chip Multiprocessor (ACMP)
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Largecore
ACMP Approach
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
“Niagara” Approach
Largecore
Largecore
Largecore
Largecore
“Tile-Large” Approach
![Page 10: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/10.jpg)
Large core vs. Small Core
• Out-of-order• Wide fetch e.g. 4-wide• Deeper pipeline• Aggressive branch
predictor (e.g. hybrid)• Many functional units• Trace cache• Memory dependence
speculation
• In-order• Narrow Fetch e.g. 2-wide• Shallow pipeline• Simple branch predictor
(e.g. Gshare)• Few functional units
LargeCore
SmallCore
![Page 11: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/11.jpg)
0
1
2
3
4
5
6
7
8
9
0 0.2 0.4 0.6 0.8 1
Degree of Parallelism
Sp
eed
up
vs.
1 L
arg
e C
ore
NiagaraTile-LargeACMP
Throughput vs. Serial Performance
![Page 12: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/12.jpg)
Server throughput
• The Good News: Not a software problem– Each core runs its own problem
• The Bad News: How many pins?– Memory bandwidth
• More Bad News: How much energy?– Each core runs its own problem
![Page 13: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/13.jpg)
What about GPUs and Data Base
• In theory, absolutely!
• GPUs (SMT + SIMD + Predication)– Provided there are no conditional branches (Divergence)
– Provided memory accesses line up nicely (Coalescing)
• Data Bases– Provided there are no critical sections
![Page 14: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/14.jpg)
Random thoughts on Parallelism
• Why the sudden preoccupation with parallelism?
• The Silliness (or what I call Meganonsense)– Break the problem Use half the energy– 1000 mickey mouse cores– Hardware is sequential– Server throughput (how many pins?)– What about GPUs and Data Base?
• Current bugs to exploiting parallelism (or are they?)– Dark silicon– Amdahl’s Law– The Cloud
• The answer– The fundamental concept vis-à-vis parallelism– What it means re: the transformation hierarchy
![Page 15: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/15.jpg)
Dark Silicon
• Too many transistors: we can not power them all– All those cores powered down
– All that parallelism wasted
• Not really: The Refrigerator! (aka: Accelerators)– Fork (in parallel)
– Although not all at the same time!
![Page 16: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/16.jpg)
Amdahl’s Law
• The serial bottleneck always limits performance
• Heterogeneous cores AND control over them
can minimize the effect
![Page 17: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/17.jpg)
The Cloud
• It is behind the curtain, how to manage it
• Answer: the on-chip run-time system
• Answer: Pragmas beyond the Cloud
![Page 18: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/18.jpg)
Random thoughts on Parallelism
• Why the sudden preoccupation with parallelism?
• The Silliness (or what I call Meganonsense)– Break the problem Use half the energy– 1000 mickey mouse cores– Hardware is sequential– Server throughput (how many pins?)– What about GPUs and Data Base?
• Current bugs to exploiting parallelism (or are they?)– Dark silicon– Amdahl’s Law– The Cloud
• The answer– The fundamental concept vis-à-vis parallelism– What it means re: the transformation hierarchy
![Page 19: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/19.jpg)
The fundamental concept:
Synchronization
![Page 20: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/20.jpg)
Algorithm
Program
ISA (Instruction Set Arch)
Microarchitecture
Circuits
Problem
Electrons
![Page 21: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/21.jpg)
At every layer we synchronize
• Algorithm: task dependencies
• ISA: sequential control flow (implicit)
• Microarchitecture: ready bits
• Circuit : clock cycle (implicit)
![Page 22: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/22.jpg)
Who understands this?
• Should this be part of students’ parallelism education?
• Where should it come in the curriculum?
• Can students even understand these different layers?
![Page 23: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/23.jpg)
Parallel to Sequential to Parallel
• Guri says: think sequential, execute parallel– i.e. don’t throw away 60 years of computing experience– The original HPS model of out-of-order execution– Synchronization is obvious: restricted data flow
• At the higher level, parallel at larger granularity– Pragmas in JAVA? Who would have thought!– Dave Kuck’s CEDAR project, vintage 1985– Synchronization is necessary: course grain data flow
![Page 24: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/24.jpg)
Can we do more?
• The run-time system – part of the chip design– The chip knows the chip resources– On-chip monitoring can supply information– The run-time system can direct the use of those resources
• The Cloud – the other extreme, and today’s be-all– How do we harness its capability?– What is needed from the hierarchy to make it work
![Page 25: Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)](https://reader035.vdocument.in/reader035/viewer/2022062519/568154a8550346895dc2b79a/html5/thumbnails/25.jpg)
My message
• Parallelism is a serious goal
IF we want to solve the most challenging problems
(Cure cancer, predict tsunamis)
• Telling people to think parallel is nice, but often silly
• Examining the transformation hierarchy
and seeing where we can leverage
seems to me a sounder approach