processor level parallelism. improving the pipeline pipelined processor – ideal speedup = num...
TRANSCRIPT
![Page 1: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/1.jpg)
Processor Level Parallelism
![Page 2: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/2.jpg)
Improving the Pipeline
• Pipelined processor– Ideal speedup = num stages– Branches / conflicts mean limited returns after certain
point
![Page 3: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/3.jpg)
ILP
• Instruction Level Parallelism– Ability to run multiple instructions at the same
time
![Page 4: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/4.jpg)
Superscalar
• Superscalar : capable of running multiple instructions at a time– Multiple execution units• Widen slowest part of pipeline
![Page 5: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/5.jpg)
Superscalar
• Multi-issue : Start multiple instructions per clock– Parallel pipes
![Page 6: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/6.jpg)
Superscalar
• Multi-issue pipeline feeding multiple execution units
![Page 7: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/7.jpg)
Superscalar
• Issue:Dependency issues just got MUCH harder…
![Page 8: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/8.jpg)
Superscalar Pro/Con
• Good– The hardware solves everything:• Hardware solves scheduling/registers/etc…• Compiler can still help matters
– Binary compatibility• New hardware issues old instructions in a more
efficient way
• Bad– Complex hardware– Limit to scale
![Page 9: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/9.jpg)
VLIW
• VLIW : Very Large Instruction Word– One instruction contains multiple ops
![Page 10: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/10.jpg)
VLIW
• Instructions VERY large– 240 bits?– Wasted space addressed by bundles• No dependencies within bundle
![Page 11: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/11.jpg)
Who does work?
• Compiler assembles long instructions– Reorders at compile time
• Compiler has more time,information
![Page 12: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/12.jpg)
VLIW Uses
• Itanium : – EPIC : Explicitly Parallel Computing– 3 instruction bundles
![Page 13: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/13.jpg)
VLIW Pro/Con
• Good– Simple hardware• Add new functional units with no new scheduling
hardware
– Better optimization in compiler
• Bad– Binary compatibility : compiler builds for one
specific hardware– Good compilers are HARD to write
![Page 14: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/14.jpg)
ARM 15
• Modern CPU:
![Page 15: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/15.jpg)
Processor Parallelism
• Process Parallelism : Run multiple instruction streams simultaneously
![Page 16: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/16.jpg)
Process vs Thread
• Process : Program– Own memory space– Has at least one
thread
![Page 17: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/17.jpg)
Process vs Thread
• Thread : Instruction sequence– Own registers/stack– Share memory
with otherthreads in process
![Page 18: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/18.jpg)
Threaded Code
• Demo…
![Page 19: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/19.jpg)
Context Switching
• Four threads running in 4-wide pipeline– Can't always fill all 4 issue slots– Have bubbles from memory access, page faults,
etc…
![Page 20: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/20.jpg)
Context Switching
• Threads often have bubbles…
![Page 21: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/21.jpg)
Multithreading
• MultithreadingAlternate threads to maximize hardware use– Course : run until stall, then switch
– Fine : switch every cycle
– Either one needs extra hardware
![Page 22: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/22.jpg)
Multithreading Superscalar
• A 2-instruction wide pipeline with multithreading:– Still only one process per cycle
Fine grained Course grained
![Page 23: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/23.jpg)
SMT
• SMT : Simultaneous Multithreading– AKA Hyperthreading
• Issue ops from multiple threads in one cycle
• Maximize use of functional units– But need to track registers each instruction goes
with…
![Page 24: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/24.jpg)
SMT Challenges
• Resources must be duplicated or split– Split too thin hurts performance…– Duplicate everything and you aren't maximizing
use of hardware…
![Page 25: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/25.jpg)
Intel vs AMD
• Variations on SMT
![Page 26: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/26.jpg)
Getting Faster
• Pipelining helps to a point• Superscalar/VLIW helps to a point• SMT helps a bit• Chips getting faster
![Page 27: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/27.jpg)
Getting Faster
• Pipelining helps to a point• Superscalar/VLIW helps to a point• SMT helps a bit• Chips getting faster• Only so much speedup possible– Power = heat– Power C V2 f
• C = Capacitance, how well it “stores” a charge• V = Voltage• f = frequency. I.e., how fast clock is (e.g., 3 GHz)
![Page 28: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/28.jpg)
Power Density Prediction circa 2000
40048008
8080 8085
8086
286 386486
Pentium® procP6
1
10
100
1000
10000
1970 1980 1990 2000 2010
Year
Pow
er D
ensi
ty (W
/cm
2)
Hot Plate
Nuclear Reactor
Rocket Nozzle
Source: S. Borkar (Intel)
Sun’s Surface
Core 2
Adapted from UC Berkeley "The Beauty and Joy of Computing"
![Page 29: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/29.jpg)
Moore's Law Related Curves
Adapted from UC Berkeley "The Beauty and Joy of Computing"
![Page 30: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/30.jpg)
Moore's Law Related Curves
Adapted from UC Berkeley "The Beauty and Joy of Computing"
![Page 31: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bfa01a28abf838c95454/html5/thumbnails/31.jpg)
Going Multi-core Helps Energy Efficiency• Power of typical integrated circuit C V2 f– C = Capacitance, how well it “stores” a charge– V = Voltage– f = frequency. I.e., how fast clock is (e.g., 3 GHz)
William Holt, HOT Chips 2005
Adapted from UC Berkeley "The Beauty and Joy of Computing"