cs184a: computer architecture (structure and organization)
DESCRIPTION
CS184a: Computer Architecture (Structure and Organization). Day 8: January 27, 2003 Empirical Cost Comparisons. Previously. Instruction Design Space Define Area and Yield Model. Today. Empirical Data Processors FPGAs Custom Gate Array Std. Cell Full. Empirical Comparisons. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/1.jpg)
Caltech CS184 Winter2003 -- DeHon1
CS184a:Computer Architecture
(Structure and Organization)
Day 8: January 27, 2003
Empirical Cost Comparisons
![Page 2: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/2.jpg)
Caltech CS184 Winter2003 -- DeHon2
Previously
• Instruction Design Space– Define– Area and Yield Model
![Page 3: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/3.jpg)
Caltech CS184 Winter2003 -- DeHon3
Today
• Empirical Data– Processors– FPGAs– Custom
• Gate Array• Std. Cell• Full
![Page 4: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/4.jpg)
Caltech CS184 Winter2003 -- DeHon4
Empirical Comparisons
![Page 5: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/5.jpg)
Caltech CS184 Winter2003 -- DeHon5
Empirical
• Ground modeling in some concretes
• Start sorting out– custom vs. configurable– spatial configurable vs. temporal
![Page 6: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/6.jpg)
Caltech CS184 Winter2003 -- DeHon6
MPGA vs. Custom?
• AMI CICC’83– MPGA 1.0– Std-Cell 0.7– Custom 0.5
• Toshiba DSP– Custom 0.3
• Mosaid RAM– Custom 0.2
• GE CICC’86– MPGA 1.0– Std-Cell 0.4--0.7
• FF/counter 0.7• FullAdder 0.4• RAM 0.2
MPGA = Metal Programmable Gate Array (traditional Gate Array)
![Page 7: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/7.jpg)
Caltech CS184 Winter2003 -- DeHon7
Metal Programmable Gate Arrays
![Page 8: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/8.jpg)
Caltech CS184 Winter2003 -- DeHon8
MPGAs
• Modern -- “Sea of Gates”
• yield 35--70%
• maybe 5k/gate ? – (quite a bit of variance)
![Page 9: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/9.jpg)
Caltech CS184 Winter2003 -- DeHon9
FPGA Table
![Page 10: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/10.jpg)
Caltech CS184 Winter2003 -- DeHon10
Modern FPGAs
• APEX 20K1500E 52K LEs 0.18m 24mm 22mm
1.25M2/LE
• XC2V1000 10.44mm x 9.90mm
[source: Chipworks]
0.15m 11,520 4-LUTs
1. 5M2/4-LUT
[Both also have RAM in cited area]
![Page 11: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/11.jpg)
Caltech CS184 Winter2003 -- DeHon11
Conventional FPGA Tile
K-LUT (typical k=4) w/ optional output Flip-Flop
![Page 12: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/12.jpg)
Caltech CS184 Winter2003 -- DeHon12
Toronto FPGA Model
![Page 13: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/13.jpg)
Caltech CS184 Winter2003 -- DeHon13
How many gates?
![Page 14: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/14.jpg)
Caltech CS184 Winter2003 -- DeHon14
“gates” in 2-LUT
![Page 15: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/15.jpg)
Caltech CS184 Winter2003 -- DeHon15
Now how many?
![Page 16: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/16.jpg)
Caltech CS184 Winter2003 -- DeHon16
Which gives: More usable gates?
More gates/unit area?
![Page 17: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/17.jpg)
Caltech CS184 Winter2003 -- DeHon17
Gates Required?
Depth=3, Depth=2048?
![Page 18: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/18.jpg)
Caltech CS184 Winter2003 -- DeHon18
Gate metric for FPGAs?• Day5: several components for computations
– compute element– interconnect:
• space• time
– instructions
• Not all applications need in same balance
• Assigning a single “capacity” number to device is an oversimplification
![Page 19: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/19.jpg)
Caltech CS184 Winter2003 -- DeHon19
MPGA vs. FPGA
• MPGA (SOG GA)– 5K2/gate– 35-70% usable
(50%)– 7-17K2/gate net
• Ratio: 2--10 (5)
• Xilinx XC4K– 1.25M2 /CLB– 17--48 gates (26?)– 26-73K2/gate net
Adding ~2x Custom/MPGA, Custom/FPGA ~10x
![Page 20: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/20.jpg)
Caltech CS184 Winter2003 -- DeHon20
MPGA vs. FPGA
• MPGA (SOG GA)gd~1ns
• Ratio: 1--7 (2.5)
• Xilinx XC4Kgates in 7ns
2-3 gates typical
![Page 21: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/21.jpg)
Caltech CS184 Winter2003 -- DeHon21
Processors vs. FPGAs
![Page 22: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/22.jpg)
Caltech CS184 Winter2003 -- DeHon22
Processors and FPGAs
![Page 23: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/23.jpg)
Caltech CS184 Winter2003 -- DeHon23
Component Example
XC4085XL-09 3,136 CLBs 4.6ns
682 Bit Ops/ns
Alpha 1996 264b ALUs 2.3ns
55.7 Bit Ops/ns
• Single die in 0.35m
[1 “bit op” = 2 gate evaluations]
![Page 24: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/24.jpg)
Caltech CS184 Winter2003 -- DeHon24
Processors and FPGAs
![Page 25: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/25.jpg)
Caltech CS184 Winter2003 -- DeHon25
Raw Density Summary
• Area– MPGA 2-3x Custom– FPGA 5x MPGA
• Area-Time– Gate Array 6-10x Custom– FPGA 15-20x Gate Array– Processor 10x FPGA
![Page 26: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/26.jpg)
Caltech CS184 Winter2003 -- DeHon26
Raw Density Caveats
• Processor/FPGA may solve more specialized problem
• Problems have different resource balance requirements– …can lead to low yield of raw density
![Page 27: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/27.jpg)
Caltech CS184 Winter2003 -- DeHon27
Task Comparisons
![Page 28: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/28.jpg)
Caltech CS184 Winter2003 -- DeHon28
Broadening Picture
• Compare larger computations
• For comparison– throughput density metric: results/area-time
• normalize out area-time point selection• high throughput density
most in fixed area
least area to satisfy fixed throughput target
![Page 29: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/29.jpg)
Caltech CS184 Winter2003 -- DeHon29
Multiply
![Page 30: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/30.jpg)
Caltech CS184 Winter2003 -- DeHon30
Example: FIR Filtering
Application metric: TAPs = filter taps multiply accumulate
Yi=w1xi+w2xi+1+...
![Page 31: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/31.jpg)
Caltech CS184 Winter2003 -- DeHon31
IIR/Biquad
Simplest IIR: Yi=AXi+B Yi-1
![Page 32: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/32.jpg)
Caltech CS184 Winter2003 -- DeHon32
DES Keysearch
<http://www.cs.berkeley.edu/~iang/isaac/hardware/>
![Page 33: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/33.jpg)
Caltech CS184 Winter2003 -- DeHon33
DNA Sequence Match
• Problem: “cost” of transform S1S2
• Given: cost of insertion, deletion, substitution
• Relevance: similarity of DNA sequences– evolutionary similarity– structure predict function
• Typically: new sequence compared to large databse
![Page 34: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/34.jpg)
Caltech CS184 Winter2003 -- DeHon34
DNA Sequence Match
![Page 35: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/35.jpg)
Caltech CS184 Winter2003 -- DeHon35
Floating-Point Add (single prec.)
![Page 36: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/36.jpg)
Caltech CS184 Winter2003 -- DeHon36
Floating-Point Mpy (single prec.)
![Page 37: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/37.jpg)
Caltech CS184 Winter2003 -- DeHon37
Degrade from Peak
![Page 38: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/38.jpg)
Caltech CS184 Winter2003 -- DeHon38
Degrade from Peak: FPGAs
• Long path length not run at cycle
• Limited throughput requirement– bottlenecks elsewhere limit throughput req.
• Insufficient interconnect
• Insufficient retiming resources (bandwidth)
![Page 39: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/39.jpg)
Caltech CS184 Winter2003 -- DeHon39
Degrade from Peak: Processors
• Ops w/ no gate evaluations (interconnect)
• Ops use limited word width
• Stalls waiting for retimed data
![Page 40: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/40.jpg)
Caltech CS184 Winter2003 -- DeHon40
Degrade from Peak: Custom/MPGA
• Solve more general problem than required– (more gates than really need)
• Long path length
• Limited throughput requirement
• Not needed or applicable to a problem
![Page 41: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/41.jpg)
Caltech CS184 Winter2003 -- DeHon41
Degrade Notes
• We’ll cover these issues in more detail as we get into them later in the course
![Page 42: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/42.jpg)
Caltech CS184 Winter2003 -- DeHon42
Big Ideas[MSB Ideas]
• Raw densities: custom:ga:fpga:processor– 1:5:100:1000– close gap with specialization
![Page 43: CS184a: Computer Architecture (Structure and Organization)](https://reader036.vdocument.in/reader036/viewer/2022081603/568149ae550346895db6ebac/html5/thumbnails/43.jpg)
Caltech CS184 Winter2003 -- DeHon43
Big Ideas[MSB-1 Ideas]
• Special-purpose functional units in processors/DSPs, much lower net benefit since need to control and interconnect