cs184a: computer architecture (structure and organization)

43
Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 8: January 27, 2003 Empirical Cost Comparisons

Upload: louvain

Post on 15-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

CS184a: Computer Architecture (Structure and Organization). Day 8: January 27, 2003 Empirical Cost Comparisons. Previously. Instruction Design Space Define Area and Yield Model. Today. Empirical Data Processors FPGAs Custom Gate Array Std. Cell Full. Empirical Comparisons. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon1

CS184a:Computer Architecture

(Structure and Organization)

Day 8: January 27, 2003

Empirical Cost Comparisons

Page 2: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon2

Previously

• Instruction Design Space– Define– Area and Yield Model

Page 3: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon3

Today

• Empirical Data– Processors– FPGAs– Custom

• Gate Array• Std. Cell• Full

Page 4: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon4

Empirical Comparisons

Page 5: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon5

Empirical

• Ground modeling in some concretes

• Start sorting out– custom vs. configurable– spatial configurable vs. temporal

Page 6: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon6

MPGA vs. Custom?

• AMI CICC’83– MPGA 1.0– Std-Cell 0.7– Custom 0.5

• Toshiba DSP– Custom 0.3

• Mosaid RAM– Custom 0.2

• GE CICC’86– MPGA 1.0– Std-Cell 0.4--0.7

• FF/counter 0.7• FullAdder 0.4• RAM 0.2

MPGA = Metal Programmable Gate Array (traditional Gate Array)

Page 7: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon7

Metal Programmable Gate Arrays

Page 8: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon8

MPGAs

• Modern -- “Sea of Gates”

• yield 35--70%

• maybe 5k/gate ? – (quite a bit of variance)

Page 9: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon9

FPGA Table

Page 10: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon10

Modern FPGAs

• APEX 20K1500E 52K LEs 0.18m 24mm 22mm

1.25M2/LE

• XC2V1000 10.44mm x 9.90mm

[source: Chipworks]

0.15m 11,520 4-LUTs

1. 5M2/4-LUT

[Both also have RAM in cited area]

Page 11: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon11

Conventional FPGA Tile

K-LUT (typical k=4) w/ optional output Flip-Flop

Page 12: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon12

Toronto FPGA Model

Page 13: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon13

How many gates?

Page 14: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon14

“gates” in 2-LUT

Page 15: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon15

Now how many?

Page 16: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon16

Which gives: More usable gates?

More gates/unit area?

Page 17: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon17

Gates Required?

Depth=3, Depth=2048?

Page 18: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon18

Gate metric for FPGAs?• Day5: several components for computations

– compute element– interconnect:

• space• time

– instructions

• Not all applications need in same balance

• Assigning a single “capacity” number to device is an oversimplification

Page 19: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon19

MPGA vs. FPGA

• MPGA (SOG GA)– 5K2/gate– 35-70% usable

(50%)– 7-17K2/gate net

• Ratio: 2--10 (5)

• Xilinx XC4K– 1.25M2 /CLB– 17--48 gates (26?)– 26-73K2/gate net

Adding ~2x Custom/MPGA, Custom/FPGA ~10x

Page 20: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon20

MPGA vs. FPGA

• MPGA (SOG GA)gd~1ns

• Ratio: 1--7 (2.5)

• Xilinx XC4Kgates in 7ns

2-3 gates typical

Page 21: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon21

Processors vs. FPGAs

Page 22: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon22

Processors and FPGAs

Page 23: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon23

Component Example

XC4085XL-09 3,136 CLBs 4.6ns

682 Bit Ops/ns

Alpha 1996 264b ALUs 2.3ns

55.7 Bit Ops/ns

• Single die in 0.35m

[1 “bit op” = 2 gate evaluations]

Page 24: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon24

Processors and FPGAs

Page 25: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon25

Raw Density Summary

• Area– MPGA 2-3x Custom– FPGA 5x MPGA

• Area-Time– Gate Array 6-10x Custom– FPGA 15-20x Gate Array– Processor 10x FPGA

Page 26: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon26

Raw Density Caveats

• Processor/FPGA may solve more specialized problem

• Problems have different resource balance requirements– …can lead to low yield of raw density

Page 27: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon27

Task Comparisons

Page 28: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon28

Broadening Picture

• Compare larger computations

• For comparison– throughput density metric: results/area-time

• normalize out area-time point selection• high throughput density

most in fixed area

least area to satisfy fixed throughput target

Page 29: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon29

Multiply

Page 30: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon30

Example: FIR Filtering

Application metric: TAPs = filter taps multiply accumulate

Yi=w1xi+w2xi+1+...

Page 31: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon31

IIR/Biquad

Simplest IIR: Yi=AXi+B Yi-1

Page 32: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon32

DES Keysearch

<http://www.cs.berkeley.edu/~iang/isaac/hardware/>

Page 33: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon33

DNA Sequence Match

• Problem: “cost” of transform S1S2

• Given: cost of insertion, deletion, substitution

• Relevance: similarity of DNA sequences– evolutionary similarity– structure predict function

• Typically: new sequence compared to large databse

Page 34: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon34

DNA Sequence Match

Page 35: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon35

Floating-Point Add (single prec.)

Page 36: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon36

Floating-Point Mpy (single prec.)

Page 37: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon37

Degrade from Peak

Page 38: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon38

Degrade from Peak: FPGAs

• Long path length not run at cycle

• Limited throughput requirement– bottlenecks elsewhere limit throughput req.

• Insufficient interconnect

• Insufficient retiming resources (bandwidth)

Page 39: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon39

Degrade from Peak: Processors

• Ops w/ no gate evaluations (interconnect)

• Ops use limited word width

• Stalls waiting for retimed data

Page 40: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon40

Degrade from Peak: Custom/MPGA

• Solve more general problem than required– (more gates than really need)

• Long path length

• Limited throughput requirement

• Not needed or applicable to a problem

Page 41: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon41

Degrade Notes

• We’ll cover these issues in more detail as we get into them later in the course

Page 42: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon42

Big Ideas[MSB Ideas]

• Raw densities: custom:ga:fpga:processor– 1:5:100:1000– close gap with specialization

Page 43: CS184a: Computer Architecture (Structure and Organization)

Caltech CS184 Winter2003 -- DeHon43

Big Ideas[MSB-1 Ideas]

• Special-purpose functional units in processors/DSPs, much lower net benefit since need to control and interconnect