18 643 lecture 2: basic fpga fabricusers.ece.cmu.edu/~jhoe/course/ece643/f17handouts/l02.pdf ·...

31
18643F17L02S1, James C. Hoe, CMU/ECE/CALCM, ©2017 18643 Lecture 2: Basic FPGA Fabric James C. Hoe Department of ECE Carnegie Mellon University

Upload: vokien

Post on 28-Aug-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S1, James C. Hoe, CMU/ECE/CALCM, ©2017

18‐643 Lecture 2:Basic FPGA Fabric

James C. HoeDepartment of ECE

Carnegie Mellon University

Page 2: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S2, James C. Hoe, CMU/ECE/CALCM, ©2017

Housekeeping• Your goal today: know enough to build a basic FPGA (even if not a very good one)

• Notices– Complete survey on Blackboard, due noon, 9/5– Handout #2: lab 0, due noon, 9/8– Make friends, make teams, due noon, 9/8

• Readings– Ch 1, Reconfigurable Computing– skim Ch 13&14 if interested– skim databooks referenced for more details

Page 3: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S3, James C. Hoe, CMU/ECE/CALCM, ©2017

Before FPGA there was GA

VCC

GND

A

(AB)’

B X Y

(X+Y)’

Page 4: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S4, James C. Hoe, CMU/ECE/CALCM, ©2017

Idea behind Gate Arrays• Mass produce identical gate array wafers• Finish into any design by custom metal layers (2)

– so called Mask‐Programmable GA (MPGA)– reduced design effort (more automation, no layout)– reduced mask and fab cost– faster fab turn‐around

• Proliferation of ASIC design starts– don’t need volume for economy of scale– small design team could keep up with Moore’s law

Of course, not as efficient as full‐custom or standard‐cell designs

Page 5: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S5, James C. Hoe, CMU/ECE/CALCM, ©2017

How about no mask, no fab?i.e., “field programmable”

• Again, mass produce identical devices but this time fully‐finalized

• Then what can be changed?– SRAM EPROM (anti)fuse

– pass gate mux  diode

{1,0}{1,0} {1,0}

{1,0}

A B

{1,0}

CAB A B

bits

conn

ectio

ns

programmable vs reprogrammable

Page 6: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S6, James C. Hoe, CMU/ECE/CALCM, ©2017

Reconfigurable Logic• Arbitrary logic (combinational and sequential) can be formed by wiring up enough NANDs or muxes

• Lookup table as universal logic primitive– arbitrary n‐input functionfrom 2n‐entry table

– this is 8‐by‐1 bit “memory”

f(…,1,…)

f(…,0,…)f(…,X,…)

X

01Shannon expansion

f(A,B,C)

f(0,0,0)f(0,0,1)∙···

f(1,1,0)f(1,1,1)

A B C

Page 7: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S7, James C. Hoe, CMU/ECE/CALCM, ©2017

Size of Lookup Tables (aka LUTs)• n‐input function from 2n‐entry LUT• Count only the 6T SRAM cells, an n‐LUT has 6∙2n T• Some points of reference

– 2‐input NAND = 4T– 3‐input NAND = 6T– 3‐input full‐adder (a, b, cin)

• s = a  b  cin = 8T• cout = bcin+acin+ab =18T

– 10‐input 5‐bit adder = 130T– M.S. flip‐flop=16T (compare to 2 LUTs per latch)

n‐LUT T‐count2 243 484 965 1926 3847 7688 15369 307210 6144

Page 8: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S8, James C. Hoe, CMU/ECE/CALCM, ©2017

Choosing LUT Granularity• Small LUTs

+ fast propagation delay– a given fxn consumes many LUTs (comes with wiring cost and delay)

– high “interpretation overhead” if too small• Big LUTs

– slower propagation delay+  a given fxn consumes fewer LUTs– high “interpretation overhead” if too large (and fxn has exploitable structure)

– wastage if not all input are used in a LUTWhere is the sweetspot?

Page 9: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S9, James C. Hoe, CMU/ECE/CALCM, ©2017

1980’s Xilinx LUT‐based Configurable Logic Block (in a sketch)

• 2 fxns (f & g) of 3 inputs OR 1 fxn (h) of 4 inputs• hardwired FFs (too expensive to fake with LUTs)• Just 10s of these in the earliest FPGAs

3‐LUT

3‐LUT

ABC

g(A,B,C)

h(A,B,C,D)

f(A,B,C)

D

FF

X

Y

{2,1,0}

{2,1,0}

{1,0} (also latch mode)

Page 10: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S10, James C. Hoe, CMU/ECE/CALCM, ©2017

Contemporary Xilinx CLB Architecture

[Figure 2‐3: 7 Series FPGAs CLB User Guide]

• each 6LUT is two 5LUTs

• LUTs can also be used as small SRAMs or shift register

• special pathsfor addition and multiplexer

2 slices per CLB

Largest devices(many $K each)

have several100K slices

Page 11: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S11, James C. Hoe, CMU/ECE/CALCM, ©2017

Even Coarser Logic Blocks?• So called Coarse‐Grain Reconfigurable Arrays (CGRAs) based on complete adders or ALUs– native arithmetic units have low interpretation overhead if you are doing arithmetic

– poor fit if you are working with narrow data or bit‐level manipulations

• Even coarser is to use many tiny processors– still a spatial computing paradigm– not programmed with RTLs– converging with software multicores

More on this later in term

Page 12: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S12, James C. Hoe, CMU/ECE/CALCM, ©2017

Mapping Logic To LUTs

[Figure 13.1: “Reconfigurable Computing: The Theory and Practice of FPGA‐Based Computation”]

• Start from primary output and input to registers, cover logic graph with cuts of less than K input edges

• K‐cuts corresponding to K‐LUT realizable functions

Page 13: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S13, James C. Hoe, CMU/ECE/CALCM, ©2017

Placement

[Vivado Implementation Screenshot]

Page 14: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S14, James C. Hoe, CMU/ECE/CALCM, ©2017

… and Route

[Vivado Implementation Screenshot]

Page 15: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S15, James C. Hoe, CMU/ECE/CALCM, ©2017

PLA‐style Configurable Routing 

I0 I1 In‐1 O0 O1 Om‐1

AND OR

? ? ? ?

?

?

Page 16: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S16, James C. Hoe, CMU/ECE/CALCM, ©2017

Island Style Routing Architecture• CLB islands in sea of interconnects• Flexible routing to support ASIC style netlists• Note regularity in structure

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C C C C C

Page 17: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S17, James C. Hoe, CMU/ECE/CALCM, ©2017

Configurable Routing (1980s Xilinx simplified)

CLB

Switch Block

Connection Block

B

Page 18: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S18, James C. Hoe, CMU/ECE/CALCM, ©2017

Reconfigurable Routing is Expensive!• Routing resource area is on par with logic • Each configurable connection is

– area of configuration bit– area of configurable connection

And don’t forget propagation delay• Too much: cost for everyone who doesn’t need it• Too little: congestion leaves unreachable CLBs unused – worse for larger arrays/designs (why?)– buy a $10K FPGA and only get to use 70%?

Page 19: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S19, James C. Hoe, CMU/ECE/CALCM, ©2017

Rent’s Rule• Tgp

– T = number of inputs and outputs– g = number of internal components– p typically between 0.5 (regular) and 0.8 (random)

• In a square, perimeter=4area0.5

– unless regular, I/O signals grow faster than available routes exiting a design area

• Need hierarchy of progressively longer additional routing resources

long routes also reduce delay when going far

Page 20: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S20, James C. Hoe, CMU/ECE/CALCM, ©2017

Virtex‐II Routing Architecture• 8 internal “fast connects” from LUT to LUT within a CLB

• Switch Matrix connects to– 1 CLB– 16 “Direct Connections” to 8 nearest switches– “Double Lines” to 1st or 2nd switch (40 track)– “Hex lines” to 3rd or 6th switch (120 per track)– “Long lines” spanning device (24 per track)

• Later architectures extended in reach and in diagonals

Separate, dedicated clock trees

Page 21: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S21, James C. Hoe, CMU/ECE/CALCM, ©2017

Virtex‐II Routing Architecture

[Figure 48: Virtex‐II Platform FPGAs: Complete Data Sheet]

Page 22: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S22, James C. Hoe, CMU/ECE/CALCM, ©2017

Virtex‐II Routing Architecture

[Figure 49: Virtex‐II Platform FPGAs: Complete Data Sheet]

Page 23: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S23, James C. Hoe, CMU/ECE/CALCM, ©2017

Latest in Routing HierarchyVirtex7 Stacked Silicon Interconnect (SSI), 2011

• Longest routes go across dies carried on interposer• No change to design tool and abstraction

[Figure 1, Stacked & Loaded: Xilinx SSI, 28‐Gbps I/O Yield Amazing FPGAs, Xcell, Q1 2011]

Page 24: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S24, James C. Hoe, CMU/ECE/CALCM, ©2017

Altera Stratix‐X HyperFlex

[Figure 2: Understanding How the New HyperFlex Architecture Enables Next‐Generation High‐Performance Systems]

• Long routes need buffered repeaters; very long routes need pipelining

• Add (bypassable) pipeline registers throughout• RTL designs have to be

pipelined explicitly tobenefit; high‐level synthesized designs leverage directly

• a high‐freq strategy e.g., 0.5xlogic at 2xfreq for perf. parity

Page 25: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S25, James C. Hoe, CMU/ECE/CALCM, ©2017

Don’t Forget Configurable I/O

FF

{1,0}

{1,0}{1,0}In/Out

Dout

Din

{1,0}

{1,0}

PAD

I/O Block

{fast,slow}

‐ real devices more complicated‐modern devices support special signaling and protocols

Page 26: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S26, James C. Hoe, CMU/ECE/CALCM, ©2017

Putting it all together:An Universal ASIC

I

I/O pins

programmable lookup tables (LUT) and flip‐flops (FF)

aka “soft logic” or “fabric”

Intercon

nect

LUT FF

programmable routing

Page 27: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S27, James C. Hoe, CMU/ECE/CALCM, ©2017

Bitstream defines the chip• After power up, SRAM FPGA loads bitstream from somewhere before becoming the “chip”

a bonus “feature” for sensitive devices that need to forget what it does

• Many built‐in loading options• Non‐trivial amount of time; must control reset timing and sequence with the rest of the system

• Reverse‐engineering concerns ameliorated by– encryption– proprietary knowledge

Page 28: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S28, James C. Hoe, CMU/ECE/CALCM, ©2017

Setting Configuration Bits• Behind‐the‐scene infrastructure

– doesn’t need to be fast (usually offline)– simpler/cheaper the better (at least used to be)

• Could organize bits into addressable SRAM or EPROM array– very basic technology– serial external interfaceto save on I/O pins ro

w

column

Page 29: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S29, James C. Hoe, CMU/ECE/CALCM, ©2017

Serial Scan• SRAM‐based config. bits can be setup as one or many scan‐chains on very slow config. clock– no addressing overhead– all minimum sized devices

• At power‐up config manager handshake externally (various options, serial, parallel ROM, PCI‐E, . . .)  

φ1

φ1’

φ1’

φ1

φ1

φ1’

φ1’

φ1

Full‐fledged config. “architecture” in modern devices to support scale and features

Page 30: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S30, James C. Hoe, CMU/ECE/CALCM, ©2017

Modern Configuration Architecturee.g., Stratix® 10 Secure Device Manager

[Figure 2: “Intel® Stratix® 10 Secure Device Manager Provides Best‐in‐Class FPGA and SoC Security”]

triple‐redundantsecure processor

each sectormanaged byits ownprocessor

Page 31: 18 643 Lecture 2: Basic FPGA Fabricusers.ece.cmu.edu/~jhoe/course/ece643/F17handouts/L02.pdf · –Ch 1, Reconfigurable Computing ... The Theory and Practice of FPGA‐Based Computation”]

18‐643‐F17‐L02‐S31, James C. Hoe, CMU/ECE/CALCM, ©2017

Parting Thoughts• Today, you can use an FPGA without knowing any of this stuff

• Basing this lecture on Xilinx vs Altera wouldn’t change the big picture

• You can find a lot of specific details on‐line(databooks and research papers)

• So far still just the basic fabric . . . .  . . . more next time

• Also didn’t say anything about design tools