l2: fpga hardware process/cmu slides-f16/l02...spartan-2 and more recent have different length...

44
18-545: ADVANCED DIGITAL DESIGN PROJECT FALL 2016 BRANDON LUCIA L2: FPGA HARDWARE

Upload: others

Post on 25-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: ADVANCED DIGITAL DESIGN PROJECT

FALL 2016

BRANDON LUCIA

L2: FPGA HARDWARE

Page 2: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

Admin stuff

Project Proposals happen on Monday

Be prepared to give an in-class presentation

Lab 1 is due Wednesday, Sept. 14th

Reading Assignment #1 due today

Submit a PDF/text file, don't fill in the web form

Team assignments are done

2

Page 3: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

Admin Stuff

Status reports due today

No word docs, please!

Be specific about what happened/is going to happen

Talk about what YOU did/will do, not just what your group did

Grades on the way, as general feedback

3

Page 4: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

Game Plan

Overview

Why use FPGAs?

FPGA Internals

7

Caveat: I will use Xilinx specific terminology since that’s the FPGA company

you will be using. Beware that other companies use different terms

Page 5: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

FPGA Overview

Field Programmable Gate Array

Array of generic logic gates

Gates where logic function can

be programmed

Programmable interconnection

between gates

Fielded systems can be

programmed

i.e. post-fabrication

Page 6: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

Xilinx Virtex-5 FPGA

9

Page 7: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

Design Platform

Virtex-5 Development System

Xilinx XC5VLX110T FPGA

17280 slices of CLB goodness

256MB DDR2 (SODIMM)

DVI Video port

VGA port is for input

10/100/1000 Ethernet port

Audio Codec (AC97)

USB2 port

16x2 LCD, RS-232

Compact Flash card slot

Expansion connectors

10

Page 8: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

Game Plan

Overview

Why use FPGAs?

FPGA Internals

11

Page 9: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Why use FPGAs?

System designers have a

Goldilocks problem

Off-the-shelf parts are not

efficient enough

Custom ASICs cost too much

Need a “just right” solution

Page 10: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

ASIC Design

Difficult to design

Large and complex

Issues in advanced processes

Interconnect delay

Device leakage

Power density constraints

Expensive to design / fabricate

Mask set costs

Non-recurring engineering costs

Need a high-volume, high-profit market to justify costs!

Page 11: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Efficiency ViewAn efficiency gap exists between ASICs and CPUs

N. Zhang, et. al, “The Cost of Flexibility in Systems on a Chip Design for Signal Processing Applications”

0.01

0.1

1

10

100

1000

10000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Energy Efficiency (MOPS/mW) Area Efficiency (MOPS/mm2)

Microprocessors

ASICsDSPs

Page 12: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Economic ViewFPGAs: High package costs ($300+), low NRE costs

ASICs: Low package costs (pennies), high NRE costs ($600K+)

Develo

pm

en

t C

ost

+ D

evic

e C

ost

•Increasing NRE charge

•58% are late to market --

impacts total volumes shipped

•ASIC cycle longer than some

market windows

•Over 50% need to be respun

Total Units

Additional ASIC costs:

Decreasing FPGA unit

cost pushing crossover

point to the right

ASIC

Trend

FPGA

Trend

(Courtesy Xilinx, Inc.)

FPGA solution has

a lower total cost

ASIC solution has

a lower total cost

Page 13: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

FPGA Advantages

Higher performance than CPU solution

Lower power than CPU solution (usually)

Low NRE costs

Off-the-shelf part designed by FPGA vendor

You are sharing NRE costs with all other customers

Fast design time

Low time-to-market

Fast re-design / re-fabrication time

Easy to correct an error, to add functionality, in response to spec change

Can even change product after deployment

16

Page 14: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

High per-part costs

Good for low to middle volume applications

High volume applications should consider ASICs

Perhaps use FPGA for prototyping

Lower performance than ASIC

Higher power than ASIC

More specialized design skills than programming a CPU

17

FPGA Disadvantages

Page 15: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Example uses of FPGAs

Rapid Prototyping

Emulation of ASIC design

Design exploration

Shipping product

Networking

Military

Microsoft Bing Datacenters

Reconfigurable Computing

Research!

(http://parallel.princeton.edu/openpiton/)

Page 16: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

Game Plan

Overview

Why use FPGAs?

FPGA Internals

19

Page 17: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

FPGA Breakdown

3 Basic components

Configurable Logic Blocks

General purpose interconnect

I/O Blocks

Advanced components

Hard macros

CPUs

Block RAM

Multipliers

Specialized components

DSP blocks

VIRTEX-II PRO

Page 18: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

CLB

(64 TOTAL)

I/O BLOCK

(64 TOTAL)

GENERAL

PURPOSE

INTERCONNECT

IOBS HAVE DIRECT

ACCESS TO

ADJACENT CLBS

SWITCH

MATRIX

(COURTESY XILINX, INC.)

XILINX XC3020

Page 19: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

ZOOMED IN VIEW OF

THE CLB MATRIX OF

THE FPGA

SPECIFIC INGRESS

AND EGRESS

CONNECTION

OPTIONS (BLACK

DOTS) ARE

AVAILABLE

EVEN MORE ZOOMED IN VIEW

(COURTESY XILINX, INC.)

ROUTING

Page 20: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

EACH MATRIX

HAS 5

CONNECTIONS

PER SIDE

(COURTESY XILINX, INC.)

ROUTING: THE SWITCH MATRIX

Page 21: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

ONLY CERTAIN

CONNECTION

PATTERNS ARE

POSSIBLE

(COURTESY XILINX, INC.)

ROUTING: THE SWITCH MATRIXEACH MATRIX

HAS 5

CONNECTIONS

PER SIDE

Page 22: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

Hierarchical Routing

25

Spartan-2 and more recent have different length connections

between switch matrices

Local roads,

limited access

roads, interstate

highways

Routes across

entire chip don’t

burn lots of short

connections

Page 23: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Detailed Routing (Spartan 2)

Page 24: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Configurable Logic Blocks

CLBs get more and more stuff

crammed in them over time

XC3K family had LUT (5

variable input, 2 FF values,

2 outputs), 2 FFs, clock

enable, FF reset (direct /

global) and 9 muxes

~51 bits of configuration

SRAM per CLB

(COURTESY XILINX, INC.)

Page 25: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

What’s a Look-up-table (LUT)?

A direct implementation of a truth table, using memory

LUT inputs are memory address values

LUT outputs are the memory data value

28

A B C D F

0 0 0 0 1

0 0 0 1 1

0 0 1 0 1

0 0 1 1 1

0 1 0 0 1

0 1 0 1 1

0 1 1 0 1

0 1 1 1 1

1 0 0 0 1

1 0 0 1 1

1 0 1 0 1

1 0 1 1 1

1 1 0 0 0

1 1 0 1 0

1 1 1 0 0

1 1 1 1 0

A B C D F

0 0 0 0 0

0 0 0 1 1

0 0 1 0 0

0 0 1 1 0

0 1 0 0 0

0 1 0 1 1

0 1 1 0 0

0 1 1 1 1

1 0 0 0 0

1 0 0 1 1

1 0 1 0 0

1 0 1 1 0

1 1 0 0 1

1 1 0 1 1

1 1 1 0 0

1 1 1 1 0

Page 26: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

Another View of LUTs

29

Can view LUT as 16:1 mux

Inputs are mux select

Config sets mux data inputs

Logically same as 16x1 memory

Can compact logic if you can route

inputs to mux data inputs

Page 27: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Look Up Table Additional Functionality

Can be configured as:

Shift register (16 regs)

Small memory (16 bits)

“Distributed RAM”

Some other FPGAs use

muxes instead of memories to

implement the core

combinational logic

Page 28: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

Spartan-2 CLBSpartan-2 has 2 LUTs (4 input each) feeding a 3rd LUT, 2 FFs

(with Preset/Reset, Enable, posedge or negedge clocks) and 16

muxes

12 inputs (plus clock), 4 outputs

(COURTESY XILINX, INC.)34

Page 29: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Spartan-3

CLBs are composed of 4 slices

Organized as 2 pairs, one of

which is optimized for

memory access

Each slice has 2 FFs and 2

LUTs

(COURTESY XILINX, INC.)

Page 30: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

FPGA Families extend Architecture

❏Devices are built, with more capability, but around

the same basic architecture

❏Some additional capabilities

◆Low voltage versions

◆Faster clock rates

◆Different packaging options

(Courtesy Xilinx, Inc.)

Page 31: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

FIFO

memory

chips

The need for more stuff

❏CompEs cannot design on logic, routing, I/O alone

❏Extreme case from early 90s

◆16 port ATM switch, designed on a single board

◆Design is limited by I/O to memory chips--bring them on-chip

FPGAs

(XC3Ks)

37

Page 32: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Other “Stuff”

❏Clock managers

◆Global clock buffering, distribution

◆Digital Clock Manager (DCM): eliminate skew, phase shifts,

multiply or divide clock

❏Memory

◆Block RAM

◆Distributed RAM (repurposed LUTs)

❏Shift Registers

❏Dedicated Multiplexers

❏Carry Look-Ahead Generators

❏I/O Blocks

◆SelectIO supports 18 standards (single, differential, various

voltage levels, ....)

❏Embedded Multipliers 38

Page 33: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Hard Macros

Hard macros

Block RAMs

Multipliers

CPUs

DSPs

Soft macros

HDL IP Blocks

Page 34: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Block RAMs

Distributed RAM

Use LUTs as memories

Low density

Poor performance

Block RAM

Large-ish dedicated memory blocks

Xilinx BRAMs = 18Kb

Some configurability

Dual-port

Data width / depth

FIFO, CAM, etc.

Page 35: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Multipliers

18x18 signed 2’s-complement multiplier

Two 18b inputs

One 36b output

18b enough for many DSP applications

Can gang multiple units together for wider data

Faster and lower power than multiplier from CLBs

Page 36: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

CPUs – PowerPC 405

XC2VP30 has 2 Embedded PowerPC 405 cores

Embedded L1 I and D caches

No FPU

Page 37: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

CPU Connectivity: PLB and OPB

IBM Core Connect

Processor Local Bus (PLB) - fast on-chip communication

On-Chip Peripheral Bus (OPB) - optimized for periphs. (UART, etc)

Device Control Register bus (DCR) - used to send and set config.

Page 38: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

CPU Connectivity: PLB and OPB (cont.)

Page 39: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

CPU Connectivity: OCM

On-Chip Memory controller

CPU block RAM

2 OCMs – I and D

Direct, fast interface

Can use dual-port BRAMs for

producer-consumer link to

FPGA fabric

Page 40: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

CPU Links

A lot more details on the embedded CPU

http://www.xilinx.com/bvdocs/userguides/ppc_ref_guide.pdf

http://direct.xilinx.com/bvdocs/userguides/ug018.pdf

http://www-

3.ibm.com/chips/techlib/techlib.nsf/productfamilies/CoreConnect_

Bus_Architecture

46

Page 41: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Zynq 7000

Advanced Microcontroller

Bus Interface + Advanced

eXtensible Interconnect

To memory, FPGA fabric,

I/O & Peripherals

AMBA = ARM’s attempt

at The One True Interface

Page 42: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

Configuration Storage

Lots of configuration bits

LUTs, routing, I/O configuration

Xilinx XC2VP30 has >11Mb

Configuration storage

technologies

Volatile

SRAM cells

Non-volatile

FLASH, EEPROM

Anti-fuse

Actel anti-fuse

WL

bit bit_b

6T SRAM cell

Page 43: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

Configuration

How to load (scan) configuration bits (bitstream)

Connect all configuration registers into single long shift register

Serially clock in configuration bits

Most designs use standard scan interface (JTAG) developed for test

Bitstream source

Non-volatile memory

On-board FLASH, EEPROM, serial memory

External media (CF card)

Attached workstation

Can encrypt bitstream to conceal configuration

49

Page 44: L2: FPGA Hardware Process/CMU Slides-F16/L02...Spartan-2 and more recent have different length connections between switch matrices Local roads, limited access roads, interstate highways

18-545: FALL 2016

Major FPGA Vendors

SRAM-based FPGAs

Xilinx

Altera

Atmel

Lattice Semiconductor

Flash & antifuse FPGAs

Actel Corp.

Quick Logic Corp.

Lattice Semiconductor

Xilinx (system-in-a-package solution)

Share over 60% of the market

50