pacap de programmer un fgpa - people.irisa.frpeople.irisa.fr/simon.rokicki/files/pacap-fpga.pdf ·...

Post on 17-Apr-2018

224 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PACAP de programmer un FGPA ?

Steven Derrien, Simon Rokicki21 novembre 2016

INSA-EII-5A 1

Schedule

9:15 - 9h50 : FPGA technology basics

9h50 – 10h15 : Designing FPGAs with HDL

9h15 – 10h45 : Designing FPGAs with HLS

break

10h45 – 12h00 : Lab session 1

break

13h30 – 14h30 : Optimizations for HLS based designs

14h30 – 16h00 : Lab session 2

break

14h30 – 16h00 : Lab session 3

PACAP - FPGA 2

Principles of FPGA technology

Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators

PACAP - FPGA 3

A basic FPGA architecture

L = logic blockC = Connection BlockS = Switch Block

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

Horizontal routingchannel

Vertical routingchannel

Wiringsegment

A matrix of logic blocs + programmable interconnectA Logic Block is programmed to emulate small logic functionsLogic Blocks are wired together to implement the full circuit

PACAP - FPGA 4

Example of logic block structure

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

FPGA

LUT6

Flip-flop

Example based on the Xilinx Virtex 7 architecture

SliceSlice

CLB SLICE

LUT

Logic block (CLB)

• Four 6-input LUTs • Two flip-flops/LUT

PACAP - FPGA 5

LUT (Look-Up Table) Functionality

x1 x2 x3 x4

y

x1 x2

y

LUT

x1x2x3x4

y

0x1

0x2 x3 x4

0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

y0100010101001100

0x1

0x2 x3 x4

0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

y1111111111110000

x1 x2 x3 x4

y

x1 x2 x3 x4

y

x1 x2

y

x1 x2

y

LUT

x1x2x3x4

y

0x1

0x2 x3 x4

0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

y0100010101001100

0x1

0x2 x3 x4

0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

y0100010101001100

0x1

0x2 x3 x4

0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

y1111111111110000

0x1

0x2 x3 x4

0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

y1111111111110000

• Look-Up tables used for logic implementation

• A LUT4 can implement any function of 4 inputs

PACAP - FPGA 6

Logic block for real (virtex 7)

Specific featuresfor building wide

multiplexers

Fast carry propagation for

adders, etc.

LUT6 can beused as 64x1

RAM

LUT6 can bedecomposedas 2xLUT5

PACAP - FPGA 7

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

Programmable routing

Based on Switch box and connection blocksConfigurable (depopulated) crossbars

In modern devices, interconnect is more sophisticatedWire spanning several logic blocks, special routing for clock, etc.

PACAP - FPGA 8

External interface

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

C

S

S

C

C

C

S

S

C

C

C S SC C

C S SC C

PACAP - FPGA 9

External interface

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

C

S

S

C

C

C

S

S

C

C

C S SC C

C S SC C

I/O pins and pin mapping is also configurable …

Pins can beconfigured as input/output, bidirectional

FPGA configurationis propagated

serially throughshift registers

Some FPGA pins are dedicated to the configuration

process

Principles of FPGA technology

Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators

PACAP - FPGA 11

architecture MLU_DATAFLOW of MLU is

signal A1,B1,Y1:STD_LOGIC;signal MUX_0, MUX_1: STD_LOGIC;signal MUX_2, MUX_3: STD_LOGIC;

Begin

A1<=A when (NEG_A='0') else not A;B1<=B when (NEG_B='0') else not B;Y<=Y1 when (NEG_Y='0') else not Y1;

MUX_0<=A1 and B1;MUX_1<=A1 or B1;MUX_2<=A1 xor B1;MUX_3<=A1 xnor B1;

with (L1 & L0) select Y1<=MUX_0 when "00",MUX_1 when "01",MUX_2 when "10",MUX_3 when others ;

end MLU_DATAFLOW;

VHDL description Circuit Netlist

Logic Synthesis

PACAP - FPGA 12

Technological mapping

LUT2

LUT3

LUT4

LUT5

LUT1FF1

FF2

LUT0

PACAP - FPGA 13

Technological mapping

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

C

S

S

C

C

C

S

S

C

C

C S SC C

C S SC C

LUT2

LUT3

LUT4

LUT5

LUT1FF1

FF2

LUT0

PACAP - FPGA 14

Palcement and routing

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

C

S

S

C

C

C

S

S

C

C

C S SC C

C S SC C

Derive an actual FPGA configuration meeting constraintsConstraints in the form of achievable clock speed

During the lab you will realizethat P&R can be time consuming.

For very large designs, P&R can take days …

PACAP - FPGA 15

0100101001011001010

Bitstream & device configuration

Configuration data is used by the FPGA at power-up

L C

C

L L

L L L

L L L

S S

S S

C

C

C

C C C

C C

C C

C

S

S

C

C

C

S

S

C

C

C S SC C

C S SC C

From Place & Route results, we derived the configuration Bitstream

The bitstream is then download inside the FPGA from FLASH or by a CPU.

PACAP - FPGA 16

HDLHDL

Logic SynthesisLogic Synthesis

Floorplanning

PlacementPlacement

RoutingRouting

configuration

SimulationSimulation

Post-Layout Simulation

Structural

Physical

BehavioralDesign Capture

Des

ign

Itera

tion

Programmable Logic Design Flow

In situ testingIn situ testing On Field

PACAP - FPGA 17

Principles of FPGA technology

Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators

PACAP - FPGA 18

Limits of LUT based FPGAs

Lack of sufficient on-chip storageSignal processing/Wireless need to buffer data and/or resultsNetwork application need to store many medium sized tables

Poor/insufficient arithmetic performanceInteger Multiplication/ACcumulation a key metric for DSP

Integer multipliers build out of LUTs too slow and costly to enable real-time signal processing applications

On-chip memory built out of LUT and Slice flip-flop not sufficient for addressing performance requirements

PACAP - FPGA 19

DSP blocks

Extend FPGA architecture with arithmetic oriented blocksMedium sized hard-wired integer multipliersFast accumulation, rounding and shifters, etc.

Example of the Virtex-5 DSP block

Somewhat similar structures used in Altera devices

48 bit wide ALU

25 bits Preadder 17 bit shifter for

scaling

25x18 pipelinedinteger multiplier

PACAP - FPGA 20

Embedded memory blocks

Hard-wired memory banks distributed in the FPGAFirst blocks were 9kbits block, current blocks are 36kbits

3636DIADIA

ADDRAADDRA3636

DOADOA

Port A

36 KbMemory

Array

CLKACLKA

WEAWEA44

3636DIBDIB

ADDRBADDRB3636

DOBDOB

Port BCLKBCLKB

WEBWEB44

Configurable width/depth

(32kx1 to 512x72)

Two read/write ports with distinct address ports.

Built-in logic to operate as FIFO buffer

PACAP - FPGA 21

State of the art FPGAs at a glance

Logic Cells

Block RAM

DSP Slices

Peak DSP Perf.

Transceivers

Transceiver Performance

Memory Performance

I/O Pins

I/O Voltages

Lowest Power

and Cost

Industry’s Best Price/Performance

Industry’s Highest System

Performance

Maximum Capability

Different capacity, performance and features

Device cost ranges from 5$ to 20k$ …

PACAP - FPGA 22

FPGA trends

FPGA capacities evolve faster than Moore’s Law dictatesVery regular design eases optimized implementation tricksMultiple FPGA die on a silicon interposer

65% 130% 163%

PACAP - FPGA 23

Principles of FPGA technology

Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators

PACAP - FPGA 24

System Level Integration

Older systems combined FPGA + CPU at PCB levelFlexibility in CPU/DSP FPGA choicesCPU used mostly for UI or system level management

Processor soft-core appeared in early 2000’s

Processors build out of FPGA logic (LUT + DSP + EMB)Limited clock-speed and low performance µ−archEx : NIOS2 (revamped MIPS R3000) reached 300 MIPs

Today, FPGAs integrate high perf. embedded CPUs

ARM processors (A9 – A53) and/or PowerPC coresIntel Xeon-FPGA as a dual chip in the same package

PACAP - FPGA 25

The Zynq platform

Virtual address space

MMU

To external memory (DDRAM)

256kb L2 cache

L1

MMU

L1

Memory controller

Cortex A9Cortex A9

1,2 GB/s

1,2 GB/s

Cache coherent access to L2 with ACP port

Four non coherent access to SDRAM

600Mhz dual core Cortex A9 with Neon SIMD ISA

PACAP - FPGA 26

The Zybo board

27

Low end Zynq based system for academic use (150$).

• 28,000 logic cells• 240 KB Block RAM• 80 DSP slices• 650 MHz dual-core Cortex A9• DDR3 memory 512 MB x32

w/ 1050Mbps bandwidth

Principles of FPGA technology

Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators

PACAP - FPGA 28

FPGA markets

Storage and networking are the main market drivers

Taken from http://www.radiantinsights.com/img/research/north-america-fpga-market.png

PACAP - FPGA 29

FPGAs vs. ASICs

ASIC NRE costs have rising dramatically over yearsFPGAs keep on improving in size, performance, cost

Total Cost

Volume

Std. Cell(current)

FPGA(current)

Break-EventPoint

FPGA(future)

Std. Cell(future)

In 2009, 97% of new design starts target FPGAs

[source chipdesign, 2009]

PACAP - FPGA 30

Principles of FPGA technology

Programmable Logic blocks & InterconnectDesigning for an FPGAEvolutions and improvement in FPGA architectureHeterogeneous system (FPGA + CPU)FPGA market and application domainsDifferent types of FPGA accelerators

PACAP - FPGA 31

FPGA as throughput accelerators

FPGA accelerator = massively parallel processing10 Tflops announced for the Stratix 10 FPGAEven better for unconventional arithmetic (cryptography)

FPGA does not necessarily [perform better than GPUSBenefit of FPGAs is mostly the 10x-50x energy efficiency

PACAP - FPGA 32

ControlALU ALU

ALU ALU

Cache

DRAM DRAM DRAM

CPU GPU FPGA

FPGAs as latency accelerators

key

value

Example : key-value store (memcached)Large scale distributed key-value systems

PACAP - FPGA 33

top related