ece645 fpga devices
TRANSCRIPT
George Mason UniversityECE 645 – Computer Arithmetic
Introduction to FPGA Devices
2ECE 645 – Computer Arithmetic
World of Integrated Circuits
Integrated Circuits
Full-CustomASICs
Semi-CustomASICs
UserProgrammable
PLD FPGA
PAL PLA PML LUT(Look-Up Table)
MUX Gates
3ECE 645 – Computer Arithmetic
• designs must be sent for expensive and time consuming fabrication in semiconductor foundry
• bought off the shelf and reconfigured by designers themselves
Two competing implementation approaches
ASICApplication Specific
Integrated Circuit
FPGAField Programmable
Gate Array
• designed all the way from behavioral description to physical layout
• no physical layout design; design ends with a bitstream used to configure a device
4ECE 645 – Computer Arithmetic
Block R
AM
s
Block R
AM
s
ConfigurableLogicBlocks
I/OBlocks
What is an FPGA?
BlockRAMs
5ECE 645 – Computer Arithmetic
Which Way to Go?
Off-the-shelf
Low development cost
Short time to market
Reconfigurability
High performance
ASICs FPGAs
Low power
Low cost inhigh volumes
6ECE 645 – Computer Arithmetic
Other FPGA Advantages
• Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower• Mistakes not detected at design time have
large impact on development time and cost• FPGAs are perfect for rapid prototyping of
digital circuits
• Easy upgrades like in case of software
• Unique applications• reconfigurable computing
7ECE 645 – Computer Arithmetic
Major FPGA Vendors
SRAM-based FPGAs• Xilinx, Inc.• Altera Corp.• Atmel• Lattice Semiconductor
Flash & antifuse FPGAs• Actel Corp.• Quick Logic Corp.
Share over 60% of the market
8ECE 645 – Computer Arithmetic
Xilinx
Primary products: FPGAs and the associated CAD software
Main headquarters in San Jose, CA Fabless* Semiconductor and Software Company
UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} Seiko Epson (Japan) TSMC (Taiwan)
Programmable Logic Devices ISE Alliance and Foundation
Series Design Software
9ECE 645 – Computer Arithmetic
Xilinx FPGA Families• Old families
• XC3000, XC4000, XC5200• Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended
for modern designs.
• High-performance families• Virtex (0.22µm)• Virtex-E, Virtex-EM (0.18µm)• Virtex-II, Virtex-II PRO (0.13µm)• Virtex-4 (0.09µm)
• Low Cost Family• Spartan/XL – derived from XC4000• Spartan-II – derived from Virtex• Spartan-IIE – derived from Virtex-E• Spartan-3
10ECE 645 – Computer Arithmetic
11ECE 645 – Computer Arithmetic
Xilinx FPGA Block Diagram
12ECE 645 – Computer Arithmetic
CLB Structure
13ECE 645 – Computer Arithmetic
CLB Slice Structure• Each slice contains two sets of the
following:• Four-input LUT
• Any 4-input logic function,• or 16-bit x 1 sync RAM• or 16-bit shift register
• Carry & Control• Fast arithmetic logic• Multiplier logic• Multiplexer logic
• Storage element• Latch or flip-flop• Set and reset• True or inverted inputs• Sync. or async. control
14ECE 645 – Computer Arithmetic
LUT (Look-Up Table) Functionality
• Look-Up tables are primary elements for logic implementation
• Each LUT can implement any function of 4 inputs
x1 x2 x3 x4
y
x1 x2
y
LUT
x1x2x3x4
y
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
x1 x2 x3 x4
y
x1 x2 x3 x4
y
x1 x2
y
x1 x2
y
LUT
x1x2x3x4
y
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
15ECE 645 – Computer Arithmetic
5-Input Functions implemented using two LUTs
• One CLB Slice can implement any function of 5 inputs• Logic function is partitioned between two LUTs• F5 multiplexer selects LUT
A4
A3
A2
A1WS DI
D
LUTROMRAM
1
0
F4
F3
F2
F1
A4
A3
A2
A1
WS DI
D
LUTROMRAM
F5
GXOR
G
nBX
BX
1
0
BX
X
F5
A4
A3
A2
A1WS DI
D
LUTROMRAM
A4
A3
A2
A1WS DI
D
LUTROMRAM
1
0
1
0
F4
F3
F2
F1
A4
A3
A2
A1
WS DI
D
LUTROMRAM
A4
A3
A2
A1
WS DI
D
LUTROMRAM
F5
GXOR
G
F5
GXOR
G
nBX
BX
1
0
nBX
BX
1
0
BX
X
F5
16ECE 645 – Computer Arithmetic
5-Input Functions implemented using two LUTs
LUTLUT
X5 X4 X3 X2 X1 Y
0 0 0 0 0 00 0 0 0 1 10 0 0 1 0 00 0 0 1 1 00 0 1 0 0 10 0 1 0 1 10 0 1 1 0 00 0 1 1 1 00 1 0 0 0 10 1 0 0 1 00 1 0 1 0 00 1 0 1 1 10 1 1 0 0 10 1 1 0 1 10 1 1 1 0 10 1 1 1 1 11 0 0 0 0 01 0 0 0 1 01 0 0 1 0 01 0 0 1 1 01 0 1 0 0 01 0 1 0 1 01 0 1 1 0 01 0 1 1 1 11 1 0 0 0 01 1 0 0 1 11 1 0 1 0 01 1 0 1 1 11 1 1 0 0 01 1 1 0 1 11 1 1 1 0 01 1 1 1 1 0
LUTLUT
OUT
17ECE 645 – Computer Arithmetic
RAM16X1S
O
DWE
WCLKA0A1A2A3
RAM32X1S
O
DWEWCLKA0A1A2A3A4
RAM16X2S
O1
D0
WEWCLKA0A1A2A3
D1
O0
=
=LUT
LUT or
LUT
RAM16X1D
SPO
D
WE
WCLK
A0
A1
A2
A3
DPRA0 DPO
DPRA1
DPRA2
DPRA3
or
Distributed RAM
• CLB LUT configurable as Distributed RAM• A LUT equals 16x1 RAM• Implements Single and Dual-
Ports• Cascade LUTs to increase
RAM size
• Synchronous write• Synchronous/Asynchronous
read• Accompanying flip-flops used
for synchronous read
18ECE 645 – Computer Arithmetic
D QCE
D QCE
D QCE
D QCE
LUT
INCE
CLK
DEPTH[3:0]
OUTLUT =
Shift Register
• Each LUT can be configured as shift register• Serial in, serial out
• Dynamically addressable delay up to 16 cycles
• For programmable pipeline
• Cascade for greater cycle delays
• Use CLB flip-flops to add depth
19ECE 645 – Computer Arithmetic
Shift Register
• Register-rich FPGA• Allows for addition of pipeline stages to increase throughput
• Data paths must be balanced to keep desired functionality
64Operation A
4 Cycles 8 Cycles
Operation B
3 Cycles
Operation C64
12 Cycles
3 Cycles9-Cycle imbalance
20ECE 645 – Computer Arithmetic
COUT
D Q
CK
S
REC
D Q
CK
REC
O
G4G3G2G1
Look-UpTable
Carry&
ControlLogic
O
YB
Y
F4F3F2F1
XB
X
Look-UpTable
F5IN
BYSR
S
Carry&
ControlLogic
CINCLKCE SLICE
Carry & Control Logic
21ECE 645 – Computer Arithmetic
Each CLB contains separate logic and routing for the fast generation of sum & carry signals• Increases efficiency and
performance of adders, subtractors, accumulators, comparators, and counters
Carry logic is independent of normal logic and routing resources
Fast Carry Logic
LSB
MSB
Car
ry L
ogic
Rou
ting
22ECE 645 – Computer Arithmetic
Accessing Carry Logic
All major synthesis tools can infer carry logic for arithmetic functions
• Addition (SUM <= A + B)
• Subtraction (DIFF <= A - B)
• Comparators (if A < B then…)
• Counters (count <= count +1)
23ECE 645 – Computer Arithmetic
Block RAM
Spartan-IITrue Dual-Port
Block RAM
Port A
Port B
Block RAM
• Most efficient memory implementation• Dedicated blocks of memory
• Ideal for most memory requirements• 4 to 104 memory blocks
• 18 kbits = 18,432 bits per block
• Use multiple blocks for larger memories
• Builds both single and true dual-port RAMs
24ECE 645 – Computer Arithmetic
Spartan-3 Block RAM Amounts
25ECE 645 – Computer Arithmetic
Block RAM Port Aspect Ratios
26ECE 645 – Computer Arithmetic
Block RAM Port Aspect Ratios
0
16,383
1
4,095
40
8,191
20
2047
8+10
1023
16+20
16k x 1
8k x 2 4k x 4
2k x (8+1)
1024 x (16+2)
27ECE 645 – Computer Arithmetic
Dual Port Block RAM
28ECE 645 – Computer Arithmetic
RAMB4_S4_S16
Port A Out18-Bit Width
Port B In2k-Bit Depth
Port A In1K-Bit Depth
Port B Out9-Bit Width
DOA[17:0]
DOB[8:0]
WEA
ENA
RSTA
ADDRA[9:0]
CLKA
DIA[17:0]
WEB
ENB
RSTB
ADDRB[8:0]
CLKB
DIB[15:0]
Dual-Port Bus Flexibility
• Each port can be configured with a different data bus width
• Provides easy data width conversion without any additional logic
29ECE 645 – Computer Arithmetic
VCC, ADDR[12:0]
GND, ADDR[12:0]
RAMB4_S1_S1
Port B Out1-Bit Width
DOA[0]
DOB[0]
WEA
ENA
RSTA
ADDRA[12:0]
CLKA
DIA[0]
WEB
ENB
RSTB
ADDRB[12:0]
CLKB
DIB[0]
Port B In8K-Bit Depth
Port A Out1-Bit Width
Port A In8K-Bit Depth
Two Independent Single-Port RAMs
• To access the lower RAM• Tie the MSB address bit to
Logic Low• To access the upper RAM
• Tie the MSB address bit to Logic High
• Added advantage of True Dual-Port
• No wasted RAM Bits• Can split a Dual-Port 16K RAM into
two Single-Port 8K RAM• Simultaneous independent access
to each RAM
30ECE 645 – Computer Arithmetic
New 18 x 18 Embedded Multiplier
• Fast arithmetic functions• Optimized to implement multiply /
accumulate modules
18 x 18 signed multiplierFully combinatorialOptional registers with CE & RST (pipeline)Independent from adjacent block RAM
31ECE 645 – Computer Arithmetic
18 x 18 Multiplier • Embedded 18-bit x 18-bit multiplier
• 2’s complement signed operation
• Multipliers are organized in columns
18 x 18Multiplier
Output (36 bits)
Data_A (18 bits)
Data_B (18 bits)
Note: See Virtex-II Data Sheet for updated performances
32ECE 645 – Computer Arithmetic
Basic I/O Block Structure
DEC
Q
SR
DEC
Q
SR
DEC
Q
SR
Three-StateControl
Output Path
Input Path
Three-State
Output
Clock
Set/Reset
Direct Input
Registered Input
FF Enable
FF Enable
FF Enable
33ECE 645 – Computer Arithmetic
IOB Functionality
• IOB provides interface between the package pins and CLBs
• Each IOB can work as uni- or bi-directional I/O
• Outputs can be forced into High Impedance
• Inputs and outputs can be registered• advised for high-performance I/O
• Inputs can be delayed
34ECE 645 – Computer Arithmetic
Routing Resources
PSM PSM
CLB
PSM PSM
CLB CLB
CLBCLB CLB
CLBCLB CLB
ProgrammableSwitchMatrix
35ECE 645 – Computer Arithmetic
Clock Distribution
36ECE 645 – Computer Arithmetic
Spartan-3 FPGA Family Members
37ECE 645 – Computer Arithmetic
FPGA Nomenclature
38ECE 645 – Computer Arithmetic
Device Part Marking
We’re Using: XC3S100-4FG256
39ECE 645 – Computer Arithmetic
40ECE 645 – Computer Arithmetic
Virtex-II 1.5V Architecture
Configurable
Logic
Block
Block R
AM
s
I/OBlock
Multipliers 18 x 18
Block R
AM
s
Multipliers 18 x 18
Block R
AM
s
Multipliers 18 x 18
Block R
AM
s
Multipliers 18 x 18
41ECE 645 – Computer Arithmetic
Virtex-II 1.5V
Device CLB Array
Slices Maximum I/O
BlockRAM
(18kb)
Multiplier Blocks
Distributed RAM bits
XC2V40 8x8 256 88 4 4 8,192
XC2V80 16x8 512 120 8 8 16,384
XC2V250 24x16 1,536 200 24 24 49,152
XC2V500 32x24 3,072 264 32 32 98,304
XC2V1000 40x32 5,120 432 40 40 163,840
XC2V1500 48x40 7,680 528 48 48 245,760
XC2V2000 56x48 10,752 624 56 56 344,064
XC2V3000 64x56 14,336 720 96 96 458,752
XC2V4000 80x72 23,040 912 120 120 737,280
XC2V6000 96x88 33,792 1,104 144 144 1,081,344
XC2V8000 112x104 46,592 1,108 168 168 1,490,944
42ECE 645 – Computer Arithmetic
Virtex-II Block SelectRAM• Virtex-II BRAM is 18 kbits
• Additional “parity” bits available in selected configurations
WEA
ENA
SSRA
CLKA
ADDRA[# : 0]
DIA[# : 0]
DOA[# : 0]
WEB
ENB
RSTB
CLKB
ADDRB[# : 0]
DIB[# : 0]
DOB[# : 0]
DIPA[# : 0]
DIPA[# : 0]
DOPA[# : 0]
DOPB[# : 0]
WEA
ENA
SSRA
CLKA
ADDRA[# : 0]
DIA[# : 0]
DOA[# : 0]
WEB
ENB
RSTB
CLKB
ADDRB[# : 0]
DIB[# : 0]
DOB[# : 0]
DIPA[# : 0]
DIPA[# : 0]
DOPA[# : 0]
DOPB[# : 0]
Width Depth Address Data Parity
1 16,386 [13:0] [0] N/A
2 8,192 [12:0] [1:0] N/A
4 4,096 [11:0] [3:0] N/A
9 2,048 [10:0] [7:0] [0]
18 1,024 [9:0] [15:0] [1:0]
36 512 [8:0] [31:0] [3:0]
George Mason UniversityECE 645 – Computer Arithmetic
Using Library Components in VHDL Code
44ECE 645 – Computer Arithmetic
RAM 16x1 (1)
library IEEE;use IEEE.STD_LOGIC_1164.all;
library UNISIM;use UNISIM.all;
entity RAM_16X1_DISTRIBUTED is port(
CLK : in STD_LOGIC; WE : in STD_LOGIC; ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC; DATA_OUT : out STD_LOGIC
);end RAM_16X1_DISTRIBUTED;
45ECE 645 – Computer Arithmetic
RAM 16x1 (2)architecture RAM_16X1_DISTRIBUTED_STRUCTURAL of RAM_16X1_DISTRIBUTED is
attribute INIT : string;attribute INIT of RAM16X1_S_1: label is "F0C1";
-- Component declaration of the "ram16x1s(ram16x1s_v)" unit-- File name contains "ram16x1s" entity: ./src/unisim_vital.vhdcomponent ram16x1sgeneric(
INIT : BIT_VECTOR(15 downto 0) := X"0000");port(
O : out std_ulogic;A0 : in std_ulogic;A1 : in std_ulogic;A2 : in std_ulogic;A3 : in std_ulogic;D : in std_ulogic;WCLK : in std_ulogic;WE : in std_ulogic);
end component;
46ECE 645 – Computer Arithmetic
RAM 16x1 (3)
begin
RAM_16X1_S_1: ram16x1s generic map (INIT => X"F0C1")port map(O=>DATA_OUT, A0=>ADDR(0), A1=>ADDR(1), A2=>ADDR(2), A3=>ADDR(3), D=>DATA_IN, WCLK=>CLK, WE=>WE );
end RAM_16X1_DISTRIBUTED_STRUCTURAL;
47ECE 645 – Computer Arithmetic
RAM 16x8 (1)
library IEEE;use IEEE.STD_LOGIC_1164.all;
library UNISIM;use UNISIM.all;
entity RAM_16X8_DISTRIBUTED is port(
CLK : in STD_LOGIC; WE : in STD_LOGIC; ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC_VECTOR(7 downto 0); DATA_OUT : out STD_LOGIC_VECTOR(7 downto 0)
);end RAM_16X8_DISTRIBUTED;
48ECE 645 – Computer Arithmetic
RAM 16x8 (2)architecture RAM_16X8_DISTRIBUTED_STRUCTURAL of RAM_16X8_DISTRIBUTED is
attribute INIT : string;attribute INIT of RAM16X1_S_1: label is "0000";
-- Component declaration of the "ram16x1s(ram16x1s_v)" unit-- File name contains "ram16x1s" entity: ./src/unisim_vital.vhdcomponent ram16x1sgeneric(
INIT : BIT_VECTOR(15 downto 0) := X"0000");port(
O : out std_ulogic;A0 : in std_ulogic;A1 : in std_ulogic;A2 : in std_ulogic;A3 : in std_ulogic;D : in std_ulogic;WCLK : in std_ulogic;WE : in std_ulogic);
end component;
49ECE 645 – Computer Arithmetic
RAM 16x8 (3)begin
GENERATE_MEMORY:for I in 0 to 7 generate
RAM_16X1_S_1: ram16x1s generic map (INIT => X"0000")port map(O=>DATA_OUT(I),
A0=>ADDR(0), A1=>ADDR(1), A2=>ADDR(2),
A3=>ADDR(3), D=>DATA_IN(I), WCLK=>CLK, WE=>WE );
end generate;
end RAM_16X8_DISTRIBUTED_STRUCTURAL;
50ECE 645 – Computer Arithmetic
ROM 16x1 (1)
library IEEE;use IEEE.STD_LOGIC_1164.all;
library UNISIM;use UNISIM.all;
entity ROM_16X1_DISTRIBUTED is port( ADDR : in STD_LOGIC_VECTOR(3 downto 0);
DATA_OUT : out STD_LOGIC );
end ROM_16X1_DISTRIBUTED;
51ECE 645 – Computer Arithmetic
ROM 16x1 (2)architecture ROM_16X1_DISTRIBUTED_STRUCTURAL of ROM_16X1_DISTRIBUTED is
attribute INIT : string;attribute INIT of ROM16X1_S_1: label is "F0C1";
component ram16x1sgeneric(
INIT : BIT_VECTOR(15 downto 0) := X"0000");port(
O : out std_ulogic;A0 : in std_ulogic;A1 : in std_ulogic;A2 : in std_ulogic;A3 : in std_ulogic;D : in std_ulogic;WCLK : in std_ulogic;WE : in std_ulogic);
end component; signal Low : std_ulogic := ‘0’;
52ECE 645 – Computer Arithmetic
ROM 16x1 (3)
begin
ROM_16X1_S_1: ram16x1s generic map (INIT => X"F0C1")port map(O=>DATA_OUT, A0=>ADDR(0), A1=>ADDR(1), A2=>ADDR(2), A3=>ADDR(3), D=>Low, WCLK=>Low, WE=>Low
);
end ROM_16X1_DISTRIBUTED_STRUCTURAL;