partially reconfigurable systems: past, present and future …moraes/prototip/arq_reconfigu... ·...
TRANSCRIPT
Pontifícia Universidade Católica do Rio Grande do Sul!Faculdade de Informática (FACIN-PUCRS)!
Grupo de Apoio ao Projeto de Hardware - GAPH!
Ney Laert Vilar Calazans!Fernando Gehm Moraes!
March 21st, 2012!
Partially Reconfigurable Systems: Past, Present and
Future Perspective
Antes de abordarmos arquiteturas reconfiguráveis, uma introdução à FPGAs
3 Fernando Gehm Moraes - [email protected]
O que são FPGAs? Arquitetura básica
CLB CLB CLB
CLB CLB CLB
CLB CLB CLB
Switch Blocks:
• Matriz de CLBs (configurable logic blocks) interconectados por uma matriz de chaveamento
4 Fernando Gehm Moraes - [email protected]
FPGAs – conceitos básicos
The image part with relationship ID rId3 was not
Bloco K Bloco K Bloco K
Bloco K Bloco K Bloco K
Bloco K Bloco K Bloco K
ES
ES
ES
ES
ES
ES
ES ES ES
ES ES ES
Entrada/Saída Configuráveis
Conexões Configuráveis Funções
Configuráveis
5 Fernando Gehm Moraes - [email protected]
FPGAs – conceitos básicos
• Exemplo de conexão entre duas redes
Bloco K Bloco K Bloco K
Bloco K Bloco K Bloco K
Bloco K
Bloco K
6 Fernando Gehm Moraes - [email protected]
LUT - gerador universal de funções • LUT - look-up table
– Uma porção de hardware configurável/reconfigurável capaz de implementar qualquer tabela verdade de n entradas
– Para n=4:
– LUT » Altamente flexível
» Método mais utilizado (Xilinx e Altera)
2 (2)
4
= 65.536 funções implementáveis
7 Fernando Gehm Moraes - [email protected]
FPGAs – LUT - gerador universal de funções
A B C D
1
0
0
1
0
0
0
1
1
0
1
0
1
0
1
0
Tabela verdade da função é
armazenada em um memória
durante a configuração do
FPGA
D A D C A D C B A D C B A F . . . . . . ) , , , ( + + = ∑ = ) 14 , 12 , 10 , 8 , 7 , 3 , 0 ( ) , , , ( D C B A F
As entradas (variáveis Booleanas) controlam um multiplexador 2n:1
0
15
Implementação física de uma LUT4
Considerando 150 transistores / LUT Para 50.000 LUTS è 7.500.000 transistores !
8 Fernando Gehm Moraes - [email protected]
Outras formas de geração de funções
• Por soma de produtos – - : Large silicon area – + : Important number of inputs allowed – + : Important number of sum of products
A B C
F1
F0 D Q
D Q
9 Fernando Gehm Moraes - [email protected]
ULG – Universal Logic Gates
• Multiplexador: Actel, QuickLogic, Algotronix – estrutura conhecida como “gerador universal de funções
lógicas” - ULG – não implementa todas as funções lógicas de n entradas – funções lógicas mais complexas requerem diversos ULGs
A
B
C
D
Saída C1
C2
C3
0 1
0 1
0 1
0
D
0
A
1
b
c
C B A D C D C B A F . . . ) , , , ( + =
0 1
0 1
0 1
10 Fernando Gehm Moraes - [email protected]
Tecnologias de configuração
SRAM-based FPGAs Xilinx Inc www.xilinx.com Altera Corp. www.altera.com Atmel Corp. www.atmel.com Lattice Sem.Corp. www.latticesemi.com
Antifuse and flash-based FPGAs
Actel Corp. www.actel.com QuickLogic Corp www.quicklogic.com
11 Fernando Gehm Moraes - [email protected]
Programming technologies : Antifuse (PLDs, CPLDs)
Metal 1
Metal 2
• Realization of a short-circuit between two metal layers • Very low silicon area • Best performances
One time programming
Dielectric
The dielectric between polysilicon and diffusion electrodes melts and forms a thin, permanent, and resistive silicon link
Tecnologias de configuração
12 Fernando Gehm Moraes - [email protected]
Programming technologies : (E)EPROM (PLDs, CPLDs, FPGAs)
Limited number of programming
Gate floating gate
The application of a potential on the upper gate causes the transfer of charges from the channel trough the thin oxyde layer, which charges the floating gate.
Oxyde layer
Configuration erased by UV or electrically
Tecnologias de configuração
13 Fernando Gehm Moraes - [email protected]
Programming technologies : SRAM (FPGAs) Unlimited number of programming
Word line
Bit
line Vcc Vcc
5 transistors SRAM
Programming must be done at each power-up
Tecnologias de configuração
14 Fernando Gehm Moraes - [email protected]
FPGAs – Configuração (dispositivos RAM-based)
• FPGA deve ser visto como “duas camadas” – Memória de configuração – Lógica do usuário
• Memória de configuração define: – All interconnection (wiring) – Logic Definition ( LUTs) – DSP blocks – Interface to hardwired blocks, e.g. PPC – BRAM width, contents – I/O Modes
• Virtex – Permite reconfiguração parcial
Configuration Memory Layer
User Logic Layer
Virtex 4: 1 MB – 4 MB
15 Fernando Gehm Moraes - [email protected]
Mercado FPGA (2005)
Source: Company reports Latest information available; computed on a 4-quarter rolling basis
Xilinx Altera
Lattice Actel QuickLogic: 2% Xilinx
All Others
PLD Segment FPGA Sub-Segment
Other: 2%
51% 33%
5% 7%
Altera
58%
31% 11%
http://teal.gmu.edu/courses/ECE448/viewgraphs_S08/lecture17_market.ppt
16 Fernando Gehm Moraes - [email protected] ECE 448 – FPGA and ASIC Design with VHDL
PLD Market Share
Source: Gartner Dataquest
$2.3B $2.6B $4.1B $2.6B $2.1B $2.6B $3.1B
31% 33% 34% 32% 31% 32% 32%
39% 32% 28% 24% 20% 18% 17%
49% 50% 44% 38%
35% 30%
51%
0%
20%
40%
60%
80%
100%
Calendar year 1998 1999 2000 2001 2002 2003 2004
Mar
ket S
hare
(%)
Xilinx Altera All Others
17 Fernando Gehm Moraes - [email protected]
http://www.ocoudert.com/blog/2009/09/15/why-fpga-startups-keep-failing/
MERCADO FPGAs
18 Fernando Gehm Moraes - [email protected]
Mercado • Dominated by two players, Xilinx and Altera
• With 51% and 35% share = 86% combined • Remaining players scramble for niches • All non-dedicated players have given up:
• Intel, T.I., Motorola, NSC, AMD, Cypress, Philips… • Late-comers have been absorbed or failed:
• Dynachip, PlusLogic, Triscend, SiliconSpice (absorbed) Chameleon, Quicksilver, Morphics, Adaptive Silicon (failed)
The pace of innovation is set by the leaders
19 Fernando Gehm Moraes - [email protected] ECE 448 – FPGA and ASIC Design with VHDL
FPGA families
Xilinx
Altera Cyclone IV/V Stratix IV/V Aria II/V
20 Fernando Gehm Moraes - [email protected]
Xilinx FPGA Families • Old families
• XC3000, XC4000, XC5200 • Old 0.5µm, 0.35µm and 0.25µm technology. Not
recommended for modern designs. • Low Cost Family
• Spartan/XL – derived from XC4000 • Spartan-II – derived from Virtex • Spartan-IIE – derived from Virtex-E • Spartan-3, Spartan-3E, Spartan-3A • Spartan-3AN, Spartan-3A DSP (90 nm)
• High-performance families • Virtex (220 nm) • Virtex-E, Virtex-EM (180 nm) • Virtex-II, Virtex-II PRO (130 nm) • Virtex-4 (90 nm) • Virtex-5 (65 nm / 45nm) • Virtex-6 e Virtex-7 (28 nm)
Source: [Xilinx Inc.]
1 milhão de portas lógicas equivalentes por menos de 5 $ !
21 Fernando Gehm Moraes - [email protected]
Arquitetura Virtex – CLB e interconexão
• Conexões diretas entre CLBs vizinhas
– Lógica de vai-um
• Matrix de conexão – CLB às linhas de
roteamento
• Linhas de roteamento – Simples – Hexas – Longas – Tri-state
SINGLE
HEX
LONG
SINGLE
HEX
LONG
SIN
GL
E
HE
X
LO
NG
SIN
GL
E
HE
X
LO
NG
TRISTATE BUSSES
SWITCHMATRIX
SLICE SLICE
LocalFeedback
CA
RR
Y
CA
RR
Y
CLB
CA
RR
Y
CA
RR
Y
DIRECT CONNECT
DIRECT CONNECT
22 Fernando Gehm Moraes - [email protected]
CLB – Virtex 5 em diante
23 Fernando Gehm Moraes - [email protected]
24 Fernando Gehm Moraes - [email protected]
25 Fernando Gehm Moraes - [email protected]
Virtex FPGA Editor View With All Wires
zoomed-in view
many routing resources
large switch box 4 slices and 2 TBUFs
26 Fernando Gehm Moraes - [email protected]
Entradas e saídas programáveis
Fim da introdução à FPGAs
29
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Outline
Ø Introduction - Defs, History and Models
Ø Reconfigurable Systems - Past Ø Previous Design Flows for DRS
Ø Reconfigurable Systems - Present Ø Current Xilinx Design Flow
Ø Reconfigurable Systems - Future?
30
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Outline
Ø Introduction - Defs, History and Models
Ø Reconfigurable Systems - Past Ø Previous Design Flows for DRS
Ø Reconfigurable Systems - Present Ø Current Xilinx Design Flow
Ø Reconfigurable Systems - Future?
31
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Organization X Flexibility in SoCs
32
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
SoCs
• Present Design Scenario for Systems-on-Chip (SoCs)
• Energy consumption
• Latency
• Design Complexity
• Testability
33
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
What is Hw Reconfigurability? • Simple definition – Any piece of Hw that can
behave differently at different instants, according to application or user needs is a Reconfigurable Hw
• Examples • Stupidly trivial – An arithmetic Logic Unit (ALU)
à each operation is a configuration • Less trivial – A µprocessor à each instruction
is a configuration • ‘Normal’ – An FPGA à a bitstream file is a
configuration
34
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
How to Obtain Reconfigurability?
A B C D
1
0
0
1
0
0
0
1
1
0
1
0
1
0
1
0
Example hardware
organization for 4-input Look-Up-
Table (LUT4)
Truth-table output stored in
register
D A D C A D C B A D C B A F . . . . . . ) , , , ( + + = ∑ = ) 14 , 12 , 10 , 8 , 7 , 3 , 0 ( ) , , , ( D C B A F
Inputs (Boolean vars) control mux 2n:1
0
15
Single bit S controls if wires connected (or not)
Single bit S controls if either a or b
connect to mux out
In other words, Hw reconfigurability achievable with
adequate organizations and
control memory
Remember, Hw is always fixed.
Changing Hw is an abstraction. But
Sw does not exist either!
35
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Reconfiguration Classifications
Reconfigurable Systems
Static
Total
Dynamic
Partial
Most FPGA implementations
• A few works
: once
: at run-time
Emphasis here is on Partially and Dynamically Reconfigurable Systems!
36
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Practical Implementations Defs • Partition
• A logic block (entity or instance) used for design reuse • User determines implementation X preservation for each
block
• Bottom-up synthesis • Separate projects synthesis à multiple netlists • No optimization across projects
• Top-down synthesis; NOT used for Partial Reconfiguration (normal flow)
• One project where synthesis may flatten design for optimization
37
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
More Terminology • Reconfigurable Partition (RP)
• Design hierarchy instance marked by the user for reconfiguration
• Reconfigurable Module (RM) • Portion of the logical design that occupies an RP • Each RP may have multiple RMs
• Static Logic • All logic in the design that is not reconfigurable
• A Configuration • A full design, consisting of Static Logic and an RM for each RP
• Partition Pins • Ports on a Partition; Interface between Static and
Reconfigurable Logic • Proxy Logic
• Components (e.g. LUTs) inserted on each Partition Pin to act as anchor points for RP
38
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Configuration Port or ICAP
Configuration Port
What is Partial Reconfiguration?
Full
Bit File
Partial Bit Files
§ Partial Reconfiguration is the ability to dynamically modify blocks of logic by downloading partial bit files while the remaining logic continues to operate without interruption.
Function A1
Function B1
Function C1 Function C2
Function B2
Function A2 Function A3
39
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
•
Process 1
Processor Context Switch
Process 2
Partial Bitstream A
FPGA
FPGA Configuration Switch
MMU Stack
PR region 1
PR region 2
PR region N
uP
Process N
Partial Bitstream B
Partial Bitstream N
PR Applications Analogy
40
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
•
Process 1
Processor Context Switch
Process 2
Partial Bitstream A
FPGA
FPGA Configuration Switch
MMU Stack
PR region 1
PR region 2
PR region N
uP
Process N
Partial Bitstream B
Partial Bitstream N
PR Applications Analogy
41
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Partial Reconfiguration: Tech and Benefits • Partial Reconfiguration enables:
• System Flexibility – Perform more functions while
maintaining communication links
• Size and Cost Reduction – Time-multiplex the hardware
to require a smaller FPGA
• Power Reduction – Shut down power-hungry tasks
when not needed
42
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
A Little History of Reconfig. Hw • In the early 60’s Estrin, Russel, Turn and Bibb
proposed the Fix-plus Machine • Fixed part – fast
computer • A special purpose
supervisory (reconfiguration) control
• A variable part, composed by several math functions
• Seen as the first Reconfigurable Machine
• IEEE Tr. On Elec. Comp. Dec/1963
43
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
A Little History of Reconfig. Hw • In 1977 Rammig (Un. Dortmund, Germany)
proposed a “Hardware Editor” architecture, similar to FPGAs
• Hand-controlled reconfiguration
44
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
A Little History of Reconfig. Hw • In early 80’s Hartenstein (Un. Kaiserslautern,
Germany) proposed the “Xputer” architecture • Composed by
• Data Sequencer • Data Memory • Reconfigurable ALUs
• Reconfiguration below instruction set level
45
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
• PROMs and PLDs are based on sum-of-products (ANDs-ORs)
• Some patents similar to FPGAs appeared at the end of 80’s, start of 90’s (Casselman, Page, Peterson)
• Founders of Xilinx, Ross Freeman and Bernard Vonderschmitt, invented the first commercial FPGA in 1985 – the XC2064
• The XC2064 had 64 configurable logic blocks (CLBs) and configurable interconnect among logic blocks
• Each CLB had only 2 3-input LUTs
A Little History of Reconfig. Hw
46
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Reconfigurable Hw Models • André deHon in his book proposes a generic model
• Array of identical blocks • Compute unit is configurable (like an ALU) • Memory stores intermediate results • Set of muxes route data among units (upper ones) or to
local compute unit (lower ones)
47
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Outline
Ø Introduction - Defs, History and Models
Ø Reconfigurable Systems - Past Ø Previous Design Flows for DRS
Ø Reconfigurable Systems - Present Ø Current Xilinx Design Flow
Ø Reconfigurable Systems - Future?
48
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Memory
Reconfigurable Computing
CPU I/O Interface
Coprocessor
RFU Reconfigurable Functional Unit
Standalone Attached
Standalone FPGA design
49
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Reconfigurable Computing Evolution 89 91 93 95 97 99 01 03 05 85 87
Legend:
RFU
Coprocessor
Attached
University
Industry
Reconfigurable processor type: Work
source:
Reconfigurable
Devices
XC2064 XC4000 Virtex Virtex 4
Spartan Spartan 3 XC3000 Stratix AT40k
XC6200 AT6000 CAL1024 CLi6000
CLAy KressArray
Matrix CHESS
DReAM Virtex-2 Virtex-2 Pro
Spartan 2 RAW
Reconfigurable
System
s
Dyer Palma
Gecko R8NR Walder
Huebner Horta
Reconfigurable P
rocessors
NAPA
DISC
Nano Processor
OneChip
PRISM GARP PipeRench REMARC
MorphoSys
PRISC Chimaera
Systolic Ring
50
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Reconfigurable Device: Virtex-II (Pro)
ICAP Maximum Speed: 66MHz
1 CLB = 22 frames = 70 µs (XC2V40) ~
• One-dimensional architecture
51
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Virtex-II 1000 frames IN
PUT/
OU
TPU
T R
ESO
UR
CES
– IO
B2
22 fr
ames
2
IOBS
CLB
CLB
CLB
CLB
CLB
CLB
IOBS
3
IOBS
CLB
CLB
CLB
CLB
CLB
CLB
IOBS
4
IOBS
CLB
CLB
CLB
CLB
CLB
CLB
IOBS
5
IOBS
CLB
CLB
CLB
CLB
CLB
CLB
IOBS
16
CEN
TRA
L C
OLU
MN
: clo
ck s
igna
l dis
trib
utio
n R
esou
rces
- 4
fram
es
0
IOBS
CLB
CLB
CLB
CLB
CLB
CLB
IOBS
21
IOBS
CLB
CLB
CLB
CLB
CLB
CLB
IOBS
32
22 frames in each CLB column
IOBS
CLB
CLB
CLB
CLB
CLB
CLB
IOBS
20
INPU
T/O
UTP
UT
RES
OU
RC
ES –
IOB
1 4
fram
es
1
IOBS
CLB
CLB
CLB
CLB
CLB
CLB
IOBS
17
IOBS
CLB
CLB
CLB
CLB
CLB
CLB
IOBS
18
SELE
CT
RA
M B
LOC
KS
64 fr
ames
0
BLO
CK
RA
M IN
TER
CO
NEC
TIO
N R
ESO
UR
CES
22
fram
es
0
SELE
CT
RA
M B
LOC
KS
64 fr
ames
1
BLO
CK
RA
M IN
TER
CO
NEC
TIO
N R
ESO
UR
CES
22
fram
es
1
IOBS
CLB
CLB
CLB
CLB
CLB
CLB
IOBS
19
BLO
CK
RA
M IN
TER
CO
NEC
TIO
N R
ESO
UR
CES
22
fram
es
2
SELE
CT
RA
M B
LOC
KS
64 fr
ames
2
BLO
CK
RA
M IN
TER
CO
NEC
TIO
N R
ESO
UR
CES
22
fram
es
3
SELE
CT
RA
M B
LOC
KS
64 fr
ames
3
IOBS
CLB
CLB
CLB
CLB
CLB
CLB
IOBS
33
IOBS
CLB
CLB
CLB
CLB
CLB
CLB
IOBS
34
INPU
T/O
UTP
UT
RES
OU
RC
ES –
IOB
2 22
fram
es
35
INPU
T/O
UTP
UT
RES
OU
RC
ES –
IOB
1 4
fram
es
36
52
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
RFU (e.g. OneChip)
IF ID EX ME WB
Instruction Cache
Register File
Data Cache
Memory Controller Main
Memory
Instruction Buffer
Controller
Local Storage
Logic
Memory Interface
FPGA Status Controller
BFU
RFU RFU
Reconfigurable Processors
• Reconfigurable logic used inside processor, at execution stage
• Small, simple reconfigurable instructions • Low latency access to reconfigurable logic
53
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Coprocessor (e.g. MorphoSys)
Reconfigurable Processors
• Dedicated connection to reconfigurable logic • Complex and CPU-oriented reconfigurable
instructions
54
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Reconfigurable Processors
Address / Control Bus
Data Bus (16 bits)
PRISM Core Processor
High-Speed Serial Channels
Armstrong Processing Node
Armstrong Bus
XC3090 XC3090 XC3090
Data Buffers
Decoder / Controller
FPGA select
Input Data
Output Data
Reconfigurable Hardware
XC3090
Attached (e.g. PRISM)
• Shared connection to reconfigurable logic • Complex, IO-oriented reconfigurable instructions • Memory mapped reconfigurable instructions
55
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Reconfigurable Systems connected in a point-to-point basis
ETH - Zurich, 2002
Matthias Dyer
Marco Wirz
Christian Plessl
Marco Platzner
• Connected to a Leon coprocessor interface • Computation and communication protocol in VC • LUTs for reconfigurable interfaces • Feed-through macros to guide the routing (i.e. I/O) • Complete columns reconfigured
56
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Arbiter
Master Core
Slave
Core 1
Slave
Core 2
controller
clock reset
startM
start
32
I/O Pins
display
dataOut
dataIn request grant reset clock
dataOut
dataIn request grant reset clock
dataOut
dataIn request grant start reset clock
data line
Reconfigurable Systems connected by bus
• Computation and communication dissociated • Tristates for reconfigurable interfaces • Partial columns reconfigured
PUCRS - Porto Alegre, 2002
José Carlos Palma
Aline Mello
Leandro Möller
Fernando Moraes
Ney Calazans
57
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Reconfigurable Systems connected by bus or p-to-p links
PUCRS - Porto Alegre, 2004
Leandro Möller
Ney Calazans
Fernando Moraes
Eduardo Brião
Ewerson Carvalho
• Connected to R8R coprocessor interface • Computation and communication dissociated • Tristates for reconfigurable interfaces • External self-reconfiguration • Complete columns reconfigured
58
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Reconfigurable Systems connected by bus
KIT - Karlsruhe, 2004
Michael Hübner
Michael Ullmann
Tobias Becker
Jürgen Becker
• 2 uni-directional busses (macros) • Computation and communication dissociated • LUTs for reconfigurable interfaces • Internal self-reconfiguration • Complete columns reconfigured
59
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Reconfigurable Systems connected by bus or point-to-point
ETH - Zurich, 2004
Herbert Walder
Marco Platzner
• Computation and communication dissociated • Tristates for reconfigurable interfaces • Complete columns reconfigured
60
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Reconfigurable modules connected by a network
IMEC - Leuven, 2003
Théodore Marescaux
Jean-Yves Mignolet
Andrei Bartic
Diederik Verkest
Serge Vernalde iPAQ 3760
61
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Outline
Ø Introduction - Defs, History and Models
Ø Reconfigurable Systems - Past Ø Previous Design Flows for DRS
Ø Reconfigurable Systems - Present Ø Current Xilinx Design Flow
Ø Reconfigurable Systems - Future?
62
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
DRS Basic Project Flow for Virtex FPGAs
System
Macros
Placement
Routing
Partial Bitstream
Hw Macros used to control routing among reconfigurable modules
Routing cannot be easily constrained manually
Tool needed to generate partial reconfiguration
files
63
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Macros
Dyer
Walder
R8NR
Huebner
Módulo 0 Módulo 1 Módulo 2 Módulo 3
s s s s
Conexão com o módulo reconfigurável
Módulo Fixo / Árbitro
8
Módulo 0 Módulo 1 Módulo 2 Módulo 3
Conexão com o módulo reconfigurável
8 8 8 8
8
8
8 8
8 8 8
Módulo Fixo / Árbitro
8 8 8
Limit between Reconfigurable Regions
Limit between Reconfigurable Regions
64
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
DRS Flows – summary
Flow Macro Used Synthesis Partial
Bitstream Generation method Compatibility
Dyer 1 yes simple manual Frame extraction Virtex I
Dyer 2 no JBits 2 JBits 2 native Virtex I
Dyer 3 yes/no simple JBits 2 Parameters change Virtex I
Dyer 4 yes/no simple JBits 2 Frame extraction Virtex I
GAPH 1 yes simple CoreUnifier Frame extraction Virtex I
Horta yes simple ParBit Frame extraction Virtex I
Modular Design yes advanced BitGen Native Virtex I, II, II Pro
Huebner yes simple JBits 3 Frame extraction Virtex II
GAPH 2 yes simple CoreUnifier-II BitGen Frame extraction Virtex II, II Pro
65
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Case Study – MR2 / R8
66
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Outline
Ø Introduction - Defs, History and Models
Ø Reconfigurable Systems - Past Ø Previous Design Flows for DRS
Ø Reconfigurable Systems - Present Ø Current Xilinx Design Flow
Ø Reconfigurable Systems - Future?
67
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br • A Configuration is a complete FPGA design
• Consists of Static Logic and one variant for each reconfigurable instance • Maximum number of RMs for any RP determines minimum number of
Configurations required • Example: Possible Configurations for this design
1. Static + A1 + B1 + C1 2. Static + A2 + B2 + C2 3. Static + A3 + B2 + C3 4. Static + A3 + B2 + C4
• Static Logic and repeated RMs are imported
• Any combination of RMs can be selected to create unique full bit files
Configurations
RP “A”
Static
RP “B”
RP “C”
A1 A2
A3
B1 B2
C1 C2
C3 C4
Reconfigurable Modules
68
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Reconfigurable Elements (Xilinx) • What is reconfigurable?
• Most of an FPGA – Slice logic (LUTs, flip-flops, and carry logic, for example) – Memories (block RAM, distributed RAM, shift register LUTs) – DSP blocks – I/O components (IOLOGIC, IODELAY, IDELAYCTRL)
• Some logic must remain in static logic • Clock-modifying blocks (MMCM, DCM, PLL, PMCD) • Global clock buffers (BUFG) • Device feature blocks (BSCAN, ICAP, STARTUP, or PCIE, for
example)
69
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Reconfigurable Elements (Xilinx) • Granularity of reconfigurable regions vary by
device family • Boundaries recommended, but not required, to align to Clock
Regions • Virtex-6 examples
– Slice region: 40 CLB high by 1 CLB wide – BRAM region: 8 RAMB36 – DSP region: 16 DSP48 – IOB region: 80 IOB (one bank)
• Virtex-5 examples – Slice region: 20 CLB high by 1 CLB wide – BRAM region: 4 RAMB36 – DSP region: 8 DSP48 – IOB region: 40 IOB (one bank)
• Virtex-4 examples – Slice region: 16 CLB high by 1 CLB wide – BRAM region: 4 RAMB16 and 4 FIFO16 – DSP region: 8 DSP48 – IOB region: 32 IOB (one bank)
• Bit file sizes for each of these resource types will vary
70
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Power Reduction Techs. with PR • Board space and resources are limited
• Multi-chip solutions consume extra area, cost, and power
• Many techniques can be employed to reduce power • Swap out high-power functions for low-power functions
when maximum performance is not required • Swap out black boxes for inactive regions • Swap high-power I/O standards for lower-power I/O when
specific characteristics are not needed • Time-multiplexing functions will reduce power by
reducing amount of configured logic
71
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Outline
Ø Introduction - Defs, History and Models
Ø Reconfigurable Systems - Past Ø Previous Design Flows for DRS
Ø Reconfigurable Systems - Present Ø Current Xilinx Design Flow
Ø Reconfigurable Systems - Future?
72
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
PR Design Flow
73
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
DF: 1. Set Up Design Structure • Bottom-up synthesis creates netlists
for static and reconfigurable logic • Any synthesis tool can be used
• Create PlanAhead tool PR project • Import static logic and constraints
• Define partitions and set as reconfigurable
• Import netlists as Reconfigurable Modules for each partition
• Set RMs active to build different configurations
74
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
DF: 2. Constrain RPs and DRC • Floorplan partition regions by
creating Pblock rectangles • Uses AREA_GROUP constraints to
assign range • These declare what will be
reconfigured • Create timing constraints
• Requirements should consider the entire design
• Budget timing on both sides of partition boundary (if needed)
• Run DRCs in the PlanAhead tool • Specific sets of rules checked for partitions and PR
75
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
DF: 3. P&R Configurations • Uses existing Run command
• Give unique name to each Configuration
– Select the RMs desired for each RP
• Allows multiple runs to be created for same configuration for exploration
• Strategy: Implement most difficult configuration first
• Once the largest or most timing-critical RMs are resolved, the other scenarios should be easier to manage
• Promote “golden” versions of each RM and static logic
• Import these for subsequent configurations
76
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
DF: 4. Create Bit Files • First, verify that all PR design rules have been
followed • PR_Verify will check validity of selected Configurations
• Use Bitstream Generation command in PlanAhead • Will run bitgen on all selected Configurations
– Generates full and partial bit files for each run in implementation directory
• Can be launched for any design run created for any configuration
• Normal simulation and timing analysis can be performed on any Configuration
• A Configuration is a complete FPGA design • Build any Configuration through Place & Route to
simulate that combination of active Reconfigurable Modules
77
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Configuration Details • Partial bit files are processed just like full bit files
• Bit file sizes will vary depending on region size and resource type
• Contain just address & data, sync & desync words, final CRC value
– No startup sequences, DONE flag
• Partial Reconfiguration time depends on two factors: 1. Configuration bandwidth
2. Partial bit file size – Estimate in PlanAhead, confirm in Rawbit file
Configuration Mode Max Clock Rate Data Width Max Bandwidth SelectMap / ICAP 100 MHz 32-bit 3.2 Gbps
Serial Mode 100 MHz 1-bit 100 Mbps JTAG 66 MHz 1-bit 66 Mbps
78
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Outline
Ø Introduction - Defs, History and Models
Ø Reconfigurable Systems - Past Ø Previous Design Flows for DRS
Ø Reconfigurable Systems - Present Ø Current Xilinx Design Flow
Ø Reconfigurable Systems - Future?
79
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
SoCs
• Present Design Scenario for Systems-on-Chip (SoCs)
• Energy consumption
• Latency
• Design Complexity
• Testability
• Future Scenario: Partially/Dynamically Reconfigurable SoCs
• Flexibility à new HW and SW can be added to the system
• Area à dynamically reconfigurable HW
• Energy consumption à only under use logic remains in HW
• Performance à higher utilization of HW rather than SW
• Reuse à not only SW can reused, HW too!
• Management à an intelligent OS must embed HW scheduling control
80
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Summary • PR enables
• System flexibility • Size and cost reduction • Power reduction
• Modern PR flows have four basic steps 1. Set up the design structure 2. Constrain RPs and run DRCs 3. Place & Route configurations 4. Create bit files
• For more info see recent conferences • FPL, FPGA, ARC, FCCM and of course, SPL!!
81
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
“Normal” Configuration
Check Sum Config. Data Header
FPGA
Start
Vcc Rise Vcc
Stable Power-on
Reset Configure
FPGA
Configuration Bitstream
User
Mode
82
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
‘Typical’ Configuration Mode
• Fixed configuration • Data loads from PROM
or other source at power on
• Configuration fixed until the end of the FPGA duty cycle
• Used extensively during traditional design flow
• Evaluate functionality of design as it is developed
Func
tion
Power On
Shut Down Time
Configuration Overhead
Device Duty-cycle
83
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Reconfiguration • Configuration memory is no
longer fixed during the system duty cycle
• Initial bitstream loaded at power-on
• Different, full device bitstreams loaded over time
Func
tion
Configuration Overhead
Reconfiguration Overhead
Power On
Shut Down Time
84
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Partial Configuration
Config. Data
Start
Vcc Rise
Partial Configuration Bitstream
Initial Config.
Complete Load Partial
Bitstream
FPGA
User
Mode User
Mode
85
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Partial Reconfiguration • Only a subset of configuration
data is altered
• But all computation halts while modification is in progress…
• Main benefit: reduced configuration overhead
Func
tion
Configuration Overhead
Reconfiguration Overhead
Power On
Shut Down Time
86
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
Dynamic Reconfiguration
• A subset of the configuration data changes…
• But logic layer continues operating while configuration layer is modified…
• Configuration overhead limited to circuit that is changing…
Func
tion
Configuration Overhead
Reconfiguration Overhead
Power On
Shut Down Time
87
{ney
.cal
azns
, fer
nand
o.m
orae
s}@
inf.p
ucrs
.br
How Can We Reconfigure?
• Initiation of reconfiguration is determined by the designer • On-chip state machine, processor or other logic • Off-chip microprocessor or other controller
• Delivery of the partial bit file uses standard interfaces • FPGA can be partially reconfigured through the SelectMap,
Serial or JTAG configuration ports, or the Internal Configuration Access Port
• Logic decoupling should be synchronized with the initiation and completion of partial reconfiguration
• Enable registers • Issue local reset