on using bs to improve the
DESCRIPTION
Talk delivered to PhD students at the Tallinn Technical University in May 2009TRANSCRIPT
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 1 / 32
Tallinn Technical University :: May 4th 2009This presentation is available at http://www.slideshare.net/josemmf
Tallinn Technical University :: May 5th 2009This presentation is available at http://www.slideshare.net/josemmf
On using BS to improve thereliability and availability of reconfigurable hardware
J. M. Martins Ferreira [ [email protected] ]
FEUP / DEEC - Rua Dr. Roberto Frias
4200-537 Porto - PORTUGAL
M. G. Gericota, G. R. Alves, M. Silva, J. M. Ferreira, “Reliability and Avaliability in Reconfigurable Computing: A Basis for a Common Solution,” IEEE Transactions on VLSI Systems, Vol. 16, No. 11, pp. 1545-1558 , Nov. 2008.
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 2 / 32
Outline of this talk
1. Introduction
2. Concurrent replication of active CLBs
3. On-line structural concurrent test (better reliability)
4. Defragmentation (better availability)
5. Conclusion
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 3 / 32
• Motivation
• Causes of failure in FPGAs
Introduction
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 4 / 32
Motivation: An old problem becomes more important• Dynamically reconfigurable
FPGAs:– Production tests cannot
guarantee fault-free operation– Application areas include
mission-critical systems– The cost / benefit of spatial
redundancy is different from static implementations
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 5 / 32
Motivation: An old problem becomes more important
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 6 / 32
Causes of failure in FPGAs
• Post-production failure modes may be permanent or temporary ― examples:– Electromigration phenomena may lead to
permanent physical damage– Single-event upsets (SEUs) may cause
permanent malfunction if not mitigated (modification of SRAM contents changes design and data information)
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 7 / 32
• The principle
• How it works
• Resources required (time, space)
Concurrent replication of active CLBs
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 8 / 32
Concurrent replication of CLBs: The principle
functional blockin another area,(non-intrusively),and making theoriginal resourcesavailable for test
Rotation
Test
Relocation
D Q
Replication of functionality
D Q
Rotation of free resources
D Q
Resources under test
• The basic idea underlying release-to-test strategies consists of replicating a given
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 9 / 32
Concurrent replication of CLBs: The principle• Concurrent fault detection based on
release-to-test approaches must provide functional and state replication
• Replication at CLB-level – Facilitates state transfer and requires
a minimal amount of spare resources– The relative position of the replicated CLB and
its replica has an impact on propagation delay
CLB
IOB
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 10 / 32
Concurrent replication of CLBs: How it works• General replication principle – phase one:
– Copy the internal configuration of the replicated CLB into the replica CLB and place the inputs of both CLBs in parallel
replicated CLB
CLBreplica
In
In Out
Out
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 11 / 32
Concurrent replication of CLBs: How it works• General replication principle – phase two:
– Place the outputs of both CLBs in parallel (the replicated CLB may then be disconnected and made available for testing)
replicated CLB
CLBreplica
In
In Out
Outreplicated CLB
CLBreplica
In
In Out
Out
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 12 / 32
Concurrent replication of CLBs: Replication aid block• Supports state transfer in synchronous gated-
clock circuits
FF_OUT
CC D Q
D Q
CE
R
01
BY_C
Logic
D Q
CE
R
01
Logic
10
RESETCLK
CE
LOGIC_OUT
Replication aid block
Replica cell
Replicated cell
from the circuit
to the circuit
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 13 / 32
Replication flow:Time & space needed
Copy the internal logic functionality and place the
input signals in parallel
BY_C="1"CC="1"
CC="0"
Connect the clock enable inputs of both CLBs
Disconnect all the auxiliary relocation circuit signals
Place the CLB outputs in parallel
Disconnect the original CLB outputs
> 2 CLK pulseN
Y
>1CLK pulseN
Y
BY_C="0"
Disconnect the original CLB inputs
StepsNo. of bytes
Time (ms)
Copy the internal logic functionality and place the input signals in parallel
11 289 9,705
BY_C=1 & CC=1 441 0,379
CC=0 277 0,238
BY_C=0 277 0,238
Connect the clock enable inputs of both CLBs 2 145 1,844
Disconnect all the auxiliary relocation circuit signals
2 217 1,906
Place the CLB outputs in parallel 4 129 3,550
Disconnect the original CLB outputs 1 333 1,146
Disconnect the original CLB inputs 3 986 3,438
Total 26 094 22,444
1
2
3
4
5
6
7
8
9
1
2
3 4 5
6
7 8 9
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 14 / 32
• Fault model, test configurations
• Test application
• Rotation and release for test strategy
• Fault detection latency
On-line structural concurrent test
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 15 / 32
Fault model and test configurations• A hybrid fault model (stuck-at / functional)
was adopted and the two CLB slices (each with 13 inputs and 6 outputs) are tested in parallel Number of
configurationsNumber of
test vectorsNo. of bytes
Time (ms)
1st 16 18 392 15,813
2nd 16 3 115 2,678
3rd 2 623 0,536
4th 2 634 0,545
5th 2 613 0,527
6th 2 512 0,440
Total 40 23 889 20,539
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 16 / 32
Test application
• CLB testing via BS:– Test vector application
is done through a 13-bit user test data register
– Response capturing takes place through unused BS cells
MUX
Bypass registerInstruction register
Config. register
TDOTDI
...CLB
under test
CLB under test
CLB under test
IN OUT
User Test Register
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 17 / 32
Rotation strategy
• Vertical rotation has an advantage in the case of arithmetic circuits that use the dedicated carry interconnection between (vertically) adjacent CLBs
• In the general case, we should consider such factors as the number of circuits with high fanout and the shape / orientation of the implementation
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 18 / 32
Replicate and release-to-test in a 24-bit counter (example)
CIN
COUTCLB_R22C7.S0
BX
YB
CIN
COUTCLB_R21C7.S0
BX
YB
CIN
COUTCLB_R23C7.S0
BX
YB
CIN
COUTCLB_R24C7.S0
BX
YB
Dedicatedcarry lines
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 19 / 32
Replicate and release-to-test in a 24-bit counter (example)
0
20
40
60
80
100
120
140
160
0 1 2 3 4 5 6 7 8 9 10 11 12
Number of relocations
Max
imum
freq
uenc
y of
ope
ratio
n(M
Hz)
- verticalrotation
- horizontalrotation
CIN
COUTCLB_R22C7.S0
BX
YB
CIN
COUTCLB_R21C7.S0
BX
YB
CIN
COUTCLB_R23C7.S0
BX
YB
CIN
COUTCLB_R24C7.S0
BX
YB
U1/C6/C16/C1/O
U1/C6/C14/C1/O
Tbxcy
Tbyp
Tbyp
U1/C6/C12/C1/O
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 20 / 32
Rotation strategy: ITC’99 benchmarks
Circuit Logic Carry logic
Reference # PI # PO # gates # FF Lines Segments
B01 2+2 2 47 5 0 0
B02 1+2 1 29 4 0 0
B03 4+2 4 150 30 0 0
B04 11+2 8 606 66 4 14
B05 1+2 36 977 34 4 16
B06 2+2 6 61 9 0 0
B07 1+2 8 422 49 2 6
B08 9+2 4 168 21 0 0
B09 1+2 1 160 28 0 0
B10 11+2 6 190 17 0 0
B11 7+2 6 484 31 1 4
B12 5+2 6 1037 121 0 0
B13 10+2 10 343 53 1 4
B14 32+2 54 4787 245 11 150
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 21 / 32
Rotation strategy: ∆f and size for the ITC’99 circuits
Ref.
Maximum ∆f (%)Size of the
reconfiguration files (bytes) Ratio size of the
reconf. files by CLB (%)
(horizontal>vertical)Vertical Horizontal Vertical Horizontal
B01 -5,5 0,0 48 350 56 102 16,0
B02 0,0 0,0 7 016 10 623 51,4
B03 -1,9 -4,9 120 705 138 484 14,7
B04 -6,1 -29,3 548 595 665 419 21,3
B05 -17,3 -36,9 1 130 985 1 286 031 13,7
B06 -2,7 0,0 45 291 53 503 18,1
B07 -23,6 -37,8 354 367 425 214 20,0
B08 -5,8 -5,8 150 093 178 339 18,8
B09 -1,8 -4,9 112 107 129 855 15,8
B10 -7,5 -7,6 195 571 245 455 25,5
B11 -10,5 -36,0 500 261 614 093 22,8
B12 0,0 -1,2 1 275 804 1 631 953 27,9
B13 -4,3 -42,8 258 827 332 954 28,6
B14 -13,5 -47,8 5 195 444 6 070 485 16,8
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 22 / 32
Fault detection latency
• The duration of a complete rotation cycle depends on the device size and on the reconfiguration and test times
• The fault detection latency alternates between a minimum and a maximum value, according to the rotation direction:
– MAXFDL = [(#CLBROWS x #CLBCOLS)-1] x 2 x
(ΔRECONF+ΔTEST)
– MINFDL = 2 x (ΔRECONF+ΔTEST)
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 23 / 32
Fault detection latency
Synchronous circuits with clock enable [With the replication aid circuit]
# of bytes
Time (ms)20MHz TCK
Copy logic functionality and parallel input signals
11 289
9,705
BY_C=1CC=1 441 0,379
CC=0 277 0,238
BY_C=0 277 0,238
Connect the clock enable inputs of both CLBs
2145 1,844
Disconnect all the auxiliary relocation circuit signals
2217 1,906
Place the CLB outputs in parallel
4129 3,550
Disconnect the original CLB outputs
1333 1,146
Disconnect the original CLB inputs and setup test configuration
18392 15,813
Total 40500 34,820
Synchronous circuits with free-running clock and combinational circuits [Without the replication aid circuit]
# of bytes
Time (ms)20MHz TCK
Copy of the internal logic functionality and place of the input signals in parallel
12163 10,457
Place of the CLB outputs in parallel
3993 3,433
Disconnect of the original CLB outputs
1073 0,923
Disconnect of the original CLB inputs and setup test configuration
18392 15,813
Total 35621 30,625
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 24 / 32
Worst-case fault detectionlatency (XCV200)
File size and reconfiguration time of the test configurations
# of configurations
# of bytes
Time (ms)20MHz TCK
2nd 3 115 2,678
3rd 623 0,536
4th 634 0,545
5th 613 0,527
6th 512 0,440
Total 5 497 4,726
Shifting time for test vector application
# of test vectors
Length (bits)
Total (bits)
Time (ms)20MHz TCK
40 13 520 0,066
Shifting time for the test vector responses from a CLB under test
# of cells of the BS register in a XCV200
# of test vectors
Time (ms)20MHz TCK
1 022 40 4,088
Mean time for the test of a 1176 CLBs matrix
Occupation type: 25% synchronous, 50% combinational, 25% empty
43 679,188 ms @ TCK = 20 MHz
26 472,235 ms @ TCK = 33 MHz
The mean time to test the full CLB matrix is also the worst-case fault detection latency
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 25 / 32
• The importance of floor planning
• Why (de)fragmentation?
• Can concurrent replication help?
Defragmentation
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 26 / 32
Availability vs. floor planning performance• Good dynamic floor planning management
may enable the implementation of applications that in total would require more than 100% of the FPGA resources
TimeInitial configuration rt - reconfiguration interval
- data transfer between different functions
Appl. C
Appl. B
Available resource
space
Function C1
Function B1
Function A1Appl. A
Function A2
Function B2
Function C3Function C2 Function C4
Applications running in the FPGA
rt
rt
r1
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 27 / 32
Fragmentation: Why?
• The absence of faults does not guarantee acceptable availability, namely when function swapping /partial reconfiguration occurs frequently
• Insufficient contiguous resources will delay incoming functions
nth partial reconfig.
2nd partial reconfig.
1st partial reconfig.
Initial config.
Resource allocation(2-D spatial)
Time
y
x
Reconfigurations (temporal dimension)
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 28 / 32
Can concurrent replication help?• Concurrent replication of active CLBs may
be used to defragment the FPGA and minimise the implementation delay to incoming functions– Defragmentation is performed concurrently with
all running functions (no need to halt their execution)
– Coherency of the register contents is guaranteed, preserving all state information
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 29 / 32
• Summary
• Research topics
Conclusion
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 30 / 32
Summary
• Concurrent replication offers a powerful and non-intrusive solution to improve reliability and availability of reconfigurable hardware
• Paralleling CLB inputs and outputs doesn’t create any problem
• Boundary-scan provides a valuable contribution to implement an on-line concurrent structural test strategy
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 31 / 32
Research topics
• Concurrent replication of active CLBs offers a powerful tool for defragmentation purposes, but the higher-level strategy is still missing
• All aspects of the proposed solutions were validated in practice (lab experimentation), but a software tool to fully automate the reconfiguration process is still missing
J. M. Martins Ferreira - University of Porto (FEUP / DEEC)Tallinn Technical University :: May 5th 2009 32 / 32
Tallinn Technical University :: May 4th 2009This presentation is available at http://www.slideshare.net/josemmf
Tallinn Technical University :: May 5th 2009This presentation is available at http://www.slideshare.net/josemmf
On using BS to improve thereliability and availability of reconfigurable hardware
Thanks for your attention!
J. M. Martins Ferreira [ [email protected] ]