sana rezgui 1, jeffrey george 2, gary swift 3, kevin somervill 4, carl carmichael 1 and gregory...
TRANSCRIPT
Sana Rezgui1, Jeffrey George2, Gary Swift3, Kevin Somervill4, Carl Carmichael1 and Gregory Allen3,
SEU Mitigation of a Soft Embedded Processor in the Virtex-II FPGAs
1Xilinx, Inc., San Jose, CA
2The Aerospace Corporation, El Segundo, CA
3Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA
4NASA Langley, Hampton, VA
For the North American Xilinx Test Consortium
Rezgui 2 MAPLD 2005/E238
Objective• Use of embedded system applications built on S-FPGAs in radiation environment => Mitigation to SEUs and Design Implementation
• Mitigated Design Performances― Simplicity, flexibility and automation― Area and timing performances
• Upset Sensitivity in Radiation Environment― Characterization of the FPGA sensitivity in beam― Evaluation of the proposed mitigation solution for the embedded design
Measure the in-beam performance of upset mitigation technique applied to a complex design - a processor- implemented on FPGA
running a computationally intensive benchmark program
Rezgui 3 MAPLD 2005/E238
Studied Case
Mitigation to SEUs of the Xilinx soft IP processor MicroBlaze by means of the Triple Modular Redundancy (TMR) technique
Configuration
Logic Block
(CLB)
Block RAM
18 bit Multipliers
Programmable I/OsDigital Clock Manager
MicroBlaze
Rezgui 5 MAPLD 2005/E238
MicroBlaze Mitigation
1. Use TMR technique to mitigate the design to SEUs• MicroBlaze designs consist of I/Os, Look-Up Tables (LUT), Flip-
Flops (FF) and user memory elements,• For TMR Tool (developed by Xilinx), MicroBlaze is no different
than any other design.
2. Run Active Readback and Continuous Scrubbing of all the static used resources for error detection and correction
• This is transparent and independent to/from the running design,• User memory elements can not be scrubbed from the
configuration port.
Rezgui 6 MAPLD 2005/E238
Internal Architecture
User memory elements: SRL16s, Distributed Memory (LUT-RAM), BRAMs• Active Readback causes problems with user memory elements (dynamic content)• BRAM static partial reconfiguration is not possible if storing program data in addition to the code
LUT-RAMs
SRL16s
SRL16s
BRAM BRAM
Rezgui 7 MAPLD 2005/E238
User Memory Mitigation
• Error Detection and Correction (EDAC)― Additional decoding logic would be required― Depends on the speed of detection and correction of upsets
• Replacement of the user memory elements by FFs and LUTs― SRL16 are automatically replaced by FFs and LUTs by the TMR Tool ― Distributed RAM (LUT-RAM) are not set to be automatically replaced:
A custom macro is then required for their replacement by FFs and LUTs
• Triple Modular Redundancy and Self-Correction of the BRAMs― Done automatically through the TMR Tool by replacing each BRAM by
a custom macro that scrubs the BRAM itself
• EDAC and TMR can be defeated by error accumulation
Rezgui 8 MAPLD 2005/E238
BRAM Mitigation Methodology
1. Apply TMR on the used BRAMs2. Insert an internal scrub controller of the
3 BRAMs by their voted output value• Mitigation Requirement: Only one
BRAM port could be used for the MicroBlaze design
• Each Block RAM is replaced with the tmred BRAMs and the internal BRAM scrubber controller
Rezgui 9 MAPLD 2005/E238
EDK / TMR Tool Design Flow
System Design
Implementation TMR Tool
NGDBuild
MAP
PAR
BitGen / BitInit
Design EntryEDK/ISE
XTMR ConversionTMR Tool
ImplementationISE
.ngc
.bmm
.elf
.edf
(Manual edit).ucf
.ngo
LUTRAM & BRAM Macro Replacement
Rezgui 10 MAPLD 2005/E238
Implementation and Performance (1)Virtex II- 6000 Used Internal Resources
0
10
20
30
40
50
60
70
80
90
100
Design Type
%V
irte
x I
I 6
00
0 U
se
d R
es
ou
rce
s
FFs
LUTsGCLK
IOsMULTs
BRAMs
Sing
le S
tring
Micr
oBla
ze
Mitig
ated
Mbl
aze
desig
n wi
th L
UT-R
AM
Mitig
ated
Mbl
aze
desig
n wi
thou
t LUT
-RAM
s
Full M
itigat
ed D
esig
n
Rezgui 11 MAPLD 2005/E238
Implementation and Performance (2)Timing Performances and Core Voltage Current Consumption
Tested Design Maximum Frequency
(MHz)
Current Consumption
(A)
Single-string Mblaze (Phase 1) 77 0.37
Mitigated Mblaze design before Replacement of LUT-RAM (Phase 2)
66 0.78
Mitigated Mblaze design after Replacement of LUT-RAM (Phase 3)
66 0.83
Full Mitigated Design (Phase 4) 66 0.99
Rezgui 12 MAPLD 2005/E238
Experimental Test Designs
Service FPGA: XC2V3000
1. Configuration Monitor• DUT Configuration• Continuous alternate scrubbing and
readback at a rate of 4 per second• SEFI Detection
2. Functional Monitor• Sends input vectors to DUT• Detects Errors based on the DUT outputs• Records errors and exception occurrence• Runs continuous handshaking with the
DUT to assure its full synchronization with external peripherals
DUT FPGAXQR2V6000
MicroBlaze design running• Integer-based FFT software• 33MHz MicroBlaze clock speed• 0.25 MHz GPIO Bus
Two mitigated design versions:
1. Without BRAM Scrubber
2. With BRAM Scrubber
Rezgui 13 MAPLD 2005/E238
DUT/Service FPGAs Communication
Majority VoterMajority Voter
DUTXQR2V6000
Data_In_TR016 Bits
16 Bits
Service FPGA XC2V3000
Data_In_TR0
Data_Out
Functional Interface BU
SFunctional Interface B
US
GPIO
BU
SG
PIO B
US
Data_Out_TR016 Bits
16 Bits
16 Bits
16 Bits
Data_Out_TR1
Data_Out_TR2
Clk-TR0Clk-TR1Clk-TR2
Rst-TR0Rst-TR1Rst-TR2
DVld-Out-TR0DVld-Out-TR1DVld-Out-TR2
DVld-Out-TR0DVld-Out-TR1DVld-Out-TR2
DVld-In-TR0DVld-In-TR1DVld-In-TR2
DVld-Exc-Out-TR0DVld-Exc-Out-TR1DVld-Exc-Out-TR2
DVld-Exc-In-TR0DVld-Exc-In-TR1DVld-Exc-In-TR2
DVld-Exc-Out-TR0DVld-Exc-Out-TR1DVld-Exc-Out-TR2
Data_In_TR1
Data_In_TR2
Data_In_TR1
Data_In_TR2
Majority VoterMajority VoterDVld-In
DVld-Exc-In
TMRed MicroBlazeTMRed MicroBlaze
Functional Monitor
Functional Monitor
DUT Configuration Monitor
- Configuration- Readback (SEU Counting)- Scrubbing- SEFI Detection
SelectMap PortSelectMap Port
Handshaking
Exception Detection
Data Transfer
Rezgui 14 MAPLD 2005/E238
Experimental Setup
Tested at Crocker Nuclear Laboratory at UC Davis using 63.3MeV Proton Beam
DUTService FPGA
Rezgui 15 MAPLD 2005/E238
Proton Beam Results (1)
• Error Classification― Type 1: FFT program calculates an incorrect result― Type 2: MicroBlaze communication sequence is wrong or stops (timeout)― Type 3: An exception or interrupt is invoked
• Error Recovery Types― The MicroBlaze recovers the next iteration of the program― The MicroBlaze recovers when the processor was reset― The MicroBlaze recovers after scrubbing the FPGA logic
• Non-Recovery Types (Type -R)― Runaway Resets: Upsets in the MicroBlaze code (stored in the BRAM) in at
least two domains― Runaway Exceptions: Illegal operation on the MicroBlaze detected by the
exception Handler (DUT/Service) ― Runaway Errors: Illegal code in the FFT computation code
Rezgui 16 MAPLD 2005/E238
Proton-Induced Cross Sections of the Design 1 at Various Fluxes
Flux
[p/cm2/s]
CLB Upsets / Scrub Cycle
Fluence
[p/cm2]
Type 1 Error Cross-Section
[cm2]
Type 1R Error Cross-Section
[cm2]
Type 2 Error Cross-Section
[cm2]
Type 2R Error Cross-Section
[cm2]
Type 3 Error Cross-Section
[cm2]
(1) 1.70 x107 2 to 7 1.00 x1011 7.00x10-11 <1.00x10-11 5.00x10-11 <1.00x10-11 <1.00x10-11
(2) 1.70 x108 15 to 30 1.03 x1011 2.92x10-10 9.74x10-12 2.05x10-10 6.82x10-11 <9.70x10-12
(3) 1.70 x109 150 to 190 4.86 x1010 1.07x10-9 <2.05x10-11 7.82x10-10 1.65x10-10 3.60x10-11
Flux
[p/cm2/s]
CLB Upsets /
Scrub Cycle
Fluence
[p/cm2]
Type 1 Error Cross-Section
[cm2]
Type 1R Error Cross-Section
[cm2]
Type 2 Error Cross-Section
[cm2]
Type 2R Error Cross-Section
[cm2]
Type 3 Error Cross-Section
[cm2]
(1) 1.94 x107 2 to 7 9.79 x1010 7.56 x 10-10 2.04 x 10-11 6.34 x 10-10 1.43 x 10-10 8.17 x 10-11
(2) 3.87 x107 4 to 15 2.49 x1010 8.44 x 10-10 < 4.02 x 10-11 6.03 x 10-10 2.01 x 10-10 1.61 x10-10
Proton-Induced Cross Sections of the Design 2 at Various Fluxes
Proton Beam Results (2)
Rezgui 17 MAPLD 2005/E238
Conclusion
• A complete solution to mitigate an embedded processor
implemented on a Xilinx Virtex II FPGA based on:
― Continuous external configuration scrubbing,
― Functional-block design triplication,
― Independent internal BRAM scrubbing (also triplicated).
• A high area and power dissipation penalties after replacement
of the distributed RAMs
• At Low flux: Very low error cross-section (1.2x10-10 cm2)
• The error cross-section increase rapidly with increasing flux
• For space environment, it is predicted that the error rate of a
MicroBlaze design should be lower than a SEFI rate, which
prove the high efficacy of this solution
Rezgui 18 MAPLD 2005/E238
Learned Lessons
• Check if your design includes SRL16s or distributed RAMs to allow active scrubbing
• Do the SMOKE test: Break one domain and insure that the design is still running
• Reduce the flux to respect the first rule of TMR mitigation technique (1 upset / scrub cycle)
Rezgui 19 MAPLD 2005/E238
References1. Lima, F., Carmichael, C., Fabula, J., Padovani, R. and Reis, R., "A Fault Injection
Analysis of Virtex® FPGA TMR Design Methodology", RADECS’01, September 2001.
2. Lima (de) F., Rezgui S., Cota E.F., Lubaszewski M. and Velazco R., “Designing and testing a radiation hardened 8051-like micro-controller”, MAPLD’00, Laurel, Maryland, September 2000.
3. Swift G., Rezgui S., George J., Carmichael C., Napier M., Maksymowicz J., Moore J., Lesea A., Koga R. and Wrobel T., “Dynamic Testing of Xilinx Virtex-II Field Programmable Gate Array’s (FPGA’s) Input Output Blocks (IOBs)”, NSREC’04, July 2004.
4. Carmichael C., Bridgford B. and Moore J., “Triple Module Redundancy Scheme for Static Latch-Based FPGAs”, MAPLD 2004, Laurel, Maryland, September 2004.
5. Carmichael C., “Triple Module Redundancy Design Techniques for Virtex FPGAs”, http://www.xilinx.com/bvdocs/appnotes/xapp197.pdf, Xilinx Application Note XAPP197, November 2001.
6. MicroBlaze Processor Reference User Guide, Embedded Development Kit (EDK 6.3), UG081, Version 4.0, Xilinx Inc., August 2004.
7. Roberts T., Slaney M., FFT C Code available at http://www.jjj.de/fft/int_fft.c, December 1994.
8. TMR Tool User Guide, UG156, Version 6.2.3, http://support.xilinx.com/products/milaero/ug156.pdf, Xilinx Inc., September 2004.
9. Xilin Application Note 197, “Triple Module Redundancy Design Techniques for Virtex FPGAs”, November 2001.