144_c4 / mapld04swift and roosta1 tradeoffs in flight-design upset mitigation in state-of-the-art...
TRANSCRIPT
Swift and Roosta 1 144_C4 / MAPLD04
Tradeoffs in Flight-Design Upset Mitigation in State-of-the-Art FPGAs
Hardened By Designvs.
Design-Level Hardening
Gary M. Swift and Ramin RoostaJet Propulsion Laboratory / California Institute of Technology
The research done in this paper was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration (NASA) and was partially sponsored by the NASA Electronic Parts and Packaging Program.
Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by the United States Government or the Jet Propulsion Laboratory, California Institute of Technology.
Swift and Roosta 2 144_C4 / MAPLD04
In the beginning was Actel …
• Leveraging from a commercial product line ONO anti-fuse based one-time programmable
(OTP)
• “beginning” = 1993 Reference:
Katz, R.; Barto, R.; McKerracher, P.; Carkhuff, B.; Koga, R.; “SEU hardening of field programmable gate arrays (FPGAs) for space applications and device characterization,” IEEE Transactions on Nuclear Science, Dec. 1994
Swift and Roosta 3 144_C4 / MAPLD04
Later, Xilinx
Leveraging from a commercial product line SRAM based reconfigurable
“later” = 1998 Reference: Guertin, S.M.; Swift, G.M.; Nguyen, D.; “Single-event
upset test results for the Xilinx XQ1701L PROM”, Radiation Effects Data Workshop Record, 1999
Quote:(Xilinx SRAM-based FPGAs)… “do appear suited to a broad
range of other (non-critical) applications, such as sensor and camera controllers.”
Swift and Roosta 4 144_C4 / MAPLD04
OUTLINE
• FPGAs: A key enabling technology for modern spacecraft
• Background in radiation testing of FPGAs▫ Earlier, Katz/Swift collaboration▫ Recently, Xilinx Consortium
• Feature Comparison
• Triple Modular Redundancy (TMR) - hardware approach vs. software approach
• Concluding Remarks
Swift and Roosta 5 144_C4 / MAPLD04
FPGAs: A key “enabling technology”
Like custom ASICs, FPGAs can replace whole boards Saving mass, volume, power Achieving extra functionality
FPGAs are much cheaper than ASICs Design efforts can be later in the schedule Design mistakes don’t require a re-spin through the
foundry
Swift and Roosta 6 144_C4 / MAPLD04
MER Pyro-Controller
Used self-checking of configuration to initiate a re-configuration after spotting an upset
Swift and Roosta 7 144_C4 / MAPLD04
MER Pyro-Controller
Nearing Mars Xilinx XQR4062XL
0
5
10
15
20
25
30
0 50 100 150Days after Launch
# of
Ups
ets
predicted
MER-A
MER-B Nov. 23MER-B
Nov. 23MER-A
Oct. 28MER-A
Oct. 28MER-B
Swift and Roosta 8 144_C4 / MAPLD04
My Background
• Actel experience is older No direct involvement in radiation tests since the
ONO anti-fuse was replaced Results here are from others’ work
• Xilinx experience is recent Active participant in Xilinx Rad Test Consortium Currently, finishing two+ year test campaign
targeting the Virtex II family
Swift and Roosta 9 144_C4 / MAPLD04
Currently Available Devices
Actel RT54SX-S family vs. Xilinx Virtex II family
(-SU)
Note: both are essentially immune to single-event latchup
and have good total ionizing dose tolerance,
[ Actel > 135 krad(Si); Xilinx > 200 krad(Si) ]
Swift and Roosta 10 144_C4 / MAPLD04
Main Feature Comparison
Actel Xilinx
RT54SX72S XQR2V6000
Gates: 72,000 ~6M ( /~3.2 )
flip/flops: 2012 67,584 / 3.2 = ~20k
I/O Pins: 360 824 / 3 = 274
Speed external : 230 MHz 622 Mb/s (I-mode LVDS)
Speed internal : 310 MHz 360 MHz
Swift and Roosta 11 144_C4 / MAPLD04
Extra Features Comparison
Actel Xilinx
RT54SX72S XQR2V6000
Block RAM: no 2.5 Mb
I/O Standards: many many
Others: hardwired TMR Clock Manager
Multipliers
Swift and Roosta 12 144_C4 / MAPLD04
Actel: What bits can upset?
User flip-flops only Direct hits of same flip/flop in multiple domains
▫ Very unlikely due to layout
Clock domain hits
SEFI modes essentially eliminated
Swift and Roosta 13 144_C4 / MAPLD04
Xilinx: What bits can upset?
• Configuration Bits Logical
Function Routing User Options
• Block RAM
• User Flip-flops
• Control Registers
× Type of I/O× Mode of Block RAM Access× Clock Manager× etc…
× NAND× Ex-OR× Flip-Flop type× etc…
Swift and Roosta 14 144_C4 / MAPLD04
Xilinx: Heavy Ion Test Results
Resulting in fairly low in-space rates:
~6 per day for 2V6000 in GCRmin.
1.E-11
1.E-10
1.E-09
1.E-08
1.E-07
0 10 20 30 40 50 60 70
LET (MeV-cm2/mg)
Cro
ss S
ect
ion
pe
r B
it (c
m2 )
X-2V1000 configuration bitsWeibull Curve Fit
Low Threshold(soft)
Low Susceptibility(hard)
Swift and Roosta 15 144_C4 / MAPLD04
Actel: Heavy Ion Test Results
Very low in-space rates (assume LETth > 40 achieved):
~1 per 6800 years for SX72-S in GCRmin.
1.E-10
1.E-09
0 20 40 60 80 100 120
LET (MeV-cm2/mg)
Cro
ss S
ecti
on
(cm
2 )
307
315
10-9
10-101.E-10
1.E-09
0 20 40 60 80 100 120
LET (MeV-cm2/mg)
Cro
ss S
ecti
on
(cm
2 )
307
315
10-9
10-10
Where’s Threshold???
Low Susceptibility(~100x harder)
Data for twoRTAX2000Sprototypesat 1 MHz usingcheckerboardpattern
from Fig. 12,JJ Wang et al., NSREC 2003[Ref. 1]
Swift and Roosta 16 144_C4 / MAPLD04
Actel-style TMR
SX-A “R” cell
triplicates to:
RTSX-S
“R” cell
Swift and Roosta 17 144_C4 / MAPLD04
Actel-style TMR
Actel-style TMR is fairly straightforward:
Each flip-flop is replaced by three plus feedback voter
Triplicated elements spread out physically
Uses one clock/inverse-clock domain
No external parts needed
Swift and Roosta 18 144_C4 / MAPLD04
Xilinx-style TMR
Xilinx-style TMR is more complicated: First, it’s not too useful without
configuration scrubbing Whole functional blocks are triplicated,
not individual flip-flops Three voters are used Three clock domains Elimination of:
▫ Weak keepers (aka half latches)
▫ Use of configuration cells as part of the design- For example, SRL16
Needs some external circuitry
(at least, a watchdog timer + PROMs)
Swift and Roosta 19 144_C4 / MAPLD04
Xilinx-style TMR
Swift and Roosta 20 144_C4 / MAPLD04
Xilinx-style TMR
In Xilinx-style TMR, I/O’s use three pins tied externally :
Minority Voter
P
Minority Voter
P
Minority Voter
P
D0
D1
D2
D
Pins
Board Traces
Swift and Roosta 21 144_C4 / MAPLD04
Xilinx TMRtool
• Xilinx-style TMR done by hand is difficult and tedious
• An automated tool which integrates into the design flow has been developed (“now” available)
• In-beam testing shows tool is very effective
Design Entry
NGC
XILINX ImplementationTranslate, Map, Floorplan, Par,
BitGen
XTMR
EDIF
NCD
BIT
XILINX
Back-AnnotationTiming, ncd2edif,
ncd2vhdl, ncd2verilog
Simulation
FPGA
NGO
EDIF TMR
Design Entry
NGCNGC
XILINX ImplementationTranslate, Map, Floorplan, Par,
BitGen
XTMRXTMR
EDIFEDIF
NCD
BIT
XILINX
Back-AnnotationTiming, ncd2edif,
ncd2vhdl, ncd2verilog
Simulation
FPGA
NGONGO
EDIF TMR
Swift and Roosta 22 144_C4 / MAPLD04
Upset Comparison• ATMR now has eliminated:
Upsets of static storage elements, and SEFIs
• ATMR upsets from: Transients that are clocked into storage Clock tree hits
• Xilinx FPGAs have a small susceptibility to two types of SEFIs Reset (sometimes only partial) Disable scrub port
• XTMR in combination with scrubbing can lower system upset rates below the SEFI rate
Swift and Roosta 23 144_C4 / MAPLD04
Rate Comparison
GCR = Galactic Cosmic Ray background (interplanetary space)
almost identical to geosynchronous orbit
• Actel• Dominated by transients
• Roughly one system error per thousand years (GCRmin)
• Xilinx• Dominated by SEFI rate
• Expect one SEFI per ~65 years in GCRmin
• Expect one system error ~5-20x less often
Swift and Roosta 24 144_C4 / MAPLD04
CONCLUSIONSFor the present –
Both can achieve very acceptable radiation tolerance
Actel wins on:▫ Less burden on the designer▫ No auxiliary components▫ Lower SEFI susceptibility
Xilinx wins on:▫ Designer control of the resources vs. hardness tradeoff▫ On-chip feature set▫ Re-configurability
Competition is good.
Swift and Roosta 25 144_C4 / MAPLD04
AcronymsFPGA - Field Programmable Gate Array ASIC - Application Specific Integrated CircuitSEU - Single Event UpsetSEFI - Single Event Functionality InterruptTMR - Triple Modular RedundancyATMR - Actel-style TMRXTMR - Xilinx-style TMRLET - Linear Energy Transfer (proportional to deposited
charge per micron for a heavy ion strike on an active node)
GCRmin - Galactic Cosmic Ray background (highest during “solar minimum” period of ~11-yr cycle of
sunspots)MER - Mars Exploration Rovers
(i.e., Spirit and Opportunity)
Swift and Roosta 26 144_C4 / MAPLD04
Additional References
[1] J.J. Wang, W. Wong, S. Wolday, B. Cronquist, J. McCollum, R. Katz, I. Kleyner, “Single event upset and hardening in 0.15 antifuse-based field programmable gate array,” IEEE Transactions on Nuclear Science, Dec. 2003
[2] Jih-Jong Wang, R.B. Katz, F. Dhaoui, J.L. McCollum, W. Wong, B.E. Cronquist, R.T. Lambertson, E. Hamdy, I. Kleyner, W. Parker, “Clock buffer circuit soft errors in antifuse-based field programmable gate arrays,” IEEE Transactions on Nuclear Science, Dec. 2000
[3] R. Katz, J.J. Wang, R. Koga, K.A. LaBel, J. McCollum, R. Brown, R.A. Reed, B. Cronquist, S. Crain, T. Scott, W. Paolini, B. Sin, “Current radiation issues for programmable elements and devices,” IEEE Transactions on Nuclear Science, Dec. 1998