wang-110 d/mapld 2004 1 seu mitigation techniques for xilinx virtex-ii pro fpga mandy m. wang jpl...
TRANSCRIPT
Wang-110 D/MAPLD 2004
1
SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA
Mandy M. Wang
JPL R&TD Mobility Avionics
Wang-110 D/MAPLD 2004
2
Project Background
SEU Sensitive Areas and Mitigation Approaches
Design Details
Conclusion
Agenda
Wang-110 D/MAPLD 2004
3
Project Objective
Mobility Avionics project aims to develop an embedded platform for space flight instruments and systems that is scalable, configurable, and capable of withstanding low to medium radiation environments.
Wang-110 D/MAPLD 2004
4
Multi-Tiered Strategy
Not Mission Critical
NotTime Critical
EDL Controller
Micro-Mobility Controller
Science Data Processor
Image Processor
Low to Medium Radiation Tolerance is Assumed
Orbiter Command Data Handler
Robust StrategySimple StrategyMotor Control
Science Data Processor
Ground Support Equipment
Always Available Strategy
Time Critical
Mission Critical
Wang-110 D/MAPLD 2004
5
Strategies
Simple Strategy: A quick-and-dirty approach. It uses less than desirable techniques such as device reset and reconfiguration as a means of error correction. It may require an external computer for configuration check.
Robust Strategy: A refinement of the simple strategy. It uses a SEU immune FPGA as a monitoring device for the system board base on Xilinx FPGA device. As a result, no external computer is needed.
Wang-110 D/MAPLD 2004
6
SEU Sensitive Areas
PPC L1 Cache
10.8 (64%)
Registers0.46 (3%)
Configuration MEM
3.61 (22%)
Block SelectRAM1.78 (11%)
Normalized Data – based on predicted upset rates
(XC2VP20)
Xilinx Virtex-II Pro SEU sensitive areas include:
PPC405 Core registers
Configuration Memory (LUT equation and Routing)
Data path Registers
User Memory (Block or Distributed RAMs)
Wang-110 D/MAPLD 2004
7
Mitigation Approaches
Detection Indicator Mitigation Fault Injection
Processor Registers
Processor Comparison at the Coreconnect Bus
Internal FF Processor Reset Serial port
User Memory EDAC Internal FF EDAC Serial port
Configuration Memory
CRC (None) FPGA reconfiguration Serial port
Data Registers TMR (None) TMR (None)
PPC L1 Cache
10.8 (64%)
Registers0.46 (3%)
Configuration MEM
3.61 (22%)
Block SelectRAM1.78 (11%)
Wang-110 D/MAPLD 2004
8
System Design - Overview
PPC4051
PPC4052
PLBARB
PLB2OPBBridge
OPBARB
C
Crit. INTC
Non-Crit INTC
DDR SDRAM Cntl UARTs
(External Devices)
EXTMEM(128MB)OCM
BRAM(8K)
Serial PortDecoder
(Injects faultSignals)
FI
FIFI FI
FI
EDC FI
Status BRAMs(4K)
PLB BRAMs (Firmware)(32K)
EDC ControllerFI
EDC
Wang-110 D/MAPLD 2004
9
Dual-processor ComparatorPPC 405 Block 1
Cache Units
PLB Bus
MMU CPUTimers
andDebug
PPC 405 Block 2
Cache Units
MMU CPUTimers
andDebug
Arb
iter
DDR SDRAMController
C
PLB IPIF External SDRAM
Note: Yellow lines: PLB master read / write signals for D-Cache Green Lines: PLB master read signals for I-Cache
FIFI
FI FIFI
FI
FI : Fault insertion point
PC
PC : Parity Check
Off Chip Area
FI
PLB IPIF
FI
Wang-110 D/MAPLD 2004
10
Dual-Processor Voting Simulation
Wang-110 D/MAPLD 2004
11
EDAC OCM BRAMs (Read/Write)
ParityEncoder
ErrorDetection
Correction
PPC405 #1
PPC405 #2
BRAMS (8KB)
Glue Logic
ENCIN
DECOUT
ERROR
FORCE ERROR
PARITY_OUT
PARITY_IN
ENOUT
DECIN
Hamming Code [32,39] Read-modified-write to support byte enable feature Error information is stored in a separate memory space Single-bit error triggers a CPU interrupt Double-bit error triggers a CPU reset
Xilinx XAPP645
Data Out (discard parity bits)
ADDR
EN
W_EN[3:0]
CLK
32
32
32
7
7
32
32
Wang-110 D/MAPLD 2004
12
EDAC PLB BRAMs (Read Only)
ParityEncoder
ErrorDetection
Correction
BRAMS (32KB + 8 KB)
ENCIN
DECOUT
ERROR
FORCE ERROR
PARITY_OUT
PARITY_IN
ENOUT
DECIN
Hamming Code [64,72] Read-modified-write to support byte enable feature Single-bit error is stored in a separate memory space Single-bit error triggers a CPU interrupt Double-bit error triggers a device reconfiguration
Xilinx XAPP645
Data Out (discard parity bits)
ADDR
EN
W_EN
CLK
Pro
cess
or L
ocal
Bu
s
64
64
64
GlueLogic
2
2PLB BRAM Controller
64
64
8
8
PL
B
Interface
Wang-110 D/MAPLD 2004
13
EDAC DDR SDRAM
Hamming Code [64,72] Read-modified-write to support byte enable and burst of 2-words features Single error is stored in a separate memory space Single error triggers a CPU interrupt Double error triggers device reconfiguration
ParityEncoder
ErrorDetection
Correction
DDR SDRAM (128MB
+32MB)
ENCIN
DECOUT
ERROR
FORCE ERROR
PARITY_IN
DECIN
Xilinx XAPP645
ADDR
CLKPro
cess
or L
ocal
Bu
s
64
64
64
GlueLogic
2
2
32
32
CLKn
4
4
DDR SDRAM Controller
Mux
Demux
64
8PARITY_OUT
ENOUT
8
64
MuxData Out (discard parity bits)
32
PL
B interface m
odules
Wang-110 D/MAPLD 2004
14
Self Configuration Checker
ICAPController
ICAP
CRCChecker
FrameAddressMemory(BRAMS)
4 Bytes
Read BackCommands( 44 Bytes)
Virtex-II Pro
Implementation
C script
top.ll(contains frameaddress used forthe design)
Frame address data formatted for BRAMS
(BRAMS)
Digital Design
top.bit
This portion can be ported to a radiation-hardened FPGA in the case of robust strategy
Wang-110 D/MAPLD 2004
15
Self Configuration CheckerDesign Highlights
No External I/Os access required
Frame-by-frame read back required
32-bit CRC algorithm implemented. (A CRC signature is generated after device power up)
No SRL16 and Distributed SelectRAMs used in design
Wang-110 D/MAPLD 2004
16
Labview Fault Injection PanelScreenshot of fault injection emulator that interfaces with the prototype board.
Fault Injection Error Counters
Process Bus Fault Injection Buttons
ProcessorsMismatchLED Indicator
Fault location map
Program counter resets to zero when a CPU reset occurs.
ASCII CommandInput window
Wang-110 D/MAPLD 2004
17
XC2VP20 Device Utilization (without TMR)
Number of External IOBs 57 out of 564 10% Number of PPC405s 2 out of 2 100% Number of RAMB16s 30 out of 88 34% Number of SLICEs 4334 out of 9280 46% Number of BUFGMUXs 6 out of 16 37% Number of DCMs 2 out of 8 25% Number of ICAPs 1 out of 1 100% Number of JTAGPPCs 1 out of 1 100%
Wang-110 D/MAPLD 2004
18
Slice Utilization (without TMR)
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
VP20 VP40 VP100
OPB Arbiter 40 2.0%
OPB2DCR Bridge 90 4.5%
PLB BRAM Controller 163 8.2%
OCM BRAM Controller 278 14.0%
Interrupt Controller 341 17.2%
Uart Transceiver 368 18.6%
PLB2OPB Bridge 700 35.4%
PLB Arbiter 1005 50.8%
Fault Injeciton Module 93 5.4%
Configuration Checker 156 9.0%
Dual-processor comparator 178 10.3%
OCB EDAC 32-bit Module 233 13.4%
PLB EDAC 64-bit Module 467 26.9%
Hardware Status Memory Controller 606 35.0%
Note: The shaded modules can be replaced by other approach.
Wang-110 D/MAPLD 2004
19
Mitigation State Machine
1) CPU mismatch2) CPU watchdog timer3) OCM EDC double-bit error
CPUReset
SystemReset
1) OPB Bus error2) PLB Bus error
1) Configuration check fail2) PLB EDC double-bit error3) DDR SDRAM double-bit error
FPGAReconfiguration
Mitigation S
everity
1) OCM BRAM single-bit error2) PLB BRAM single-bit error3) DDR SDRAM single-bit error
CPUInterrupt
CPU reset counter == full
System reset counter == full
Normal
Wang-110 D/MAPLD 2004
20
Conclusion
Identified and categorized error prone regions on the Virtex-II Pro into four types
Developed mitigation strategies for each region.
Radiation test on the overall system is in progress.
Wang-110 D/MAPLD 2004
21
Acronyms
• SEU : Single Event Upset
• FPGA: Field Programmable Gate Array
• LUT: Look Up Table
• PLB: Processor Local Bus
• OPB: On-Chip Peripheral Bus
• OCM: On-Chip Memory
• EDAC: Error Detect-And-Correct
• ICAP: Internal Configuration Access Point