improving fpga design robustness with partial tmr

31
Pratt 1 MAPLD 2005/202 Improving FPGA Design Robustness with Partial TMR Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael Wirthlin 1 1 Brigham Young University Department of Electrical Engineering 2 Los Alamos National Laboratory

Upload: pippa

Post on 06-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Improving FPGA Design Robustness with Partial TMR. Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael Wirthlin 1 1 Brigham Young University Department of Electrical Engineering 2 Los Alamos National Laboratory. x. MTBF. Reliability constraint. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Improving FPGA Design Robustness with Partial TMR

Pratt 1 MAPLD 2005/202

Improving FPGA Design Robustness with Partial TMR

Brian Pratt 1,2

Michael Caffrey, Paul Graham 2

Eric Johnson, Keith Morgan, Michael Wirthlin 1

1 Brigham Young University Department of Electrical Engineering2 Los Alamos National Laboratory

Page 2: Improving FPGA Design Robustness with Partial TMR

Pratt 2 MAPLD 2005/202

Motivation for Partial TMR

• Factors of fault-tolerant computing:– Availability– Reliability– Mitigation Cost

• Full TMR– Expensive in terms of

power, speed, area, etc.– Worthwhile if

affordable! Area Cost

MT

BF

xReliability constraint

Area constraint

Page 3: Improving FPGA Design Robustness with Partial TMR

Pratt 3 MAPLD 2005/202

Motivation for Partial TMR

• Partial TMR offers:– Mitigation of most sensitive design structures

– Increased availability of a system by decreasing number of system resets

– Decreased mitigation cost over full TMR

• Suitability of Partial TMR is application dependent– Reduced reliability compared to full TMR

Page 4: Improving FPGA Design Robustness with Partial TMR

Pratt 4 MAPLD 2005/202

Scrubbing

• Must be included with Partial Mitigation

• Continuously ‘read’ and ‘clean’ configuration memory

• Single bit will be upset no longer than ts

ts = time for one scrub

1001011010101000110001

Page 5: Improving FPGA Design Robustness with Partial TMR

Pratt 5 MAPLD 2005/202

Non-Persistent Errors

• An SEU in the non-persistent cross-section will cause a temporary interruption of service

• Requires partial reconfiguration to correct

Scrubbing Repairs Configuration

Correct Output

time cycle

erro

r m

agni

tud

e

error = delta between outputs of a golden and DUT circuit

Page 6: Improving FPGA Design Robustness with Partial TMR

Pratt 6 MAPLD 2005/202

Persistent Errors

• An SEU in the persistent cross-section will cause a permanent interruption of service

• Requires full system reset to correct

Scrubbing Repairs Configuration

Incorrect Output

error = delta between outputs of a golden and DUT circuit

time cycle

erro

r m

agni

tud

e

Page 7: Improving FPGA Design Robustness with Partial TMR

Pratt 7 MAPLD 2005/202

Non-Persistent Circuit Structures

• Generally consists of circuit components and routing in a feed-forward path

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Page 8: Improving FPGA Design Robustness with Partial TMR

Pratt 8 MAPLD 2005/202

Persistent Circuit Structures

• Generally consists of circuit components and routing in, or contributing to, a feed-back path

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Page 9: Improving FPGA Design Robustness with Partial TMR

Pratt 9 MAPLD 2005/202

• Apply a mitigation technique to just the persistent cross section

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

LogicTMR

Partial Mitigation

Page 10: Improving FPGA Design Robustness with Partial TMR

Pratt 10 MAPLD 2005/202

Limitations of Partial Mitigation

• Does not prevent all errors– System must be corrected with configuration

bitstream scrubbing– Circuit configuration can be incorrect between

scrubbing• Non-persistent errors remain

Page 11: Improving FPGA Design Robustness with Partial TMR

Pratt 11 MAPLD 2005/202

Automated Partial TMR

• Analyze an EDIF source file for feedback structures– Protect these sections

with TMR to reduce persistent cross section

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Logic FFFF Logic

Logic FFFF Logic

VoterVoter

VoterVoter

VoterVoter

Page 12: Improving FPGA Design Robustness with Partial TMR

Pratt 12 MAPLD 2005/202

BLTmr Partial TMR Tool

• BYU-LANL Triple Modular Redundancy:

Configurable Reliability– Limit mitigation to minimize:

• design resource requirements

• power consumption

– Mitigation focused on persistent circuit structures

Page 13: Improving FPGA Design Robustness with Partial TMR

Pratt 13 MAPLD 2005/202

BLTmr Partial TMR Tool

• Design Divided into three sections:– Feedback, Input to FB, Output

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Page 14: Improving FPGA Design Robustness with Partial TMR

Pratt 14 MAPLD 2005/202

BLTmr Partial TMR Tool

• Design Divided into three sections:– Feedback, Input to FB, Output

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Page 15: Improving FPGA Design Robustness with Partial TMR

Pratt 15 MAPLD 2005/202

BLTmr Partial TMR Tool

• Design Divided into three sections:– Feedback, Input to FB, Output

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Page 16: Improving FPGA Design Robustness with Partial TMR

Pratt 16 MAPLD 2005/202

BLTmr Partial TMR Tool

• Design Divided into three sections:– Feedback, Input to FB, Output

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Page 17: Improving FPGA Design Robustness with Partial TMR

Pratt 17 MAPLD 2005/202

BLTmr Tool Options

• BLTmr Tool applies TMR mitigation to subsections of the design:– Feedback Only

– Feedback + Input to Feedback

– FB + Input to FB + Output (Full TMR)

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Page 18: Improving FPGA Design Robustness with Partial TMR

Pratt 18 MAPLD 2005/202

BLTmr Tool Options

• BLTmr Tool applies TMR mitigation to subsections of the design:– Feedback Only

– Feedback + Input to Feedback

– FB + Input to FB + Output (Full TMR)

FF

FF

FF

FFLogic

Logic

Logic

Logic

FFLogic

FFLogic

VoterVoter

VoterVoter

VoterVoter

FFLogic

Page 19: Improving FPGA Design Robustness with Partial TMR

Pratt 19 MAPLD 2005/202

BLTmr Tool Options

• BLTmr Tool applies TMR mitigation to subsections of the design:– Feedback Only

– Feedback + Input to Feedback

– FB + Input to FB + Output (Full TMR)

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Logic FFFF Logic

Logic FFFF Logic

VoterVoter

VoterVoter

VoterVoter

Page 20: Improving FPGA Design Robustness with Partial TMR

Pratt 20 MAPLD 2005/202

BLTmr Tool Options

• BLTmr Tool applies TMR mitigation to subsections of the design:– Feedback Only

– Feedback + Input to Feedback

– FB + Input to FB + Output (Full TMR)

Logic FF

FF

FF Logic

Logic

Logic FFFF Logic

Logic FFFF Logic

VoterVoter

VoterVoter

VoterVoter

Logic FFFF Logic

Logic FFFF Logic

Logic FFFF Logic

FFLogic

FFLogic

Page 21: Improving FPGA Design Robustness with Partial TMR

Pratt 21 MAPLD 2005/202

BLTmr Tool Flow

• BYU EDIF development environment reads in user design

• Design organized into graph structure for analysis

ParseEDIF

CreateDesign

Database

UserConstraints

Analysis(Feedback,Input to FB,

etc.)

CellTriplication

OriginalDesign

PartiallyMitigated

Design

VoterInsertion

Page 22: Improving FPGA Design Robustness with Partial TMR

Pratt 22 MAPLD 2005/202

BLTmr Tool Flow

• User may direct mitigation

• Design analyzed to classify components as described

ParseEDIF

CreateDesign

Database

UserConstraints

Analysis(Feedback,Input to FB,

etc.)

CellTriplication

OriginalDesign

PartiallyMitigated

Design

VoterInsertion

Page 23: Improving FPGA Design Robustness with Partial TMR

Pratt 23 MAPLD 2005/202

BLTmr Tool Flow

• Circuit elements triplicated

• Voters inserted

• Mitigated design written in EDIF format

ParseEDIF

CreateDesign

Database

UserConstraints

Analysis(Feedback,Input to FB,

etc.)

CellTriplication

OriginalDesign

PartiallyMitigated

Design

VoterInsertion

Page 24: Improving FPGA Design Robustness with Partial TMR

Pratt 24 MAPLD 2005/202

Example Circuits

• Tests on two designs

1. DSP Kernel

2. Synthetic Design– LFSR modules feeding

into an add-multiply tree

Page 25: Improving FPGA Design Robustness with Partial TMR

Pratt 25 MAPLD 2005/202

FPGA Editor Layout Sensitivity Map Persistence Map

DSP Kernel

Unmitigated Fault Analysis

5,746 slices (46%) 575,448 bits (9.9%) 13,841 bits (0.23%)

Synthetic Design

2,538 slices (20%) 189,835 bits (3.3%) 77,159 bits (1.3%)

Page 26: Improving FPGA Design Robustness with Partial TMR

Pratt 26 MAPLD 2005/202

FPGA Editor Layout Sensitivity Map Persistence Map

Unmitigated

Experimental Results – Design #1DSP Kernel

5,746 slices (46%) 575,448 (9.90%) 13,841 (0.24%)

Partial TMR applied to

Feedback & Input to FB

8,036 slices (65%) 569,700 (9.81%) 152 (0.0026%)

Page 27: Improving FPGA Design Robustness with Partial TMR

Pratt 27 MAPLD 2005/202

Unmitigated

Experimental Results – Design #2Synthetic (LFSR/Mult)

2,538 slices (20%) 189,835 (3.27%) 77,159 (1.33%)

Full TMR Applied

11,961 slices (97%) 20,256 (0.35%) 671 (0.012%)

FPGA Editor Layout Sensitivity Map Persistence Map

Page 28: Improving FPGA Design Robustness with Partial TMR

Pratt 28 MAPLD 2005/202

* Full TMR could not be applied to DSP Kernel due to FPGA resource constraints“Qpro Virtex 2.5V radiation hardened FPGAs”, Xilinx Inc., DS028 (v1.2), Nov. 5, 2001.

1.00E-12

1.00E-11

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

Unmitigated TMRFeedback

TMRFeedback +Input to FB

Max TMR

Vir

tex

10

00

Pro

ton

Cro

ss

Se

cti

on

(c

m2) Static X-

Section

DSP Kernel*Dynamic

DSP KernelPersistent

SyntheticDynamic

SyntheticPersistent

Experimental Results

Page 29: Improving FPGA Design Robustness with Partial TMR

Pratt 29 MAPLD 2005/202

Experimental Results

• GPS orbit (22,200 km altitude, 55° inclination)• AP-8 Solar Minimum, JPL Solar Proton Quiet, CRÈME 96 Solar Minimum

1

10

100

1000

10000

100000

Static X-Section -Sensitive

Unmitigated -Sensitive

Unmitigated -Persistent

Feedback TMR -Persistent

Feedback+InputTMR - Persistent

Max TMR -Persistent

MT

BF

(d

ay

s)

DSP Kernel

Synthetic

Page 30: Improving FPGA Design Robustness with Partial TMR

Pratt 30 MAPLD 2005/202

Summary of Results

Design Size

Increase

Sensitivity Decrease

Persistence Decrease

Average MTBF

Increase‡‡

DSP Kernel*

40% 3% 99% 90x

Synthetic

Design ‡

370% 89% 99% 114x

* Unmitigated to Partial TMR of Feedback + Input to FB‡ Unmitigated to Full TMR‡‡ GPS orbit; AP-8 Solar Minimum, JPL Solar Proton Quiet, CRÈME 96 Solar Minimum

Page 31: Improving FPGA Design Robustness with Partial TMR

Pratt 31 MAPLD 2005/202

Conclusions• Pros: Partial TMR (BLTmr) as fault mitigation

offers:– Increased system availability due to fewer system resets– More “affordable” fault mitigation than full TMR– Critical design areas are mitigated with an automated

tool

• Cons:– Much of the design may be unmitigated, leaving

sensitive sections• May result in temporary errors