improving fpga design robustness with partial tmr

Post on 06-Jan-2016

46 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Improving FPGA Design Robustness with Partial TMR. Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael Wirthlin 1 1 Brigham Young University Department of Electrical Engineering 2 Los Alamos National Laboratory. x. MTBF. Reliability constraint. - PowerPoint PPT Presentation

TRANSCRIPT

Pratt 1 MAPLD 2005/202

Improving FPGA Design Robustness with Partial TMR

Brian Pratt 1,2

Michael Caffrey, Paul Graham 2

Eric Johnson, Keith Morgan, Michael Wirthlin 1

1 Brigham Young University Department of Electrical Engineering2 Los Alamos National Laboratory

Pratt 2 MAPLD 2005/202

Motivation for Partial TMR

• Factors of fault-tolerant computing:– Availability– Reliability– Mitigation Cost

• Full TMR– Expensive in terms of

power, speed, area, etc.– Worthwhile if

affordable! Area Cost

MT

BF

xReliability constraint

Area constraint

Pratt 3 MAPLD 2005/202

Motivation for Partial TMR

• Partial TMR offers:– Mitigation of most sensitive design structures

– Increased availability of a system by decreasing number of system resets

– Decreased mitigation cost over full TMR

• Suitability of Partial TMR is application dependent– Reduced reliability compared to full TMR

Pratt 4 MAPLD 2005/202

Scrubbing

• Must be included with Partial Mitigation

• Continuously ‘read’ and ‘clean’ configuration memory

• Single bit will be upset no longer than ts

ts = time for one scrub

1001011010101000110001

Pratt 5 MAPLD 2005/202

Non-Persistent Errors

• An SEU in the non-persistent cross-section will cause a temporary interruption of service

• Requires partial reconfiguration to correct

Scrubbing Repairs Configuration

Correct Output

time cycle

erro

r m

agni

tud

e

error = delta between outputs of a golden and DUT circuit

Pratt 6 MAPLD 2005/202

Persistent Errors

• An SEU in the persistent cross-section will cause a permanent interruption of service

• Requires full system reset to correct

Scrubbing Repairs Configuration

Incorrect Output

error = delta between outputs of a golden and DUT circuit

time cycle

erro

r m

agni

tud

e

Pratt 7 MAPLD 2005/202

Non-Persistent Circuit Structures

• Generally consists of circuit components and routing in a feed-forward path

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Pratt 8 MAPLD 2005/202

Persistent Circuit Structures

• Generally consists of circuit components and routing in, or contributing to, a feed-back path

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Pratt 9 MAPLD 2005/202

• Apply a mitigation technique to just the persistent cross section

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

LogicTMR

Partial Mitigation

Pratt 10 MAPLD 2005/202

Limitations of Partial Mitigation

• Does not prevent all errors– System must be corrected with configuration

bitstream scrubbing– Circuit configuration can be incorrect between

scrubbing• Non-persistent errors remain

Pratt 11 MAPLD 2005/202

Automated Partial TMR

• Analyze an EDIF source file for feedback structures– Protect these sections

with TMR to reduce persistent cross section

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Logic FFFF Logic

Logic FFFF Logic

VoterVoter

VoterVoter

VoterVoter

Pratt 12 MAPLD 2005/202

BLTmr Partial TMR Tool

• BYU-LANL Triple Modular Redundancy:

Configurable Reliability– Limit mitigation to minimize:

• design resource requirements

• power consumption

– Mitigation focused on persistent circuit structures

Pratt 13 MAPLD 2005/202

BLTmr Partial TMR Tool

• Design Divided into three sections:– Feedback, Input to FB, Output

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Pratt 14 MAPLD 2005/202

BLTmr Partial TMR Tool

• Design Divided into three sections:– Feedback, Input to FB, Output

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Pratt 15 MAPLD 2005/202

BLTmr Partial TMR Tool

• Design Divided into three sections:– Feedback, Input to FB, Output

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Pratt 16 MAPLD 2005/202

BLTmr Partial TMR Tool

• Design Divided into three sections:– Feedback, Input to FB, Output

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Pratt 17 MAPLD 2005/202

BLTmr Tool Options

• BLTmr Tool applies TMR mitigation to subsections of the design:– Feedback Only

– Feedback + Input to Feedback

– FB + Input to FB + Output (Full TMR)

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Pratt 18 MAPLD 2005/202

BLTmr Tool Options

• BLTmr Tool applies TMR mitigation to subsections of the design:– Feedback Only

– Feedback + Input to Feedback

– FB + Input to FB + Output (Full TMR)

FF

FF

FF

FFLogic

Logic

Logic

Logic

FFLogic

FFLogic

VoterVoter

VoterVoter

VoterVoter

FFLogic

Pratt 19 MAPLD 2005/202

BLTmr Tool Options

• BLTmr Tool applies TMR mitigation to subsections of the design:– Feedback Only

– Feedback + Input to Feedback

– FB + Input to FB + Output (Full TMR)

Logic

FF

FF

FF

FF

FF

Logic

Logic

Logic

Logic

Logic FFFF Logic

Logic FFFF Logic

VoterVoter

VoterVoter

VoterVoter

Pratt 20 MAPLD 2005/202

BLTmr Tool Options

• BLTmr Tool applies TMR mitigation to subsections of the design:– Feedback Only

– Feedback + Input to Feedback

– FB + Input to FB + Output (Full TMR)

Logic FF

FF

FF Logic

Logic

Logic FFFF Logic

Logic FFFF Logic

VoterVoter

VoterVoter

VoterVoter

Logic FFFF Logic

Logic FFFF Logic

Logic FFFF Logic

FFLogic

FFLogic

Pratt 21 MAPLD 2005/202

BLTmr Tool Flow

• BYU EDIF development environment reads in user design

• Design organized into graph structure for analysis

ParseEDIF

CreateDesign

Database

UserConstraints

Analysis(Feedback,Input to FB,

etc.)

CellTriplication

OriginalDesign

PartiallyMitigated

Design

VoterInsertion

Pratt 22 MAPLD 2005/202

BLTmr Tool Flow

• User may direct mitigation

• Design analyzed to classify components as described

ParseEDIF

CreateDesign

Database

UserConstraints

Analysis(Feedback,Input to FB,

etc.)

CellTriplication

OriginalDesign

PartiallyMitigated

Design

VoterInsertion

Pratt 23 MAPLD 2005/202

BLTmr Tool Flow

• Circuit elements triplicated

• Voters inserted

• Mitigated design written in EDIF format

ParseEDIF

CreateDesign

Database

UserConstraints

Analysis(Feedback,Input to FB,

etc.)

CellTriplication

OriginalDesign

PartiallyMitigated

Design

VoterInsertion

Pratt 24 MAPLD 2005/202

Example Circuits

• Tests on two designs

1. DSP Kernel

2. Synthetic Design– LFSR modules feeding

into an add-multiply tree

Pratt 25 MAPLD 2005/202

FPGA Editor Layout Sensitivity Map Persistence Map

DSP Kernel

Unmitigated Fault Analysis

5,746 slices (46%) 575,448 bits (9.9%) 13,841 bits (0.23%)

Synthetic Design

2,538 slices (20%) 189,835 bits (3.3%) 77,159 bits (1.3%)

Pratt 26 MAPLD 2005/202

FPGA Editor Layout Sensitivity Map Persistence Map

Unmitigated

Experimental Results – Design #1DSP Kernel

5,746 slices (46%) 575,448 (9.90%) 13,841 (0.24%)

Partial TMR applied to

Feedback & Input to FB

8,036 slices (65%) 569,700 (9.81%) 152 (0.0026%)

Pratt 27 MAPLD 2005/202

Unmitigated

Experimental Results – Design #2Synthetic (LFSR/Mult)

2,538 slices (20%) 189,835 (3.27%) 77,159 (1.33%)

Full TMR Applied

11,961 slices (97%) 20,256 (0.35%) 671 (0.012%)

FPGA Editor Layout Sensitivity Map Persistence Map

Pratt 28 MAPLD 2005/202

* Full TMR could not be applied to DSP Kernel due to FPGA resource constraints“Qpro Virtex 2.5V radiation hardened FPGAs”, Xilinx Inc., DS028 (v1.2), Nov. 5, 2001.

1.00E-12

1.00E-11

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

Unmitigated TMRFeedback

TMRFeedback +Input to FB

Max TMR

Vir

tex

10

00

Pro

ton

Cro

ss

Se

cti

on

(c

m2) Static X-

Section

DSP Kernel*Dynamic

DSP KernelPersistent

SyntheticDynamic

SyntheticPersistent

Experimental Results

Pratt 29 MAPLD 2005/202

Experimental Results

• GPS orbit (22,200 km altitude, 55° inclination)• AP-8 Solar Minimum, JPL Solar Proton Quiet, CRÈME 96 Solar Minimum

1

10

100

1000

10000

100000

Static X-Section -Sensitive

Unmitigated -Sensitive

Unmitigated -Persistent

Feedback TMR -Persistent

Feedback+InputTMR - Persistent

Max TMR -Persistent

MT

BF

(d

ay

s)

DSP Kernel

Synthetic

Pratt 30 MAPLD 2005/202

Summary of Results

Design Size

Increase

Sensitivity Decrease

Persistence Decrease

Average MTBF

Increase‡‡

DSP Kernel*

40% 3% 99% 90x

Synthetic

Design ‡

370% 89% 99% 114x

* Unmitigated to Partial TMR of Feedback + Input to FB‡ Unmitigated to Full TMR‡‡ GPS orbit; AP-8 Solar Minimum, JPL Solar Proton Quiet, CRÈME 96 Solar Minimum

Pratt 31 MAPLD 2005/202

Conclusions• Pros: Partial TMR (BLTmr) as fault mitigation

offers:– Increased system availability due to fewer system resets– More “affordable” fault mitigation than full TMR– Critical design areas are mitigated with an automated

tool

• Cons:– Much of the design may be unmitigated, leaving

sensitive sections• May result in temporary errors

top related