on the reliability of sram-based fpgas luca sterpone luca sterpone

67
On the reliability On the reliability of SRAM-based FPGAs of SRAM-based FPGAs Luca Sterpone Luca Sterpone <[email protected]> <[email protected]> www.cad.polito.it

Upload: vivien-kelly

Post on 04-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

On the reliability On the reliability of SRAM-based FPGAs of SRAM-based FPGAs

Luca Sterpone Luca Sterpone <[email protected]><[email protected]>

www.cad.polito.it

Page 2: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

OutlineOutline

IntroductionIntroduction Previous worksPrevious works

Scrubbing with partial reconfigurationScrubbing with partial reconfiguration Triple Module RedundancyTriple Module Redundancy

Nowadays TrendsNowadays Trends Proposed approaches and methodologyProposed approaches and methodology

High Level Functional VHDLHigh Level Functional VHDL RPAR algorithmRPAR algorithm

ConclusionsConclusions

Page 3: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

IntroductionIntroduction What’s a SRAM-based FPGA ?What’s a SRAM-based FPGA ?

The SRAM-based FPGA is an array of island-The SRAM-based FPGA is an array of island-style blocks. Each block consists of an array of style blocks. Each block consists of an array of logic elements and routing channels logic elements and routing channels programmed by a Static-RAM configuration programmed by a Static-RAM configuration memory.memory.

logic blocksI/O blocks

routing resources

ConfigurationBITSTREAM

Page 4: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

IntroductionIntroduction The SRAM-based FPGA’s major vendors:The SRAM-based FPGA’s major vendors:

Altera familiesAltera families Cyclone and AcexCyclone and Acex

Low costLow cost Stratix-IIStratix-II

High density FPGAHigh density FPGA 90nm technologies90nm technologies

Xilinx familiesXilinx families SpartanSpartan

90nm technologies90nm technologies Up to 5 Million System gatesUp to 5 Million System gates Lower cost per gate and per pinLower cost per gate and per pin

VirtexVirtex High performanceHigh performance

Page 5: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

IntroductionIntroduction

The SRAM-based FPGAs are very The SRAM-based FPGAs are very convenient because of:convenient because of: High flexibility in achieving multiple High flexibility in achieving multiple

requirements of different applicationsrequirements of different applications Low costLow cost High performanceHigh performance High turnaround timeHigh turnaround time Re-configurabilityRe-configurability

Page 6: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

IntroductionIntroduction The performance and the capacity of the FPGAs The performance and the capacity of the FPGAs

suitable for space flight is increasing steadilysuitable for space flight is increasing steadily Increase from tens of thousands to millions of Increase from tens of thousands to millions of

system gatessystem gates

Spartan 90nm Die

Virtex-4 Die

Page 7: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

IntroductionIntroduction Application of FPGAs has moved form glue logic to Application of FPGAs has moved form glue logic to

complete subsystems that combine real time complete subsystems that combine real time functions on a single chip, including functions on a single chip, including microprocessors and memoriesmicroprocessors and memories

The potentials for FPGA use in space is steadily The potentials for FPGA use in space is steadily increasing and opening up new application areasincreasing and opening up new application areas

The FPGAs are more commonly being used not only The FPGAs are more commonly being used not only in critical applications and are replacing ASICs on a in critical applications and are replacing ASICs on a regular basis.regular basis. SRAM-based FPGA

re-configurable Server

Page 8: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

IntroductionIntroduction

What’s happened in the space environment What’s happened in the space environment ??

Page 9: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

IntroductionIntroduction

The high-energy particles can hit the sensitive The high-energy particles can hit the sensitive silicon area of the SRAM-based FPGAsilicon area of the SRAM-based FPGA

High sensibility to High sensibility to Single Event UpsetsSingle Event Upsets (SEUs) (SEUs) The configuration memory elements could change their The configuration memory elements could change their

content content bit-flipbit-flip

SEUs may drastically alter the FPGA correct SEUs may drastically alter the FPGA correct operations causing unexpected outputs called operations causing unexpected outputs called Single Event Functional InterruptsSingle Event Functional Interrupts (SEFIs). (SEFIs).

Page 10: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

IntroductionIntroduction iRoCiRoC technologiestechnologies conducted a series of tests conducted a series of tests

to determine the failure rate of five different to determine the failure rate of five different FPGA architectures: FPGA architectures: Virtex-II and Spartan-3 SRAM-based from Virtex-II and Spartan-3 SRAM-based from

XilinxXilinx SRAM-based Cyclone FPGA from Altera SRAM-based Cyclone FPGA from Altera Antifuse - based Axcelerator FPGA ProASIC Antifuse - based Axcelerator FPGA ProASIC

Plus devices form ActelPlus devices form Actel

FITFIT (failure in time) is defined as one failure in (failure in time) is defined as one failure in 101099 hours. hours.

Page 11: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

IntroductionIntroduction The results were:The results were:

Antifuse- and flash-based FPGAs suffered no Antifuse- and flash-based FPGAs suffered no loss of configuration under neutron loss of configuration under neutron bombardementbombardement

The tested SRAM-based FPGAs demonstrated The tested SRAM-based FPGAs demonstrated a FIT rate ranging form a FIT rate ranging form 1,1501,150 at sea level to at sea level to 3,9003,900 at 5,000 feet to at 5,000 feet to 540,000 540,000 atat 60,000 feet.60,000 feet.

Please note thatPlease note that: : The integrated circuits typically have a FIT The integrated circuits typically have a FIT

rates lower than rates lower than 100100 The high-reliability applications require a FIT The high-reliability applications require a FIT

rate of rate of 1010 to to 2020..

Page 12: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

IntroductionIntroduction

Safety critical applications such as Safety critical applications such as space applications must consider the space applications must consider the effect of energetic particles (radiation) effect of energetic particles (radiation) can have on electronic componentscan have on electronic components

The usage of the SRAM-based FPGAs in The usage of the SRAM-based FPGAs in safety critical applications needs the safety critical applications needs the develop of techiniques able to decrease develop of techiniques able to decrease the FIT ratio.the FIT ratio.

Page 13: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Previous worksPrevious works

SEU scrubbingSEU scrubbing The configuration bitstream is simply The configuration bitstream is simply

reloaded at a chosen interval.reloaded at a chosen interval.

+ The scrubbing requires a low overhead + The scrubbing requires a low overhead in the systemin the system

- The configuration logic is in “write mode” The configuration logic is in “write mode” for a greater percentage of timefor a greater percentage of time

- The chosen interval for scrub cycles The chosen interval for scrub cycles should be based on the expected static should be based on the expected static upset rate and could be very frequent.upset rate and could be very frequent.

Page 14: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Previous worksPrevious works Partial Reconfiguration + SEU ScrubbingPartial Reconfiguration + SEU Scrubbing

The configuration memory array is divided The configuration memory array is divided into separate segmentsinto separate segments

Thanks to error detection and correction Thanks to error detection and correction architecture (EDAC architecture) it is architecture (EDAC architecture) it is reloaded only the segment that is affected reloaded only the segment that is affected by SEUsby SEUs

- The architecture overhead is very highThe architecture overhead is very high- The power consumption are excessive for The power consumption are excessive for

space/mission critical application.space/mission critical application.

Page 15: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Previous works- TMR Previous works- TMR techniquetechnique The purpose is to remove all single The purpose is to remove all single

points of failure from the designpoints of failure from the design How to protect the design against SEUs ?How to protect the design against SEUs ?

A circuit can be hardened by designing A circuit can be hardened by designing three copies of the same circuit and three copies of the same circuit and building a majority voter on the building a majority voter on the outputs of the replicated circuits.outputs of the replicated circuits.

Depends on the type of data structure Depends on the type of data structure to be mitigatedto be mitigated

Throughput LogicThroughput Logic State-machine LogicState-machine Logic I/O LogicI/O Logic Special FeaturesSpecial Features

Page 16: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Previous works- TMR Previous works- TMR techniquetechnique Although TMR based approach can Although TMR based approach can

tolerate one SEU, they can not tolerate tolerate one SEU, they can not tolerate a second one before being refresheda second one before being refreshed

The refresh cycle of the configuration The refresh cycle of the configuration memory and of the flip-flops can be memory and of the flip-flops can be compared with the scrubbing memory compared with the scrubbing memory protected by EDAC architectureprotected by EDAC architecture

The refresh period needs to be shorted The refresh period needs to be shorted than the expected bit error periodthan the expected bit error period

The TMR based design is not as The TMR based design is not as efficient as presumed.efficient as presumed.

Page 17: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Previous works- TMR Previous works- TMR techniquetechnique There' are two kind of TMR methodologies:There' are two kind of TMR methodologies:

Functional Triple Modular Redundancy Functional Triple Modular Redundancy (FTMR) (2002) developed by the (FTMR) (2002) developed by the GAISLER GAISLER research.research.

A VHDL design methodology that provides TMR at A VHDL design methodology that provides TMR at different design levels:different design levels:

DeviceDeviceModularModularGateGate

Concurrent Error Detection-Duplication with Concurrent Error Detection-Duplication with Comparison for the user combinational logic Comparison for the user combinational logic (2003) presented by Lima et all(2003) presented by Lima et all

A VHDL design methodology that provides an A VHDL design methodology that provides an application oriented architecture able to detect application oriented architecture able to detect the SEU.the SEU.

Fernanda Lima, Luigi Carro, Ricardo Reis, “Designing fault tolerant system Fernanda Lima, Luigi Carro, Ricardo Reis, “Designing fault tolerant system into SRAM based FPGAs”, DAC 2003into SRAM based FPGAs”, DAC 2003

Page 18: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Previous works- TMR Previous works- TMR techniquetechnique Functional Triple Modular Functional Triple Modular

RedundancyRedundancy Triple Module Redundancy flip-flops:Triple Module Redundancy flip-flops:

Triple Module Redundancy sequential - Triple Module Redundancy sequential - logiclogic

Page 19: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Previous works- TMR Previous works- TMR techniquetechnique Functional Triple Modular RedundancyFunctional Triple Modular Redundancy

GAISLER Research Group Report on FPGA for ESA activities 2002GAISLER Research Group Report on FPGA for ESA activities 2002

Page 20: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Previous works- TMR Previous works- TMR techniquetechnique Concurrent Error Detection-Duplication Concurrent Error Detection-Duplication

with Comparison for the user with Comparison for the user combinational logic combinational logic

Page 21: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Previous works- TMR Previous works- TMR techniquetechnique Evaluation of the SEU sensitiveness of Evaluation of the SEU sensitiveness of the TMR basic architecture by simulation the TMR basic architecture by simulation

(BYU SEU simulator)(BYU SEU simulator)

Nathan Rollins, Michael J. Wirthlin, Michael Caffrey and Paul Graham, “Evaluating Nathan Rollins, Michael J. Wirthlin, Michael Caffrey and Paul Graham, “Evaluating TMR Techniques in the Presence of Single Event Upsets”TMR Techniques in the Presence of Single Event Upsets”

Department of Electrical and Computer Engineering, Brigham Young University.Department of Electrical and Computer Engineering, Brigham Young University.

Page 22: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Previous works- TMR Previous works- TMR techniquetechnique

Nathan Rollins, Michael J. Wirthlin, Michael Caffrey and Paul Graham, “Evaluating Nathan Rollins, Michael J. Wirthlin, Michael Caffrey and Paul Graham, “Evaluating TMR Techniques in the Presence of Single Event Upsets”TMR Techniques in the Presence of Single Event Upsets”

Department of Electrical and Computer Engineering, Brigham Young University.Department of Electrical and Computer Engineering, Brigham Young University.

Page 23: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Previous works- TMR Previous works- TMR techniquetechnique Evaluation of the SEU sensitiveness of the TMR basic architecture by fault injection Evaluation of the SEU sensitiveness of the TMR basic architecture by fault injection

P. Bernardi, M. Sonza Reorda, L. Sterpone, M. Violante “On the evaluation of SEU sensitiveness in SRAM-based FPGAs”, 12-14 July, IOLTS 2004.P. Bernardi, M. Sonza Reorda, L. Sterpone, M. Violante “On the evaluation of SEU sensitiveness in SRAM-based FPGAs”, 12-14 July, IOLTS 2004.

Page 24: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

TMR design flow TMR design flow

1.1. User TMR design (VHDL – EDF)User TMR design (VHDL – EDF)2.2. SynthesizeSynthesize

1.1. SynthesisSynthesis2.2. RTL schematicRTL schematic3.3. Check SyntaxCheck Syntax

3.3. Implement DesignImplement Design1.1. MapMap2.2. Place & Route (PAR)Place & Route (PAR)

4.4. Generate Programming FileGenerate Programming File1.1. Native Circuit DescriptionNative Circuit Description2.2. Configuration memory fileConfiguration memory file

Page 25: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

TMR design flow TMR design flow The place-and-route tools provided The place-and-route tools provided

by the FPGA vendors are capable of by the FPGA vendors are capable of optimisingoptimising the number of modules the number of modules used in the design by used in the design by recombiningrecombining the modules and the modules and compactingcompacting the the design.design.

Page 26: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

TMR design flowTMR design flow It’s important to analyse the results It’s important to analyse the results

of the synthesis and the place-and-of the synthesis and the place-and-route at the netlist level to ensure route at the netlist level to ensure that the intended SEU protection has that the intended SEU protection has been implemented.been implemented.

Implement design

Page 27: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

The TMR fault scenarioThe TMR fault scenario

The investigations are made at the The investigations are made at the architectural level of the SRAM-architectural level of the SRAM-based FPGAs manufactured by Xilinxbased FPGAs manufactured by Xilinx

The main macro-element is the TILEThe main macro-element is the TILE CLBCLB Buffer T-stateBuffer T-state Routing SwitchboxRouting Switchbox

Page 28: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

The fault scenarioThe fault scenarioThe investigation methodologyThe investigation methodology

Page 29: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Xilinx – TMR design flowXilinx – TMR design flow

A possible Control Logic Block in a A possible Control Logic Block in a Xilinx TMR designXilinx TMR design

Implement design

Page 30: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Control Logic BlockControl Logic BlockThe fault scenario

SRMUX

CKINV

CEMUX

BXMUX

CY0F

BYMUX

CY0G

CYSELG

GYMUXG

FXMUX

CYINIT

CYSELF

Page 31: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Critical components for the TMR Critical components for the TMR architecture within the CLB:architecture within the CLB:

Combinational TMR designCombinational TMR design MUX FaultMUX Fault

CKINV, CY0G, CY0FCKINV, CY0G, CY0F

Sequential TMR designSequential TMR design MUX FaultMUX Fault

CKINV, CY0G, CY0F, BYMUX, BXMUX, CEMUX, CKINV, CY0G, CY0F, BYMUX, BXMUX, CEMUX, SRMUX, CYINT, CYSELF, CYSELGSRMUX, CYINT, CYSELF, CYSELG

INITIALIZATIONINITIALIZATION SYNC_ATTRSYNC_ATTR

Control Logic BlockControl Logic BlockThe fault scenario

Page 32: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Control Logic BlockControl Logic Block

MUX Fault : CKINVMUX Fault : CKINVThe fault scenarioCombinational Design

This MUX isn’t used before the configuration memory upset.

A possible SEU can activate it!

Then the upset becomes a SEFI in the TMR circuitry as this component

controls both the TMR LUTs!

TMR 1 bit j

TMR 2 bit j

Please note that the two TMR modules are related to signals referred to the

same bit (j) within the circuitry!

Page 33: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Control Logic BlockControl Logic Block

MUX Fault : CY0G/CY0FMUX Fault : CY0G/CY0FThe fault scenarioCombinational Design

The upset alters the output YB of the TMR 1 and the output COUT. COUT is used by another TMR module in a

different CLB.

The configuration memory upset provokes a miss configuration of the

CY0G MUX!

Page 34: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Control Logic BlockControl Logic Block

MUX Fault : BYMUX\BXMUXMUX Fault : BYMUX\BXMUXThe fault scenarioSequential Design

Page 35: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Control Logic BlockControl Logic BlockMUX Fault : CYINITMUX Fault : CYINIT

The fault scenarioSequential Design

Page 36: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Control Logic BlockControl Logic BlockMUX Fault : CYSELF/CYSELGMUX Fault : CYSELF/CYSELG

The fault scenarioSequential Design

Page 37: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Control Logic BlockControl Logic BlockINITIALIZATION: SYNC_ATTRINITIALIZATION: SYNC_ATTR

The fault scenarioSequential Design

Page 38: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Routing SwitchboxRouting Switchbox The routing switchboxes provide the The routing switchboxes provide the

interconnection between the whole logic interconnection between the whole logic resources implemented on the SRAM-based resources implemented on the SRAM-based FPGA.FPGA.

The fault scenario

Page 39: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Routing SwitchboxRouting Switchbox

The fault scenario of the Routing The fault scenario of the Routing Switchbox is based on basic events:Switchbox is based on basic events:

Unrouted netUnrouted net Antenna netAntenna net Bridge netBridge net Short netShort net Open netOpen net

The fault scenario

Page 40: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Critical cases for the TMR Critical cases for the TMR interconnection architecture:interconnection architecture:

Combinational TMR designCombinational TMR design Multiple basic events provoked by common control bitMultiple basic events provoked by common control bit Non-TMR signals routed by the PAR algorithmNon-TMR signals routed by the PAR algorithm

Sequential TMR designSequential TMR design Multiple basic events provoked by common control bitMultiple basic events provoked by common control bit Short eventShort event Non-TMR signals routed by the PAR algorithmNon-TMR signals routed by the PAR algorithm

Routing SwitchboxRouting SwitchboxThe fault scenario

Page 41: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Routing SwitchboxRouting Switchbox

(I) (I) Multiple basic events provoked by common Multiple basic events provoked by common control bit. control bit. OPEN-OPENOPEN-OPEN

The fault scenarioCombinational Design & Sequential Design

The upset in the configuration memory provokes the OPEN of both the

connection called: OUT1->H6W0 and H6M4 -> V6S4 !

Please note that the two faulty signals are related only to different TMR modules in sequential circuits!!!

dev15335.bit of Elliptic Filter

Page 42: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Routing SwitchboxRouting Switchbox(II) (II) Multiple basic events provoked by common Multiple basic events provoked by common

control bit. control bit. OPEN-SHORTOPEN-SHORT

The fault scenarioCombinational Design & Sequential Design

dev10984.bit of Elliptic Filter

Page 43: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Routing SwitchboxRouting Switchbox(III) (III) Multiple basic events provoked by common Multiple basic events provoked by common

control bit. control bit. OPEN-BRIDGEOPEN-BRIDGE

The fault scenarioCombinational Design & Sequential Design

dev3992.bit of Adder 16

Page 44: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Routing SwitchboxRouting Switchbox

(IV) (IV) Multiple basic events provoked by common Multiple basic events provoked by common control bit. control bit. BRIDGE-BRIDGEBRIDGE-BRIDGE

The fault scenarioCombinational Design & Sequential Design

Dev16568.bit Elliptic Filter

Page 45: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Routing SwitchboxRouting Switchbox Non-TMR signal routed by the PAR algorithmNon-TMR signal routed by the PAR algorithm

The fault scenarioCombinational Design & Sequential Design

The upset in the configuration memory provokes a bitflip within a MUX that

controls a CONSTANT value, used for different TMR modules.

Page 46: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Routing switchboxRouting switchboxCombinational Design & Sequential Design The fault scenario

Page 47: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

TMR fault scenario TMR fault scenario classificationclassification

P. Bernardi, M. Sonza Reorda, L. Sterpone, M. Violante “Analysis of the P. Bernardi, M. Sonza Reorda, L. Sterpone, M. Violante “Analysis of the robustness of the TMR architecture in SRAM-based FPGAs”, 22-24 Sept, RADECS robustness of the TMR architecture in SRAM-based FPGAs”, 22-24 Sept, RADECS 2004.2004.

Page 48: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Routing SwitchboxRouting Switchbox

Short eventShort event

The fault scenarioSequential Design

The upset in the configuration memory provokes the conflict on the HEX LINE

bitween two different TMR modules.In this case the bad nodes are the HEX

LINE nodes.

The nodes related to the Hex Lines are very critical within the SRAM-based

FPGA.

Page 49: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Routing Switchbox – Hex Routing Switchbox – Hex lineslines The hex lines are The hex lines are partpart of the general of the general

purpose interconnection provided by the purpose interconnection provided by the Xilinx devices. They route a TILE signals Xilinx devices. They route a TILE signals to another TILEs six-blocks away in each to another TILEs six-blocks away in each one of the four directionsone of the four directions

Hex-lines signals can be accessed either Hex-lines signals can be accessed either at the endpoints or at the midpoint at the endpoints or at the midpoint (three blocks from the source).(three blocks from the source).

Page 50: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Routing Switchbox – GRMRouting Switchbox – GRM

A General Routing Matrix connectability is A General Routing Matrix connectability is formed by:formed by: 108 hex-lines for each TILE108 hex-lines for each TILE 96 bidiretional interconnection to the 96 bidiretional interconnection to the

TILEs in each one of the four directions.TILEs in each one of the four directions.

Page 51: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Nowadays TrendsNowadays Trends

+ Antifuse based FPGAs have so far dominated in + Antifuse based FPGAs have so far dominated in space applications but the SRAM based families space applications but the SRAM based families offer high gate countsoffer high gate counts

- The SRAM configuration memory has a level of - The SRAM configuration memory has a level of SEU sensitivity that can not be ignoredSEU sensitivity that can not be ignored

- The - The careful application of TMRcareful application of TMR and and complementary techniques could have an complementary techniques could have an overhead of 4.5 – 7.5 gates and a performance overhead of 4.5 – 7.5 gates and a performance reduction of about 50% (S. Habinc, reduction of about 50% (S. Habinc, Microelectronics Final Presentation Days, Microelectronics Final Presentation Days, ESA-ESA-ESTECESTEC, Feb. 4-5, 2004). Also reported by the , Feb. 4-5, 2004). Also reported by the GAISLER ResearchGAISLER Research group. group.

Page 52: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Nowadays TrendsNowadays Trends

VHDL – EDF TMR deviceVHDL – EDF TMR device

Synthesis

MAP

Place & Route

Reliable ??

The designer has no capability to control the process result

The designer can define the project constraints in term of

• area occupation for each hierarchy

• timing delay

Page 53: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Nowadays TrendsNowadays Trends

The idea is develop a Reliable Place The idea is develop a Reliable Place And Route process (RPAR algorithm) And Route process (RPAR algorithm) able to perform a dependable able to perform a dependable placement of both the interconnection placement of both the interconnection and logic resources.and logic resources.

Page 54: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Nowadays Trends – RPAR Nowadays Trends – RPAR algorithmalgorithmRPAR(RPAR(DD) ) {{for each area constraints for each area constraints AiAi for each logic node for each logic node LNLN within within AiAi do do {{ find the destination logic nodes list find the destination logic nodes list NODE_D_LISTNODE_D_LIST

in in AiAi for each destination node for each destination node DN DN in in NODE_D_LISTNODE_D_LIST

{{ ((NTNT)=connect_node_2_node()=connect_node_2_node(LNLN, , DN DN ))

if none connection are available thenif none connection are available then re_place(re_place(DN DN ))

elseelse update_avoid_node_graph(update_avoid_node_graph(NT, Ai NT, Ai ))

}}}}}}

Page 55: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

((NTNT)=connect_node_2_node()=connect_node_2_node(LNLN, , DN DN ))

It supports the routing exploiting the Versatile Place It supports the routing exploiting the Versatile Place and Route Algorithm (VPR)and Route Algorithm (VPR) It performs a Shortest path connection between logic nodesIt performs a Shortest path connection between logic nodes It is controlled by different parameters that permit a good It is controlled by different parameters that permit a good

flexibilityflexibility The maximun length of the interconnectionThe maximun length of the interconnection The maximun delay of the interconnectionThe maximun delay of the interconnection

Vaughn Betz and Jonathan Rose, “VPR: A New Packing, Placement and Routing Tool Vaughn Betz and Jonathan Rose, “VPR: A New Packing, Placement and Routing Tool for FPGA research”, International Workshop on Field Programmable Logic and for FPGA research”, International Workshop on Field Programmable Logic and Applications 1997Applications 1997

Nowadays Trends – RPAR Nowadays Trends – RPAR algorithmalgorithm

Page 56: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

1) (1) (NT NT )=connect_node_2_node()=connect_node_2_node(LNLN, , DN DN ))

2) update_avoid_node_graph(2) update_avoid_node_graph(NT, Ai NT, Ai ))

Nowadays Trends – RPAR Nowadays Trends – RPAR algorithmalgorithm

a

c

b

d

ef

g

h

i

Page 57: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Nowadays Trends – RPAR Nowadays Trends – RPAR algorithmalgorithm The RPAR algorithm is applied only to a The RPAR algorithm is applied only to a

range of interconnections and logics range of interconnections and logics that are involved in the fault with the that are involved in the fault with the TMR fault injection campaign.TMR fault injection campaign.

Page 58: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Nowadays Trends – RPAR Nowadays Trends – RPAR algorithmalgorithm TMR XilinxCircuit / Technique # SEU that provokes a fault Prog. BitsADD8 882 9785ADD16 1350 11963MUL8 2300 17448FILTER 2764 33888

RPARCircuit / Technique # SEU that provokes a fault Prog. BitsADD8 16 7867ADD16 28 10091MUL8 29 18974FILTER 2760 33078

TMR Dedicated FloorplanningCircuit / Technique # SEU that provokes a fault Prog. BitsADD8 25 7855ADD16 38 10036MUL8 38 18927FILTER 2800 33057

Page 59: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Nowadays Trends – Nowadays Trends – PlacementPlacement Each redundant module could be Each redundant module could be

partitioned in different partpartitioned in different part

Then perform the placement keeping Then perform the placement keeping the hierarchy of each partitionthe hierarchy of each partition

BA C

PARA B CA

,B,C

A,B

,C

A,B

,C

PARB

AC

BA

C

BA

C

BA

C

BA

C

BA

C

Page 60: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Nowadays TrendsNowadays Trends

28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

20 3--X 3--X 3--X 1 3 3 3 1 2 1--2 1 1 2 2 2

21 3 3--2 1--3 1 3 3 3 1 2 1--2 1 1 2 2 2

22 3 3--2 1--3 1 3 3 3 1 2 1--2 1 1 2 2 2

23 3 3--2 1--3 1 3 3 3 1 2 1--2 1 1 2 2 2

24 3 3--2 1--3 1 3 3 3 1 2 1--2 1 1 2 2 2

25 3 3--2 1--3 1 3 3 3 1 2 1--2 1 1 2 2 2

26 3 3--2 1--3 1 3 3 3 p17-1 2 p19-2 1 1-p22 x-p24 2 2-p26

27 3 p9-2 1--X 1--p11 p13-X p14-X 3--X 3--1 2-p18 X-p20 X-p21 1-vco X X-p25 2--X

28 vc2-X X X X X X X X X X X X X X X

29 30 31 32 33 34 35 36 37 38 39 40 41 42

11 p8

12

13 p3

14 p5 p4 vc8 p0

15

16 p9 p7 p6 p2 p1

17 1 1 1 1 1 1 1 1 1 1

18 p12 1 1 1 1 1 1 1 1 1 1

19 vc4 vc7 1 1 1 1 1 1 1 1 1 1

20 vc6 1 1 1 1 1 1 1 1 1 1

21 2 2 2 2 2 2 2 2 2 2

22 p11 2 2 2 2 2 2 2 2 2 2

23 vc3 vc5 2 2 2 2 2 2 2 2 2 2

24 p12 2 2 2 2 2 2 2 2 2 2

25 vc2 3 3 3 3 3 3 3 3 3 3

26 p13 3 3 3 3 3 3 3 3 3 3

27 p16 p14 3 3 3 3 3 3 3 3 3 3

28 p15 3 3 3 3 3 3 3 3 3 3

Page 61: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

ConclusionsConclusions

The obtained results encourage the The obtained results encourage the application and the improving of the application and the improving of the RPAR algorithmRPAR algorithm

The reliability enhancement is not The reliability enhancement is not finished! finished!

High level strategy

Reliability

Placement strategy RPAR algorithm

Hierarchical optimization

Dedicated floorplanning Efficient PAR

Page 62: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Future worksFuture works

Validation of the obtained results by Validation of the obtained results by radiation testingradiation testing

Evaluation of the impact of these Evaluation of the impact of these strategy onstrategy on Power consumptionPower consumption Timing delayTiming delay Area overheadArea overhead Applicability to different FPGAsApplicability to different FPGAs

Page 63: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

ReferencesReferences European space components information exchange systemEuropean space components information exchange system

https://escies.org/https://escies.org/ Gaisler Reseach Group Gaisler Reseach Group

www.gaisler.comwww.gaisler.com XilinxXilinx

www.xilinx.comwww.xilinx.com Application notes XilinxApplication notes Xilinx

Carl Carmichael “Triple Module Redundancy Design Techniques for Carl Carmichael “Triple Module Redundancy Design Techniques for Virtex FPGAs”, XAPP197 November 1, 2001Virtex FPGAs”, XAPP197 November 1, 2001

M. Violante, M. Ceschia, M. Sonza Reorda, A. Paccagnella, P. M. Violante, M. Ceschia, M. Sonza Reorda, A. Paccagnella, P. Bernardi, M. Rebaudengo, D. Bortolato, M. Bellato, P. Zambolin and Bernardi, M. Rebaudengo, D. Bortolato, M. Bellato, P. Zambolin and A. Candelori “Analyzing SEU Effects in SRAM-based FPGAs”, IOLTS A. Candelori “Analyzing SEU Effects in SRAM-based FPGAs”, IOLTS 2003.2003.

M. Ceschia, M. Violante, M. Sonza Reorda, A. Paccagnella, P. M. Ceschia, M. Violante, M. Sonza Reorda, A. Paccagnella, P. Bernardi, M. Rebaudengo, D. Bortolato, M. Bellato, P. Zambolin, A. Bernardi, M. Rebaudengo, D. Bortolato, M. Bellato, P. Zambolin, A. Candelori “Identification and classification of single-event upsets in Candelori “Identification and classification of single-event upsets in the configuration memory of SRAM-based FPGAs”, IEEE Transaction the configuration memory of SRAM-based FPGAs”, IEEE Transaction on Nuclear Science 2003.on Nuclear Science 2003.

Page 64: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

ReferencesReferences Brigham Young University, Department of Electrical and Computer Brigham Young University, Department of Electrical and Computer

Engineering Engineering

www.ee.byu.eduwww.ee.byu.edu M. Bellato, P. Bernardi, D. Bortolato, A. Candelori, M. Ceschia, A. M. Bellato, P. Bernardi, D. Bortolato, A. Candelori, M. Ceschia, A.

Paccagnella, M. Rebaudengo, M. Sonza Reorda, M. Violante, P. Paccagnella, M. Rebaudengo, M. Sonza Reorda, M. Violante, P. Zambolin “Evaluating the effects of SEUs affecting the configuration Zambolin “Evaluating the effects of SEUs affecting the configuration memory of an SRAM-based FPGA”, DATE 2004memory of an SRAM-based FPGA”, DATE 2004

F. Lima, C. Carmichael, J. Fabula, R. Padovani, R. Reis “A fault F. Lima, C. Carmichael, J. Fabula, R. Padovani, R. Reis “A fault injection analysis of Virtex FPGA TMR design methodology”, RADECS injection analysis of Virtex FPGA TMR design methodology”, RADECS 20012001

Fernanda Lima, Luigi Carro, Ricardo Reis, “Designing fault tolerant Fernanda Lima, Luigi Carro, Ricardo Reis, “Designing fault tolerant system into SRAM based FPGAs”, DAC 2003system into SRAM based FPGAs”, DAC 2003

Page 65: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Spare slidesSpare slides

Page 66: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Nowadays Trends – RPAR Nowadays Trends – RPAR algorithmalgorithm/* pre-algorithm Mapping operations *//* pre-algorithm Mapping operations */

for each defined area constraintsfor each defined area constraints set_area constraints set_area constraints AiAi for each logic_node for each logic_node LNLN of the design of the design DD dodo place place LNLN for each area constraints for each area constraints AiAi

RPAR(RPAR(DD) ) { { for each area constraints for each area constraints AiAi for each logic node for each logic node LNLN within within AiAi do do {{ find the destination logic nodes list find the destination logic nodes list NODE_D_LISTNODE_D_LIST in in AiAi for each destination node for each destination node DN DN in in NODE_D_LISTNODE_D_LIST

{{ ((NTNT)=connect_node_2_node()=connect_node_2_node(LNLN, , DN DN ))

if none connection are available thenif none connection are available then {{ re_place(re_place(DNDN))

}}elseelse {{ update_avoid_node_graph(update_avoid_node_graph(NT, Ai NT, Ai )) }}

}}}}}}

Page 67: On the reliability of SRAM-based FPGAs Luca Sterpone Luca Sterpone

Nowadays Trends – RPAR Nowadays Trends – RPAR algorithmalgorithm

TMR XilinxCircuit / Technique # SEU that provokes a fault Prog. BitsADD8 882 9785ADD16 1350 11963MUL8 2300 17448FILTER 2764 33888

RPARCircuit / Technique # SEU that provokes a fault Prog. BitsADD8 16 7867ADD16 28 10091MUL8 29 18974FILTER 2760 33078

TMR Dedicated FloorplanningCircuit / Technique # SEU that provokes a fault Prog. BitsADD8 25 7855ADD16 38 10036MUL8 38 18927FILTER 2800 33057

The RPAR algorithm is applied only to a The RPAR algorithm is applied only to a range of interconnections and logics range of interconnections and logics that are involved in the fault with the that are involved in the fault with the TMR Dedicated floorplanning fault TMR Dedicated floorplanning fault injection campaign.injection campaign.