1 fault tolerant fpga co-processing toolkit oral defense in partial fulfillment of the requirements...

42
1 Fault Tolerant FPGA Co-processing Toolkit Oral defense in partial fulfillment of the requirements for the degree of Master of Science 2006 Douglas Michael DiSabello

Upload: clara-freckleton

Post on 14-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

1

Fault Tolerant FPGA Co-processing ToolkitFault Tolerant FPGA Co-processing Toolkit

Oral defense in partial fulfillment of the requirements for the degree of Master of Science 2006

Oral defense in partial fulfillment of the requirements for the degree of Master of Science 2006

Douglas Michael DiSabello

2

OverviewOverview

Computation in radiation space environments is slow compared to terrestrial computation

FPGAs can fill and exceed this gap for specific applications

The Fault Tolerant FPGA Co-processing Toolkit facilitates a general capability of FPGA co-processing capability for space based applications

Computation in radiation space environments is slow compared to terrestrial computation

FPGAs can fill and exceed this gap for specific applications

The Fault Tolerant FPGA Co-processing Toolkit facilitates a general capability of FPGA co-processing capability for space based applications

3

OutlineOutline

The Space Computation Performance Gap Ionizing Radiation and FPGA Background Development Board Fault Tolerant Co-Processing Toolkit

Support System Fault Tolerant Support System End User Environment

The Space Computation Performance Gap Ionizing Radiation and FPGA Background Development Board Fault Tolerant Co-Processing Toolkit

Support System Fault Tolerant Support System End User Environment

4

The Space Computation Performance Gap

The Space Computation Performance Gap

Radiation Hardened Microprocessor BAE Systems 133MHz PowerPC Special and proprietary design techniques

Laptop running this presentation 1,500 MHz PowerPC

Radiation Hardened Microprocessor BAE Systems 133MHz PowerPC Special and proprietary design techniques

Laptop running this presentation 1,500 MHz PowerPC

5

Computation Gap ProblemsComputation Gap Problems

Bare minimum of computations performed in space

Large and slow data transfers to limited number of receiving stations

Bare minimum of computations performed in space

Large and slow data transfers to limited number of receiving stations

6

Field Programmable Gate ArraysField Programmable Gate Arrays

200 – 1600 fold speed ups Exploits fine grain parallelism of algorithms

Speed up computations in space beyond what can even be offered by normal terrestrial computers

Fast development cycles In flight reprogramming to adapt to changing

mission requirements

200 – 1600 fold speed ups Exploits fine grain parallelism of algorithms

Speed up computations in space beyond what can even be offered by normal terrestrial computers

Fast development cycles In flight reprogramming to adapt to changing

mission requirements

7

FPGAs DetailsFPGAs Details

SRAM Based Configuration memory stores and implements design Programmed using JTAG or SelectMap interfaces

Configuration Logic Blocks Look Up Tables and supporting logic

Input/Output Blocks Used for all general I/O package pins

Block RAMs General Routing Matrix

SRAM Based Configuration memory stores and implements design Programmed using JTAG or SelectMap interfaces

Configuration Logic Blocks Look Up Tables and supporting logic

Input/Output Blocks Used for all general I/O package pins

Block RAMs General Routing Matrix

8

Ionizing RadiationIonizing Radiation

Low Earth Orbit contains ionizing particles trapped in the Van Allen Belts

Particles cause direct and secondary nuclear reactions in silicon substrate

Decreasing feature sizes and shrinking threshold voltages increase probability of these interactions causing errors in circuits

Low Earth Orbit contains ionizing particles trapped in the Van Allen Belts

Particles cause direct and secondary nuclear reactions in silicon substrate

Decreasing feature sizes and shrinking threshold voltages increase probability of these interactions causing errors in circuits

9

Ionizing Radiation DefinitionsIonizing Radiation Definitions

Single Event Transient Temporary change in logic value

Single Event Upset SET that is latched into a memory

Single Event Functional Interrupt Component stops service

Single Event Transient Temporary change in logic value

Single Event Upset SET that is latched into a memory

Single Event Functional Interrupt Component stops service

10

Ionizing Radiation and FPGAsIonizing Radiation and FPGAs

Architecture SEUs in configuration memory Instantiated design changes Usually results in a SEFI

Data SEUs in Flip Flops, Latches, BRAMS, etc… Incorrect computation results and/or SEFIs

Off Chip Communication Non-dedicated configurable input/outputs package pins SEUs to input/output blocks can disable a package pin

Architecture SEUs in configuration memory Instantiated design changes Usually results in a SEFI

Data SEUs in Flip Flops, Latches, BRAMS, etc… Incorrect computation results and/or SEFIs

Off Chip Communication Non-dedicated configurable input/outputs package pins SEUs to input/output blocks can disable a package pin

11

Typical FPGA Cross SectionTypical FPGA Cross Section

Configuration Memory accounts for 91% of a typical FPGA cross section 78% - 84.8% Routing Structure 20% Control bits and CLB LUT values

Configuration Memory accounts for 91% of a typical FPGA cross section 78% - 84.8% Routing Structure 20% Control bits and CLB LUT values

Michael Affrey, Paul Graham, Eric Johnson, Michael Wirthlin, Nathan Rollins, and Carl Carmichael, “Single-Event Upsets in SRAM FPGAs” MAPLD, Sep. 2002

12

Fault Mitigation TechniquesFault Mitigation Techniques

Scrubbing fixes architectural upsets Continuously rewrite static portions of

configuration memory Active partial reconfiguration bitstream SEUs corrected at given reconfiguration rate

(shorter than expected upsets rate for given orbit)

Scrubbing fixes architectural upsets Continuously rewrite static portions of

configuration memory Active partial reconfiguration bitstream SEUs corrected at given reconfiguration rate

(shorter than expected upsets rate for given orbit)

13

Fault Mitigation TechniquesFault Mitigation Techniques

Triple Modular Redundancy Allows continuous service when architectural

upsets occur Majority Voters determine final output Inherent data redundancy

Triple Modular Redundancy Allows continuous service when architectural

upsets occur Majority Voters determine final output Inherent data redundancy

14

Hardware Development BoardHardware Development Board

Design developed by Naval Post Graduate School

Naval Research Laboratory modified with Virtex II FPGA

Designed for Configurable Fault Tolerant Computing

Design developed by Naval Post Graduate School

Naval Research Laboratory modified with Virtex II FPGA

Designed for Configurable Fault Tolerant Computing

15

Hardware Development BoardHardware Development Board

General I/O

SelectMap

PC104/ISA Bus

JTAG

FLASH

EEPROM

SelectMap

Xilinx VirtexFPGA

Xilinx Virtex IIFPGA

EmbeddedX86 PC

Support FPGA Co-processing FPGA

Only off-chip memory

16

Development Board Radiation TestingDevelopment Board Radiation Testing

NPS and NRL conducted tests at the Crocker Nuclear Lab, U.C. Davis

Protons were emitted from a cyclotron to interact with the FPGAs

Both the Virtex and Virtex II were irradiated

NPS and NRL conducted tests at the Crocker Nuclear Lab, U.C. Davis

Protons were emitted from a cyclotron to interact with the FPGAs

Both the Virtex and Virtex II were irradiated

17

Development Board Radiation TestingDevelopment Board Radiation Testing

Results: 1 upset for every 5 days for the Virtex in orbit Order of magnitude greater for Virtex II in orbit Scrubbing was proven to repair configuration

memory upsets TMR was proven to allow continuous service

between SEU scrubbing intervals

Results: 1 upset for every 5 days for the Virtex in orbit Order of magnitude greater for Virtex II in orbit Scrubbing was proven to repair configuration

memory upsets TMR was proven to allow continuous service

between SEU scrubbing intervals

James C. Coudeyras, “Radiation Testing of The Configurable Fault Tolerant Processor(CFTP) for Space-Based Applications,” Thesis, United States Naval PostGraduate School, 2005

18

The Toolkit ObjectiveThe Toolkit Objective

Build a suite of VHDL designs, C++ software, and tools to give a general FPGA co-processing capability

Modular design for easy integration into new hardware platforms and with new HDL modules

Allow designers to concentrate mainly on the co-processing algorithm and design instead of Fault Tolerance

Build a suite of VHDL designs, C++ software, and tools to give a general FPGA co-processing capability

Modular design for easy integration into new hardware platforms and with new HDL modules

Allow designers to concentrate mainly on the co-processing algorithm and design instead of Fault Tolerance

19

How Objective was AchievedHow Objective was Achieved

1. Create a support system to interface between all components

2. Modify the support system into a Fault Tolerant version

3. Co-processing designer templates and interfaces

1. Create a support system to interface between all components

2. Modify the support system into a Fault Tolerant version

3. Co-processing designer templates and interfaces

20

Support SystemSupport System

Interpret and execute all commands from a host embedded computer

Route data between all components on the board

Program FPGAs Software for embedded x86 to interact with

support system instantiated into an FPGA

Interpret and execute all commands from a host embedded computer

Route data between all components on the board

Program FPGAs Software for embedded x86 to interact with

support system instantiated into an FPGA

21

FLASH INTERFACE

VirtexSelectMapInterface

Virtex IISelectMapInterface

FlashArbitrator

PC104/ISA Bus Interfaces

FLASH Control Interface

Inter-FPGACommunication

InterfaceSupport SystemSupport System

22

PC104 / ISA Bus InterfacePC104 / ISA Bus Interface

Responsible for all data transfers on and off the FPGA board

Two addresses from the host PC are used: Data address

8 Bit data words Control address

A write causes the interface to reset and send a reset to any other modules

A read gives the status of the buffer FIFOs Each main support module use a copy of this

interface

Responsible for all data transfers on and off the FPGA board

Two addresses from the host PC are used: Data address

8 Bit data words Control address

A write causes the interface to reset and send a reset to any other modules

A read gives the status of the buffer FIFOs Each main support module use a copy of this

interface

23

InputFIFO

OutputFIFO

Bus ControlLogic

Bus Data

EM

PT

Y

RE

AD

EN

AB

LE

FULL

Write Enable

Read Enable

EMPTY

FU

LL

WR

ITE

EN

AB

LE

ADDRESS

Bus Data

BUS DATA

BUS WRITE

BUS READ

AENData Out

Data In

PC104 / ISA Bus InterfacePC104 / ISA Bus Interface

24

Flash ComponentsFlash Components Interface

Translates commands and data into a series of signals to interact with a Flash chip

Control Facilitates interaction between the flash interface and

the Bus Interface Arbitrator

Each module that requires Flash access is given a priority number

A modules must relinquish control before another module can be given access

Interface Translates commands and data into a series of signals

to interact with a Flash chip Control

Facilitates interaction between the flash interface and the Bus Interface

Arbitrator Each module that requires Flash access is given a

priority number A modules must relinquish control before another

module can be given access

25

SelectMap InterfacesSelectMap Interfaces Control configuration data flow to the FPGAs configuration interfaces Configuration Clock is used to allow non-uniform data loading Actual FPGA configuration commands are contained in the

configuration data Two versions: Virtex and Virtex II (and Virtex 4)

Control configuration data flow to the FPGAs configuration interfaces Configuration Clock is used to allow non-uniform data loading Actual FPGA configuration commands are contained in the

configuration data Two versions: Virtex and Virtex II (and Virtex 4)

SelectMapInterface

Flash Address

Flash Interface CMD

Flash Data

Flash Data Valid

Flash Interface Busy

Flash Control Request

Current Flash Owner

SelectMap Data Out <7:0>

Write

Chip Select

Configuration Clock

26

SelectMap Interface CommandsSelectMap Interface Commands

Load Start Address Load Stop Address Program Using Flash Load Bus Word Number Program Using Bus Scrub using Flash Abort (Virtex version only)

Load Start Address Load Stop Address Program Using Flash Load Bus Word Number Program Using Bus Scrub using Flash Abort (Virtex version only)

27

Inter-FPGA Communication InterfaceInter-FPGA Communication Interface

Modified ISA Bus interface for delays between physical FPGA chips

Co-processing FPGA has direct access to BUS and own memory space

Designed specifically to use resources of support FPGA

Modified ISA Bus interface for delays between physical FPGA chips

Co-processing FPGA has direct access to BUS and own memory space

Designed specifically to use resources of support FPGA

28

x86 Host PC Programsx86 Host PC Programs

Flash program Flash verify SelectMap Configuration Scrub On and Scrub Off Co-processing echo check program

Flash program Flash verify SelectMap Configuration Scrub On and Scrub Off Co-processing echo check program

29

Fault Tolerant Support SystemFault Tolerant Support System

Add fault tolerance to original support system

Specialized fault mitigation techniques FPGA configuration, scrubbing, and

BitStream manipulation Support System End User Environment

Add fault tolerance to original support system

Specialized fault mitigation techniques FPGA configuration, scrubbing, and

BitStream manipulation Support System End User Environment

30

Support System Main Fault Tolerance Methods

Support System Main Fault Tolerance Methods

Triple Modular Redundancy All modules are made in triplicate Majority voter determines correct output Between HDL modules three voters are used

to keep redundancy of signal paths Place and Route to keep redundant modules

separate

Triple Modular Redundancy All modules are made in triplicate Majority voter determines correct output Between HDL modules three voters are used

to keep redundancy of signal paths Place and Route to keep redundant modules

separate

31

Majority VotersMajority Voters

Tri-State Buffers Not made from SRAM material Only interconnects are susceptible and

correct operation still results Takes multiple SEUs for incorrect function

Tri-State Buffers Not made from SRAM material Only interconnects are susceptible and

correct operation still results Takes multiple SEUs for incorrect function

Carl Carmichael, “Triple Modular Redundancy Design Techniques for Virtex FPGAs,” Xilinx Application Note 197, 2001

32

FSMs and TMRFSMs and TMR

Method keeps FSM synchronized if SEU occurs in state register

Outputs are also majority voted

Method keeps FSM synchronized if SEU occurs in state register

Outputs are also majority voted

Current State Register

Current State Register

Current State Register

Next StateLogic

Next StateLogic

Next StateLogic

INPUTS

V

V

V

33

Block RAM Fault MitigationBlock RAM Fault Mitigation

TMR Necessary for routing structure Inherent data redundancy Method is fine for short term data storage SEUs are not corrected during long term data

storage in individual BRAM

TMR Necessary for routing structure Inherent data redundancy Method is fine for short term data storage SEUs are not corrected during long term data

storage in individual BRAM

34

BRAM TMR w/ RefreshBRAM TMR w/ Refresh

All BRAMs are dual ported Second ports are used to constantly read

data values from three copies, vote, and rewrite the values

Data write collision avoidance Not needed for support system, but useful

for co-processing applications

All BRAMs are dual ported Second ports are used to constantly read

data values from three copies, vote, and rewrite the values

Data write collision avoidance Not needed for support system, but useful

for co-processing applications

Carl Carmichael, “Triple Modular Redundancy Design Techniques for Virtex FPGAs,” Xilinx Application Note 197, 2001

35

Specialized TechniquesSpecialized Techniques

Off FPGA transfers TMR of package pins to a single trace TMR of package pins not available on

development board

Off FPGA transfers TMR of package pins to a single trace TMR of package pins not available on

development board

Carl Carmichael, “Triple Modular Redundancy Design Techniques for Virtex FPGAs,” Xilinx Application Note 197, 2001

36

Specialized TechniquesSpecialized Techniques

Inter-FPGA transfers Not enough pins available to triplicate all

signals 8 data bit and 4 redundant bit Hamming code

used for data Double Error Detection and Single Error

Correction Triplication of all other signals

Inter-FPGA transfers Not enough pins available to triplicate all

signals 8 data bit and 4 redundant bit Hamming code

used for data Double Error Detection and Single Error

Correction Triplication of all other signals

37

Specialized TechniquesSpecialized Techniques

PC104 / ISA Bus transfers Updated fault tolerant module allows for

extended data transfer sizes All data sent is encoded in (8,4) hamming

code No redundancy available for other signals

PC104 / ISA Bus transfers Updated fault tolerant module allows for

extended data transfer sizes All data sent is encoded in (8,4) hamming

code No redundancy available for other signals

38

Flash Memory Data ReadsFlash Memory Data Reads

CRC values embedded every 512 16-bit flash words by Flash program

Fault tolerant Flash interface uses BRAM buffer cache to hold every 512 blocks of data when data is requested

If CRC value is incorrect data cache is flushed and the data is read again

CRC values embedded every 512 16-bit flash words by Flash program

Fault tolerant Flash interface uses BRAM buffer cache to hold every 512 blocks of data when data is requested

If CRC value is incorrect data cache is flushed and the data is read again

39

Configuration and ScrubbingConfiguration and Scrubbing

Three modes of configuration determined by the bit file created using Xilinx Bitgen tool Initial Configuration

Contains startup commands Reconfiguration

Contains shutdown and startup commands Doesn’t require a powercycle, but FPGA is taken out of service

Active Partial Reconfiguration Used for scrubbing Rewrites static portions of bitstream Removes initial BRAM contents Any portion of design could be masked out

Three modes of configuration determined by the bit file created using Xilinx Bitgen tool Initial Configuration

Contains startup commands Reconfiguration

Contains shutdown and startup commands Doesn’t require a powercycle, but FPGA is taken out of service

Active Partial Reconfiguration Used for scrubbing Rewrites static portions of bitstream Removes initial BRAM contents Any portion of design could be masked out

40

Support System End User EnvironmentSupport System End User Environment

Flash program and verify host programs Store initial co-processing, co-processing scrubbing,

and support scrubbing bitstreams at designated flash addresses

Scrub On and Off Selectmap interfaces begin scrubbing at given

periods using flash data configuration can also still be done across the bus

C++ examples programs and headers files for data transfers to the co-processing FPGA

Flash program and verify host programs Store initial co-processing, co-processing scrubbing,

and support scrubbing bitstreams at designated flash addresses

Scrub On and Off Selectmap interfaces begin scrubbing at given

periods using flash data configuration can also still be done across the bus

C++ examples programs and headers files for data transfers to the co-processing FPGA

41

Toolkit User EnvironmentToolkit User Environment

Fault Tolerant Support System Controls all data flow Modular design for quick integration to different

physical systems with varying number of FPGAs C++ programs and header files

Co-Processing Tools Templates for HDL co-processing components HDL “hooks” for data transfers C++ programs and header files

Fault Tolerant Support System Controls all data flow Modular design for quick integration to different

physical systems with varying number of FPGAs C++ programs and header files

Co-Processing Tools Templates for HDL co-processing components HDL “hooks” for data transfers C++ programs and header files

42

Future Toolkit AdditionsFuture Toolkit Additions

Different bus modules for interaction with radiation hardened microprocessors

Co-processing library Use support system on single and multiple

FPGA boards

Different bus modules for interaction with radiation hardened microprocessors

Co-processing library Use support system on single and multiple

FPGA boards