a combinatorial group testing method for fpga fault location

34
Ronald F. DeMara, Carthik A. Sharma University of Central Florida A Combinatorial Group Testing Method A Combinatorial Group Testing Method for FPGA Fault Location

Upload: leandra-winters

Post on 01-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

A Combinatorial Group Testing Method for FPGA Fault Location. Ronald F. DeMara, Carthik A. Sharma University of Central Florida. Introduction. Field Programmable Gate Arrays Gate-array-based reconfigurable architecture Matrix of Logic Cells (Look-Up Tables) surrounded by peripheral I/O cells - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Combinatorial Group Testing Method for FPGA Fault Location

Ronald F. DeMara, Carthik A. SharmaUniversity of Central Florida

Ronald F. DeMara, Carthik A. SharmaUniversity of Central Florida

A Combinatorial Group Testing MethodA Combinatorial Group Testing Methodfor

FPGA Fault Location

Page 2: A Combinatorial Group Testing Method for FPGA Fault Location

Introduction

Field Programmable Gate ArraysField Programmable Gate Arrays Gate-array-based reconfigurable architectureGate-array-based reconfigurable architecture

Matrix of Logic Cells (Look-Up Tables) surrounded by Matrix of Logic Cells (Look-Up Tables) surrounded by peripheral I/O cellsperipheral I/O cells

Capabilities:Capabilities: Runtime reconfigurationRuntime reconfiguration On-chip processor core & Millions of gate-equivalent logic On-chip processor core & Millions of gate-equivalent logic

elementselements

Millions of FPGA devices produced annually: most Millions of FPGA devices produced annually: most SRAM-basedSRAM-based

Used in mission-critical applicationsUsed in mission-critical applications Remote systems & Hazardous Environments Space Applications – Satellites, probes, and shuttles

Page 3: A Combinatorial Group Testing Method for FPGA Fault Location

Group Testing Algorithms

• Origin – World War II Blood testingOrigin – World War II Blood testing Problem: Test samples from millions of new recruits Solution: Test blocks of sample before testing

individual samples

• Problem DefinitionProblem Definition Identify subset Q of defectives from set P

Minimize number of tests Test v-subsets of P Form suitable blocks

Page 4: A Combinatorial Group Testing Method for FPGA Fault Location

Previous Work

• Pre-compiled Column-Based Dual FPGA architecture [Mitra04] Autonomous detection, repair by shifting pre-compiled columns Isolation using distributed CED-checkers and “blind” reconfiguration

attempts

• Overview of Combinatorial Group Testing and Applications [Du00] Provides taxonomy and general algorithms for applying CGT Examples of CGT applications: DNA clone library filtering, vaccine

screening, computer fault diagnosis, etc.

• CGT Enhanced Circuit Diagnosis [Kahng04] Present doubling, halving etc for circuit fault diagnosis using BIST,

CGT Requires ability to test resources individually

• Chinese Remainder Sieve technique [Eppstein05] Efficient non-adaptive and two-stage CGT based on prime number

driven test formation Improved algorithms for practical problem sizes (n < 1080) with small

number of defectives (d < 4)

Page 5: A Combinatorial Group Testing Method for FPGA Fault Location

Device Failure

Duration:

Target:

Detection:

Isolation:

Diagnosis:

Recovery:

Transient: SEU Permanent: SEL, Oxide Breakdown, Electron Migration, LPD

Repetitive Readback

DeviceConfiguration

Approach: TMRBIST

Processing Datapath

DeviceConfiguration

Processing Datapath

Bitwise Comparison

Invert BitValue

IgnoreDiscrepancy

MajorityVote

STARS

SupplementaryTestbench

CartesianIntersection

Worst-caseClock Period

Dilation

Replicate inSpare Resource

Characteristics

MethodsCED

Duplex Output

Comparison

Fast Run-time Location

Select SpareResource

DuplexOutput

Comparison

unnecessary

Repetitive Intersections

EvolutionaryAlgorithm usingIntrinsic Fitness

Evaluation

Fault-Handling Techniques

Dueling

CGT-Based

Page 6: A Combinatorial Group Testing Method for FPGA Fault Location

Isolation Problem Outline

ObjectivesObjectives Locate faulty logic and/or interconnect resource: a single stuck-at fault Locate faulty logic and/or interconnect resource: a single stuck-at fault

model is assumedmodel is assumed Online Fault Isolation: device not entirely removed from serviceOnline Fault Isolation: device not entirely removed from service

FeaturesFeatures Runtime Reconfiguration: FPGA resources configured dynamicallyRuntime Reconfiguration: FPGA resources configured dynamically Utilize Runtime Inputs: avoid special test-vectors, improve availabilityUtilize Runtime Inputs: avoid special test-vectors, improve availability

Constraints Constraints Use pre-designed configurations: defined by target applicationUse pre-designed configurations: defined by target application Subsets under test have constant resource utilization range for a given Subsets under test have constant resource utilization range for a given

isolation problemisolation problem Resource grouping influences fault articulation: resource-mapping and Resource grouping influences fault articulation: resource-mapping and

input vector might mask hardware faultsinput vector might mask hardware faults Do not use specialized “block designs”Do not use specialized “block designs” Runtime reconfiguration limited to column-swappingRuntime reconfiguration limited to column-swapping ““Non-reasonable” algorithm: “tests” may be repeated without gaining Non-reasonable” algorithm: “tests” may be repeated without gaining

new isolation informationnew isolation information

Page 7: A Combinatorial Group Testing Method for FPGA Fault Location

Fault Location Using Dueling

The set of all competing configurations is represented by S.

Set Ck represents the resources utilized by configuration k.

Each competing configuration k, 1 < k < |S| has a unique binary

Usage MatrixUsage Matrix Uk, 1 < k < p.

Elements Uk[i,j], 1 < i < m, 1 < j n, where m and n represent the rows and columns in the device layout respectively.

Elements Uk[i,j] = 1 denote the usage of resource (i, j) by Ck.

The History MatrixHistory Matrix H, with elements H[i,j] 1 < i < m, 1 < j < n, is an integer matrix used to represent the relative fitness of individual resources.

H[i,j] provides instantaneous relative fitness values of resources.

Page 8: A Combinatorial Group Testing Method for FPGA Fault Location

Dueling Example

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 00 0 1 0 0 0 0 0 0 00 0 0 0 0 1 0 1 0 00 0 0 1 0 0 0 0 0 00 0 1 0 0 1 1 0 0 00 0 0 0 1 0 0 0 0 00 0 1 0 0 0 0 1 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 00 0 0 1 0 1 1 0 0 00 0 1 1 0 0 1 0 0 00 0 1 0 1 0 0 0 0 00 0 1 0 0 1 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 1 1 1 1 0 0 0

0 0 2 1 0 0 1 0 0 0

0 0 1 0 1 1 0 1 0 0

0 0 1 1 0 1 0 0 0 0

0 0 1 0 0 1 1 0 0 0

0 0 0 0 1 0 0 0 0 0

0 0 1 0 0 0 0 1 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

H H [i,j][i,j]@ t = 0

H H [i,j][i,j]@ t = 2

UU11 UU22

• H H [i,j] changes after [i,j] changes after CC1 1 andand C C2 2 are loadedare loaded

• UU11 and and UU22 are corresponding are corresponding Usage MatricesUsage Matrices

• (3,3) is identified as the faulty resource(3,3) is identified as the faulty resource

Page 9: A Combinatorial Group Testing Method for FPGA Fault Location

Initiate H Matrix

Select & Load Competing Configurations

Increment Corresponding H Matrix Elements

Discrepancy?

Decrement Corresponding H Matrix Elements

No Yes

Unique MaxIn H?

YesNo

Return Indices of Faulty Element

Stasis after n Iterations?

Yes

No

Swap 50% Suspect Columns

Modified Halving

Initially all H[i,j] = 0

Selection Process can be Adaptive

Fitness Augmentationcan be non-linear

Columns can be swapped with any other

Columns

Page 10: A Combinatorial Group Testing Method for FPGA Fault Location

FPGA Arrangement for Dueling

Configurations in PopulationConfigurations in Population• C = CL CR

• CL = subset of left-half configurations• CR = subset of right-half configurations• |CL|=|CR |= |C|/2

Reconfiguration Algorithm

`

SR A M-based FPGA

LHalf-Configuration

Discrepancy Check L Discrepancy Check R

Function Logic L

CONFIGURATION BIT STREAM

INPUT DATA

Function Logic R

DATA OUTPUT

FE

ED

BA

CK

RHalf-Configuration

CONTROL

OF

F-C

HIP

EE

PR

OM

( N

OT

E:

a no

n-vo

latil

e m

emor

y is

alr

eady

req

uire

d to

boo

t an

y S

RA

MF

PG

A fr

om c

old

star

t ..

. thi

s is

not

an

addi

tiona

l chi

p )

Page 11: A Combinatorial Group Testing Method for FPGA Fault Location

Isolation Progress without Halving

0 5 10 15 20 25 30

100

1000

10000

100

1000

10000

Nu

mb

er

of S

usp

ect

ed

Fa

ulty

Ele

me

nts

(lo

g)

Number of Iterations

Without HalvingWithout Halving

• Initially |S| = 20,000

• Resource Utilization = 40%

• Number of suspected faulty elements constant at 36 after 23 iterations

• No subsequent improvement due to lack of differentiating information between competing configurations

Temporary stasis in isolation due to insufficient design

diversity

Page 12: A Combinatorial Group Testing Method for FPGA Fault Location

0 5 10 15 20 25

100

1000

10000

Nu

mb

er

of S

usp

ect

ed

Fa

ulty

Ele

me

nts

(lo

g)

Number of Iterations

Dueling with Modified Halving

Dueling with HalvingDueling with Halving

• Halving works by Halving works by swapping half the used swapping half the used columns with unused onescolumns with unused ones • Halving progressively Halving progressively reduces the size of the set of reduces the size of the set of suspected faulty elementssuspected faulty elements

• Isolation proceeds till a Isolation proceeds till a single faulty element is single faulty element is isolatedisolated

• Fault isolated after 19 Fault isolated after 19 iterationsiterations

Symptoms of stasis invoke

halving procedure for fast isolation

Page 13: A Combinatorial Group Testing Method for FPGA Fault Location

Effect of Total Number of Elements

Increased Problem SizeIncreased Problem Size

• Number of Elements = Number of Elements = (Number of Rows x (Number of Rows x Number of ColumnsNumber of Columns • As the size of the array As the size of the array containing the fault containing the fault increases, the increase in the increases, the increase in the required number of required number of iterations is minimaliterations is minimal

• For 1 mill. elements, only For 1 mill. elements, only 27.4 iterations required.27.4 iterations required.0 100 200 300 400 500 600 700 800 900 1000 1100

0

5

10

15

20

25

30

Ave

rage

Num

ber

of It

erat

ions

For

Fau

lt Is

olat

ion

Number of Rows and Columns in Device

Population Size = 40Resource Utilization = 50%

Page 14: A Combinatorial Group Testing Method for FPGA Fault Location

Effect of Population Size

Population SizePopulation Size

• Single fault in S is assumedSingle fault in S is assumed

•As pop. size increases, As pop. size increases, isolation expected to be fasterisolation expected to be faster • Increased pop. size implies Increased pop. size implies more initial designsmore initial designs

• A population size of 30 A population size of 30 seems to be an ideal tradeoff seems to be an ideal tradeoff between between ease of isolationease of isolation, , and the and the difficulty of difficulty of generating increased numbergenerating increased number of individuals.of individuals.

0 20 40 60 80 10010

12

14

16

18

20

22

24

26

28

Ave

rag

e N

um

be

r o

f Ite

ratio

ns

for

Fa

ult

Iso

latio

n

Population Size

Resource Utilization (%) = 50Number of Resources = 40000

Increased population size

provides minimal added benefit

Page 15: A Combinatorial Group Testing Method for FPGA Fault Location

Effect of Resource Utilization

• Moderate resource Moderate resource utilization ideal for isolationutilization ideal for isolation • Rate of isolation progress Rate of isolation progress low with extreme utilization low with extreme utilization characteristicscharacteristics

• Isolation takes longer Isolation takes longer when less than 20% or when less than 20% or greater than 80% of the greater than 80% of the available resources are available resources are utilized.utilized.10 20 30 40 50 60 70 80 90

15

20

25

30

35

40

45

Ave

rage

Num

ber

of It

era

tions

for

Fau

lt Is

olat

ion

Resource Utilization (%)

Population Size=40 Population Size=20

Number of Resources = 40000

2040

Page 16: A Combinatorial Group Testing Method for FPGA Fault Location

Future Work

• Conducting Tests using Benchmark CircuitsConducting Tests using Benchmark Circuits ISCAS89 s38584 with 11448 gates: sequential logicISCAS89 s38584 with 11448 gates: sequential logic ISCAS85 circuits with max 3513 gates: combinational ISCAS85 circuits with max 3513 gates: combinational

logiclogic Compression/ Signal Processing algorithms, such as the Compression/ Signal Processing algorithms, such as the

Lempel-Ziv (LZ) compression scheme [Mitra04]Lempel-Ziv (LZ) compression scheme [Mitra04]

• Development of an architecture to enable Development of an architecture to enable column-swappingcolumn-swapping Multi-layer Runtime Reconfigurable Architecture (MRRA) Multi-layer Runtime Reconfigurable Architecture (MRRA)

being prototypedbeing prototyped

Page 17: A Combinatorial Group Testing Method for FPGA Fault Location

Backup Slides

• On following pages …

Page 18: A Combinatorial Group Testing Method for FPGA Fault Location

Online Dueling Evaluation

• ObjectiveObjective Isolate faults by successive intersection between sets of FPGA

resources used by configurations Analyze complexity of Isolation process

• VariablesVariables Total resources available

Measured in number of LUTs Number of Competing Configurations

Number of initial “Seed” designs in CRR process Degree of Articulation

Some inputs may not manifest faults, even if faulty resource used by individual

Resource Utilization Factor Percentage of FPGA resources required by target application/design

Number of Iterations for Isolation Measure of complexity and time involved in isolating fault

Page 19: A Combinatorial Group Testing Method for FPGA Fault Location

Discrepancy Mirror Circuit

Fault CoverageFault CoverageComponent Fault Scenarios Fault-Free

Function Output A Fault Correct Correct Correct Correct

Function Output B Correct Fault Correct Correct Correct

XNORA Disagree (0) Disagree (0) Fault : Disagree(0) Agree (1) Agree (1)

XNORB Disagree (0) Disagree (0) Agree (1) Fault : Disagree(0) Agree (1)

BufferA 0 0 High-Z 0 1

BufferB 0 0 0 High-Z 1

Match Output 0 0 0 0 1

Page 20: A Combinatorial Group Testing Method for FPGA Fault Location

Influence of LUT utilizationInfluence of LUT utilization

Perpetually Articulating InputsPerpetually Articulating Inputswith Equiprobable Distributionwith Equiprobable Distribution

Intermittently Articulating InputsIntermittently Articulating Inputswith Equiprobable Distributionwith Equiprobable Distribution

• expected number of pairings grows sub-linearly in number of resources

• utilization below 20% or above 80% implicates (or exonerates) a smaller sub-set of resources

• 50% utilization, the expected number of pairings for 1,000, 10,000, and 100,000 resources are 11.1, 14.9, and 17.6

• at 90% utilization mean value of 258 pairings are required to isolate the faulty resource.

Page 21: A Combinatorial Group Testing Method for FPGA Fault Location

Accommodating Multi-bit Word Widths

• Proof of conceptProof of concept The present circuit works efficiently Demonstrates important Dueling-enabled isolation method

• StrategiesStrategies Use an array of detectors

attempt to minimize points of failure as word-width increases Number of logic resources used is acceptable for smaller

circuits Create new circuit or scheme, combining fault tolerant

coding-based methods with single-fault secure circuit Current research focused on improving detector by

investigating codes, and fault-secure circuits

Page 22: A Combinatorial Group Testing Method for FPGA Fault Location

Pull-down Resistor Considerations

• Proof of conceptProof of concept The present circuit works in a verifiable correct manner Can utilize synthesized (digital) pull-down resistor which

simulate the behavior of analog resistors Demonstrates Dueling-enabled isolation method Can be utilized without implementation problems for

Custom-VLSI designs

• Alternative ApproachAlternative Approach Alternate detector circuits for FPGA implementation are Alternate detector circuits for FPGA implementation are

under investigationunder investigation Avoid using Tri-state buffers, pull-down resistors and use Avoid using Tri-state buffers, pull-down resistors and use

native digital components available on FPGAsnative digital components available on FPGAs

Page 23: A Combinatorial Group Testing Method for FPGA Fault Location

graceful degredationvia ranking of alternatives

Evolutionary Computation strategies effective for more than just repair phase: continually detect, rank, and isolate faults entirely within the underlying data throughput flow

Competitive Runtime Reconfiguration (CRR)

no test

vectors

diverse alternatives working

a-priori

fault detection by robust consensus

over time device remains

online during repair

no reconfiguration when fault-free

fault isolation is model-free and self-calibrating

completely-repaired

criteria can be ignored

performance readily adjustable

novel fitness novel fitness assessment assessment via via pairwise pairwise discrepancydiscrepancy without any without any

pre-conceived pre-conceived oracle for oracle for

correctness correctness (emergent (emergent behavior)behavior)

ConceptualConceptualInnovationInnovation

Reconfiguration Algorithm

`

SR A M-based FPGA

LHalf-Configuration

Discrepancy Check L Discrepancy Check R

Function Logic L

CONFIGURATION BIT STREAM

INPUT DATA

Function Logic R

DATA OUTPUT

FE

ED

BA

CK

RHalf-Configuration

CONTROL

OF

F-C

HIP

EE

PR

OM

( N

OT

E:

a n

on

-vo

lati

le m

em

ory

is a

lre

ad

y r

eq

uir

ed

to

bo

ot

an

y S

RA

M

FP

GA

fro

m c

old

sta

rt .

.. t

his

is n

ot

an

ad

dit

ion

al c

hip

)

checking logic part of

individual hence also

competes for correctness

failures in population memory covered

Initialization Population partitioned into

functionally-identical yetphysically-distincthalf-configurations

Fitness Adjustment

update fitness of onlyL and R based ondetection results

either L's or R'sfitness < Repair

Threshold?

Selectionchoose

FPGA configuration(s)labeled L and R

Detectionapply functional inputs

to compute FPGAoutputs using L, R

Adjust Controlsdetection mode, overlap interval, ...

invoke

GeneticOperators only once

and only on L or R

L=R

L=R

PRIMARYLOOP

discrepancyfree

L, R results

NO

YES

is

Page 24: A Combinatorial Group Testing Method for FPGA Fault Location

pristine

suspect

refurbished

under repair

partial repair

L R

L = R

complete repair

primordial

L = R

L R

L R

L = R

L = R

LR

1

2

3

4

5

6

7

8

fi fOT

:L = R

: fi fOT

9

10

11

fi < fRT

L R:

fi < fRT

L R:

integral w ith

:fi fRT

:fi < fOT

COMPETITION

C O M P E T I T I O N

E V O L U T I O N

States Transitions during lifetime of States Transitions during lifetime of

iithth Half-Configuration Half-Configuration

Configuration Health States

Discrepancy OperatorDiscrepancy Operator• Baseline Discrepancy Operator is dyadic operator with binary output:

• Z(Ci) is FPGA data throughput output of configuration Ci

Othewise

CZCZCC

Ri

LiR

iLi

)()(

1

0

Rji

Ljii CEORC ,,j =RS:

(Hamming Distance)

Rji

Ljii CEORC ,,j ^ =WTA:

(Equivalence)

Page 25: A Combinatorial Group Testing Method for FPGA Fault Location

Procedural Flow under Consensus-Based Evaluation

Initialization Population partitioned into

functionally-identical yetphysically-distincthalf-configurations

Fitness Adjustment

update fitness of onlyL and R based ondetection results

either L's or R'sfitness < Repair

Threshold?

Selectionchoose

FPGA configuration(s)labeled L and R

Detectionapply functional inputs

to compute FPGAoutputs using L, R

Adjust Controlsdetection mode, overlap interval, ...

invoke

GeneticOperators only once

and only on L or R

L=R

L=R

PRIMARYLOOP

discrepancyfree

L, R results

NO

YES

is

InitializationInitializationPartition P into sub-populations of size |P|/2 to designate

physical FPGA left-half or right-half resource utilization

Consensus Based EvaluationConsensus Based EvaluationDiscrepancy Operator: CL CRFour Fitness States :Pristine Suspect Under Repair Refurbished

RegenerationRegenerationGenetic Operators recover based on Reintroduction Rate Operators only applied once then offspring returned to “service” without concern about increasing fitness

Page 26: A Combinatorial Group Testing Method for FPGA Fault Location

GA Parameters & Experiments

SpeciationSpeciation Two-point crossover between individuals from same sub-groupTwo-point crossover between individuals from same sub-group Crossover points chosen to prevent intra-CLB crossoverCrossover points chosen to prevent intra-CLB crossover Breeding occurs exclusively among members of sub-populationsBreeding occurs exclusively among members of sub-populations Maintains non-interfering resource use among Maintains non-interfering resource use among L, RL, R

GA operatorsGA operatorsExternal-Module-CrossoverExternal-Module-CrossoverInternal-Module-Crossover Internal-Module-Crossover Internal-Module-MutationInternal-Module-Mutation

GA parametersGA parametersPopulation size : 20 individuals Population size : 20 individuals Crossover rate : 5% Crossover rate : 5% Mutation rate : up to 80% per bitMutation rate : up to 80% per bit

Fault Isolation CharacteristicsFault Isolation Characteristics Regenerative ExperimentsRegenerative Experiments

Demonstrate …Demonstrate … Objective fitness function replaced Objective fitness function replaced

by the Consensus-based by the Consensus-based Evaluation Approach and Relative Evaluation Approach and Relative FitnessFitness

Elimination of additional test vectorsElimination of additional test vectors

Experiments …Experiments …

Page 27: A Combinatorial Group Testing Method for FPGA Fault Location

Impact of Fault on Viable Individuals

• Existence of Positive Test VectorExistence of Positive Test Vector Input Ip comprises a positive test vector iff Cv(Ip) Cf(Ip) = 1 where Cv denotes a

viable configuration and Cf denotes a faulty configuration So if a discrepancy is visible then some Ip exists which manifests the fault

• Minimal Case whenMinimal Case when Ip is Uniqueis Unique

Ip is unique if fault is observable under exactly one test vector

• Probability Mass Function for EncounteringProbability Mass Function for Encountering Ip in Minimal Casein Minimal Case Consider Ew=600 yielding 99.5% coverage for a module with input space W=64

The number of input occurrences, 0 i 600, that randomly encounter Ip to identify

the fault is governed by the probability density function:

p.m.f.(i)= where

where D is the length of EwD

iD

W

nW

i

D

1

16000,1,64,600 inWD

Page 28: A Combinatorial Group Testing Method for FPGA Fault Location

Isolation of a single faulty individual with 1-out-of-64 impact

• Outliers are identified after EW iterations have elapsed• Expected D.V. = (1/64)*600 = 9.375 from individual impacted by fault• Isolated individual’s DV differs from the average DV by 33 after 1 or more observation intervals of length

EW

Page 29: A Combinatorial Group Testing Method for FPGA Fault Location

Isolation of a single faulty L individual with 10-out-of-64 impact

Compare with 1-out-of-64 fault impactCompare with 1-out-of-64 fault impact Expected DV of (10/64)*600 = 93.75 for faulty configuration One isolation will be complete approx. once in every 93.75/5 = 19 Sliding Windows Fault Isolation achieved is 100%

Page 30: A Combinatorial Group Testing Method for FPGA Fault Location

Isolation of 8 faulty individuals L4&R4 with 1-out-of-64 impact

• Expected isolations do not occur approx. 40% of the timeExpected isolations do not occur approx. 40% of the time Average discrepancy value of the population is higher Outlier isolation difficult Multiple faulty individual, Discrepancies scattered

Page 31: A Combinatorial Group Testing Method for FPGA Fault Location

Regeneration PerformanceRegeneration Performance

Difference (vs. Hamming Distance)Evaluation Window, Ew = 600Suspect Threshold: DVS = 1-6/600=99%Repair Threshold: DVR = 1-4/600 = 99.3%Re-introduction rate: r = 0.1

ParametersParameters:

Repairs evolvedRepairs evolved in-situ, in real-time, without additional test in-situ, in real-time, without additional test vectors, vectors, while allowing device to remainwhile allowing device to remain partially online. partially online.

3x3 Multiplier Experiment

Number Fault Location

Failure Type

Correctness

after Fault

Total

Iterations

Discrepant Iterations

Repair Iterations

Final Correctness

Effective Throughput

1 CLB3,LUT0,Input1 Stuck-at-1 52 / 64 17920100 421123 1194 64 / 64 97.65

2 CLB6,LUT0,Input1 Stuck-at-0 33 / 64 802050 17034 47 64 / 64 97.87

3 CLB5,LUT2,Input0 Stuck-at-1 22 / 64 3134660 68027 193 64 / 64 97.83

4 CLB7,LUT2,Input0 Stuck-at-0 38 / 64 8158280 185193 513 64 / 64 97.73

5 CLB9,LUT0,Input1 Stuck-at-0 40 / 64 2332670 71613 219 64 / 64 96.93

Average 32.6 / 64 6469550 152598 433 64 / 64 97.6

Page 32: A Combinatorial Group Testing Method for FPGA Fault Location

Multilayer Runtime Reconfiguration Architecture Multilayer Runtime Reconfiguration Architecture

(MRRA)(MRRA)

Fault-RepairGenetic Algorithm

ReconfigurationEngineM

icro

proc

esso

r

System Bus

Virtex-II ProFPGA RAM

Control S

ystem

• Develop MRRA fast Develop MRRA fast reconfiguration paradigm for the reconfiguration paradigm for the CRR approachCRR approach

• Validate with real hardware Validate with real hardware platform along with detailed platform along with detailed performance analysis performance analysis

• First general-purpose framework First general-purpose framework for a wide variety of applications for a wide variety of applications requiring dynamic reconfiguration requiring dynamic reconfiguration

• Extend existing theories on Extend existing theories on reconfiguration reconfiguration

Page 33: A Combinatorial Group Testing Method for FPGA Fault Location

Avnet FPGA Development Board

PCI I nt er f ace

Virtex-IIPro FPGA

Off ChipRAM

Controlhosted on

PC

FP

GA

Ou

tp

ut

Bit file

Input Data

Loosely Coupled SolutionLoosely Coupled Solution

The entire system operates on a The entire system operates on a 32-bit basis32-bit basis

The The Virtex-II ProVirtex-II Pro is mounted on a is mounted on a development board which can then development board which can then

be interfaced with a WorkStation be interfaced with a WorkStation running running XilinxXilinx EDK and ISE. EDK and ISE.

Page 34: A Combinatorial Group Testing Method for FPGA Fault Location

For further info … EH Websitehttp://cal.ucf.edu