![Page 1: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/1.jpg)
Ronald F. DeMara, Carthik A. SharmaUniversity of Central Florida
Ronald F. DeMara, Carthik A. SharmaUniversity of Central Florida
Self-Checking Fault Detection Self-Checking Fault Detection using
Discrepancy Mirrors
PDPTA 2005PDPTA 2005 Las Vegas Las Vegas
PDPTA 2005PDPTA 2005 Las Vegas Las Vegas
![Page 2: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/2.jpg)
Fault Handling Overview
• FailureFailure Manifestation of a fault Deviation from expected behavior
• DetectionDetection Identify occurrence of fault
Fully articulating inputs Intermittently articulating inputs
Methods Coding based schemes Redundancy
• IsolationIsolation Physical location of fault PCI-based card used for Xilinx
Virtex II-Pro Based Autonomous Repair Testbed
![Page 3: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/3.jpg)
Ideal Detection Characteristics
• Faults in the detector are covered by itselfFaults in the detector are covered by itself Fault-secure Self-testing No “Golden Elements”
• Multiple types of faults handled by same detectorMultiple types of faults handled by same detector Transient and Permanent faults Logic and Interconnect faults
• Minimum number of false-positivesMinimum number of false-positives Accuracy and reliability
• Minimal power consumptionMinimal power consumption
• Verifiable correctnessVerifiable correctness
• Practical AssessmentPractical Assessment Fitness assessment should be tractable
![Page 4: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/4.jpg)
Discrepancy Mirror
Fault CoverageFault Coverage
• Mechanism for Checking-the-Checker (“golden element” problem)
• Makes checker part of configuration that competes for correctness [DeMara PDPTA-05]
![Page 5: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/5.jpg)
Discrepancy Mirror Circuit
Fault CoverageFault CoverageComponent Fault Scenarios Fault-Free
Function Output A Fault Correct Correct Correct Correct
Function Output B Correct Fault Correct Correct Correct
XNORA Disagree (0) Disagree (0) Fault : Disagree(0) Agree (1) Agree (1)
XNORB Disagree (0) Disagree (0) Agree (1) Fault : Disagree(0) Agree (1)
BufferA 0 0 High-Z 0 1
BufferB 0 0 0 High-Z 1
Match Output 0 0 0 0 1
![Page 6: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/6.jpg)
Discrepancy Mirror Truth Table
A B XNORA XNORB ENBA ENBB TRIA TRIB MATCH
0 0 1 1 1 1 1 1 1
0 1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1
• Discrepancy Mirror Truth Table ensures complete coverage of detector.
• Single Point of Failure reduced to a stuck-at fault exposure for MATCH output (Wired-Or)
![Page 7: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/7.jpg)
Discrepancy-Enabled Isolation
![Page 8: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/8.jpg)
Discrepancy Mirror Approach
• Selection PhaseSelection Phase Two candidates chosen from population Use mutually exclusive resources Carry out computation in tandem
• Detection PhaseDetection Phase Discrepancy Mirror compares outputs MATCH output signifies fault free configurations Faults in the detector also covered
• Preference Adjustment ProcessPreference Adjustment Process Detector output over time indicates relative fitness Relative fitness can be used to choose candidates
![Page 9: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/9.jpg)
CRR Arrangement in SRAM FPGA
Configurations in PopulationConfigurations in Population• C = CL CR
• CL = subset of left-half configurations• CR = subset of right-half configurations• |CL|=|CR |= |C|/2
Discrepancy OperatorDiscrepancy Operator• Baseline Discrepancy Operator is dyadic operator with binary output:
• Z(Ci) is FPGA data throughput output of configuration Ci
• Each half-configuration evaluates using embedded checker (XNOR gate) within each individual
• Any fault in checker lowers that individual’s fitness so that individual is no longer preferred and eventually undergoes repair
Othewise
CZCZCC
Ri
LiR
iLi
)()(
1
0
Reconfiguration Algorithm
`
SR A M-based FPGA
LHalf-Configuration
Discrepancy Check L Discrepancy Check R
Function Logic L
CONFIGURATION BIT STREAM
INPUT DATA
Function Logic R
DATA OUTPUT
FEE
DB
AC
K
RHalf-Configuration
CONTROL
OFF
-CH
IP E
EPR
OM
( NO
TE: a
non
-vol
atile
mem
ory
is a
lread
y re
quire
d to
boo
t any
SR
AMFP
GA
from
col
d st
art .
.. th
is is
not
an
addi
tiona
l chi
p )
Rji
Ljii CEORC ,,j =RS:
(Hamming Distance)
Rji
Ljii CEORC ,,j ^ =WTA:
(Equivalence)
![Page 10: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/10.jpg)
Overview of FPGA operation
Competing ConfigurationsCompeting Configurations• Configurations A and B are physically distinct• CA = subset consisting of ‘A’ configurations• CB = subset consisting of ‘B’ configurations• |CA|=|CB |= |C|/2
Discrepancy OperatorDiscrepancy Operator• Baseline Discrepancy Operator is dyadic operator with binary output:
• Z(Ci) is FPGA data throughput output of configuration Ci
• Each half-configuration evaluates using embedded checker (XNOR gate) within each individual
• Any fault in checker or functional logic lowers fitness of resources used by that individual leading to isolation
Otherwise
CZCZCC
Bi
AiB
iA
i
)()(
1
0
Reconfiguration Algorithm
`
SRAM-based FPGA
Configuration A
Discrepancy Mirror A Discrepancy Mirror B
Function Logic A
CONFIGURATION BIT STREAM
INPUT DATA
Function Logic B
DATA OUTPUT
FE
ED
BA
CK
Configuration B
CONTROL
OF
F-C
HIP
EE
PR
OM
( N
OT
E:
a no
n-vo
latil
e m
emor
y is
alre
ady
requ
ired
to b
oot
any
SR
AM
FP
GA
fro
m c
old
star
t ..
. th
is is
not
an
addi
tiona
l chi
p )
![Page 11: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/11.jpg)
Discrepancy Mirror Schematic:CMOS
Pspice SchematicPspice Schematic
• 44 p- and n-channel MOS Transistors
• 1.5 micron minimum width
• 600 nm length
• Width of p-mos transistors = 3*width of n-mos trans.
![Page 12: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/12.jpg)
Discrepancy Mirror Schematic:Xilinx
Xilinx SchematicXilinx Schematic
• Virtex-II Pro FPGA
• ModelSim-II Simulator
• Emulated (digital) Pull-down Resistor
![Page 13: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/13.jpg)
Discrepancy Mirror Simulation:CMOS Circuit
Transient ResponseTransient Response
• Behavior conforms to specifications
• Correct identification of Discrepancy
![Page 14: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/14.jpg)
Discrepancy Mirror Simulation:Xilinx ModelSim-II
Circuit ResponseCircuit Response
Output ‘High’ == 1 when input q1 == q2
Output ‘Low’ when input q1 != q2. In Xilinx FPGAs, ‘Low’ is not exactly equal to zero, but is a Logic ‘zero’ nevertheless.
![Page 15: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/15.jpg)
Fault Location Experiments
• Two experiments conductedTwo experiments conducted C-language program simulator Locate fault by successive intersections
v-subsets or groups of resources Fault identified after m comparisons – what is the value of m?
Identify number of iterations required to identify single-fault Random inputs, Single stuck-at fault Expected number of pairings over 100 simulations One ‘resource’ equivalent to one CLB ( > 10 gates)
• Experiment 1Experiment 1 Perpetually articulating inputs
• Experiment 2Experiment 2 Intermittently articulating inputs
![Page 16: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/16.jpg)
Fault Location Using Dueling
Let UU denote the set of all logic resources on the FPGASS denote the pool of resources suspected of being faultyInitially denotes the set of resources used by ith configuration.
To isolate the fault, m successive intersections,
are performed at the end of which |S| = 1
With pre-designed partitions to achieve maximal isolation• Isolation can be completed in 2n iterations, where n = | |
|||| US
UCi
),( mkjiCC jkj
iC
![Page 17: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/17.jpg)
Analysis with Perpetually Articulating Inputs
Perpetually Articulating Perpetually Articulating InputsInputs• No observed discrepancy implies fault-free resources
Best Case (50% Utilized Capacity):• 11.1 pairings for 1,000 resources• 17.6 pairings for 100,000 resources
Most Demanding Case:63.7 pairings for 100,000 resources with 5% capacity utilization.
![Page 18: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/18.jpg)
Analysis with Intermittently Articulating Inputs
Intermittently Articulating Intermittently Articulating InputsInputs• Inputs may be such that fault is not articulated at the outputs• No observed discrepancy does not imply fault-free resources • Only discrepant outputs provide fault-location information
Best Case (45% Utilized Capacity):• 42 pairings for 1,000 resources• 64.1 pairings for 100,000 resources
Most Demanding Case:478 pairings for 100,000 resources with 95% capacity utilization.50% of the inputs articulate the fault
![Page 19: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/19.jpg)
Experimental Results Summary
• Number of iterations to detect faults depends on Number of iterations to detect faults depends on Utilized CapacityUtilized Capacity Designs that utilize only a very few resources ( < 20%), or
almost all ( > 80%) the resources on the FPGA pose difficult isolation problems
Each intersection exonerates (implicates) fewer individual resources
• Method scales wellMethod scales well 11.1, 14.9, 17.6 pairings required for 1,000, 10,000, and
100,000 resources. Sub-linear increase in location time. • Current WorkCurrent Work
Competitive Runtime Reconfiguration (CRR) framework under development which will utilize methods outlined
Investigation of Competitive Group Testing methods to enable faster fault isolation
Analysis of characteristics of isolation, dependency on parameters, optimal partitioning methods.
![Page 20: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/20.jpg)
Backup Slides Follow
![Page 21: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/21.jpg)
Accommodating Multi-bit Word Widths
• Proof of conceptProof of concept The present circuit works efficiently Demonstrates important Dueling-enabled isolation method
• StrategiesStrategies Use an array of detectors
attempt to minimize points of failure as word-width increases Number of logic resources used is acceptable for smaller
circuits Create new circuit or scheme, combining fault tolerant
coding-based methods with single-fault secure circuit Current research focused on improving detector by
investigating codes, and fault-secure circuits
![Page 22: Self-Checking Fault Detection using Discrepancy Mirrors](https://reader036.vdocument.in/reader036/viewer/2022062321/56813221550346895d9880dc/html5/thumbnails/22.jpg)
Pull-down Resistor Considerations
• Proof of conceptProof of concept The present circuit works in a verifiable correct manner Can utilize synthesized (digital) pull-down resistor which
simulate the behavior of analog resistors Demonstrates Dueling-enabled isolation method Can be utilized without implementation problems for
Custom-VLSI designs
• Alternative ApproachAlternative Approach Alternate detector circuits for FPGA implementation are Alternate detector circuits for FPGA implementation are
under investigationunder investigation Avoid using Tri-state buffers, pull-down resistors and use Avoid using Tri-state buffers, pull-down resistors and use
native digital components available on FPGAsnative digital components available on FPGAs