performance implications of faults in prediction arrays nikolas ladas yiannakis sazeides veerle...
Post on 18-Jan-2016
214 Views
Preview:
TRANSCRIPT
Performance Implications of Faults in Prediction ArraysPerformance Implications of Faults in Prediction Arrays
Nikolas LadasYiannakis Sazeides Veerle Desmet
University of Cyprus Ghent University
DFR’ 10Pisa, Italy - 24/1/2010
HiPEAC2010
2
MotivationMotivation● Technology scaling: Opportunities and Challenges● Reliability and computing tomorrow
● Failures will not be exceptional● Various sources of failures
● Manufacturing: imperfections, process-variation● Physical phenomena: soft-errors, wear-out● Power constraints: control operation below Vcc-min
● Key challenge: provide reliable operation with little or no performance degradation in the presence of faults with low-overhead solutions
Nikolas Ladas 24/1/2010
3
Architectural vs Non-Architectural FaultsArchitectural vs Non-Architectural Faults● So far research mainly focused on correctness● Emphasis architectural structures, e.g. caches, registers,
buses, alus etc● However, faults can occur in non-architectural structures,
e.g. predictor and replacement arrays● Faults in non-architectural structures may degrade
performance● Not issue for soft-errors● Can be problem for persistent faults: wear-out, process-
variation, operation below Vcc-min
Nikolas Ladas 24/1/2010
4
Non-architectural ResourcesNon-architectural Resources Arrays
• line predictor• branch direction predictor• return-address-stack• indirect jump predictor• memory dependence prediction• way, hit/miss, bank predictors • replacement arrays (various caches)• hysteresis arrays (various predictors)
• ... Non-Arrays
• branch target address adder• memory prefetch adder• ....
EV6 like core array bits breakdown
Nikolas Ladas 24/1/2010
5
This talk…This talk…● Quantify performance implications of faults in non-
architectural array-structures● Identify which non-architectural array-structures are
the most sensitive to faults● Do we need to worry about protecting these
structures?
Nikolas Ladas 24/1/2010
6
OutlineOutline● Fault model / Experimental framework● Performance implications of faults when all non-
architectural arrays are faulty● Criticality of the non-architectural arrays studied● Fault semantics● Conclusions and future direction
Nikolas Ladas 24/1/2010
7
Faults and ArraysFaults and Arrays Faults may occur in different parts of an array We only consider cell faults
.
.
.
. . .cell cell ce
llcell cell cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell cell
cell cell cell cell cell cell
WL
WL
WL
WL
WL
WL
WL
WL
WL
WL
BL BL’ BL
BL’ BL BL’ BL BL’ BL BL’ BL BL’
cell
driver
decoder bitline
wordlinewordline
Nikolas Ladas 24/1/2010
8
Array Fault Modeling Key ParametersArray Fault Modeling Key Parameters Number of faults:
• consider % of cells that are faulty: 0.125 and 0.5• Understand performance trends with increasing number of faults
Fault Locations• consider random fault locations each affecting 1 cell• Try to capture average behavior
Model for each fault• each faulty cell randomly set at either stuck-at-1 or stuck-at-0
Nikolas Ladas 24/1/2010
9
Processor ModelProcessor Model• EV7 like processor with 15 stage pipeline• 4-way ooo, mispredictions resolved at commit
• Non-Architectural Arrays Considered• Line Predictor Array: 4K entries, 11
bits/entry• Line Predictor Hysteresis Array: 4K entries, 2 bits/entry• LRU array for 2-way 64KB 64B/block I$ : 512 entries, 1 bit/entry• LRU array 2-way 64KB 64B/block D$ : 512 entries, 1 bit/entry • Gshare Direction Predictor: 32Kentries, 2bits/entry• Return address stack: 16 entries, 31bits/entry• Memory dependence predictor (load-wait) 1024 entries, 1 bit/entry
• sim-alpha simulator • SPEC CPU 2000 benchmarks – 100 M instructions
• Representative regionsNikolas Ladas 24/1/2010
10
ExperimentsExperiments Baseline performance: runs with no faults For experiments with faults:
• For each run all arrays with faults have same % of faulty bits 0.125, 0.5
• ALL experiments are performed using the same 100 randomly generated fault maps (50 for each % of faulty bits)
0.125% 0.5% Gshare Direction Predictor 65536 bits: 82 328 Line Predictor Array 45056 bits: 56 225 Line Predictor Hysteresis Array 8192 bits: 10 41 Memory dependence predictor 1024 bits: 1 5 2-way 64KB 64B/block I$ LRU array 512 bits: 1 3 2-way 64KB 64B/block D$ LRU array 512 bits: 1 3 Return address stack 496 bits: 1 3
Nikolas Ladas 24/1/2010
11
Performance with 0.125% Faulty Bits (all arrays faulty)Performance with 0.125% Faulty Bits (all arrays faulty)
Nikolas Ladas 24/1/2010
12
Performance with 0.5% of Faulty Bits (all arrays faulty)Performance with 0.5% of Faulty Bits (all arrays faulty)
Nikolas Ladas 24/1/2010
13
Observations with all arrays faultyObservations with all arrays faulty• Performance degradation substantial even with small % of faulty bits• Both INT and FP benchmarks can degrade
0.125 0.5• Average degradation 1% 3.5%• Max degradation 39% 53% • Degradation is benchmark specific
• Instruction mix (different number and type of vulnerable instructions)• Programs with high accuracy more vulnerable than those with low accuracies• When few arrays entries accessed by a program it takes large number of faults to
have faulty entries accessed• Some benchmarks are memory dominated
• Worst-case degradation much greater than average • Will cause performance variation between otherwise identical
cores/chips • Are all bits equally vulnerable? Which unit(s) matter the most?
Nikolas Ladas 24/1/2010
14
Performance for Each StructurePerformance for Each Structure(0.125% faulty bits)(0.125% faulty bits)
26 benchmarks x 50 experiments for each section
Nikolas Ladas 24/1/2010
15
Performance for Each StructurePerformance for Each Structure(0.5% faulty bits)(0.5% faulty bits)
26 benchmarks x 50 experiments for each section
Nikolas Ladas 24/1/2010
16
ObservationsObservations• For the processor configuration used in this study the
various non-architectural units are not equally vulnerable to same fraction of faults.
• RAS and BPRED are the most sensitive to faults• Line predictor and load-wait predictor degrade
performance significantly when there are 0.5% faults• 2-way I$ and D$ are not sensitive even at 0.5% of faults
in the LRU array
Nikolas Ladas 24/1/2010
17
Reasons for Variable Vulnerability across unitsReasons for Variable Vulnerability across units● Semantics of faults vary across unit● Some faults cause flushing the pipeline, others delay the
execution of an instruction, others cause a one-cycle bubble● Faults causing delays can be less severe since they can be hidden
in the shadow of a misprediction or with ooo● Units with typically higher accuracy more vulnerable (RAS
and conditional predictor)● Even within a unit faults can have different semantics
Nikolas Ladas 24/1/2010
18
Semantics of Faults for a 2-bit Replacement Semantics of Faults for a 2-bit Replacement State Action0x Replace1x No replace0/1 Stack-at value
00R
11N
10N
01R
00R
01R
11N
10N
11N
01R
00R
10N
00R
11N
Always Replace
Never Replace
01R
10N
Nikolas Ladas 24/1/2010
19
Repair mechanism: XOR RemappingRepair mechanism: XOR RemappingAccess map
Fault map
0
40
3
20
50
100
0
0
1
0
0
1
1
0
0
0
XOR 1
•Access map: counts access/entry during an interval•Fault Map: indicates which entries are faulty (can be determined at manufacturing test or at very coarse intervals using BIST)•Remap the index using XOR to minimize faulty accesses•At regular intervals search for the optimal XOR value using the access map and fault map
After remapping
Faulty accesses: 143 70
Nikolas Ladas 24/1/2010
Results
•26 benchmarks x 10 fault maps per category•Recovers most of the performance degradation•Possible to make things worse if we remap when there is no need
20Nikolas Ladas 24/1/2010
21
Summary-ConclusionsSummary-Conclusions● Faults in non-architectural arrays can degrade processor
performance ● Not all faults are equally important. Fault semantics vary.
● RAS and conditional branch predictor the most critical● Faults can cause performance non-determinism across
otherwise identical chips or within the cores of the same chip
Nikolas Ladas 24/1/2010
22
Future WorkFuture Work● Develop analytical model to predict the performance
distribution for a given failure rate● Understand implications of faults for other architectural
and non-architectural structures
Nikolas Ladas 24/1/2010
23
AcknowledgmentsAcknowledgments Costas Kourougiannis
Funding: University of Cyprus, Ghent University, HiPEAC, Intel
Nikolas Ladas 24/1/2010
24
Thanks!Thanks!
25
BACKUP SLIDESBACKUP SLIDES
26
Fault SemanticsFault Semantics Line Predictor Array:
• incorrect prediction• Conditional, returns get corrected within a cycle, indirects are resolved much later
Line Predictor Hysteresis Array: • Always update prediction on a misprediction• Never update
2-way 64KB 64B/block I$ and D$ LRU arrays• Converts sets with faulty LRU bit to direct mapped sets, more misses but can hide
Gshare Direction Predictor• faulty entries always predict taken or always not-taken• Incorrect prediction that gets resolved late (25% chance been lucky)
Return address stack• Return misprediction is resolved late
Memory dependence predictor (load-wait)• Independent load wait (common case we should not wait) can partially hide• Dependent load not wait (this should rarely be a serious problem)
Nikolas Ladas 24/1/2010
27
Processor PipelineProcessor Pipeline
27
. . .
0
4
40954092
8
Adder
4 nops
PCCT1
CT 2
Instruction cache
Line predictor
Branch predictor
RAS
Update line prediction
adder
CT 3
Update program counter
4 nops
CT 4
Miss
Hit
L2
4xn
n n
4xn
=
NLS_PC
Correct PC
Fetch stage Slot stage Commit stage
. . .
Writeback stage
Assign value to PC
(indirect jump)
28
Line predictor Logical structure Line predictor Logical structure
28
.
.
.
.
.
.
TAG 91 2
Predecode bits
Instruction Cache
way0 way1
51
2
inst0 inst1 inst2 inst3
sb0 sb1 sb2 sb3 sb0 sb1 sb2 sb3
inst0 inst1 inst2 inst3
sbX
Valid sb
31
Functional Faults and Array Logical ViewFunctional Faults and Array Logical View
cell
output bit
row address
data_in
Not practical to study faults at physical levelFunctional Models: Abstractions that ease study of faultsFault locations: cell, input address, input/output dataWe only consider cell faults
32
BIST for Detecting Faults and Updating Fault MapBIST for Detecting Faults and Updating Fault Map
33
Example Remapping Search AlgoExample Remapping Search Algo
34
Interleaved vs Non-Interleaved Design Style (1)Interleaved vs Non-Interleaved Design Style (1) Each array wordline contains many entries
Entries in the physical implementation are bit-interleaved• More area efficient
35
Interleaved vs Non-Interleaved Design Style (2)Interleaved vs Non-Interleaved Design Style (2) But a cluster faults affects more entries in interleaved design
For architectural structures: • Soft-errors prefer interleaved• Hard-errors: map to spare/disable block/set
For non-architectural structures: • Soft-errors – no need for protection • Hard-errors: prefer non-interleaved (if area not issue)
36
4K LP:No Interleaving vs Interleaving (average 4K LP:No Interleaving vs Interleaving (average random)random)
37
Random results without and with remappingRandom results without and with remapping
38
Expected InvariantsExpected InvariantsWith increasing faults more performance degradation
Frequently accessed entries more critical than less accessed entries
Cell stuck-at-1 more critical if bits stored in the cell are biased towards zero
39
Worst-case - Hit rateWorst-case - Hit rate
40
Random results without and with remappingRandom results without and with remapping
top related