performance implications of faults in prediction arrays nikolas ladas yiannakis sazeides veerle...

Performance Implications of Faults in Prediction ArraysPerformance Implications of Faults in Prediction Arrays

Nikolas LadasYiannakis Sazeides Veerle Desmet

University of Cyprus Ghent University

DFR’ 10Pisa, Italy - 24/1/2010

HiPEAC2010

MotivationMotivation● Technology scaling: Opportunities and Challenges● Reliability and computing tomorrow

● Failures will not be exceptional● Various sources of failures

● Manufacturing: imperfections, process-variation● Physical phenomena: soft-errors, wear-out● Power constraints: control operation below Vcc-min

● Key challenge: provide reliable operation with little or no performance degradation in the presence of faults with low-overhead solutions

Nikolas Ladas 24/1/2010

Architectural vs Non-Architectural FaultsArchitectural vs Non-Architectural Faults● So far research mainly focused on correctness● Emphasis architectural structures, e.g. caches, registers,

buses, alus etc● However, faults can occur in non-architectural structures,

e.g. predictor and replacement arrays● Faults in non-architectural structures may degrade

performance● Not issue for soft-errors● Can be problem for persistent faults: wear-out, process-

variation, operation below Vcc-min

Non-architectural ResourcesNon-architectural Resources Arrays

• line predictor• branch direction predictor• return-address-stack• indirect jump predictor• memory dependence prediction• way, hit/miss, bank predictors • replacement arrays (various caches)• hysteresis arrays (various predictors)

• ... Non-Arrays

• branch target address adder• memory prefetch adder• ....

EV6 like core array bits breakdown

This talk…This talk…● Quantify performance implications of faults in non-

architectural array-structures● Identify which non-architectural array-structures are

the most sensitive to faults● Do we need to worry about protecting these

structures?

OutlineOutline● Fault model / Experimental framework● Performance implications of faults when all non-

architectural arrays are faulty● Criticality of the non-architectural arrays studied● Fault semantics● Conclusions and future direction

Faults and ArraysFaults and Arrays Faults may occur in different parts of an array We only consider cell faults

. . .cell cell ce

llcell cell cell

cell cell

cell cell cell cell cell cell

BL BL’ BL

BL’ BL BL’ BL BL’ BL BL’ BL BL’

driver

decoder bitline

wordlinewordline

Array Fault Modeling Key ParametersArray Fault Modeling Key Parameters Number of faults:

• consider % of cells that are faulty: 0.125 and 0.5• Understand performance trends with increasing number of faults

Fault Locations• consider random fault locations each affecting 1 cell• Try to capture average behavior

Model for each fault• each faulty cell randomly set at either stuck-at-1 or stuck-at-0

Processor ModelProcessor Model• EV7 like processor with 15 stage pipeline• 4-way ooo, mispredictions resolved at commit

• Non-Architectural Arrays Considered• Line Predictor Array: 4K entries, 11

bits/entry• Line Predictor Hysteresis Array: 4K entries, 2 bits/entry• LRU array for 2-way 64KB 64B/block I$ : 512 entries, 1 bit/entry• LRU array 2-way 64KB 64B/block D$ : 512 entries, 1 bit/entry • Gshare Direction Predictor: 32Kentries, 2bits/entry• Return address stack: 16 entries, 31bits/entry• Memory dependence predictor (load-wait) 1024 entries, 1 bit/entry

• sim-alpha simulator • SPEC CPU 2000 benchmarks – 100 M instructions

• Representative regionsNikolas Ladas 24/1/2010

ExperimentsExperiments Baseline performance: runs with no faults For experiments with faults:

• For each run all arrays with faults have same % of faulty bits 0.125, 0.5

• ALL experiments are performed using the same 100 randomly generated fault maps (50 for each % of faulty bits)

0.125% 0.5% Gshare Direction Predictor 65536 bits: 82 328 Line Predictor Array 45056 bits: 56 225 Line Predictor Hysteresis Array 8192 bits: 10 41 Memory dependence predictor 1024 bits: 1 5 2-way 64KB 64B/block I$ LRU array 512 bits: 1 3 2-way 64KB 64B/block D$ LRU array 512 bits: 1 3 Return address stack 496 bits: 1 3

Performance with 0.125% Faulty Bits (all arrays faulty)Performance with 0.125% Faulty Bits (all arrays faulty)

Performance with 0.5% of Faulty Bits (all arrays faulty)Performance with 0.5% of Faulty Bits (all arrays faulty)

Observations with all arrays faultyObservations with all arrays faulty• Performance degradation substantial even with small % of faulty bits• Both INT and FP benchmarks can degrade

0.125 0.5• Average degradation 1% 3.5%• Max degradation 39% 53% • Degradation is benchmark specific

• Instruction mix (different number and type of vulnerable instructions)• Programs with high accuracy more vulnerable than those with low accuracies• When few arrays entries accessed by a program it takes large number of faults to

have faulty entries accessed• Some benchmarks are memory dominated

• Worst-case degradation much greater than average • Will cause performance variation between otherwise identical

cores/chips • Are all bits equally vulnerable? Which unit(s) matter the most?

Performance for Each StructurePerformance for Each Structure(0.125% faulty bits)(0.125% faulty bits)

26 benchmarks x 50 experiments for each section

Performance for Each StructurePerformance for Each Structure(0.5% faulty bits)(0.5% faulty bits)

26 benchmarks x 50 experiments for each section

ObservationsObservations• For the processor configuration used in this study the

various non-architectural units are not equally vulnerable to same fraction of faults.

• RAS and BPRED are the most sensitive to faults• Line predictor and load-wait predictor degrade

performance significantly when there are 0.5% faults• 2-way I$ and D$ are not sensitive even at 0.5% of faults

in the LRU array

Reasons for Variable Vulnerability across unitsReasons for Variable Vulnerability across units● Semantics of faults vary across unit● Some faults cause flushing the pipeline, others delay the

execution of an instruction, others cause a one-cycle bubble● Faults causing delays can be less severe since they can be hidden

in the shadow of a misprediction or with ooo● Units with typically higher accuracy more vulnerable (RAS

and conditional predictor)● Even within a unit faults can have different semantics

Semantics of Faults for a 2-bit Replacement Semantics of Faults for a 2-bit Replacement State Action0x Replace1x No replace0/1 Stack-at value

Always Replace

Never Replace

Repair mechanism: XOR RemappingRepair mechanism: XOR RemappingAccess map

Fault map

•Access map: counts access/entry during an interval•Fault Map: indicates which entries are faulty (can be determined at manufacturing test or at very coarse intervals using BIST)•Remap the index using XOR to minimize faulty accesses•At regular intervals search for the optimal XOR value using the access map and fault map

After remapping

Faulty accesses: 143 70

Results

•26 benchmarks x 10 fault maps per category•Recovers most of the performance degradation•Possible to make things worse if we remap when there is no need

20Nikolas Ladas 24/1/2010

Summary-ConclusionsSummary-Conclusions● Faults in non-architectural arrays can degrade processor

performance ● Not all faults are equally important. Fault semantics vary.

● RAS and conditional branch predictor the most critical● Faults can cause performance non-determinism across

otherwise identical chips or within the cores of the same chip

Future WorkFuture Work● Develop analytical model to predict the performance

distribution for a given failure rate● Understand implications of faults for other architectural

and non-architectural structures

AcknowledgmentsAcknowledgments Costas Kourougiannis

Funding: University of Cyprus, Ghent University, HiPEAC, Intel

Thanks!Thanks!

BACKUP SLIDESBACKUP SLIDES

Fault SemanticsFault Semantics Line Predictor Array:

• incorrect prediction• Conditional, returns get corrected within a cycle, indirects are resolved much later

Line Predictor Hysteresis Array: • Always update prediction on a misprediction• Never update

2-way 64KB 64B/block I$ and D$ LRU arrays• Converts sets with faulty LRU bit to direct mapped sets, more misses but can hide

Gshare Direction Predictor• faulty entries always predict taken or always not-taken• Incorrect prediction that gets resolved late (25% chance been lucky)

Return address stack• Return misprediction is resolved late

Memory dependence predictor (load-wait)• Independent load wait (common case we should not wait) can partially hide• Dependent load not wait (this should rarely be a serious problem)

Processor PipelineProcessor Pipeline

40954092

4 nops

Instruction cache

Line predictor

Branch predictor

Update line prediction

Update program counter

4 nops

NLS_PC

Correct PC

Fetch stage Slot stage Commit stage

Writeback stage

Assign value to PC

(indirect jump)

Line predictor Logical structure Line predictor Logical structure

TAG 91 2

Predecode bits

Instruction Cache

way0 way1

inst0 inst1 inst2 inst3

sb0 sb1 sb2 sb3 sb0 sb1 sb2 sb3

inst0 inst1 inst2 inst3

Valid sb

Functional Faults and Array Logical ViewFunctional Faults and Array Logical View

output bit

row address

data_in

Not practical to study faults at physical levelFunctional Models: Abstractions that ease study of faultsFault locations: cell, input address, input/output dataWe only consider cell faults

BIST for Detecting Faults and Updating Fault MapBIST for Detecting Faults and Updating Fault Map

Example Remapping Search AlgoExample Remapping Search Algo

Interleaved vs Non-Interleaved Design Style (1)Interleaved vs Non-Interleaved Design Style (1) Each array wordline contains many entries

Entries in the physical implementation are bit-interleaved• More area efficient

Interleaved vs Non-Interleaved Design Style (2)Interleaved vs Non-Interleaved Design Style (2) But a cluster faults affects more entries in interleaved design

For architectural structures: • Soft-errors prefer interleaved• Hard-errors: map to spare/disable block/set

For non-architectural structures: • Soft-errors – no need for protection • Hard-errors: prefer non-interleaved (if area not issue)

4K LP:No Interleaving vs Interleaving (average 4K LP:No Interleaving vs Interleaving (average random)random)

Random results without and with remappingRandom results without and with remapping

Expected InvariantsExpected InvariantsWith increasing faults more performance degradation

Frequently accessed entries more critical than less accessed entries

Cell stuck-at-1 more critical if bits stored in the cell are biased towards zero

Worst-case - Hit rateWorst-case - Hit rate

Random results without and with remappingRandom results without and with remapping

performance implications of faults in prediction arrays nikolas ladas yiannakis sazeides veerle...

nonarchitectural structures

persistent faults

presence of faults

bitsentrylru array

bitentrylru array

vccminnikolas ladas

cell faultsnikolas ladas

future directionnikolas

Documents

climate services for socio-economic benefit provision of...

lean & agile enterprise frameworks...the scrumban...

scanned by camscanner -...

night back to school welcome to bms · bms ptsa september...

ladas, corey_scrumban. lean thinking for agile process...

implicit-storing and redundant- encoding-of-attribute...

steeple run · pdf filesteeple run views official...

modeling the impact of permanent faults in caches29 modeling...

tourist roles preference in greece - easm.net · tourist...

mitigating the performance degradation due to faults in...

perfilesde aluminio aluminium profiles - alumisan.com · -...

1 community dental health jan ladas. algonquin college - jan...

cancun menus : your guide to the best restaurants and...

evaluación del rendimiento y fenología de tres genotipos...

maria briola aspasia georgakopoulou axelle delangle...

social media marketing 03 24 2010 non ladas (review...

riai practice no. 11039 - the lyreen view...

francis graf report - amanda ladas vs. apple case

athena ladas - burnside primary school · students...

debbie ladas portfolio