cost-effective register file soft error reduction

18
Cost-Effective Register File Soft Error reduction Pablo Montesinos, Wei Liu and Josep Torellas, University of Illinois at Urbana-Champaign

Upload: selia

Post on 04-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Cost-Effective Register File Soft Error reduction. Pablo Montesinos, Wei Liu and Josep Torellas, University of Illinois at Urbana-Champaign. Overview. Study of register file vulnerability to SDC(Silent Data Corruption) Shield – cost effective protection to register files - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cost-Effective Register File Soft Error reduction

Cost-Effective Register File Soft Error reduction

Pablo Montesinos, Wei Liu and Josep Torellas,University of Illinois at Urbana-Champaign

Page 2: Cost-Effective Register File Soft Error reduction

Overview

Study of register file vulnerability to SDC(Silent Data Corruption)

Shield – cost effective protection to register files

Highighting policies and techniques used in shield

Experiment - Results

Page 3: Cost-Effective Register File Soft Error reduction

Register File AVF RF-AVF is the probability that a fault that occurs will

lead to error. Register lifetime is divided into PreWrite, Useful,

and PostLastRead parts.

Based on AVF calculation we can divide lifetime of bit into ACE (Architecturally Correct Execution) and un-ACE cycles.

Page 4: Cost-Effective Register File Soft Error reduction

Register File AVF

During PreWrite Period – un-ACE If used atleast once after write the reg

switches to ACE state. After last read on reg, switches back

to un-ACE during PostLastRead

Page 5: Cost-Effective Register File Soft Error reduction

Highlighting Insights (1)

The combined %-USEFUL time of all registers is small

Page 6: Cost-Effective Register File Soft Error reduction

Highlighting Insights (1) The average number of useful (live) registers is less

than 20 (SPECint) and 17(SPECfp).

It is thus possible to redue the vulnerability of the register file by only protecting a subset of carefully chosen registers at a time.

Page 7: Cost-Effective Register File Soft Error reduction

Highlighting Insights (2) Only a few long-lived registers contribute to

overall Total useful time

On average less than 10% of register versions are long-lived.

Page 8: Cost-Effective Register File Soft Error reduction

Highlighting Insights (2)

On average 40% of useful time comes from the few long-lived versions.

In SPECfp, 5% of long-lived versions account for 46% of the useful time.

Page 9: Cost-Effective Register File Soft Error reduction

Motivation

Register files have a very high access rate.

High temperature thus leading to lesser Qcrit for the devices.

An error in an RF can propagate with hght failure probability

If we isolate a few register versions, predicting their life-time, and protect these register versions alone, high reliability can be achieved with limited overhead.

Page 10: Cost-Effective Register File Soft Error reduction

Shield - Architecture

Life-Time Prediction

Shielding Decision

Register Error Check

Error Recovery

Page 11: Cost-Effective Register File Soft Error reduction

Reg-Version Lifetime Prediction

P12 => Used(1) , Renamed(1)

P7 => Used(0) , Renamed(1)

Page 12: Cost-Effective Register File Soft Error reduction

Shielding Decision These prediction bits are stored as status in the ECC

table. The decision to shield an incoming register version

written is by: Availability of free ECC-Table entry Same register# present in the ECC table will be replaced

with new entry. Existing reg-version with lesser lifetime than incoming reg-

version will be replaced. Replacement policy:

Page 13: Cost-Effective Register File Soft Error reduction

Register Error Check & Recovery On a read request the register data is sent

to the original datapath and shield. If the Reg# matches with a tag entry, then

the reg-data is checked for errors at the ECC-Checker.

If Error is detected Processor stalls the instruction I reading reg P Reg-data is corrected and written into RF Oldest read instruction reading reg P in ROB and

all succeeding instructions is flushed. Processor resumes from flushed instruction.

Page 14: Cost-Effective Register File Soft Error reduction

Experiments- Results

AVF computation for RF with shield

Page 15: Cost-Effective Register File Soft Error reduction

Experiments-Results

AVF of intREG reduced by different replacement policies: LRU = 31% Effective = 63% OptEffective = 84% ( pinning of global pointers to

particular ECC entries + Effective )

AVF for fpREG can be reduced maximum by 100%, because fewer fp-registers are in useful state.

Page 16: Cost-Effective Register File Soft Error reduction

Power and Area Impact

Shield only uses 3ECC generators and 3 ECC checkers.

Shield has 45% power overhead over a plain register file. (Full ECC has 2X)

Shield introduces an overall 10% area overhead.

Page 17: Cost-Effective Register File Soft Error reduction

Conclusion

A cost-effective architectural technique has been proposed to reduce the vulnerability of RF by 84%

The area and power overhead indicated is a marginal tradeoff for reliability achieved.

Page 18: Cost-Effective Register File Soft Error reduction