methodology to compute architectural vulnerability factors chris weaver 1, 2 shubhendu s. mukherjee...

29
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin 2 1 Fault Aware Computing Technology (FACT), VSSAD, Intel 2 University of Michigan

Upload: cecil-cummings

Post on 17-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

Strike Changes State 0 1

TRANSCRIPT

Page 1: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Methodology to Compute Architectural

Vulnerability FactorsChris Weaver1, 2

Shubhendu S. Mukherjee1

Joel Emer 1

Steven K. Reinhardt1, 2

Todd Austin2

1Fault Aware Computing Technology (FACT), VSSAD, Intel2University of Michigan

Page 2: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Overview Background Previous reliability estimation methodology Proposed methodology for early reliability

estimates Sample analysis Conclusion

Page 3: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Strike Changes State

01

Page 4: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Failure Rate Definitions Interval-based

MTBF = Mean Time Between Failures Rate-based

FIT = Failure in Time = 1 failure in a billion hours 1 year MTBF = 109 / (24 * 365) FIT = 114,155 FIT Additive

Total of 228K FIT+

Cache: 0 FITIQ: 114K FITFU: 114K FIT

Page 5: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Motivation

1

10

100

1000

10000

100000

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Dat

a C

orru

ptio

n FI

T

1000 MTBF Goal

1

10

100

1000

10000

100000

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Dat

a C

orru

ptio

n FI

T

1000 MTBF Goal

FIT if all flips manifest as errors

1

10

100

1000

10000

100000

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Dat

a C

orru

ptio

n FI

T

1000 MTBF GoalFIT if all flips manifest as errorsFIT if 10% of flips manifest as errors

Page 6: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Results of precise & early analysis

If we meet goalwe are done

If we don’t meet goaladd error protection schemes

Page 7: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Objectives

Determine which bits matter Compute FIT rate

Page 8: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Strike on state bitBit

Read

Bit has error

protection

Erroris only detected(e.g., parity + no recovery)

Error can be corrected(e.g, ECC)

yes no

Does bit matter?

Silent Data Corruption

(SDC)

yesyes

no

Detected, but unrecoverable error

(DUE)

no error

yes no

benign faultno error

benign faultno error

* We only focus on SDC FIT* We only focus on SDC FIT

Page 9: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Architectural Vulnerability Factor (AVF)

AVFbit = Probability Bit Matters

=# of Visible Errors

# of Bit Flips from Particle Strikes

FITbit= intrinsic FITbit * AVFbit

Page 10: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Previous AVF Methodology

Statistical Fault Injection with RTL

Logic

1

0

Simulate Strike on Latch

0

output

Does Fault Propagate to Architectural State

Page 11: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Characteristics of SFI with RTL

Naturally characterizes all logical structures

RTL not till late in the design cycle Numerous experiments to flip all bits Generally done at the chip level

Limited structural insight

Page 12: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Objectives Determine which bits matter

Earlier in the design cycle With fewer experiments At the structural-level

Compute FIT rate Intrinsic FIT per bit Architectural Vulnerability Factor

Page 13: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Our Analysis: Which bits matter?

Branch Predictor Doesn’t matter at all (AVF = 0%)

Program Counter Almost always matters (AVF ~ 100%)

Page 14: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Architecturally Correct Execution (ACE)

ACE path requires only a subset of values to flow correctly through the program’s data flow graph (and the machine)

Anything else (un-ACE path) can be derated away

Program Input

Program Outputs

Page 15: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Example of un-ACE instruction: Dynamically Dead Instruction

Dynamically Dead Instruction

Most bits of an un-ACE instruction do not affect program output

Page 16: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Dynamic Instruction Breakdown

Average across all of Spec2K slices

DYNAMICALLY DEAD20%

PERFORMANCE INST1%

NOP26%

ACE46%PREDICATED

FALSE7%

Page 17: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Mapping ACE & un-ACE Instructions to the Instruction Queue

Architectural un-ACE Micro-architectural un-ACE

Wrong-PathInst

IdleNOP Prefetch ACE Inst

ACEInstEx-

ACEInst

Page 18: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

T = 3 ACE% = 0/4T = 2 ACE% = 1/4

Vulnerability of a structure AVF = fraction of cycles a bit contains ACE state

T = 1 ACE% = 2/4

Average number of ACE bits in a cycleAverage number of ACE bits in a cycleTotal number of bits in the structureTotal number of bits in the structure

=

T = 4 ACE% = 3/4 ( 2 + 1 + 0 + 3 ) / 4( 2 + 1 + 0 + 3 ) / 444

=

Page 19: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Little’s Law for ACEs

aceaceace LTN

totalNNAVF ace

Page 20: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Computing AVF Our approach is conservative

We assume every bit is ACE unless proven otherwise

Data Analysis Try to prove that data held in a structure is

un-ACE Timing Analysis

Tracks the time this data spent in the structure

Page 21: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Computing FIT rate of a Chip Total FIT = (FIT per biti X # of bitsi X

AVFi)Structure FIT per bit # of bits AVF Total FITBranch Predictor

.001* 1K 0 0

Program Counter

.001* 64 1 0.064

Instruction Queue

.001* 6400 ? ?

Funtional Units

.001* 4000 ? ?

… …Total FIT of whole chip =

column* Intrinsic FIT per bit from externally published data

Page 22: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Results:Experimental Setup

Used ASIM modeling infrastructure Model of a Itanium®2-like processor Ran all Spec2K benchmarks

Compiled with highest level of optimization with the Intel electron compiler

Simulated under a full OS Simulation points chosen using SimPoint

(Sherwood et al)

Page 23: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Instruction Queue

ACE percentage = AVF = 29%

NOP15%

ACE29%

IDLE31%

Ex-ACE10%

WRONG PATH3%

DYNAMICALLY DEAD

8%

PREDICATED FALSE

3%PERFORMANCE

INST1%

Page 24: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Functional Units

ACE percentage = AVF = 9%

SPECULATIVE ISSUE

1%

PERFORMANCE INST0%

PREDICATED FALSE

1%

DYNAMICALLY DEAD

4%

WRONG PATH1%

NOP6%

ACE9%

LOGICAL MASKING

0%

DATAPATH IDLE1%

UNIT IDLE77%

Page 25: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Computing FIT rate of Chip

Structure FIT per bit # of bits AVF Total FITBranch Predictor

.001* 1K 0 0

Program Counter

.001* 64 1 0.064

Instruction Queue

.001* 6400 .29 1.856

Funtional Units

.001* 4000 .09 0.360

… …Total FIT of whole chip =

column* Intrinsic FIT per bit from externally published data

Page 26: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Summary

Determine which bits matter ACE (Architecturally Correction

Execution) Compute FIT rate

Intrinsic FIT per bit AVF (Architectural Vulnerability

Factor)

Page 27: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Questions?

Page 28: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

Statistical Fault Injection (SFI) Algorithm

Find a statistically significant set of bits Randomly select a bit Flip the bit Run two simulations: one with bit flip and one without

bit flip Run for pre-defined # cycles Compare architectural state of two simulations (e.g.,

register file) If mismatch, declare an error Repeat algorithm with different bit flip AVF = # mismatches observed / total # experiments

Used widely+ has provided useful AVF numbers till date

Page 29: Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin

SFI vs. ACE analysisSFI ACE

Accuracy of Microarchitectural un-ACE

Better than ACE analysis

Conservative

Accuracy of Archirectural un-ACE

Conservative Better than SFI(e.g., covers dynamically dead instructions)

Insight Per-structure insights harder

Little’s Law & per-structure breakdown easier

# of experiments Large # required to be statistically significant

Small # of experiments can give good accuracy