nc state university dsn 2007 © vimal reddy 2007 1 inherent time redundancy (itr): using program...

41
NC STATE UNIVERSITY DSN 2007 © Vimal Reddy 2007 1 Inherent Time Redundancy (ITR): Using Program Repetition for Low-Overhead Fault Tolerance Vimal Reddy Eric Rotenberg Center for Efficient, Secure and Reliable Computing (CESR) Dept. of Electrical & Computer Engineering North Carolina State University Raleigh, NC

Upload: nickolas-burns

Post on 17-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

NC STATE UNIVERSITY

DSN 2007© Vimal Reddy 2007 1

Inherent Time Redundancy (ITR): Using Program Repetition for

Low-Overhead Fault Tolerance

Vimal Reddy

Eric Rotenberg

Center for Efficient, Secure and Reliable Computing (CESR)Dept. of Electrical & Computer EngineeringNorth Carolina State UniversityRaleigh, NC

NC STATE UNIVERSITY

DSN 2007© Vimal Reddy 2007 2

Processor Fault Tolerance

• Main approaches: Time or space redundancy– Explicitly duplicate in time or space

– Match values/signals

– E.g., AR-SMT (time), IBM S/390 G5 (space)

• High overheads– Area/power/performance

– Unsuitable for commodity high-performance processors

NC STATE UNIVERSITY

DSN 2007

New Idea: Inherent Time Redundancy

© Vimal Reddy 2007 3

==

==

==

==

==

==

==

==

program program duplicate program

Conventional time redundancy Inherent time redundancy

NC STATE UNIVERSITY

DSN 2007

Outline• Exploiting ITR for Fault Tolerance

• ITR Cache for Checking Decode Signals

• ITR Cache Hit/Miss Scenarios

• Microarchitecture Extensions to Superscalar

• Results– Repetition

– Fault Coverage Based on Cache Hits/Misses

– Fault Coverage Based on Fault Injection

• Area and Power Overhead

• Conclusion© Vimal Reddy 2007 4

NC STATE UNIVERSITY

DSN 2007

Exploiting ITR for Fault Tolerance

• Key ideas:– Record input-independent signals for instructions

– Confirm their correctness upon repetition

• This paper focuses on decode signals– Provides protection for fetch and decode stages

• Longer term goal– Apply ITR to other input-independent stages

– My thesis combines ITR with other microarch.-level checks for a comprehensive fault regimen

© Vimal Reddy 2007 5

NC STATE UNIVERSITY

DSN 2007

Outline• Exploiting ITR for Fault Tolerance

• ITR Cache for Checking Decode Signals

• ITR Cache Hit/Miss Scenarios

• Microarchitecture Extensions to Superscalar

• Results– Repetition

– Fault Coverage Based on Cache Hits/Misses

– Fault Coverage Based on Fault Injection

• Area and Power Overhead

• Conclusion© Vimal Reddy 2007 6

NC STATE UNIVERSITY

DSN 2007

ITR Cache for Checking Decode Signals

• Create decode signatures at trace granularity– Trace ends at branch or 16 instructions (arbitrary)

– Combine decode signals of instructions in a trace

• Record decode signatures in an ITR cache– Indexed by start PC (program counter) of a trace

• For each new trace, compare with ITR cache– Fault if new signature mismatches ITR cache

© Vimal Reddy 2007 7

NC STATE UNIVERSITY

DSN 2007

Outline• Exploiting ITR for Fault Tolerance

• ITR Cache for Checking Decode Signals

• ITR Cache Hit/Miss Scenarios

• Microarchitecture Extensions to Superscalar

• Results– Repetition

– Fault Coverage Based on Cache Hits/Misses

– Fault Coverage Based on Fault Injection

• Area and Power Overhead

• Conclusion© Vimal Reddy 2007 8

NC STATE UNIVERSITY

DSN 2007

pc(I)

Scenario 1: ITR Cache Hit

© Vimal Reddy 2007 9

Signature GenerationIJK S2 (I, J, K)S2 (I, J, K)

Tags Signatures

S1 (I, J, K)pc(I)

HitI

Fault Detected

Notes:-- Faults on traces that hit (S2) are detectable-- These faults are recoverable by flushing the processor pipeline and restarting from the faulting trace

ITR Cache

NC STATE UNIVERSITY

DSN 2007

Scenario 2: ITR Cache Miss Followed by Hit

© Vimal Reddy 2007 10

Signature GenerationIJK S1 (I, J, K)S1 (I, J, K)

Tags Signatures

S1 (I, J, K)pc(I)

Miss

pc(I)I

ITR Cache

NC STATE UNIVERSITY

DSN 2007

pc(I)

© Vimal Reddy 2007 11

Signature GenerationIJK S2 (I, J, K)S2 (I, J, K)

Tags Signatures

S1 (I, J, K)pc(I)

Hit

Scenario 2: ITR Cache Miss Followed by Hit

Fault Detected

Notes:-- Faults on traces that miss but later hit (faulty S1) are detectable-- Faults on S1 need checkpoint for recovery or must abort

ITR Cache

NC STATE UNIVERSITY

DSN 2007

Scenario 3: ITR Cache Miss Followed by Eviction

© Vimal Reddy 2007 12

Signature GenerationIJK S1 (I, J, K)S1 (I, J, K)

Tags Signatures

S1 (I, J, K)pc(I)

Miss

pc(I)I

ITR Cache

NC STATE UNIVERSITY

DSN 2007

pc(M)

© Vimal Reddy 2007 13

Signature GenerationMNO S2 (M, N, O)S2 (M, N, O)

Tags Signatures

Evict

Notes:-- Faults on missed & evicted traces (S1) are not detected

Scenario 3: ITR Cache Miss Followed by Eviction

S2 (M, N, O)S1 (I, J, K)pc(M)pc(I)

Fault not detected

ITR Cache

NC STATE UNIVERSITY

DSN 2007

Outline• Exploiting ITR for Fault Tolerance

• ITR Cache for Checking Decode Signals

• ITR Cache Hit/Miss Scenarios

• Microarchitecture Extensions to Superscalar

• Results– Repetition

– Fault Coverage Based on Cache Hits/Misses

– Fault Coverage Based on Fault Injection

• Area and Power Overhead

• Conclusion© Vimal Reddy 2007 14

NC STATE UNIVERSITY

DSN 2007

Microarchitecture Extensions to Superscalar

© Vimal Reddy 2007 15

I$

Next PC Logic

PC

Decode

Rename

Store Buffer

Reorder Buffer (ROB)

FU FU FU

Memory Hierarchy

RegisterFile

Arch.State

ooo issue

dispatch

commit

loadreg

read

store regwriteback

BranchChkpts

Signature

ITR ROB

Start PCBranch or 16 instr.?

ITRCache

== checkrecord

Start PC Signature chk miss retry

Decode signals redirected for signature generation and ITR cache access

NC STATE UNIVERSITY

DSN 2007

Microarchitecture Extensions to Superscalar

© Vimal Reddy 2007 16

I$

Next PC Logic

PC

Decode

Rename

Store Buffer

Reorder Buffer (ROB)

FU FU FU

Memory Hierarchy

RegisterFile

Arch.State

ooo issue

dispatch

commit

loadreg

read

store regwriteback

BranchChkpts

Signature

ITR ROB

Start PCBranch or 16 instr.?

ITRCache

== checkrecord

Start PC Signature chk miss retry

Signature generation: Bitwise XOR decode signals of instr. from a trace

NC STATE UNIVERSITY

DSN 2007

Microarchitecture Extensions to Superscalar

© Vimal Reddy 2007 17

I$

Next PC Logic

PC

Decode

Rename

Store Buffer

Reorder Buffer (ROB)

FU FU FU

Memory Hierarchy

RegisterFile

Arch.State

ooo issue

dispatch

commit

loadreg

read

store regwriteback

BranchChkpts

Signature

ITR ROB

Start PCBranch or 16 instr.?

ITRCache

== checkrecord

Start PC Signature chk miss retry

Signatures dispatched into ITR ROB(ITR cache more effective without wrong-path trace signatures)

NC STATE UNIVERSITY

DSN 2007

Microarchitecture Extensions to Superscalar

© Vimal Reddy 2007 18

I$

Next PC Logic

PC

Decode

Rename

Store Buffer

Reorder Buffer (ROB)

FU FU FU

Memory Hierarchy

RegisterFile

Arch.State

ooo issue

dispatch

commit

loadreg

read

store regwriteback

BranchChkpts

Signature

ITR ROB

Start PCBranch or 16 instr.?

ITRCache

== checkrecord

Start PC Signature chk miss retry

Access ITR cache with trace start PC

NC STATE UNIVERSITY

DSN 2007

Microarchitecture Extensions to Superscalar

© Vimal Reddy 2007 19

I$

Next PC Logic

PC

Decode

Rename

Store Buffer

Reorder Buffer (ROB)

FU FU FU

Memory Hierarchy

RegisterFile

Arch.State

ooo issue

dispatch

commit

loadreg

read

store regwriteback

BranchChkpts

Signature

ITR ROB

Start PCBranch or 16 instr.?

ITRCache

== checkrecord

Start PC Signature chk miss retry

Signature mismatch, set retry bit -- Flush pipeline and retry from faulting trace

NC STATE UNIVERSITY

DSN 2007

Microarchitecture Extensions to Superscalar

© Vimal Reddy 2007 20

I$

Next PC Logic

PC

Decode

Rename

Store Buffer

Reorder Buffer (ROB)

FU FU FU

Memory Hierarchy

RegisterFile

Arch.State

ooo issue

dispatch

commit

loadreg

read

store regwriteback

BranchChkpts

Signature

ITR ROB

Start PCBranch or 16 instr.?

ITRCache

== checkrecord

Start PC Signature chk miss retry

Fault re-detected, old signature is faulty -- Abort or rollback to a safe checkpoint

Fault disappears, successful recovery

NC STATE UNIVERSITY

DSN 2007

Microarchitecture Extensions to Superscalar

© Vimal Reddy 2007 21

I$

Next PC Logic

PC

Decode

Rename

Store Buffer

Reorder Buffer (ROB)

FU FU FU

Memory Hierarchy

RegisterFile

Arch.State

ooo issue

dispatch

commit

loadreg

read

store regwriteback

BranchChkpts

Signature

ITR ROB

Start PCBranch or 16 instr.?

ITRCache

== checkrecord

Start PC Signature chk miss retry

ITR cache miss -- Write new signature to ITR cache

NC STATE UNIVERSITY

DSN 2007

Microarchitecture Extensions to Superscalar

© Vimal Reddy 2007 22

I$

Next PC Logic

PC

Decode

Rename

Store Buffer

Reorder Buffer (ROB)

FU FU FU

Memory Hierarchy

RegisterFile

Arch.State

ooo issue

dispatch

commit

loadreg

read

store regwriteback

BranchChkpts

Signature

ITR ROB

Start PCBranch or 16 instr.?

ITRCache

== checkrecord

Start PC Signature chk miss retry

No fault detected (or possible loss of detection due to miss-followed-by-eviction case)

NC STATE UNIVERSITY

DSN 2007

Outline• Exploiting ITR for Fault Tolerance

• ITR Cache for Checking Decode Signals

• ITR Cache Hit/Miss Scenarios

• Microarchitecture Extensions to Superscalar

• Results– Repetition

– Fault Coverage Based on Cache Hits/Misses

– Fault Coverage Based on Fault Injection

• Area and Power Overhead

• Conclusion© Vimal Reddy 2007 23

NC STATE UNIVERSITY

DSN 2007

Outline• Exploiting ITR for Fault Tolerance

• ITR Cache for Checking Decode Signals

• ITR Cache Hit/Miss Scenarios

• Microarchitecture Extensions to Superscalar

• Results– Repetition

– Fault Coverage Based on Cache Hits/Misses

– Fault Coverage Based on Fault Injection

• Area and Power Overhead

• Conclusion© Vimal Reddy 2007 24

NC STATE UNIVERSITY

DSN 2007

Results: Repetition in SPEC2K INT

© Vimal Reddy 2007 25

0

10

20

30

40

50

60

70

80

90

100

0 100 200 300 400 500 600 700 800 900 1000Number of static traces

bzipgapgccgzipparserperltwolfvortexvpr

% o

f tot

al d

ynam

ic in

stru

ctio

ns

NC STATE UNIVERSITY

DSN 2007© Vimal Reddy 2007 26

Results: Repetition in SPEC2K FP%

of t

otal

dyn

amic

inst

ruct

ions

0

10

20

30

40

50

60

70

80

90

100

0 50 100 150 200 250 300 350 400 450 500Number of static traces

appluapsiartequakemgridswimwupwise

NC STATE UNIVERSITY

DSN 2007© Vimal Reddy 2007 27

Results: Proximity in SPEC2K INT%

of t

otal

dyn

amic

inst

ruct

ions

# dynamic instructions separating repetitive traces

0

10

20

30

40

50

60

70

80

90

100

bzipgzipparsergapvprgcctwolfperlvortex

NC STATE UNIVERSITY

DSN 2007© Vimal Reddy 2007 28

Results: Proximity in SPEC2K FP

# dynamic instructions separating repetitive traces

% o

f tot

al d

ynam

ic in

stru

ctio

ns

0

10

20

30

40

50

60

70

80

90

100

appluapsiartequakemgridswimwupwise

NC STATE UNIVERSITY

DSN 2007

Outline• Exploiting ITR for Fault Tolerance

• ITR Cache for Checking Decode Signals

• ITR Cache Hit/Miss Scenarios

• Microarchitecture Extensions to Superscalar

• Results– Repetition

– Fault Coverage Based on Cache Hits/Misses

– Fault Coverage Based on Fault Injection

• Area and Power Overhead

• Conclusion© Vimal Reddy 2007 29

NC STATE UNIVERSITY

DSN 2007

Fault Coverage Based on ITR Cache Misses• Loss in fault detection coverage

– Miss-followed-by-eviction causes loss in fault detection– (# of missed & evicted traces) x instructions/trace

• Loss in fault recovery coverage (w/o checkpoints)– Misses cause loss in fault recovery– (# of missed traces) x instructions/trace

• Tried various ITR cache configurations– Direct-mapped (DM), 2,4,8,16-way, fully-associative (FA)

– 256, 512 and 1024 signatures– Bzip, gzip, art, mgrid and wupwise had <1% loss in

coverage (results not shown)

© Vimal Reddy 2007 30

NC STATE UNIVERSITY

DSN 2007

Results: Loss in Fault Detection Coverage

© Vimal Reddy 2007 31

Coverage loss correlates with proximity of repetitione.g., perl and vortex

05

10152025303540455055

gap gcc parser perl twolfvortex vpr applu apsi equaswim

256 signatures

512 signatures

1024 signatures

Capacity is important for poor performers

For 2-way, 1024 signatures: 8% max. (vortex) and 1.3% avg. loss in detection coverage

% o

f to

tal d

yna

mic

inst

ruct

ion

s

NC STATE UNIVERSITY

DSN 2007

Results: Loss in Fault Recovery Coverage

© Vimal Reddy 2007 32

Recovery coverage loss is biggerthan loss indetection coverage

For 2-way, 1024signatures: 15% max. (vortex) and 2.5% avg. loss in recovery coverage

% o

f to

tal d

yna

mic

inst

ruct

ion

s

05

10152025303540455055

gap gcc parser perl twolfvortex vpr applu apsi equaswim

256 signatures

512 signatures

1024 signatures

NC STATE UNIVERSITY

DSN 2007

Outline• Exploiting ITR for Fault Tolerance

• ITR Cache for Checking Decode Signals

• ITR Cache Hit/Miss Scenarios

• Microarchitecture Extensions to Superscalar

• Results– Repetition

– Fault Coverage Based on Cache Hits/Misses

– Fault Coverage Based on Fault Injection

• Area and Power Overhead

• Conclusion© Vimal Reddy 2007 33

NC STATE UNIVERSITY

DSN 2007

Fault Injection Experiments

© Vimal Reddy 2007 34

• Injected faults on decode signals of a cycle-accurate simulator

• Based on MIPS R10K processor

• 1000 random faults per benchmark

NC STATE UNIVERSITY

DSN 2007

Decode Signals Modeled

© Vimal Reddy 2007 35

Field Description Widthopcode instruction opcode 8

flags

decoded control flags (is_int, is_fp, is_signed/unsigned, is_branch, is_uncond, is_ld, is_st, mem_left/right, is_RR, is_disp, is_direct, is_trap)

12

shamt shift amount 5rsrc1 source register operand 5rsrc2 source register operand 5rdst destination register operand 5lat execution latency 2

imm immediate 16num_rsrc number of source operands 2num_rdst number of destination operands 1mem_size size of memory word 3

Total width 64

NC STATE UNIVERSITY

DSN 2007

Fault Injection Outcomes

© Vimal Reddy 2007 36

Fault

SDC Mask

ITR+SDC

MayITR+SDC

Undet+SDC

ITR+Mask

Undet+Mask

MayITR+Mask

92% 97%

7%

1%

37%

3%

0%

63%

Refer paper for other interesting fault injection results

NC STATE UNIVERSITY

DSN 2007

Outline• Exploiting ITR for Fault Tolerance

• ITR Cache for Checking Decode Signals

• ITR Cache Hit/Miss Scenarios

• Microarchitecture Extensions to Superscalar

• Results– Repetition

– Fault Coverage Based on Cache Hits/Misses

– Fault Coverage Based on Fault Injection

• Area and Power Overhead

• Conclusion© Vimal Reddy 2007 37

NC STATE UNIVERSITY

DSN 2007Vimal Reddy © 2007 38

Area Overhead

• The IBM S/390 G5 replicates the fetch and decode units (I-unit)

• ITR cache configuration similar to BTB in the picture (2K entries, 2-way assoc., 35 bits per entry)

• ITR cache is 1/7th the I-unit

*From [slegel-IEEE-Micro-1999]

NC STATE UNIVERSITY

DSN 2007Vimal Reddy © 2007 39

Power Overhead• Redundant fetching is a major power overhead• Compare I-cache of IBM power4 (64KB, dm, 128B line, 1

rd./wr. port) with ITR cache (16KB, 2-way, 4B line, 1 rd. and 1wr. port)

• Excluded redundant decoding energy (more savings in reality)

0

10

20

30

40

50

60

70

80

90

100

bzip

gap

gcc

gzip

pars

er perl

twolf

vorte

xvp

r

applu ap

si art

equa

kem

grid

swim

wupwise

En

erg

y (m

J)

ITR cache 1rd/wrITR cache 1rd+1wrI-cache 1rd/wr

NC STATE UNIVERSITY

DSN 2007

Outline• Exploiting ITR for Fault Tolerance

• ITR Cache for Checking Decode Signals

• ITR Cache Hit/Miss Scenarios

• Microarchitecture Extensions to Superscalar

• Results– Repetition

– Fault Coverage Based on Cache Hits/Misses

– Fault Coverage Based on Fault Injection

• Area and Power Overhead

• Conclusion© Vimal Reddy 2007 40

NC STATE UNIVERSITY

DSN 2007

Conclusion

• Inherent Time Redundancy is a new opportunity for efficient fault tolerance

• 96% of faults on decode signals detected by small ITR cache (~16 KB)

• Future work:– Handling benchmarks with less repetition

– Exploiting ITR for other input-independent parts of the processor

– Evaluating a comprehensive fault regimen that includes ITR

© Vimal Reddy 2007 41