nc state university dsn 2007 © vimal reddy 2007 1 inherent time redundancy (itr): using program...
TRANSCRIPT
NC STATE UNIVERSITY
DSN 2007© Vimal Reddy 2007 1
Inherent Time Redundancy (ITR): Using Program Repetition for
Low-Overhead Fault Tolerance
Vimal Reddy
Eric Rotenberg
Center for Efficient, Secure and Reliable Computing (CESR)Dept. of Electrical & Computer EngineeringNorth Carolina State UniversityRaleigh, NC
NC STATE UNIVERSITY
DSN 2007© Vimal Reddy 2007 2
Processor Fault Tolerance
• Main approaches: Time or space redundancy– Explicitly duplicate in time or space
– Match values/signals
– E.g., AR-SMT (time), IBM S/390 G5 (space)
• High overheads– Area/power/performance
– Unsuitable for commodity high-performance processors
NC STATE UNIVERSITY
DSN 2007
New Idea: Inherent Time Redundancy
© Vimal Reddy 2007 3
==
==
==
==
==
==
==
==
program program duplicate program
Conventional time redundancy Inherent time redundancy
NC STATE UNIVERSITY
DSN 2007
Outline• Exploiting ITR for Fault Tolerance
• ITR Cache for Checking Decode Signals
• ITR Cache Hit/Miss Scenarios
• Microarchitecture Extensions to Superscalar
• Results– Repetition
– Fault Coverage Based on Cache Hits/Misses
– Fault Coverage Based on Fault Injection
• Area and Power Overhead
• Conclusion© Vimal Reddy 2007 4
NC STATE UNIVERSITY
DSN 2007
Exploiting ITR for Fault Tolerance
• Key ideas:– Record input-independent signals for instructions
– Confirm their correctness upon repetition
• This paper focuses on decode signals– Provides protection for fetch and decode stages
• Longer term goal– Apply ITR to other input-independent stages
– My thesis combines ITR with other microarch.-level checks for a comprehensive fault regimen
© Vimal Reddy 2007 5
NC STATE UNIVERSITY
DSN 2007
Outline• Exploiting ITR for Fault Tolerance
• ITR Cache for Checking Decode Signals
• ITR Cache Hit/Miss Scenarios
• Microarchitecture Extensions to Superscalar
• Results– Repetition
– Fault Coverage Based on Cache Hits/Misses
– Fault Coverage Based on Fault Injection
• Area and Power Overhead
• Conclusion© Vimal Reddy 2007 6
NC STATE UNIVERSITY
DSN 2007
ITR Cache for Checking Decode Signals
• Create decode signatures at trace granularity– Trace ends at branch or 16 instructions (arbitrary)
– Combine decode signals of instructions in a trace
• Record decode signatures in an ITR cache– Indexed by start PC (program counter) of a trace
• For each new trace, compare with ITR cache– Fault if new signature mismatches ITR cache
© Vimal Reddy 2007 7
NC STATE UNIVERSITY
DSN 2007
Outline• Exploiting ITR for Fault Tolerance
• ITR Cache for Checking Decode Signals
• ITR Cache Hit/Miss Scenarios
• Microarchitecture Extensions to Superscalar
• Results– Repetition
– Fault Coverage Based on Cache Hits/Misses
– Fault Coverage Based on Fault Injection
• Area and Power Overhead
• Conclusion© Vimal Reddy 2007 8
NC STATE UNIVERSITY
DSN 2007
pc(I)
Scenario 1: ITR Cache Hit
© Vimal Reddy 2007 9
Signature GenerationIJK S2 (I, J, K)S2 (I, J, K)
Tags Signatures
S1 (I, J, K)pc(I)
HitI
≠
Fault Detected
Notes:-- Faults on traces that hit (S2) are detectable-- These faults are recoverable by flushing the processor pipeline and restarting from the faulting trace
ITR Cache
NC STATE UNIVERSITY
DSN 2007
Scenario 2: ITR Cache Miss Followed by Hit
© Vimal Reddy 2007 10
Signature GenerationIJK S1 (I, J, K)S1 (I, J, K)
Tags Signatures
S1 (I, J, K)pc(I)
Miss
pc(I)I
ITR Cache
NC STATE UNIVERSITY
DSN 2007
pc(I)
© Vimal Reddy 2007 11
Signature GenerationIJK S2 (I, J, K)S2 (I, J, K)
Tags Signatures
S1 (I, J, K)pc(I)
Hit
≠
Scenario 2: ITR Cache Miss Followed by Hit
Fault Detected
Notes:-- Faults on traces that miss but later hit (faulty S1) are detectable-- Faults on S1 need checkpoint for recovery or must abort
ITR Cache
NC STATE UNIVERSITY
DSN 2007
Scenario 3: ITR Cache Miss Followed by Eviction
© Vimal Reddy 2007 12
Signature GenerationIJK S1 (I, J, K)S1 (I, J, K)
Tags Signatures
S1 (I, J, K)pc(I)
Miss
pc(I)I
ITR Cache
NC STATE UNIVERSITY
DSN 2007
pc(M)
© Vimal Reddy 2007 13
Signature GenerationMNO S2 (M, N, O)S2 (M, N, O)
Tags Signatures
Evict
Notes:-- Faults on missed & evicted traces (S1) are not detected
Scenario 3: ITR Cache Miss Followed by Eviction
S2 (M, N, O)S1 (I, J, K)pc(M)pc(I)
Fault not detected
ITR Cache
NC STATE UNIVERSITY
DSN 2007
Outline• Exploiting ITR for Fault Tolerance
• ITR Cache for Checking Decode Signals
• ITR Cache Hit/Miss Scenarios
• Microarchitecture Extensions to Superscalar
• Results– Repetition
– Fault Coverage Based on Cache Hits/Misses
– Fault Coverage Based on Fault Injection
• Area and Power Overhead
• Conclusion© Vimal Reddy 2007 14
NC STATE UNIVERSITY
DSN 2007
Microarchitecture Extensions to Superscalar
© Vimal Reddy 2007 15
I$
Next PC Logic
PC
Decode
Rename
Store Buffer
Reorder Buffer (ROB)
FU FU FU
Memory Hierarchy
RegisterFile
Arch.State
ooo issue
dispatch
commit
loadreg
read
store regwriteback
BranchChkpts
Signature
ITR ROB
Start PCBranch or 16 instr.?
ITRCache
== checkrecord
Start PC Signature chk miss retry
Decode signals redirected for signature generation and ITR cache access
NC STATE UNIVERSITY
DSN 2007
Microarchitecture Extensions to Superscalar
© Vimal Reddy 2007 16
I$
Next PC Logic
PC
Decode
Rename
Store Buffer
Reorder Buffer (ROB)
FU FU FU
Memory Hierarchy
RegisterFile
Arch.State
ooo issue
dispatch
commit
loadreg
read
store regwriteback
BranchChkpts
Signature
ITR ROB
Start PCBranch or 16 instr.?
ITRCache
== checkrecord
Start PC Signature chk miss retry
Signature generation: Bitwise XOR decode signals of instr. from a trace
NC STATE UNIVERSITY
DSN 2007
Microarchitecture Extensions to Superscalar
© Vimal Reddy 2007 17
I$
Next PC Logic
PC
Decode
Rename
Store Buffer
Reorder Buffer (ROB)
FU FU FU
Memory Hierarchy
RegisterFile
Arch.State
ooo issue
dispatch
commit
loadreg
read
store regwriteback
BranchChkpts
Signature
ITR ROB
Start PCBranch or 16 instr.?
ITRCache
== checkrecord
Start PC Signature chk miss retry
Signatures dispatched into ITR ROB(ITR cache more effective without wrong-path trace signatures)
NC STATE UNIVERSITY
DSN 2007
Microarchitecture Extensions to Superscalar
© Vimal Reddy 2007 18
I$
Next PC Logic
PC
Decode
Rename
Store Buffer
Reorder Buffer (ROB)
FU FU FU
Memory Hierarchy
RegisterFile
Arch.State
ooo issue
dispatch
commit
loadreg
read
store regwriteback
BranchChkpts
Signature
ITR ROB
Start PCBranch or 16 instr.?
ITRCache
== checkrecord
Start PC Signature chk miss retry
Access ITR cache with trace start PC
NC STATE UNIVERSITY
DSN 2007
Microarchitecture Extensions to Superscalar
© Vimal Reddy 2007 19
I$
Next PC Logic
PC
Decode
Rename
Store Buffer
Reorder Buffer (ROB)
FU FU FU
Memory Hierarchy
RegisterFile
Arch.State
ooo issue
dispatch
commit
loadreg
read
store regwriteback
BranchChkpts
Signature
ITR ROB
Start PCBranch or 16 instr.?
ITRCache
== checkrecord
Start PC Signature chk miss retry
Signature mismatch, set retry bit -- Flush pipeline and retry from faulting trace
NC STATE UNIVERSITY
DSN 2007
Microarchitecture Extensions to Superscalar
© Vimal Reddy 2007 20
I$
Next PC Logic
PC
Decode
Rename
Store Buffer
Reorder Buffer (ROB)
FU FU FU
Memory Hierarchy
RegisterFile
Arch.State
ooo issue
dispatch
commit
loadreg
read
store regwriteback
BranchChkpts
Signature
ITR ROB
Start PCBranch or 16 instr.?
ITRCache
== checkrecord
Start PC Signature chk miss retry
Fault re-detected, old signature is faulty -- Abort or rollback to a safe checkpoint
Fault disappears, successful recovery
NC STATE UNIVERSITY
DSN 2007
Microarchitecture Extensions to Superscalar
© Vimal Reddy 2007 21
I$
Next PC Logic
PC
Decode
Rename
Store Buffer
Reorder Buffer (ROB)
FU FU FU
Memory Hierarchy
RegisterFile
Arch.State
ooo issue
dispatch
commit
loadreg
read
store regwriteback
BranchChkpts
Signature
ITR ROB
Start PCBranch or 16 instr.?
ITRCache
== checkrecord
Start PC Signature chk miss retry
ITR cache miss -- Write new signature to ITR cache
NC STATE UNIVERSITY
DSN 2007
Microarchitecture Extensions to Superscalar
© Vimal Reddy 2007 22
I$
Next PC Logic
PC
Decode
Rename
Store Buffer
Reorder Buffer (ROB)
FU FU FU
Memory Hierarchy
RegisterFile
Arch.State
ooo issue
dispatch
commit
loadreg
read
store regwriteback
BranchChkpts
Signature
ITR ROB
Start PCBranch or 16 instr.?
ITRCache
== checkrecord
Start PC Signature chk miss retry
No fault detected (or possible loss of detection due to miss-followed-by-eviction case)
NC STATE UNIVERSITY
DSN 2007
Outline• Exploiting ITR for Fault Tolerance
• ITR Cache for Checking Decode Signals
• ITR Cache Hit/Miss Scenarios
• Microarchitecture Extensions to Superscalar
• Results– Repetition
– Fault Coverage Based on Cache Hits/Misses
– Fault Coverage Based on Fault Injection
• Area and Power Overhead
• Conclusion© Vimal Reddy 2007 23
NC STATE UNIVERSITY
DSN 2007
Outline• Exploiting ITR for Fault Tolerance
• ITR Cache for Checking Decode Signals
• ITR Cache Hit/Miss Scenarios
• Microarchitecture Extensions to Superscalar
• Results– Repetition
– Fault Coverage Based on Cache Hits/Misses
– Fault Coverage Based on Fault Injection
• Area and Power Overhead
• Conclusion© Vimal Reddy 2007 24
NC STATE UNIVERSITY
DSN 2007
Results: Repetition in SPEC2K INT
© Vimal Reddy 2007 25
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600 700 800 900 1000Number of static traces
bzipgapgccgzipparserperltwolfvortexvpr
% o
f tot
al d
ynam
ic in
stru
ctio
ns
NC STATE UNIVERSITY
DSN 2007© Vimal Reddy 2007 26
Results: Repetition in SPEC2K FP%
of t
otal
dyn
amic
inst
ruct
ions
0
10
20
30
40
50
60
70
80
90
100
0 50 100 150 200 250 300 350 400 450 500Number of static traces
appluapsiartequakemgridswimwupwise
NC STATE UNIVERSITY
DSN 2007© Vimal Reddy 2007 27
Results: Proximity in SPEC2K INT%
of t
otal
dyn
amic
inst
ruct
ions
# dynamic instructions separating repetitive traces
0
10
20
30
40
50
60
70
80
90
100
bzipgzipparsergapvprgcctwolfperlvortex
NC STATE UNIVERSITY
DSN 2007© Vimal Reddy 2007 28
Results: Proximity in SPEC2K FP
# dynamic instructions separating repetitive traces
% o
f tot
al d
ynam
ic in
stru
ctio
ns
0
10
20
30
40
50
60
70
80
90
100
appluapsiartequakemgridswimwupwise
NC STATE UNIVERSITY
DSN 2007
Outline• Exploiting ITR for Fault Tolerance
• ITR Cache for Checking Decode Signals
• ITR Cache Hit/Miss Scenarios
• Microarchitecture Extensions to Superscalar
• Results– Repetition
– Fault Coverage Based on Cache Hits/Misses
– Fault Coverage Based on Fault Injection
• Area and Power Overhead
• Conclusion© Vimal Reddy 2007 29
NC STATE UNIVERSITY
DSN 2007
Fault Coverage Based on ITR Cache Misses• Loss in fault detection coverage
– Miss-followed-by-eviction causes loss in fault detection– (# of missed & evicted traces) x instructions/trace
• Loss in fault recovery coverage (w/o checkpoints)– Misses cause loss in fault recovery– (# of missed traces) x instructions/trace
• Tried various ITR cache configurations– Direct-mapped (DM), 2,4,8,16-way, fully-associative (FA)
– 256, 512 and 1024 signatures– Bzip, gzip, art, mgrid and wupwise had <1% loss in
coverage (results not shown)
© Vimal Reddy 2007 30
NC STATE UNIVERSITY
DSN 2007
Results: Loss in Fault Detection Coverage
© Vimal Reddy 2007 31
Coverage loss correlates with proximity of repetitione.g., perl and vortex
05
10152025303540455055
gap gcc parser perl twolfvortex vpr applu apsi equaswim
256 signatures
512 signatures
1024 signatures
Capacity is important for poor performers
For 2-way, 1024 signatures: 8% max. (vortex) and 1.3% avg. loss in detection coverage
% o
f to
tal d
yna
mic
inst
ruct
ion
s
NC STATE UNIVERSITY
DSN 2007
Results: Loss in Fault Recovery Coverage
© Vimal Reddy 2007 32
Recovery coverage loss is biggerthan loss indetection coverage
For 2-way, 1024signatures: 15% max. (vortex) and 2.5% avg. loss in recovery coverage
% o
f to
tal d
yna
mic
inst
ruct
ion
s
05
10152025303540455055
gap gcc parser perl twolfvortex vpr applu apsi equaswim
256 signatures
512 signatures
1024 signatures
NC STATE UNIVERSITY
DSN 2007
Outline• Exploiting ITR for Fault Tolerance
• ITR Cache for Checking Decode Signals
• ITR Cache Hit/Miss Scenarios
• Microarchitecture Extensions to Superscalar
• Results– Repetition
– Fault Coverage Based on Cache Hits/Misses
– Fault Coverage Based on Fault Injection
• Area and Power Overhead
• Conclusion© Vimal Reddy 2007 33
NC STATE UNIVERSITY
DSN 2007
Fault Injection Experiments
© Vimal Reddy 2007 34
• Injected faults on decode signals of a cycle-accurate simulator
• Based on MIPS R10K processor
• 1000 random faults per benchmark
NC STATE UNIVERSITY
DSN 2007
Decode Signals Modeled
© Vimal Reddy 2007 35
Field Description Widthopcode instruction opcode 8
flags
decoded control flags (is_int, is_fp, is_signed/unsigned, is_branch, is_uncond, is_ld, is_st, mem_left/right, is_RR, is_disp, is_direct, is_trap)
12
shamt shift amount 5rsrc1 source register operand 5rsrc2 source register operand 5rdst destination register operand 5lat execution latency 2
imm immediate 16num_rsrc number of source operands 2num_rdst number of destination operands 1mem_size size of memory word 3
Total width 64
NC STATE UNIVERSITY
DSN 2007
Fault Injection Outcomes
© Vimal Reddy 2007 36
Fault
SDC Mask
ITR+SDC
MayITR+SDC
Undet+SDC
ITR+Mask
Undet+Mask
MayITR+Mask
92% 97%
7%
1%
37%
3%
0%
63%
Refer paper for other interesting fault injection results
NC STATE UNIVERSITY
DSN 2007
Outline• Exploiting ITR for Fault Tolerance
• ITR Cache for Checking Decode Signals
• ITR Cache Hit/Miss Scenarios
• Microarchitecture Extensions to Superscalar
• Results– Repetition
– Fault Coverage Based on Cache Hits/Misses
– Fault Coverage Based on Fault Injection
• Area and Power Overhead
• Conclusion© Vimal Reddy 2007 37
NC STATE UNIVERSITY
DSN 2007Vimal Reddy © 2007 38
Area Overhead
• The IBM S/390 G5 replicates the fetch and decode units (I-unit)
• ITR cache configuration similar to BTB in the picture (2K entries, 2-way assoc., 35 bits per entry)
• ITR cache is 1/7th the I-unit
*From [slegel-IEEE-Micro-1999]
NC STATE UNIVERSITY
DSN 2007Vimal Reddy © 2007 39
Power Overhead• Redundant fetching is a major power overhead• Compare I-cache of IBM power4 (64KB, dm, 128B line, 1
rd./wr. port) with ITR cache (16KB, 2-way, 4B line, 1 rd. and 1wr. port)
• Excluded redundant decoding energy (more savings in reality)
0
10
20
30
40
50
60
70
80
90
100
bzip
gap
gcc
gzip
pars
er perl
twolf
vorte
xvp
r
applu ap
si art
equa
kem
grid
swim
wupwise
En
erg
y (m
J)
ITR cache 1rd/wrITR cache 1rd+1wrI-cache 1rd/wr
NC STATE UNIVERSITY
DSN 2007
Outline• Exploiting ITR for Fault Tolerance
• ITR Cache for Checking Decode Signals
• ITR Cache Hit/Miss Scenarios
• Microarchitecture Extensions to Superscalar
• Results– Repetition
– Fault Coverage Based on Cache Hits/Misses
– Fault Coverage Based on Fault Injection
• Area and Power Overhead
• Conclusion© Vimal Reddy 2007 40
NC STATE UNIVERSITY
DSN 2007
Conclusion
• Inherent Time Redundancy is a new opportunity for efficient fault tolerance
• 96% of faults on decode signals detected by small ITR cache (~16 KB)
• Future work:– Handling benchmarks with less repetition
– Exploiting ITR for other input-independent parts of the processor
– Evaluating a comprehensive fault regimen that includes ITR
© Vimal Reddy 2007 41