reliability aware circuit optimization aware circuit optimization ... i deeply appreciate the...
TRANSCRIPT
Reliability Aware Circuit Optimization
Submitted in partial fulfillment of the requirements for
the degree of
Doctor of Philosophy
in
Electrical and Computer Engineering
Kai-Chiang Wu
B.S., Computer Science, National Tsing Hua University M.S., Computer Science, National Tsing Hua University
Carnegie Mellon University Pittsburgh, PA
August 2011
ii
Acknowledgements
I am very grateful to my advisor, Prof. Diana Marculescu, for her guidance and support,
professional and personal, throughout these years. I haven been blessed to have her as my
Ph.D. advisor at Carnegie Mellon. I deeply appreciate the experience and wisdom she im-
parted to me. Also, special thanks go to the sources of my financial aid, including National
Science Foundation, Carnegie Mellon CyLab, and Liang Ji-Dian Fellowship. This disserta-
tion would not have been possible without these assistance and support.
I would like to express my gratitude to the members of my thesis committee, Prof. Rob
Rutenbar, Prof. Shawn Blanton, Dr. Frank Liu at IBM, and Dr. Vikas Chandra at ARM, for
their valuable time and constructive feedback, which greatly enriched the dissertation from
various aspects.
To my research group, the EnyAC-ers, Natasa Miskov-Zivanov, Siddharth Garg, Puru
Choudhary, Sebastian Herbert, Lavanya Subramanian, Da-Cheng Juan, Wan-Ping Lee,
Ming-Chao Lee, and Yi-Lin Chuang, I am thankful for many discussions we had, about our
research and beyond. It was an unforgettable memory while being with you.
iii
I am forever grateful to my parents for their endless love and encouragement. Finally, I
would like to thank my wife, Sung-En Huang, who have always been there and given me the
courage to move along.
iv
Abstract
Due to current technology scaling trends such as shrinking feature sizes and decreasing
supply voltages, nanoscale integrated circuits are becoming increasingly sensitive to radia-
tion-induced transient faults (soft errors). Logical masking, electrical masking, and latch-
ing-window masking, which used to effectively prevent transient events in logic circuits from
being latched into memory elements, are weakened with continuous scaling trends. Therefore,
soft errors, which have been a great concern in memories, are now a main factor in reliability
degradation of logic circuits. Unless explicitly dealt with, the soft error rate (SER) of logic is
expected to be comparable to that of unprotected memories.
Negative Bias Temperature Instability (NBTI), a PMOS aging phenomenon causing sig-
nificant loss on circuit performance and lifetime, is becoming a critical challenge for tempo-
ral reliability concerns in nanoscale designs. In the literature, NBTI-induced PMOS aging has
been demonstrated to be an exponential function of oxide thickness and operating tempera-
ture. With aggressive technology scaling trends such as thinner gate oxide without propor-
tional downscaling of supply voltage, the need of an optimization flow considering NBTI
effects during early design stages emerges.
v
This dissertation research presents low-cost methodologies for reducing circuit SER and
mitigating NBTI-induced performance degradation. For SER reduction, three approaches
based on redundancy addition and removal (RAR), selective voltage scaling (SVS), and
clock skew scheduling (CSS) are proposed to provide compounding improvements. For NBTI
mitigation, joint logic restructuring (LR) and pin reordering (PR) are exploited to combat
performance degradation, with path sensitization explicitly considered. Finally, the recovery
mechanism of NBTI and the use of reverse body bias are explored to achieve lifetime exten-
sion for power-gated circuits.
vi
Table of Contents
Chapter 1 Introduction..........................................................................................................1 1.1 Thesis Motivation...................................................................................................1 1.2 Thesis Overview and Contribution ........................................................................6
Chapter 2 Background and Related Work ........................................................................11 2.1 Soft Error Rate (SER) Modeling and Analysis.....................................................11
2.1.1 Problem Statement .....................................................................................17 2.1.2 Prior Work on SER Reduction (for Soft Error Tolerance) .........................18
2.2 Negative Bias Temperature Instability (NBTI) Modeling and Analysis...............20 2.2.1 Problem Statement .....................................................................................22 2.2.2 Prior Work on NBTI Mitigation (against NBTI-Induced Performance
Degradation) ..............................................................................................23
SER REDUCTION.....................................................................................................................25
Chapter 3 SER Reduction via Redundancy Addition and Removal (RAR)...................26 3.1 RAR-Based Approach for SER Reduction............................................................32
3.1.1 Wire Addition Constraint ...........................................................................35 3.1.2 Wire Removal Constraint...........................................................................38 3.1.3 Topology Constraint on Candidate Addition and Removal .......................41
3.2 Gate Resizing for SER Reduction ........................................................................50 3.3 Experimental Results ...........................................................................................52 3.4 Concluding Remarks............................................................................................57
Chapter 4 SER Reduction via Selective Voltage Scaling (SVS) .......................................59 4.1 Effects of Voltage Scaling.....................................................................................62 4.2 Problem Formulation...........................................................................................64 4.3 Dual-VDD SER Reduction Framework .................................................................65 4.4 Bi-Partitioning for Power-Planning Awareness ..................................................73
4.4.1 Problem Description ..................................................................................73 4.4.2 Cost Function.............................................................................................76
4.5 Experimental Results ...........................................................................................79 4.6 Concluding Remarks............................................................................................87
vii
Chapter 5 SER Reduction via Clock Skew Scheduling (CSS) .........................................89 5.1 A Motivating Example..........................................................................................91
5.1.1 Implication-Based Masking.......................................................................95 5.1.2 Mutually-Exclusive Propagation ...............................................................98
5.2 Clock Skew Scheduling Based on Piecewise Linear Programming (PLP)........101 5.2.1 Problem Formulation ...............................................................................102 5.2.2 Interaction with Other Techniques...........................................................108
5.3 Experimental Results .........................................................................................109 5.4 Concluding Remarks..........................................................................................114 5.5 Impact of Technology Scaling and Process Variability on SER.........................115
NBTI MITIGATION ................................................................................................................117
Chapter 6 NBTI Mitigation via Joint Logic Restructuring (LR) and Pin Reordering (PR) ...................................................................................................................118
6.1 Proposed Methodology ......................................................................................121 6.1.1 Logic Restructuring .................................................................................122 6.1.2 Pin Reordering .........................................................................................131
6.2 Interplay between NBTI and Hot Carrier Injection (HCI) ................................133 6.3 Experimental Results .........................................................................................134 6.4 Concluding Remarks..........................................................................................138
Chapter 7 NBTI Mitigation Considering Path Sensitization .........................................140 7.1 Impact of Path Sensitization on Aging-Aware Timing Analysis.........................141
7.1.1 Sensitizable Paths vs. False Paths............................................................141 7.1.2 Aging-Aware Timing Analysis Considering Path Sensitization ..............143
7.2 Proposed Methodology for Aging-Aware Timing Optimization.........................146 7.2.1 Efficient Identification of Critical Sub-Circuits Considering Path
Sensitization.............................................................................................147 7.2.2 Achieving Full Coverage of Critical Sensitizable Paths..........................152 7.2.3 Proposed Algorithm Description .............................................................154 7.2.4 Impact of Process Variability ...................................................................156
7.3 Experimental Results .........................................................................................157 7.4 Concluding Remarks..........................................................................................161
Chapter 8 NBTI Mitigation for Power-Gated Circuits ..................................................163 8.1 Aging Analysis for Power-Gated Circuits..........................................................167
8.1.1 NBTI Degradation Model for Logic Networks .......................................167
viii
8.1.2 NBTI Degradation Model for Sleep Transistors......................................167 8.2 Lifetime Extension for Power-Gated Circuits....................................................172
8.2.1 Problem Formulation ...............................................................................173 8.2.2 Exploring NBTI Recovery via ST Redundancy ......................................174 8.2.3 Applying Reverse Body Bias...................................................................177
8.3 Experimental Results .........................................................................................179 8.4 Concluding Remarks..........................................................................................184
Chapter 9 Summary...........................................................................................................186
Bibliography .........................................................................................................................188
Glossary (Index of Terms)...................................................................................................195
ix
List of Figures
Figure 1-1: Thesis scope for SER reduction .............................................................................7
Figure 1-2: Thesis scope for NBTI mitigation..........................................................................9
Figure 2-1: An example circuit (C17) from the ISCAS’85 benchmark suite .........................15
Figure 2-2: Duration ADDs for a glitch originating at gate G2, and passing through gates G3 and G5, respectively .............................................16
Figure 3-1: Duration ADDs associated with mean masking impact on duration of gate G5 ..29
Figure 3-2: An example of redundancy addition and removal [46]........................................33
Figure 3-3: Changes in MEI and MMI after adding wire w (s t) .......................................35
Figure 3-4: Changes in MEI and MMI after removing wire w’ (u v) ................................38
Figure 3-5: An example of Constraint 4 and the effect of redundancy on soft error robustness42
Figure 3-6: The overall algorithm of our RAR-based approach for SER reduction...............49
Figure 3-7: Output failure probabilities of all primary outputs before and after optimization55
Figure 3-8: SER-aware optimization using: (i) the proposed RAR-based approach only (blue), (ii) the gate resizing strategy only (purple), and (iii) integrated RAR and gate resizing methodology (yellow) ............................56
Figure 4-1: HSPICE simulations for glitch generation and propagation: the plots on the top are for the low supply voltage (1.0V) and those on the bottom are for the high supply voltage (1.2V). ...............................63
Figure 4-2: An illustrative example of scaling criticality (SC): SC(G2) estimates the decrease in MEI of gate G1 after gate G2 has been scaled up to VDD
H............................................................................................................66
Figure 4-3: Effects of two refinement techniques: in both cases, the numbers of required LCs decrease by one in terms of output loading. .........70
x
Figure 4-4: The overall algorithm of our SVS-based approach for SER reduction................72
Figure 4-5: An example of a move in the FM-based bi-partitioning framework: switch the supply voltage of gate G3 from VDD
H to VDDL....................................76
Figure 4-6: Cost function: a weighted combination of the cut size (|cut|) and the number of required LCs (#LC) ..................................77
Figure 4-7: The proposed FM-based methodology for power-planning awareness ...............78
Figure 4-8: SER reduction vs. power and delay overheads....................................................84
Figure 4-9: Mean error impact (MEI) distributions................................................................85
Figure 4-10: SER reduction with different lower and upper bounds......................................87
Figure 5-1: An example circuit (s27) from the ISCAS’89 benchmark suite ..........................92
Figure 5-2: Overlapping of error-latching windows...............................................................94
Figure 5-3: Illustrative relationships between a pair of flip-flops (X and Y) as candidates for clock skew scheduling .............................................................98
Figure 5-4: Generalized clock skew scheduling of a candidate pair of flip-flops (FFi and FFj) for MBU-aware soft error tolerance .......................103
Figure 5-5: fij versus sij, with four intervals that are piecewise linear: sij = (di – dj) – (tsu + th), sij = (di – dj), and sij = (di – dj) + (tsu + th) ....................106
Figure 5-6: SER reduction vs. normalized absolute adjustment in clock signal ..................113
Figure 5-7: Mitigation of MBU effects during clock cycles subsequent to particle hits ......114
Figure 6-1: NBTI effect vs. signal probability......................................................................119
Figure 6-2: NBTI effect vs. transistor stacking ....................................................................121
Figure 6-3: A supergate (SG) and its most critical path segment (MCPS) ...........................124
Figure 6-4: An example of logic restructuring......................................................................130
Figure 6-5: The overall algorithm for NBTI mitigation .......................................................133
Figure 6-6: Recovery of NBTI-induced performance degradation ......................................137
Figure 6-7: Number of critical PMOS transistors vs. stress probability...............................138
xi
Figure 7-1: Criteria of path sensitization ..............................................................................142
Figure 7-2: A longest topological path that is false (un-sensitizable)...................................143
Figure 7-3: An example circuit (C17) for illustrating our methodology ..............................148
Figure 7-4: A case of missing sensitizable paths ..................................................................153
Figure 7-5: The overall algorithm for aging-aware timing optimization..............................155
Figure 7-6: Aging-aware timing optimization with path sensitization considered...............159
Figure 7-7: Incremental recovery of aging-induced performance degradation ....................161
Figure 8-1: A header-based power gating structure ..............................................................164
Figure 8-2: Analysis results of the proposed model for power-gated circuits ......................171
Figure 8-3: HSPICE validation with a chain of inverters .....................................................172
Figure 8-4: NBTI-aware power gating design......................................................................175
Figure 8-5: Aging behaviors of PMOS transistors with different Vth values ........................178
Figure 8-6: Comparison of aging behaviors with various settings .......................................180
Figure 8-7: Lifetime vs. Vb (bulk voltage) ............................................................................181
Figure 8-8: Lifetime vs. ST redundancy ...............................................................................183
xii
List of Tables
Table 3-1: MEI and MMI of gates in Figure 3-5: the second and third columns are for gates in Figure 3-5(a), the fourth and fifth for gates in Figure 3-5(b), and the sixth to eight for gates in Figure 3-5(c). ..........................................................47
Table 3-2: Average mean error susceptibility (MES) improvement and overall soft error rate (SER) reduction............................................................54
Table 4-1: Average mean error susceptibility (MES) improvement and overall soft error rate (SER) reduction............................................................80
Table 5-1: Average mean error susceptibility (MES) improvement and overall soft error rate (SER) reduction..........................................................111
Table 6-1: Recovery of NBTI-induced performance degradation ........................................136
Table 7-1: Aging-aware timing analysis with and without path sensitization considered ....145
Table 7-2: Aging-aware timing optimization with path sensitization considered.................158
Table 8-1: Optimization results of lifetime and leakage .......................................................182
1
Chapter 1 Introduction
1.1 Thesis Motivation
Circuit reliability, usually measured by failure-in-time (FIT), has become a critical chal-
lenge for achieving robustness in nanoscale designs. The 2009 International Technology
Roadmap for Semiconductor (ITRS) [1] projects that the long-term reliability of sub-100nm
integrated circuits is on the order of 1000 FITs (failures in a billion hours). Soft errors, proc-
ess variations, and device aging phenomena are currently some of the main factors in reli-
ability degradation. With the continuous scaling of transistor dimensions, soft errors, which
cause unpredictable transient circuit failure, are becoming increasingly dominant for func-
2
tional reliability concerns [2]. On the other hand, device aging phenomena, which cause
significant loss on circuit performance and lifetime, are becoming increasingly dominant for
temporal reliability concerns [3]. Therefore, the need of an optimization flow considering
soft errors and aging effects in early design stages emerges.
A radiation-induced charged particle passing through a microelectronic device ionizes
the material along its path and generates free pairs of electrons and holes. The free (ionized)
carriers deposited around the particle track can be attracted or repelled by an internal electric
field of the device and lead to an electrical pulse, referred to as a single-event transient (SET)
or a glitch. A single-event upset (SEU) or a soft error refers to transient bit corruption that
occurs when a single-event transient is large enough to flip the state of a storage node. The
rate at which soft errors occur is called soft error rate (SER).
Traditionally, soft errors in both static and dynamic memories have drawn much atten-
tion due to their regularity and vulnerability. Unlike SETs in logic which need to be propa-
gated to outputs before being captured, soft errors happen in memories as long as particles
(with high enough energy) strike. During SET propagation, three mechanisms used to pro-
vide logic circuits with effective protection against soft errors:
3
1) Logical masking: A SET which is not on a sensitized path from the location where it
originates is logically masked. Once a SET is logically masked, it no longer has any in-
fluence on the target circuit; i.e., both of its amplitude and duration become zero.
2) Electrical masking: A SET which is attenuated and becomes too small in amplitude or
duration to be latched is electrically masked. While a SET may be latched if its attenu-
ated amplitude and duration are still large enough, electrical masking can reduce the
overall impact of SETs.
3) Latching-window (timing) masking: A SET which does not arrive “on time” is also
masked, depending on the setup and hold times of the target memory element. The basic
condition for a SET to be latched is to have its duration greater than the sum of setup
and hold times and to reach the memory element during the latching window.
These three mechanisms prevent some SETs from being latched and alleviate the effects
of soft errors in digital systems. However, continuous scaling trends have negative impact on
these masking mechanisms. Decreasing gate count and logic depth in super-pipeline stages
reduce the probability of logical masking since the path from where a SET originates to a
latch is more easily sensitized. Lower supply voltage and node capacitance needed by ul-
4
tra-low power designs not only decrease the critical charge for SETs, but also diminish the
pulse attenuation due to electrical masking. Higher clock frequency increases the number of
latching windows per unit of time and thus facilitates SET latching. As a result, soft errors in
logic become as great of a concern as in memories, where soft errors can be mitigated by
conventional error detecting and correcting codes. A recent study [4] showed that soft errors
significantly degrade the robustness of logic circuits, while the nominal SER of SRAMs
tends to be nearly constant from 130nm to 65nm technologies. Unless explicitly dealt with,
the SER of logic circuits was predicted to be comparable to that of unprotected memory
elements by 2011 [5]. Therefore, not only mission-critical applications, but also mainstream
commercial applications should be capable of soft error tolerance/resilience.
As for device aging, negative bias temperature instability (NBTI) on which this thesis
work focuses is known for prevailing over other device aging phenomena. NBTI [6] is a
PMOS aging phenomenon that occurs when PMOS transistors are stressed under Negative
Bias (Vgs = -Vdd) at elevated Temperature. NBTI-induced PMOS aging refers to the genera-
tion of interface traps along the silicon-oxide (Si-SiO2) interface due to the dissociation of
Si-H bonds. These traps manifest themselves as an increase in the magnitude of PMOS
5
threshold voltage (|Vth|, as much as 50mV over 10 years [7]), and in turn slow down the rising
propagation of logic gates. If the performance degradation continues and finally exceeds a
tolerable limit, the circuit lifetime will also be influenced since the timing specification is no
longer met. In contrast, the aging mechanism can be partially recovered by annealing gener-
ated interface traps when the stress condition is relaxed (Vgs = 0).
At traditional technology nodes, the NBTI problem is not so severe because the electric
field across gate oxide is small. However, as technology scaling proceeds aggressively, e.g.,
thinner gate oxide without proportional downscaling of supply voltage and higher operating
temperature due to higher power density, the dissociation of Si-H bonds is accelerated and
thus, the rate of NBTI-induced performance degradation is getting faster. Experiments on
PMOS aging [8] indicate that NBTI effects grow exponentially with thinner gate oxide and
higher operating temperature. If the thickness of gate oxide shrinks down to 4nm, the circuit
performance can be degraded by as much as 15% after 10 years of stress and lifetime will be
dominated by NBTI [9].
In addition to the oxide thickness and operating temperature, NBTI-induced perform-
ance degradation strongly depends on the amount of time during which a PMOS transistor is
stressed. In [10][11][12], the increase in threshold voltage has been demonstrated to be a
6
logarithmic function of the corresponding stress time. A PMOS under DC stress (i.e., duty
cycle = 1) suffers from static NBTI and ages very rapidly. Under a real AC stress condition
(i.e., duty cycle < 1), the NBTI impact is periodical and can be recovered, which results in a
lower extent of degradation. The stress time of a PMOS under AC stress is associated with its
stress probability, that is, the probability that Vgs is equal to -Vdd. For a NAND gate with
parallel pull-up PMOS transistors, the stress probability of any PMOS is simply the probabil-
ity of its input signal being logic “0”; for a NOR gate with series pull-up PMOS transistors,
the stress probability of a PMOS is the product of signal probabilities of its input and input(s)
to its upper PMOS transistor(s) in the stack. This parameter, based on the circuit topology
and input vectors, is distributed non-uniformly from transistor to transistor. The asymmetric
distribution may lead to 2-5X difference in the degradation rate of threshold voltage [13].
1.2 Thesis Overview and Contribution
Having discussed the importance of soft errors and NBTI in logic which motivates the
work on “reliability aware circuit optimization” (as it is titled), the main goal of this dis-
sertation research is to develop a low-cost, integrated framework that, given a logic circuit,
7
can optimize both its (i) functional reliability by reducing the overall SER and (ii) temporal
reliability by mitigating NBTI-induced performance degradation.
The scope of my thesis for SER reduction is outlined in Figure 1-1. Three approaches
for SER reduction are presented. The first one, based on redundancy addition and removal
(RAR), estimates the effects of redundancy manipulations and accepts only those with posi-
tive impact on circuit SER. Several metrics and constraints are proposed to guide the RAR
algorithm toward SER reduction in an efficient manner. The second approach, based on
selective voltage scaling (SVS), assigns a higher supply voltage to gates that have large error
impact and contribute most to the overall SER. The number of gates operating at the higher
voltage level, positively correlated with the power overhead, can be bounded by the appro-
priate use of level converters. The third approach, based on clock skew scheduling (CSS),
abcdO
123
1.0V 1.2VRedundancy addition and removal (RAR) changes the structure of the logic block
Selective voltage scaling(SVS) involves modification
on the power distribution
Clock skew scheduling(CSS) involves modification
on the clock network
Figure 1-1: Thesis scope for SER reduction
8
adjusts the arrival times of clock signals to memory elements (latches or flip-flops) such that
the probability of capturing unwanted transient pulses is decreased, as a result of more latch-
ing-window masking.
The major advantages over existing techniques are twofold: (i) lower design costs and
(ii) compounding results. Unlike some of existing SER reduction techniques based on dupli-
cation or resizing, which monotonically increase hardware resources without eliminating any,
the RAR-based approach focuses on restructuring the combinational block of a logic circuit
and incurs very little area overhead. By bounding the number of gates operating at high
supply voltage using level converters, the SVS-based approach significantly decreases the
power overhead and introduces only marginal delay penalty. As a post-processing procedure,
the CSS-based approach involves minor degree of clock network modification without
touching the logic block and thus, existing SER benefits from the two aforementioned ap-
proaches or other techniques such as duplication and resizing, if applied, will not be affected.
The scope of my thesis for NBTI mitigation is outlined in Figure 1-2. Joint logic re-
structuring and pin reordering are first exploited to combat performance degradation. Based
on detecting functional symmetries and transistor stacking effects, the proposed methodology
involves only wire perturbation and introduces no gate area overhead; therefore it can be
9
adopted as a pre-processing step when considering path sensitization for more accurate opti-
mization. It has been shown that mitigating aging effects while ignoring path sensitization
may lead to underestimation of circuit lifetime, thus pointing to the need of considering path
sensitization for aging-aware optimization when the impact of device aging is getting severe.
Finally, by exploring the recovery mechanism of NBTI, a scheduling algorithm for minimiz-
ing the NBTI effects on sleep transistors of a power-gated circuit is developed to extend its
lifetime for a longer period of reliable operation.
The salient feature of overall research contributions is that none of the reliability-aware
abcdO
123
Virtual VDD
Sleep
Joint logic restructuring (LR) and pin reordering (PR) for
the combinational block Consider path sensitization for more accurate and effective optimization
Redundant sleep transistors with RBB for
the power gating network
Figure 1-2: Thesis scope for NBTI mitigation
10
optimization techniques described above involves aggressive changes in logic circuits – all of
them incur favorable and affordable design penalties, while remarkably improving circuit
reliability. Furthermore, since all these proposed approaches can be embedded in existing
design flows, they can synergistically provide additive improvements when used together or
in conjunction with other techniques.
The rest of this dissertation is organized as follows: Chapter 2 reviews the background
of reliability modeling and analysis for SER (Chapter 2.1) and for NBTI (Chapter 2.2), and
also gives an overview of related work on SER reduction and NBTI mitigation. Three ap-
proaches for SER reduction based on redundancy addition and removal, selective voltage
scaling, and clock skew scheduling are presented in Chapter 3, Chapter 4, and Chapter 5,
respectively. A NBTI mitigation framework employing joint logic restructuring and pin
reordering is explained in Chapter 6; the NBTI-aware methodology is extended to consider
path sensitization, which is shown in Chapter 7; in Chapter 8, a novel strategy addressing the
NBTI issue in power-gated circuits is proposed. Finally, Chapter 9 summarizes this thesis
work.
11
Chapter 2 Background and Related Work
Used throughout this dissertation for our objective of reliability optimization, the mod-
eling and analysis frameworks for SER and NBTI are introduced in Chapter 2.1 and Chapter
2.2, respectively. Each of them is followed by a general statement of the corresponding opti-
mization problem and ends up with an overview of prior solutions.
2.1 Soft Error Rate (SER) Modeling and Analysis
Analyzing the soft error rate of a circuit accurately and efficiently is a crucial step for
SER reduction. Intensive research has been done so far in the area of SER modeling and
analysis. Among various existing modeling frameworks, we choose the symbolic one pre-
12
sented in [14]-[19] as the SER analysis engine. This symbolic SER analyzer, which provides
a unified treatment of three masking mechanisms through decision diagrams, enables us to
quantify the error impact and the masking impact of each gate in logic circuits. Hence, all
masking mechanisms, rather than one or two of them, are jointly considered as criteria for
SER reduction. To model a transient glitch originating at gate G to be latched at output F, the
following events can be defined:
A (Amplitude condition): The amplitude of a glitch at the output is larger than the
switching threshold of the latch (if the correct output value is “0”) or smaller than
the switching threshold (if the output value is “1”).
D (Duration condition): The duration of a glitch at the output is larger than the sum of
setup and hold times of the latch.
T (Timing condition): The glitch appears at the output on time; more specifically, it
satisfies the setup time and hold time requirements when the rising edge of the
clock occurs.
In this model, logical and electrical masking are implicitly included in A and D, while
latching-window masking is included in T. More formally, one can express these events as
13
follows:
A: A > Vs (if the correct output is “0”) or
A < Vs (if the correct output is “1”)
where A is the amplitude of the glitch and Vs is the switching threshold of the latch.
D: D > tsetup + thold
where D is the duration of the glitch, and tsetup and thold are the setup and hold times
of the latch.
T: t ∈ [T + thold – tp – D, T – tsetup – tp]
where t is the time when the initial glitch occurs, tp is the propagation delay from
gate G to output F, and T is the moment of a latch trigger (i.e., a clock edge).
The three events are necessary conditions for a soft error to happen. In addition, D is
satisfied only if A is satisfied (i.e., D ⊂ A). Under the assumption that t is uniformly distrib-
uted [20], the probability that a soft error occurs can be derived as:
P(A ∩ D ∩ T) = P(D ∩ T) = P(T | D)⋅P(D)
( )
∑
∑
⎟⎟⎠
⎞⎜⎜⎝
⎛=Ρ⋅
−+−
=
=Ρ⋅=−−−−+∈Ρ=
kk
initclk
holdsetupk
kkkpsetupphold
DDdT
ttD
DDDDttTDttTt
)()(
)()],[((1)
14
where {Dk} is the set of possible glitch durations, Tclk is the clock period, dinit is the initial
glitch duration, and t is uniformly distributed in the interval [T, T + Tclk – dinit].
Equation (1) is the worst-case derivation where [T + thold – tp – D, T – tsetup – tp] lies in [T,
T + Tclk – dinit], leading to the largest overlap between two intervals. In other words, the error
probability obtained by Equation (1) provides an upper bound on SER analysis. To find out
the possible values for duration, {Dk}, the attenuation model in [20] depending mainly on
gate propagation delay is used. To determine the probability of having a glitch with duration
Dk, the authors of [14][15] employ binary decision diagrams (BDDs) and algebraic decision
diagrams (ADDs). The detailed methodology of [14][15] is described next.
Terminal node “0” of the ADD associated with a gate represents all cases where a glitch
is logically or electrically masked; other terminal nodes represent the remaining values for
duration or amplitude after a glitch passes through the gate. The initial ADD of each gate is
built for the glitch originating at that gate. It consists of only one terminal node – initial
duration or amplitude value. These initial ADDs are propagated to respective fanout gates,
which use them to create new ADDs based on the attenuation model and related sensitization
BDDs.
15
Sensitization BDDs include information about logical masking. The sensitization BDD
of gate G to gate G’ is just the Boolean difference of G’ with respect to G (∂G’/∂G). Input
vectors that make the sensitization BDD of path G G’ go to terminal node “0” logically
mask glitches from gate G at gate G’. Therefore, only paths ending up in terminal node “1”
of the sensitization BDD and a node different from “0” of the associated ADD, need to be
considered for calculating new values relying on the attenuation model. All other cases,
which indicate either logical or electrical masking, go to terminal node “0”. Figure 2-2 dem-
onstrates the overall process of building duration ADDs for a glitch originating at gate G2 in
Figure 2-1.
Figure 2-1: An example circuit (C17) from the ISCAS’85 benchmark suite
16
Based on Equation (1), a key metric, mean error susceptibility (MES), for evaluating the
soft error rate of a circuit can be defined as follows: For each primary output Fj, initial dura-
tion d and initial amplitude a, MES(Fj) is the probability of output Fj failing due to errors at
internal gates. More formally, MES(Fj) can be expressed as:
fG
n
k
n
iij
adj nn
adglitchinitfailsGfailsFF
f G
⋅
=∩Ρ=∑∑
= =1 1,)),(_(
)MES( (2)
where nG is the cardinality of the set of internal gates in the circuit, {Gi}, and nf is the cardi-
nality of the set of input probability distributions, {fk}.
In [14], the authors compute the MES value of each primary output in combinational
logic for a discrete set of pairs (d, a) of initial glitch durations and amplitudes. Therefore, the
probability of output Fj failing due to glitches with various durations and amplitudes at dif-
Figure 2-2: Duration ADDs for a glitch originating at gate G2, and passing through gates G3 and G5, respectively
17
ferent internal gates is:
∑∑ Δ⋅+Δ⋅+
−⋅−Δ⋅Δ=Ρ
n m
anadmdjj F
aaddadF )(MES
)()()( minmin ,
minmaxminmax(3)
Finally, the soft error rate (SER) of primary output Fj can be derived as:
CIRCUITEFFPH ARR)()(SER ⋅⋅⋅Ρ= jj FF (4)
where RPH is the particle hit rate per unit of area, REFF is the fraction of particle hits that
result in charge disturbance, and ACIRCUIT is the total silicon area of the circuit.
2.1.1 Problem Statement
Typically, two types of methods are used for soft error hardening, namely, SER reduc-
tion. The first one, fault avoidance, consists in minimizing the occurrence of SETs at most
sensitive nodes, which in effect reduces SET generation. The second one, fault correction,
attempts to maximize the probabilities of three masking mechanisms, which reduces the
likelihood of generated SETs being latched. The objective of our SER reduction framework,
as explained later in Chapter 3 - Chapter 5, is to achieve the highest level of soft error toler-
ance by enhancing the circuit robustness/resilience to SETs/SEUs while incurring relatively
low design penalties. On one hand, belonging to the first category (fault avoidance), we
manipulate the operating voltage for smaller SET generation so as to make circuits more
18
robust to particle hits. On the other hand, belonging to the second category (fault correction),
we modify the logic structure and clock network for higher masking probabilities so as to
make circuits more resilient to already-existing SETs and SEUs as a result of particle hits.
2.1.2 Prior Work on SER Reduction (for Soft Error Tolerance)
Triple modular redundancy (TMR), consisting of three identical copies of an original
circuit feeding a majority voter, is the most well-known technique to realize soft error toler-
ance. But TMR is extremely expensive and not necessary for transient faults. To reduce the
overall cost, partial duplication [21] and gate resizing [22] strategies target only nodes with
high error susceptibility and ignore nodes with low error susceptibility. A potentially large
overhead in area and power is still needed for a higher degree of soft error tolerance. In [23],
voltage assignment is exploited to enhance the circuit robustness to soft errors. This method
trades power penalty for SER reduction by applying a higher supply voltage to a certain
portion of gates. A related method [24] uses optimal assignments of gate size, supply voltage,
threshold voltage, and output capacitive load to get better results with smaller area overhead.
Nevertheless, such a method increases design complexity and may make resulting circuits
hard to optimize at the physical design stage. Approaches based on rewiring or resynthesis
19
[25][26] can achieve relatively smaller SER improvement while incurring little overhead.
Sequential circuits, as opposed to combinational circuits, have received less attention in
terms of soft error tolerance. Since a sequential circuit has a feedback loop leading back to
state inputs of the circuit, it is possible that errors latched at state lines propagate through the
circuit for multiple clock cycles. Therefore, SER-aware sequential circuit optimization
should consider transient faults during successive cycles. The intuitive way to address this
problem is by replacing sequential elements with hardened latches or flip-flops that are less
sensitive to soft errors, as developed in [27]. A flip-flop sizing scheme [28] increases the
probability of latching-window (timing) masking by lengthening the latching window inter-
vals of vulnerable flip-flops. However, this scheme does not take into account logical mask-
ing and electrical masking, which are also important factors in determining circuit SER. To
deal with this, the authors of [29] proposed a hybrid approach combining gate and flip-flop
sizing (selection) to obtain more SER reduction. In [30], gates are locally relocated such that,
for each gate, delays to different outputs are balanced as much as possible. In effect, this
strategy minimizes the probability that an error originating at a gate is registered by any of
the flip-flops. The error, however, may reach more than one output simultaneously due to
balanced path delays and be registered by multiple flip-flops, resulting in so-called multi-
20
ple-bit upsets (MBUs). For sequential circuits, MBUs imply that there will be multiple errors
propagating in subsequent cycles, further degrading circuit reliability. This is a crucial reli-
ability concern in sequential circuits that has not been addressed so far.
Instead of exploring spatial redundancy as mentioned above, several techniques for soft
error hardening based on temporal redundancy were presented in [31][32]. Nevertheless,
such techniques employing time-domain majority voting are very sensitive to delay varia-
tions and fail to cope with large-duration SETs because a sufficiently large slack time is
required.
2.2 Negative Bias Temperature Instability (NBTI) Modeling and
Analysis
The NBTI modeling and analysis framework used in this work is the one developed in
[12][13][33][34]. The framework provides a mathematical model, taking into account both
aging and recovery mechanisms, for predicting the long-term PMOS degradation due to
NBTI.
21
First, the degradation of threshold voltage at a given time t can be predicted as:
n
nt
clkvth
TKV
2
2/1
2
1 ⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
−⋅⋅
=Δβ
α (5)
where Kv is a function of temperature, electrical field, and carrier concentration, α is the
stress probability, n is the time-exponential constant, 0.16 for the used technology, and
tCt
TCt
ox
clket ⋅+
−⋅⋅⋅+⋅−=
2)1(2
1 21 αξξβ
The detailed explanation of each parameter can be found in [33].
Next, the authors of [34] simplify this predictive model to be:
nnnth tbtbV )( ⋅⋅=⋅⋅=Δ αα (6)
where b = 3.9×10-3 V·s-1/6.
Finally, the rising propagation delay of a gate through the degraded PMOS can be de-
rived as a first-order approximation:
npp ta )( ⋅⋅+=′ αττ (7)
where ιp is the intrinsic delay of the gate without NBTI degradation and a is a constant.
We apply Equation (7) to calculate the delay of each gate under NBTI, and then estimate
the performance of a circuit. The coefficient a in Equation (7) for each gate type and each
22
input pin is extracted by fitting SPICE simulation results in 65nm, Predictive Technology
Model (PTM) [35]. The simplified model successfully analyzes the long-term behavior of
NBTI-induced PMOS degradation with negligible error, within 5% versus the cycle-by-cycle
(short-term) simulation. Hence, the performance (timing) estimation in our methodology is
more accurate and efficient than those in existing techniques which ignore the recovery
mechanism or employ expensive cycle-by-cycle simulations. For more details about this
mathematical NBTI model, please refer to [12][13][33][34].
2.2.1 Problem Statement
The objective of our NBTI mitigation framework, as explained later in Chapter 6 -
Chapter 8, is to minimize the circuit delay under NBTI over 10 years while incurring as little
area overhead as possible. We manipulate stress probabilities by using logic restructuring
and pin reordering such that NBTI effects on those gates (transistors) along timing-critical
paths can be reduced. Subsequently, transistor resizing is integrated for further reduction in
NBTI-induced performance degradation, with less design penalty than stand-alone
NBTI-aware resizing, especially when path sensitization is considered for more accurate
optimization.
23
2.2.2 Prior Work on NBTI Mitigation (against NBTI-Induced Performance
Degradation)
Traditional design methods add guard-bands or adopt worst-case margins to account for
aging phenomena, which in practice imply over-design and may be cost-expensive. To avoid
overly conservative design, the mitigation of NBTI-induced performance degradation can be
formulated as a timing-constrained area minimization problem with consideration of NBTI
effects. Recent NBTI-aware techniques basically follow this formulation. The authors of [36]
proposed a gate sizing algorithm based on Lagrangian relaxation (LR). The LR-based algo-
rithm determines the optimal values of gate sizes, which are assumed to be continuous, by
solving a non-linear area minimization problem. An average of 8.7% area penalty is required
to ensure reliable operation for 10 years. Other methods related to gate sizing can be found in
[37][38][39].
A novel technology mapper considering signal probabilities for NBTI was developed in
[40]. This technique first characterizes each gate in a given standard cell library in terms of
its NBTI impact, as a function of its input signal probabilities. Then, the technology mapper
takes signal probabilities as one of the arguments when searching for the best matching in the
library. About 10% area recovery and 12% power saving are accomplished, as compared to
24
the most pessimistic case assuming static NBTI on all PMOS transistors in a design. In [41],
a reconfigurable flip-flop design based on time borrowing is introduced for aging detection
and correction. Among all of the aforementioned approaches, only the one in [39] considers
path sensitization for more accurate optimization. However, the approach involves path
enumeration (on a path-wise basis) of exponential complexity and is not scalable for large
benchmarks.
Instead of reducing NBTI effects during active mode as described above, an idea of
NBTI-aware optimization during standby mode was presented in [42]. Input vectors for
minimum standby-mode leakage are selected to minimize PMOS aging. Moreover, for gates
that are deep in a large circuit and cannot be well controlled by primary input vectors, inter-
nal node control [43] intrusively assigns logic “1” to those gates if they are on the critical
paths. The logic “1” relaxes the stress condition and can thus relieve the NBTI impact. In
[44], power gating (PG) is exploited for aging optimization by shutting off the power supply
to a circuit. However, the continuous Vth degradation of sleep transistors during active mode
in the case of header-based PG design is ignored in [42][43][44].
26
Chapter 3 SER Reduction via Redundancy Addition and Removal (RAR)
Before introducing the proposed SER reduction approaches, we define two metrics as-
sociated with SER analysis in the sequel. The first, mean error impact (MEI), characterizes
each gate in terms of its contribution to the overall SER; the second, mean masking impact
(MMI), characterizes each gate in terms of its capability of filtering glitches propagated
through its inputs.
Definition 1 (mean error impact): For each internal gate Gi, initial duration d and initial
amplitude a, mean error impact (MEI) over all primary outputs Fj that are affected by a
glitch occurring at the output of gate Gi is defined as:
fF
n
k
n
jij
adi nn
adglitchinitfailsGfailsFG
f F
⋅
=∩Ρ=ΜΕΙ∑∑
= =1 1,
)),(_()( (8)
27
where nF is the cardinality of the set of primary outputs in the circuit, {Fj}, and nf is the
cardinality of the set of input probability distributions, {fk}.
The MEI value of a gate quantifies the probability that at least one primary output is af-
fected by a glitch originating at this gate. The larger MEI a gate has, the higher the probabil-
ity that a glitch occurring at this gate will be latched. This implies that those gates with
higher MEI make the circuit more vulnerable to soft errors. Thus, it is beneficial for SER if
gates with large MEI are removed from the circuit.
We need the following notations for defining mean masking impact.
D(Gi): the attenuated duration of a glitch at gate Gi
C(Gi): the set of gates in the fanin cone of gate Gi
F(Gi): the set of gates in the immediate fanin of gate Gi
p(Gj, Gi): the set of gates on the paths between gates Gj and Gi
Definition 2 (mean error impact): For each internal gate Gi, initial duration d and initial
amplitude a, we define mean masking impact on duration (MMID) as:
dnn
GGG
fG
n
k
n
ji
adj
adi
f G
⋅⋅
→ΜΙ=ΜΜΙ∑∑
= =1 1
,D
,D
)()( (9)
28
where nG is the cardinality of C(Gi), nf is the cardinality of the set of input probability distri-
butions, {fk}, and MID(Gjd,a → Gi), masking impact on duration of gate Gi with respect to
(w.r.t.) gate Gj, denotes the absolute duration attenuation contributed by gate Gi on a glitch
with duration d and amplitude a originating at gate Gj. MID(Gjd,a → Gi) can be formally
defined as:
( )
( )∑ ∑
∑
∩∈
−⋅=∩=Ρ−
−⋅=∩=Ρ=
→ΜΙ
),(p)(F
,D
)()),(_)(D(
)()),(_)(D(
)(
ijil GGGG kkjkl
kkjki
iad
j
DdadglitchinitfailsGDG
DdadglitchinitfailsGDG
GG
(10)
where {Dk} is the set of possible values for glitch duration, as in Equation (1). The second
summation represents the total weighted attenuation attributed to gate Gi’s immediate fanin
gates on the paths between gates Gj and Gi, instead of just gate Gi itself. Intuitively, MID(Gjd,a
→ Gi) quantifies how much attenuation can be contributed to gate Gi only, on the duration of
glitches originating at gate Gj.
Example: In Figure 2-1, assume only one set of input probability distributions is applied to
the circuit: {P1 = 0.5, P2 = 0.5, P3 = 0.5, P4 = 0.5, P5 = 0.5} where Pi is the probability of
logic “1” for the ith primary input. The duration ADDs associated with mean masking impact
on duration of gate G5 are shown in Figure 3-1, where those values for attenuated duration in
the terminal nodes are assigned arbitrarily for the sake of simplicity. In the real case, the
29
values are found using the attenuation model presented in [20]. Given initial duration d and
initial amplitude a, the mean masking impact on duration of G5, MMID(G5d,a), is computed as
follows. Since there are three gates G1, G2 and G3 in G5’s fanin cone, there will be three
masking impact values for MMID(G5d,a).
According to Figure 3-1(a), the masking impact on duration of gate G5 w.r.t. gate G1 is:
dddd
dddADD
dddADDdADD
GG
G
GGGG
ad
127)
32(
85)0(
83
)()(
)32()
32()0()0(
)(
1
5151
5,
1D
=−+−=
−⋅→Ρ−
−⋅→Ρ+−⋅→Ρ=
→ΜΙ
→→
(11)
According to Figure 3-1(b), the masking impact on duration of gate G5 w.r.t. gate G2 is:
(a) Duration ADDs for path G1 G5 (b) Duration ADDs for path G2 G3 G5
(c) Duration ADDs for path G3 G5
Figure 3-1: Duration ADDs associated with mean masking impact on duration of gate G5
30
ddd
dddddd
dddADDdADD
dddADDdADD
GG
GGGG
GGGGGG
ad
61
32
65
)32(
21)0(
21)
94(
83)0(
85
)32()
32()0()0(
)94()
94()0()0(
)(
3232
532532
5,
2D
=−=
−−−−−+−=
−⋅→Ρ−−⋅→Ρ−
−⋅→Ρ+−⋅→Ρ=
→ΜΙ
→→
→→→→
(12)
According to Figure 3-1(c), the masking impact on duration of gate G5 w.r.t. gate G3 is:
dddd
dddADD
dddADDdADD
GG
G
GGGG
ad
21)
32(
43)0(
41
)()(
)32()
32()0()0(
)(
3
5353
5,
3D
=−+−=
−⋅→Ρ−
−⋅→Ρ+−⋅→Ρ=
→ΜΙ
→→
(13)
One can note that the gate at which a glitch originates has no masking impact on that
glitch. In Equation (12), the third and fourth terms are the amount of attenuation attributed to
gate G3 and should be subtracted. By Equation (9), we can obtain the mean masking impact
on duration of gate G5:
125
321
61
127
3)()()(
)(
5,
3D5,
2D5,
1D
,5D
=++
=
→ΜΙ+→ΜΙ+→ΜΙ=
ΜΜΙ
d
ddd
dGGGGGG
Gadadad
ad
(14)
Similarly, we can also define mean masking impact on amplitude (MMIA) by replacing
the normalization factor, d, in (3) with the initial amplitude, a, and {Dk} in (4) with {Ak}, the
set of possible values for glitch amplitude. Basically, the associated amplitude ADDs for
mean masking impact on amplitude of gate G5 are isomorphic to those duration ADDs in
31
Figure 3-1. The only difference is in the values of terminal nodes. As a result, the way to
compute mean masking impact on amplitude of G5 is the same as shown in the above exam-
ple – except that one has to replace the attenuated duration (Dk) with the attenuated amplitude
(Ak). We found that the duration of a glitch is proportional to the probability of a soft error
being registered, but the amplitude of a glitch is not. Therefore, it makes sense to use only
mean masking impact on duration (MMID) as a guideline for SER reduction.
The MMI value of a gate, defined by Equation (9) and shown in the above example,
denotes the normalized expected attenuation on the duration (or amplitude) of all glitches
passing through the gate. Every MMI value ranges from 0 to 1 as a result of normalization.
The larger MMI a gate has, the more capable of masking glitches this gate is. A gate with
MMI equal to 0 will not attenuate any glitch at all; in contrast, a gate with MMI equal to 1
will entirely mask glitches passing through it. This implies that those gates with higher MMI
make the circuit more robust to soft errors. In general, high MMI of a gate is due to its large
gate delay or considerable effect of logical masking on the gate. Thus, it is also beneficial for
SER if gates with large MMI are kept in the circuit.
32
3.1 RAR-Based Approach for SER Reduction
In this subchapter, we present our SER reduction approach based on redundancy addi-
tion and removal (RAR). RAR is a logic minimization technique which performs a series of
wire/gate addition and removal operations by searching for redundant wires/gates in a circuit.
Candidate wires for addition can be identified according to the mandatory assignments made
during automatic test pattern generation (ATPG). Mandatory assignments [45] are those
value assignments which are required for a test to exist and must be satisfied by any test
vector. For example in Figure 3-2(a), the mandatory assignments for gate G6 stuck-at-1 fault
are {f = 1, G3 = 1, G4 = 0, G6 = 0}, from which we can get the implications {d = 0, G1 = 0, G2
= 0, G5 = 0}. If a wire from gate G5 to gate G9 is added into the circuit, there will be a con-
flicting assignment because gate G5 should be set to be “1” to make gate G6 stuck-at-1 fault
observable at outputs. So wire G5 G9 is a candidate for wire addition.
One still needs to check if the candidate wire is indeed redundant; i.e., the wire does not
change the circuit functionality. In the above example, wire G5 G9 is redundant. The newly
added wire could cause one or more existing irredundant wires to become redundant (re-
movable). ATPG is again used for redundancy checking of each wire except the one just
inserted (e.g., wire G5 G9 in Figure 3-2(b)) by finding compatible mandatory assignments.
33
If a set of mandatory assignments for a wire cannot be derived, the wire is said to be redun-
dant and can be removed. Consider the same example in Figure 3-2: after adding wire G5
G9 into the circuit, wires G1 G4 and G6 G7 become redundant as compatible mandatory
assignments do not exist for both of them. So they can be removed, as shown in Figure
3-2(b).
Note that gates with only one fanin and gates without fanout can also be deleted. Figure
3-2(c) shows the resulting circuit after redundancy removal. The circuit becomes smaller if
the removed redundancies are more than the added redundancies. For the goal of logic opti-
mization, the wire addition and removal procedures iterate until no further improvement can
be found.
For our objective of SER reduction, using RAR in an unsystematic manner may increase
SER by reducing the number of gates or the depth of circuits: a smaller gate count will affect
(a) The original circuit (b) The circuit after redundancy
addition (G5 G9)
(c) The circuit after redundancy
removal (G1 G4 and G6 G7)
Figure 3-2: An example of redundancy addition and removal [46]
34
the impact of logical masking, while smaller logic depth will reduce the impact of both logi-
cal and electrical masking. The basic principle of our RAR-based approach is to keep
wires/gates with high masking impact and to remove wires/gates with high error impact.
The RAR technique has two major parts: wire addition and wire removal. Each wire ad-
dition step is followed by a wire removal step, irrespective of whether or not there are any
removable wires other than the added one available. For logic minimization, where the goal
is the total literal count, it is easy to track the change in the number of literals after an itera-
tion of addition and removal by simply calculating the number of added and removed
wires/gates. However, for SER reduction, it is not efficient to track the change in the soft
error rate of a circuit by re-computing it every time. Instead, during each step of wire addi-
tion/removal, we define criteria or constraints to guide us in the wire addition/removal proc-
ess and check whether the step is advantageous for SER reduction.
Several constraints on the RAR algorithm are introduced to ensure that our proposed
approach can significantly mitigate the soft error rate of a logic circuit. In the beginning of
this chapter, we have demonstrated the relationship between MEI/MMI and circuit vulner-
ability/robustness. Intuitively, one can use MEI and MMI as metrics to guide RAR toward
SER reduction.
35
3.1.1 Wire Addition Constraint
Let wire w (s t) be an addible (redundant) candidate wire whose source node is gate s
and destination node is gate t, as shown in Figure 3-3. The following three effects take place
after adding wire w into the circuit:
1) The MEI values of gate s and its fanin neighbors are likely to increase because the new
connection w from gate s to gate t provides an additional path for propagating erroneous
values to primary outputs.
2) The MEI values of fanin neighbors of gate t are likely to decrease because, to a certain
extent, the new connection w logically masks glitches from those fanin neighbors. The
MEI values of some gates which are in the fanin cones of both gates s and t may in-
MEI(s), MEI(a), MEI(b), and MEI(fanin neighbors of gates a and b) ↗ (ADVERSE!)
MMI(t) ↗
MEI(c), MEI(d), and MEI(fanin neighbors of gates c and d) ↘
Figure 3-3: Changes in MEI and MMI after adding wire w (s t)
36
crease, but these increases are incorporated into effect 1) above.
3) The MMI value of gate t becomes larger due to increased logical masking and propaga-
tion delay. The MMI values of fanout neighbors of gate t may also change (increase or
decrease), but these changes will not degrade the circuit robustness since fewer glitches
(with smaller duration and amplitude) pass through gate t.
Based on the definitions of MEI and MMI, the first effect (shown within the highlighted
region in Figure 3-3) is adverse, but the second and third ones are beneficial for SER reduc-
tion. Hence, we introduce a constraint to minimize the adverse effect.
Constraint 1 (wire addition constraint): Wire w (s t) can be added into the circuit if
MEI(t) < T1 and MMID(t) > T2 where T1 and T2 are pre-specified thresholds.
Intuitively, those wires having small MEI and large MMID for their destination gates can
be added. This constraint will keep gates with large MMI in the circuit. To simplify the
following discussion without loss of generality, we omit initial duration d and amplitude a
from the notations of MEI (Equation (8)) and MMI (Equation (9)), but keep in mind that they
actually exist.
37
After adding wire w into the circuit, no matter how small MEI(s) is, a complete glitch
with the initial duration and amplitude is propagated from gate s to gate t once an effective
particle strikes gate s. That is, the resulting increase in error impact of gate s due to glitches
propagated along the new connection w does not depend on MEI(s). More precisely, assume
that the initial duration of a glitch occurring at gate s is d. After passing through gate t, the
attenuated duration of the glitch can be quantified as:
[ ])(1 D tdd ΜΜΙ−⋅=′ (15)
If d’ is smaller than or equal to the sum of setup and hold times, the glitch will be
masked; otherwise, the increase in MEI(s) due to the addition of wire w is estimated to be:
[ ]
[ ])(1)(
)(1)()()(
D
D
ttd
tdtddts
ΜΜΙ−⋅ΜΕΙ=
ΜΜΙ−⋅⋅ΜΕΙ=′
⋅ΜΕΙ=ΔΜΕΙ(16)
This observation is based on the fact that the duration of a glitch (if large enough) is
proportional to the probability of the glitch being latched. From Equation (16), one can
minimize the increases in the MEI values of gate s and its fanin neighbors by specifying a
sufficiently small T1 and a sufficiently large T2 for MEI(t) and MMID(t), respectively. Al-
though we can also specify additional thresholds for MEI(s) and MMID(s) to further mini-
mize the increases in the MEI values of those fanin neighbors, doing so greatly restricts the
38
search space for RAR and typically, does not lead to better results.
3.1.2 Wire Removal Constraint
Let wire w’ (u v) be a removable (redundant) candidate wire whose source node is
gate u and destination node is gate v, as shown in Figure 3-4. Three other effects take place
after removing wire w’ from the circuit:
1) The MEI values of gate u and its fanin neighbors are likely to decrease because errone-
ous values propagated along the removed connection w’ from gate u to gate v are elimi-
nated.
2) The MEI values of fanin neighbors of gate v are likely to increase because logical
MEI(u), MEI(a), MEI(b), and MEI(fanin neighbors of gates a and b) ↘
MMI(v) ↘(ADVERSE!)
MEI(c), MEI(d), and MEI(fanin neighbors of gates c and d) (ADVERSE!)↗
Figure 3-4: Changes in MEI and MMI after removing wire w’ (u v)
39
masking impact at gate v is decreased by the removal of wire w’.
3) The MMI value of gate v becomes smaller due to decreased logical masking. At the
same time, the MMI values of fanout neighbors of gate v may also change (increase or
decrease).
Based on the definitions of MEI and MMI, the first effect is beneficial, but the second
and third ones (shown within the highlighted region in Figure 3-4) are adverse for SER re-
duction. Hence, we set up two additional constraints: one is to maximize effect 1), the other
to minimize effects 2) and 3).
Constraint 2 (wire removal constraint I): Wire w’ (u v) can be removed from the circuit
if MEI(v) > T3 ≧ T1 and MMID(v) < T4 ≦ T2 where T3 and T4 are pre-specified thresholds.
Intuitively, those wires having large MEI and small MMID for their destination gates can
be removed. This constraint will try to remove gates with large MEI from the circuit. Again,
without loss of generality, we omit initial duration d and amplitude a, which actually exist,
from the notations of MEI and MMI. Similar to the argument for Equation (16), the decrease
in MEI(u) due to the removal of wire w’ is estimated to be:
[ ])(1)()( D vvu ΜΜΙ−⋅ΜΕΙ=ΔΜΕΙ (17)
40
From Equation (17), one can maximize the decreases in the MEI values of gate u and its
fanin neighbors by specifying T3 and T4 where T3 ≧ T1 and T4 ≦ T2. The lower bound for T3
and the upper bound for T4 are set such that we can gain more from wire removal (e.g.,
ΔMEI(u) in Equation (17)) than lose from wire addition (e.g., ΔMEI(s) in Equation (16)).
Constraint 3 (wire removal constraint II): Wire w’ (u v) can be removed from the circuit
if ( ) cvTvcvu <=Ρ )(ˆ across all probability distributions where u is the output value of gate
u, cv(v) is the controlling value of gate v, and Tcv is a pre-specified threshold.
The necessary condition of logical masking at gate v is that at least one of the side in-
puts must be the controlling value of gate v, expressed by cv(v). Side inputs are those inputs
on which no glitch is propagated. For instance, gate v in Figure 3-4 is assumed to be an OR
gate (i.e., cv(v) = 1). If a glitch is propagated from gate c to gate v and the output value of
gate u is “1”, the glitch will be logically masked by the controlling value “1” from gate u.
The higher probability of going to cv(v) gate u has, the more likely glitches from gate v’s
fanin gates (except gate u itself) will be logically masked at gate v. Therefore, this constraint
is introduced to minimize the loss on logical masking as a result of wire removal. When
( ))(ˆ vcvu =Ρ is large, wire w’ (u v) plays an important role in logically masking glitches at
gate v and should not be removed.
41
Furthermore, for some added wires, there may be more than one corresponding remov-
able wires which are mutually irredundant and cannot be removed together. In other words,
removing one redundant wire will cause another one(s) to become irredundant. We sort these
removable wires by the MEI values of their source gates, from the largest to the smallest. The
removable wire with the largest MEI value for its source gate will be removed first. We can
thus further maximize the beneficial effect 1) of wire removal and potentially remove gates
with large MEI.
3.1.3 Topology Constraint on Candidate Addition and Removal
Two types of mandatory assignments (MAs) are distinguished in the original RAR paper
[45]: backward MA and forward MA. If a mandatory assignment of gate G is obtained by
backward implication from G’s fanout gates, the mandatory assignment is a backward MA. If
a mandatory assignment of gate G is obtained by forward implication from G’s fanin gates,
the mandatory assignment is a forward MA. Assume that a pair of candidate wires for addi-
tion wa (s t) and for removal wr (u v) are extracted. Gate t either (i) has a backward MA
due to the redundancy checking of wire wr, or (ii) has to be a dominator of gate v along with
a forward MA. Here, gate D is said to be a dominator of gate G with respect to output O iff
42
all paths from G to O must pass through D. Also, we say that D dominates G or G is domi-
nated by D, with respect to O. For example in Figure 3-5(b), gate G7 is a dominator of gate
G2 w.r.t. output y while gate G6 is not, since G6 does not have to lie on the paths from G2 to y
(e.g., G2 G5 G7 y).
The aforementioned three constraints focus on finding redundant wires for addition and
removal such that the positive influences on circuit SER (e.g., ΔMEI(u) in Equation (17)) are
greater than the negative influences (e.g., ΔMEI(s) in Equation (16)). To satisfy ΔMEI(u) >
ΔMEI(s), however, these constraints filter out most candidate pairs falling into the second (ii)
category described above. The reason is that a dominator, which is closer to primary outputs,
has higher MEI than the gate being dominated [15]. Let wire wa (s t) for addition and wire
wr (u v) for removal be a candidate pair where gate t is a dominator of gate v. As in [45],
wires wa and wr are regarded as alternatives of each other and supposed to be implemented
(a) The original circuit with candidate
wire wa (G2 G5) for addition
(b) The circuit with candidate wire wr
(G1 G3) for removal after adding wire wa
(c) The resulting circuit
after removing wire wr
Figure 3-5: An example of Constraint 4 and the effect of redundancy on soft error robustness
43
together. Since gate t is a dominator and thereby MEI(t) is usually larger than MEI(v),
ΔMEI(s) in Equation (16) will be easily larger than ΔMEI(u) in Equation (17), too. More
specifically, given MEI(t) > MEI(v), T1 and T3 used in Constraint 1 and Constraint 2, MEI(v)
> T3 ≧ T1 > MEI(t) is never true, meaning that two constraints may not be met simultane-
ously and this pair of candidate wires (wa and wr) will be discarded. But such a pair is not
always adverse for SER. To keep such potential redundancy manipulations and explore more
solution space for our methodology, we introduce the last constraint.
Constraint 4 (topology constraint): Given candidate wire wa (s t) for addition and wire wr
(u v) for removal, the addition and removal steps can be performed together if gate t is a
dominator of gate v and also a dominator of gate u assuming wa and wr have been imple-
mented already.
Consider the circuit in Figure 3-5 where wire wa (G5 G7) is an alternative of wire wr
(G2 G4), which suggests that wa can be added for removing wr, as shown from Figure
3-5(a) to Figure 3-5(c). That is, wires wa and wr are recognized as a pair of candidates for
addition and removal, respectively. In this example, wire wa’s destination node, G7, is a
dominator of both gates G2 and G4 (as in Figure 3-5(c), after the current RAR operations).
Therefore, it is very likely that removal-of-wr-induced adverse impact, stemming from gates
44
G2 and G4, will be blocked at dominator G7 due to the addition of wire wa, which reflects
more logical masking, larger propagation delay and in effect, more electrical masking. This is
basically true if wire wa can be used to realize a more complex logic cell at gate G7 with
longer delay [47]. For instance, gate G7 in Figure 3-5(c) can be remapped with wire wa to a
3-input AND whose delay is 43.33ps, while its original realization (without wa) in Figure
3-5(a), a 2-input AND, has a delay of 34.67ps. The delay numbers are found using logical
effort [48] in 70nm Predictive Technology Model (PTM). As mentioned earlier, high MMI
results from large propagation delay or considerable logical masking. We can thus expect to
see a significant increase in the MMI value of gate G7, which has been known as a dominator
and will stop more error impact from those gates being dominated.
Note that one still needs to quantitatively check if such a pair of redundancy manipula-
tions is indeed beneficial. An extended strategy of estimation from Equations (16) and (17) is
discussed as follows. The basic idea is to look at the dominator only. In the case exemplified
by Figure 3-5, we check whether or not gate G7, given the addition of wire wa, is powerful
enough to block additional error impact as a result of wa-addition and wr-removal. More
precisely, the following steps need to be followed:
1) Update MMID(G7) locally and incrementally: To do this, we first renew the propagation
45
delay of gate G7, and apply the new delay on the attenuation model to recalculate
non-zero terminal nodes of those ADDs which have been propagated to G7. Next, the
ADD structures also need to be transformed; these transformations can be accomplished
incrementally because wire wa brings supplementary patterns of logical masking without
shrinking the original, i.e., one-way expansion of logical masking patterns. Then, we
propagate ADDs from gate G5 to gate G7 (along wire wa) and compute corresponding
new ADDs attenuated by G7. Finally, updated MMID(G7), denoted by MMID’(G7), can
be obtained.
2) Calculate the changes in MEI of gate G7’s immediate fanin neighbors, namely,
ΔMEI(G3), ΔMEI(G5), and ΔMEI(G6):
⎪⎪⎩
⎪⎪⎨
⎧
⎥⎦⎤
⎢⎣⎡ −′⋅=
ΔΔ
⎥⎦⎤
⎢⎣⎡ ′−⋅=Δ
)(MMI)(MMI)(MEI)(MEI)(MEI
)(MMI1)(MEI)(MEI
7D7D76
3
7D75
GGGGG
GGG
(18)
where ΔMEI(G3) and ΔMEI(G6) are advantageous and ΔMEI(G5) is disadvantageous.
The cumulative estimation of absolute MEI changes is:
∑Δ
)Δ+)Δ−)Δ=MEI
653 (MEI(MEI(MEI GGG (19)
For the same reason as in Constraints 1 and 2, those gates beyond the first-level (imme-
46
diate) fanin of the dominator are not taken into account in order to relax the restriction
on RAR, reduce the computational complexity and keep our methodology tractable.
This heuristic of considering only immediate fanin gates is experimentally verified to be
representative enough for analysis and estimation of impact on circuit SER.
3) Evaluate the validity of this candidate pair (wire wa for addition and wire wr for re-
moval):
⎪⎩
⎪⎨⎧ ≥∑
Δ
ra
ra
ww
ww
and discard Otherwise,
and accept ,0 IfMEI (20)
In wire addition and removal constraints, MMI does not require updating due to the fol-
lowing reasons: ΔMEI(s) in Constraint 1 is the worst-case (pessimistic) estimation, which is
reasonable to be used for estimating adverse effects; ΔMEI(u) in Constraint 2 is the aver-
age-case estimation, which is just suitable for estimating beneficial effects. In Equation (18),
ΔMEI(G3) and ΔMEI(G6) both belong to the second effect of wire addition. We do not con-
sider this beneficial effect when applying wire addition constraint (Constraint 1) so MMI
updating is not necessary. As for wire removal constraint, Constraint 3 has been introduced to
assist Constraint 2 in minimizing adverse effects of wire removal. Hence, we do not need
new MMI to estimate these effects, either.
47
Constraint 4 catches those candidates missed by the first three constraints, which allows
for higher likelihood to get a better solution. Table 3-1 lists the MEI and MMI values of gates
in Figure 3-5, where wires wa and wr are the current pair of candidates for addition and re-
moval, respectively. With a set of appropriate thresholds, it is obvious that wa and wr will be
filtered out by Constraints 1 and 2. However, this candidate pair satisfies Constraint 4 and
can be performed for a reduction of 33.2% in average MEI w.r.t. output y, equivalent to
33.2% reduction in SER of output y. As it can be seen, the MEI values of gates in the fanin
cone feeding wire wa (e.g., G2 and G5) increase marginally while those in the original fanin
cone of gate G7 (e.g., G3, G4 and G6) decrease significantly. According to the test results with
other benchmark circuits, most of the cases satisfying Constraint 4 are beneficial for SER as
Table 3-1: MEI and MMI of gates in Figure 3-5: the second and third columns are for gates in Figure 3-5(a),
the fourth and fifth for gates in Figure 3-5(b), and the sixth to eight for gates in Figure 3-5(c).
48
long as their dominators have sufficient (>20%) increases in MMI.
Constraint 4, exemplified by Figure 3-5, particularly distinguishes the proposed meth-
odology from [25][26]. As discussed earlier, circuit SER can benefit from redundancy ma-
nipulations satisfying Constraint 4 when the MMI values of those dominators increase sig-
nificantly. The increases in MMI result not only from more logical masking but also from
more electrical masking due to larger gate delay. In [25], electrical masking is not considered
at all so such potential rewiring operations will be discarded unless, in a few cases, the im-
pact of increased logical masking predominates. On the other hand, the greedy heuristic in
[25] processes wires as targets to be removed in decreasing order of sensitization probability
(Psens, only logical masking considered as well). However, wires/gates to be removed ac-
cording to Constraint 4 are those being dominated and always have small Psens, implying that
they are hardly targeted as candidates for removal.
In [26], the authors use a derating factor to account for electrical masking separately
beyond logical masking. Besides the SER overestimation because of separate treatment of
masking mechanisms [14], the use of a derating factor without a generalized attenuation
model cannot accurately reflect the effect of gate delay change on masking impact (MMI)
and thereby, will rarely catch the benefit of Constraint 4. One should note that the
49
comparison between our work and [26] is not perfect since the resynthesis technique (SiDeR)
in [26] actually adds new wires and gates without identifying and removing any possible
Algorithm 1: RAR-based SER reduction (circuit, T1, T2, T3, T4, Tcv)// T1-4 and Tcv: thresholds for Constraint 1 (T1-2), Constraint 2 (T3-4), and Constraint 3 (Tcv)01 Compute MEI and MMID for each internal gate in circuit; 02 WHILE (pair of candidate wires wa and wr identified by RAR) {
// wa for addition and wr for removal // Constraint 4: topology constraint, applied first
03 s source gate of wire wa; 04 t destination gate of wire wa; 05 u source gate of wire wr; 06 v destination gate of wire wr; 07 IF (gate t is not a dominator of both gate u and gate v) 08 GOTO notDominator; 09 IF (wires wa and wr performed for SER reduction, based on Equation (20)) { 10 Add wa into circuit; 11 Remove wr from circuit; 12 CONTINUE;
}
13 notDominator: // Wire addition procedure
14 IF ((MEI(t) ≧ T1) or (MMID(t) ≦ T2)) CONTINUE; // Constraint 1 15 Add wa into circuit;
// Wire removal procedure
16 gain 0; 17 sorted_wires Sort all removable wires due to the addition of wa
by the MEI values of their source gates, from the largest to the smallest; 18 FOR EACH (wire wr’ in sorted_wires) { 19 IF (wire wr’ is no longer redundant) CONTINUE; // mutual irredundant 20 u source gate of wire wr’; 21 v destination gate of wire wr’; 22 IF ((MEI(v) ≦ T3) or (MMID(v) ≧ T4)) CONTINUE; // Constraint 2 23 IF (P(gate u goes to cv(v)) ≧ Tcv) CONTINUE; // Constraint 3 24 Remove wr’ from circuit; 25 gain gain + 1;
}
26 IF ((gain > 0) or (MEI(t) is extremely small)) 27 Keep wa in circuit; 28 ELSE 29 Remove wa from circuit;
30 Update MEI and MMID for affected gates;
}
Figure 3-6: The overall algorithm of our RAR-based approach for SER reduction
50
hardware redundancies. Consequently, SiDeR cannot achieve such a case as in Figure 3-5
(i.e., the added wire lies on the critical path of the circuit) without increasing the circuit delay
if no wire is removed, while we can.
To wrap up four proposed constraints, our overall algorithm for RAR-based SER reduc-
tion is given in Figure 3-6. Note that Constraint 4 has to be applied prior to Constraints 1-3 in
order to ensure that beneficial redundancy manipulations satisfying Constraint 4 are not
discarded by the other three constraints.
3.2 Gate Resizing for SER Reduction
Up to this point, we have proposed a systematic algorithm based on RAR for SER re-
duction. One can note that this RAR-based approach aims at the combinational block of a
logic circuit, by manipulating MEI and MMI of internal gates. The motivation behind is to
keep wires/gates with high MMI and to remove wires/gates with high MEI.
In this subchapter, we illustrate the efficacy of gate resizing via MEI and MMI as
post-RAR SER optimization. Gate resizing for soft error robustness was first presented in
51
[22]. The strategy reduces circuit SER by ranking gates in increasing order of logical mask-
ing probability and then modifying the W/L ratios of transistors in gates whose logical
masking probabilities are within the lowest percentile. The logical masking probabilities are
extracted by running fault simulation, which involves an inevitable tradeoff between accu-
racy and efficiency. The authors of [22] take only logical masking into account since they
claim that the distribution of logical masking probabilities across all gates is highly asym-
metric, but electrical and latching-window masking probabilities do not exhibit a similar
asymmetry. Moreover, potentially large costs in area and power are incurred to harden a
circuit against large radiation-induced upsets.
In [14], a gate with MEI greater than a specified threshold is resized such that the same
amount of charge collection cannot produce an effective glitch at this gate, becoming im-
mune to soft errors. Consider the circuit in Figure 3-5(c) where the MEI values of gates over
all primary outputs are shown in the last column of Table 3-1. If the resizing threshold is 0.2,
gates G5, G7 and G8, which have MEI greater than 0.2, will be chosen for resizing. As op-
posed to [22], the resizing technique proposed in [14] considers three masking mechanisms
jointly via MEI and thus, can identify truly critical gates to resize in a more accurate manner.
We apply the similar resizing technique for additive SER improvement after a circuit is opti-
52
mized by our RAR-based approach. To compare the results fairly, the same threshold is
specified for stand-alone gate resizing and gate resizing as a post-RAR procedure. In Chapter
3.3, we will demonstrate that gate resizing is orthogonal to the proposed approach and can
provide additive benefits without affecting existing SER-aware optimization.
3.3 Experimental Results
We have implemented the RAR-based SER reduction framework in C/C++ and con-
ducted experiments on a set of benchmarks from the ISCAS and MCNC suites. The technol-
ogy used is 70nm, Predictive Technology Model (PTM) [35]. The clock period (Tclk) used for
probability computation by Equation (1) is 250ps, and the setup (tsetup) and hold (thold) times
of output latches are both assumed to be 15ps. The supply voltage is 1.0V. To calculate SER
by Equations (3) and (4), the allowed intervals for initial duration and amplitude are (dmin,
dmax) = (60ps, 120ps) and (amin, amax) = (0.8V, 1.0V) with incremental steps Δd = 20ps and Δa
= 0.1V, respectively.
For glitches with initial duration smaller than 60ps, the gates that will influence outputs
53
are mostly the output gates and their fanin gates. For glitches with initial duration greater
than 120ps, there are a considerable number of gates that will almost certainly have negative
impact on outputs. This is the reason we choose (60ps, 120ps) as duration sizes for our ex-
periments. The RPH used is 56.5 m-2s-1 and REFF is 2.2×10-5.
Table 3-2 reports the experimental results for SER reduction and area overhead. The
area numbers are found using SIS technology mapping tool with the MCNC library
(mcnc.genlib). For each benchmark listed in Table 3-2, various glitch sizes and different input
distributions are applied. We demonstrate the MES (Equation (2)) improvements from 60ps
to 120ps duration sizes, as shown in columns four and five. For circuit C432, which has 36
primary inputs, 7 primary outputs, and 156 internal gates, the average MES of the baseline
(original) circuit attacked by glitches with 60ps duration is 0.00357, while that of the radia-
tion-hardened version (optimized by our approach) is 0.00260. When the initial glitch dura-
tion is 120ps, the average MES values of the original and optimized circuits are 0.02954 and
0.02137, respectively. For initial glitches with small duration, the average MEI is small and
the average MMI is large. In this case, there are more candidate wires satisfying wire addi-
tion constraint (Constraint 1) than the case when the initial duration is large. Hence, more
added and removed wires can be expected. When considering all possible glitch sizes, in the
54
case of circuit C432, the total area overhead is 3.85% and the overall SER reduction is
29.63%. The absolute SER in FITs (failures-in-time) drops from 12.9 FITs to 9.1 FITs. On
average across all benchmarks, 22.76% SER reduction can be achieved with 3.54% area
Table 3-2: Average mean error susceptibility (MES) improvement and overall soft error rate (SER) reduction
55
overhead.
At the bottom of Table 3-2, we also report the results of two related SER reduction
frameworks using Rewiring [25] and resynthesis (Rewriting and SiDeR) [26]. Rewiring fol-
lows a greedy heuristic which performs every potential rewiring operation to see if the over-
all SER can be improved; Rewriting focuses on locally restructuring 4-input sub-circuits to
enhance soft error robustness, while SiDeR globally but monotonically adds wires and gates
without removing anything else. The methodology we present is guided, in a systematic and
less restricted manner, by the four constraints based on MEI and MMI.
We also perform experiments on the probabilities of output failure as in Equation (3)
over all primary outputs before and after optimization, as shown in Figure 3-7. In order to
Output failure probability
Primary output index
Figure 3-7: Output failure probabilities of all primary outputs before and after optimization
Output failure probability
Primary output index
(a) alu2 (b) x4
56
make the plots more readable, we sort all primary outputs according to their original prob-
abilities of output failure, from the smallest to the largest. As it can be seen, in both cases, a
maximum reduction of 35-70% is achieved in output failure probability.
Figure 3-8 compares three different infrastructures for SER-aware optimization: (i) our
proposed RAR-based approach, (ii) the gate resizing strategy proposed in [14], and (iii)
integrated RAR and gate resizing method where gate resizing is applied as a post-RAR pro-
cedure. We fix the initial glitch size to be (d, a) = (100ps, 1.0V). For (ii) and (iii) involving
gate resizing, the same threshold is applied on each listed benchmark in order for a fair com-
Figure 3-8: SER-aware optimization using: (i) the proposed RAR-based approach only (blue),
(ii) the gate resizing strategy only (purple), and (iii) integrated RAR and gate resizing methodology (yellow)
57
parison. As shown in Figure 3-8, post-RAR gate resizing can provide additive benefits on top
of our approach based on RAR. For most of the circuits, the combined MES improvement
(the yellow bar) is close to the sum of the other two (the blue and purple bars), which implies
that gate resizing does not affect existing SER-aware optimization by the proposed
RAR-based approach. In this experiment, we do not constrain the additional area introduced
by gate resizing and thus the area overhead may be relatively significant, ranging from 9%
(for circuit t481, less SER reduction achieved) to 16% (for circuit alu4, more SER reduction
achieved). The overhead of the combined algorithm is 17% on average. By adjusting the
threshold for gate resizing, we can always trade between area overhead and SER reduction.
3.4 Concluding Remarks
In this chapter, we propose a RAR-based SER reduction framework for combinational
circuits. Two metrics, mean error impact (MEI) and mean masking impact (MMI), are used
for accurate estimation of SER changes during RAR iterations. According to the estimation
through MEI and MMI, we introduce four constraints to guide the RAR technique toward
SER reduction. Experiments on a set of ISCAS’85 and MCNC’91 benchmarks reveal the
58
effectiveness of our methodology. Furthermore, a gate resizing strategy is integrated as a
post-RAR procedure to provide additive SER improvement.
59
Chapter 4 SER Reduction via Selective Voltage Scaling (SVS)
In the power optimization domain, voltage scaling is a well-known technique for reduc-
ing energy costs by applying lower supply voltages to those gates off critical paths. Toward
this end, dual-VDD design is the most common methodology to implement voltage scaling for
power reduction. For SER reduction, voltage scaling is a possible technique which can miti-
gate SET generation. More specifically, the same amount of charge disturbance produces a
smaller (less harmful) SET at gates with a high supply voltage (VDDH) than at gates with a
low supply voltage (VDDL). Accordingly, voltage scaling becomes effective against soft errors
by scaling up soft-error-critical gates. Soft-error-critical gates are those gates that have large
error impact and accounts for a large portion of total SER values. Level converters (LCs),
which impose delay and energy penalties, are needed on the connections from VDDL-gates to
VDDH-gates for preventing short-circuit leakage current in VDD
H-gates. To minimize the cost
60
for level conversion (using LCs), some existing methods, whether focusing on power or SER
optimization, do not allow any VDDL-VDD
H connection in a circuit. In such a case, the opti-
mized circuit is basically partitioned into two voltage islands: the one (closer to primary
inputs) operating at VDDH and the other (closer to primary outputs) operating at VDD
L. How-
ever, as we will see later, most of the soft-error-critical gates are near primary outputs, which
means that restricting the use of VDDH only near primary inputs cannot prove advantageous
for SER improvement in an energy-efficient manner.
A related method [24] determines optimal assignments of gate size, supply voltage,
threshold voltage, and output capacitive load to achieve soft error tolerance. Nevertheless,
their results show that, for all benchmarks, all sub-circuits finally operate at the highest VDD
(1.2V), which dissipates unnecessary power even though LC insertion can be avoided. The
algorithm described by Choudhury et al. [23] is another work employing voltage assignment
(dual-VDD) for single-event upset robustness. No LC is needed under the restriction that only
high-VDD gates are allowed to drive low-VDD gates, but not vice versa. This implies that
soft-error-critical gates, which are of great importance to the soft error rate of a circuit and
always close to primary outputs, may not operate at the high VDD unless all gates in the fanin
cones are scaled up. Therefore, the resulting voltage assignment is likely to introduce unrea-
61
sonable power penalty.
In order to avoid incurring LCs, the aforementioned two methodologies scale up too
many gates or even the whole circuit. We will point out quantitatively that such a scenario is
pessimistic; scaling up only a few of those gates in the presence of LCs can also carry out
promising results, with much less power dissipation. Then, we propose a power-aware SER
reduction framework using dual supply voltages. A higher supply voltage (VDDH) is assigned
selectively to gates that have large error impact and contribute most to the overall SER. Since
the soft error rate may vary after each voltage assignment, we estimate the effects of VDDH
assignments on circuit SER and power consumption, and accept those which minimize SER
while keeping the power overhead below a prescribed limit. The key contribution of our
approach based on selective voltage scaling (SVS) is on the appropriate use of LCs such that
the number of up-scaled gates is bounded for power awareness. In addition, a bi-partitioning
technique is developed to further alleviate the common physical-level power-planning issues
coming with dual-VDD design style, by minimizing the number of nets with terminal nodes
operating at different voltages.
62
4.1 Effects of Voltage Scaling
Before presenting the SVS-based approach for SER reduction, we explain the effects of
voltage scaling in terms of glitch generation and glitch propagation. By changing the supply
voltage (VDD) of a gate, the critical charge for transient glitches and the propagation delay of
the gate also change. The former, inversely correlated with glitch generation, is proportional
to VDD; the latter, inversely correlated with glitch propagation, is proportional to
VDD/(VDD-VTH)α where α is the technology-dependent velocity saturation factor. When a gate
is scaled up, the same amount of collected charge at its output load will generate a smaller
glitch (i.e., lower glitch generation) owing to increased critical charge. On the other hand, the
glitches generated in its fanin cone may be propagated with less attenuation (i.e., higher
glitch propagation) owing to decreased propagation delay. A chain of fanout-of-4 (FO4)
inverters simulated by HSPICE in 70nm Predictive Technology Model (PTM) indicates that
the effect on glitch generation prevails over the one on glitch propagation.
In Figure 4-1, we plot the generated and propagated glitches of a transient glitch occur-
ring at the first inverter with 15fC injected charge. The plots on the top (bottom) are made
when all inverters operate at VDDL = 1.0V (VDD
H = 1.2V). As shown in the figure, after scal-
ing up all inverters, glitch generation of the first inverter decreases and glitch propagation of
63
the remaining inverters also decreases, even though these gates become faster. The main
reason for lower glitch propagation in this example is the decreasing glitch amplitude, which
can enhance the effect of electrical masking (attenuation). In other words, electrical masking
will be weakened (by the speed-up) only if the collected charge is large enough to produce a
glitch with amplitude at least equal to the supply voltage, i.e., full swing. However, based on
the used attenuation model [20], electrical masking will become ineffective once the glitch
duration exceeds 2X the gate delay, in which case the speed-up of a single gate due to the
up-scaling of its supply voltage hardly has negative impact on electrical masking. As a result,
voltage scaling is certainly feasible for soft error hardening because a higher supply voltage
(i) can significantly reduce the generation of transient glitches and (ii) will adversely affect
Figure 4-1: HSPICE simulations for glitch generation and propagation: the plots on the top are for the low supply voltage (1.0V) and those on the bottom are for the high supply voltage (1.2V).
64
the propagation of generated glitches only within a limited range of glitch sizes.
4.2 Problem Formulation
By using Equation (4), the proposed SER reduction problem based on selective voltage
scaling (SVS) is formulated as:
)Gates#()Gates@V#(Subject to
)(SERMinimize
HDD
POs
⋅≤
∑∈
f
FjF
j
(21)
where f is the allowable percentage of gates operating at VDDH.
HSPICE simulation using 70nm PTM shows that scaling up three 3-input FO4 NOR
gates (or four 3-input FO4 NAND gates) can simply compensate for the delay imposed by a
LC implemented, for example, as in [49]. That is, the delay of a LC plus three 3-input FO4
VDDH-NORs is smaller than the delay of the three NORs when operating at VDD
L. Hence, the
circuit delay will not be significantly increased even if additional LCs are inserted, especially
for a circuit with more than 30 FO4 inverter delay [50]. Note that in the minimization prob-
lem in Equation (21), SER is a joint function of three masking mechanisms, which are pat-
65
tern-dependent and probabilistic in essence. It may not be possible to solve this problem
effectively in an analytical form, or to develop a tractable algorithm for finding an exact
solution, thereby necessitating a heuristic approach for fine-grained exploration of solutions.
The number of gates operating at VDDH is constrained by a fraction f of total gate count for
bounded energy increase. In the next subchapter, we propose a very efficient algorithm to
minimize SER while keeping the numbers of VDDH-gates and required LCs quantifiably low.
The basic principle of our approach is to quantify the scaling criticality (SC) of each gate and,
under a given power budget, scale up as many gates with maximum cumulative scaling criti-
cality as possible.
4.3 Dual-VDD SER Reduction Framework
We first define scaling criticality (SC) for each internal gate. To simplify the following
discussion without loss of generality, we omit initial duration d and amplitude a from the
notations of MEI (Equation (8)) and MMI (Equation (9)), but keep in mind that they actually
exist. In the circuit in Figure 4-2 where all gates operate at VDDL, the MEI value of gate G1
can be expressed as:
66
[ ])(MMI1)(MEI)(MEI 2
LD2
L1
L GGG −⋅+Δ= (22)
where MEIL(G2) and MMIDL(G2) are the MEI and MMI values of gate G2 when gate G2
operates at VDDL, and Δ is the amount of gate G1’s error impact propagated to primary outputs
through its fanout neighbors except gate G2 – gates G3 and G4 in this example.
If gate G2 is scaled up to VDDH, the MEI value of gate G1, still operating at VDD
L, becomes:
[ ])(MMI1)(MEI)(MEI 2HD2
H1
L GGG −⋅+Δ=′ (23)
where MEIH(G2) and MMIDH(G2) are the MEI and MMI values of gate G2 when gate G2
operates at VDDH.
By subtracting Equation (23) from Equation (22), we have:
[ ] [ ])(MMI1)(MEI)(MMI1)(MEI
)(MEI)(MEI
2HD2
H2
LD2
L1
L1
L
GGGG
GG
−⋅−−⋅=
′− (24)
Figure 4-2: An illustrative example of scaling criticality (SC): SC(G2) estimates the decrease in MEI of gate G1 after gate G2 has been scaled up to VDD
H.
67
The difference between Equations (22) and (23), as shown in Equation (24), is the scal-
ing criticality (SC) of gate G2. The larger the difference is, the more critical gate G2 is for
being scaled up to VDDH.
Definition 3 (scaling criticality): The scaling criticality (SC) of gate G is defined as:
[ ] [ ])(MMI1)(MEI)(MMI1)(MEI)(SC HD
HLD
L GGGGG −⋅−−⋅= (25)
MEIL and MMIDL are obtained during SER analysis for the standard voltage level, VDD
L
(= 1.0V in our case). Every time the ADD computation and propagation for a gate operating
at VDDL are completed, we change the voltage level from VDD
L to VDDH (= 1.2V in our case)
and then calculate MEIH and MMIDH. It is not necessary to rebuild the ADDs for VDD
H since
they are isomorphic to those for VDDL. What we need to do is only re-compute the attenuated
duration and amplitude in terminal nodes of ADDs by applying the new supply voltage
(VDDH) to the attenuation model.
The scaling criticality of gate G represents the decrease in MEI of gate G’s immediate
fanin neighbors after gate G has been scaled up. Based on the definition of MEI, we know
that the SER of a circuit greatly depends on the MEI values of its internal gates. This implies
that gates with high SC are most critical for being scaled up for soft error robustness.
68
Definition 4 (soft-error-critical gate): A gate is called soft-error-critical if its SC is within
the highest l% of overall SC values where l is a specified lower bound.
Definition 5 (soft-error-relevant gate): A gate is called soft-error-relevant if its SC is within
the next l%-u% of overall SC values where u is a specified upper bound and greater than l.
Our objective is to develop a framework which can scale up all soft-error-critical gates
and as many soft-error-relevant gates as possible, while incurring the smallest number of LCs
and lowest power overhead. The lower bound l for soft-error-critical gates guarantees a sig-
nificant reduction in SER; the upper bound u for soft-error-relevant gates sets up a power
constraint. The algorithm is described in the sequel.
First, we sort all gates (total number of gates being denoted by n) according to their SC
values in decreasing order. For each soft-error-relevant gate in the sorted list, we calculate the
number of required LCs assuming that gates between the first gate (a soft-error-critical gate)
and the current gate (a soft-error-relevant gate) are scaled up. Next, we choose the ith gate (a
soft-error-relevant gate; l*n+1 ≦ i ≦ u*n), which has the least required LCs when the 1st
gate to the ith gate are scaled up. Finally, we assign VDDH to the first i gates and VDD
L to the
remaining gates.
69
Up to this point, all soft-error-critical gates and some soft-error-relevant gates are scaled
up so that a significant amount of SER reduction is expected. Nevertheless, there may still be
an undesirable number of LCs in the current circuit. Besides extra design costs, (i) soft error
susceptibility and (ii) physical design issues will also arise if we do not carefully control the
number and distribution of LCs. Decreasing the number of required LCs not only reduces the
error impact of LCs themselves, but also alleviates potential layout issues at the physical
design stage. As a result, we present the following two refinement techniques to remove
unnecessary LCs.
Refinement 1: Scale up some VDDL-gates which are not soft-error-critical to minimize the
number of LCs.
Scaling up a VDDL-gate which is not soft-error-critical leads to little improvement in
SER, but could reduce the number of LCs needed in the circuit. For example in Figure 4-3(a),
if we scale up gate G2, LC1-2 needs to be inserted but LC2-3 and LC2-4 can be removed. The
number of LCs decreases by one in this case. We try to remove as many LCs as possible
using Refinement 1, because the power penalty resulting from a LC is larger than that from
the up-scaling of a single gate. This was confirmed by HSPICE simulation (70nm, PTM)
during which we found that the power consumption of a LC [49] is 3.55X the additional
70
power from the up-scaling of a 3-input FO4 NAND gate.
Refinement 2: Scale down some VDDH-gates which are no longer soft-error-critical due to
the up-scaling of other gates to further minimize the number of LCs.
A soft-error-critical gate may become non-soft-error-critical if one or more of its fanout
neighbors are scaled up. For example, let gates G3 and G4 in Figure 4-3(b) be
soft-error-critical and assume that both have been scaled up. However, as a result of the fact
that gate G4 has been scaled up, gate G3 may become non-soft-error-critical since its MEI and
Figure 4-3: Effects of two refinement techniques: in both cases, the numbers of required LCs decrease by one in terms of output loading.
(b) Refinement 2: Down-scaling of gate G3
(a) Refinement 1: Up-scaling of gate G2
71
SC decrease and may not need to be scaled up. Thus, we can scale gate G3 down back to
VDDL and save one LC. We do not avoid scaling these gates up before applying Refinement 2
due to the fact that, early use of this technique will easily cause broken voltage assignments –
a small cluster of VDDH-gates follows a small cluster of VDD
L-gates, following another small
cluster of VDDH-gates, and so on so forth. Evidently, such voltage scaling is not satisfactory in
terms of the extra design costs imposed by required LCs.
Refinement 1 may increase the percentage of VDDH-gates to exceed the upper bound u,
which is specified for limiting the power overhead. Hence, the allowable percentage f of
VDDH-gates in our problem formulation (Equation (21)) should be slightly larger than the
upper bound u. In Chapter 4.5, we will illustrate how the pair (l, u) is decided and how f
varies with (l, u). Our overall algorithm of selective voltage scaling for SER reduction, which
includes one efficient heuristic and two iterative refinements, is given in Figure 4-4. The time
complexity of the dual-VDD SER reduction algorithm (Algorithm 1) is analyzed as follows.
Let n be the number of gates in the target circuit. Given that the MEI and MMI values of
each gate are available, the heuristic (Lines 1-8) takes O(n lg n) time due to sorting. The two
refinement techniques (Lines 9-11 and Lines 12-16) can both run in O(n) time. The total time
of Algorithm 1 is thus O(n lg n). The time complexity of ADD traversal for MEI and MMI
72
computation is O(p) where p is the ADD size. To compute the MEI (MMI) value of gate G,
one has to traverse O(q) ADDs where q is the number of primary outputs (the number of G’s
fanin neighbors). The entire methodology works well as long as all duration and amplitude
Algorithm 1: Dual-VDD SER reduction (circuit, n, l, u) // n: gate count, l: lower bound, u: upper bound.
// Heuristic: O(n lg n) 01 Compute scaling criticality (SC) given MEI/MMI for each gate in circuit; 02 sorted_gate_list Sort all gates in decreasing order of their SC values;
// 1 ~ l*n: soft-error-critical gates, l*n+1 ~ u*n: soft-error-relevant gates. 03 FOR (i = 1; i <= u*n; i = i+1) { 04 Scale up the ith gate in sorted_gate_list; 05 num_of_LCs[i] Calculate the number of LCs needed in circuit;
} // Find the least required LCs.
06 index Extract the index of minimum in num_of_LCs[l*n+1:u*n]; 07 FOR (i = index+1; i <= u*n; i = i+1) // Keep the first index gates up-scaled.08 Scale down the ith gate in sorted_gate_list;
// Refinement 1: O(n)
09 FOR EACH (VDDL-gate G in circuit)
10 IF (scaling up gate G will not increase the number of required LCs) 11 Scale up gate G;
// Refinement 2: O(n)
12 FOR EACH (VDDH-gate G in circuit) {
13 IF (gate G is soft-error-critical) // Do not touch soft-error-critical gates. 14 CONTINUE; 15 IF (scaling down gate G will not increase the number of required LCs) 16 Scale down gate G;
}
Figure 4-4: The overall algorithm of our SVS-based approach for SER reduction
73
ADDs associated with a circuit can be built with a good primary input ordering so the sizes
of necessary ADDs are tractable. An extended scheme for further speeding up MEI/MMI
analysis was presented in [19].
Despite the limited number of required LCs as demonstrated later, physical-level floor-
planning and power network routing for a dual-VDD design may still be a challenge, espe-
cially when the connectivity between two voltage islands is complex. To address the layout
issues during physical design implementation, we use the result provided by Algorithm 1 as
an initial solution to a bi-partitioning framework. The detailed idea of exploiting partitioning
for further layout considerations is described in the next subchapter.
4.4 Bi-Partitioning for Power-Planning Awareness
4.4.1 Problem Description
The proposed formulation of voltage scaling using dual supply voltages (VDDL and VDD
H)
can be directly transformed into a bi-partitioning problem [51] where each partition is simply
74
the set of gates operating at a single VDD. Herein, we denote by a “voltage island” a topo-
logical cluster of gates operating at the same VDD, instead of a physical region with a portion
of gates in the design floorplan. For a typical partitioning problem, the total number of nets
(hyperedges) with terminal nodes in different partitions is minimized so that the subsequent
physical design steps, such as floorplanning and placement, can be optimized more easily.
For a voltage scaling problem (as proposed), it is worth to note that, the fewer connections
across voltage islands operating at different supply voltages, the more likely those voltage
islands with different supply voltages can be “physically” separated rather than interweaved
or encompassed by each other during the optimization of floorplanning (e.g., wire-length
minimization), the less complex the planning of power network synthesis will be, and finally,
the less cost/effort it requires to generate a feasible layout for a dual-VDD design. Similar
concepts have been adopted in [50] to lower the physical design overhead by minimizing the
number of LCs and assigning the same VDD to a group of gates which tend to be physically
adjacent.
Therefore, by such a problem transformation, minimizing the cut set between two parti-
tions (i.e., the number of nets with terminal nodes operating at different voltages) can not
only account for physical-level layout concerns, but also implicitly decrease the number of
75
LCs needed on the connections from the VDDL-partition to the VDD
H-partition. Toward this
end, we develop a bi-partitioning framework based on the Fiduccia-Mattheyses (FM) algo-
rithm [52]. The initial solution to our bi-partitioning problem is the result obtained by Algo-
rithm 1. To ensure maintaining a significant SER reduction, we fix the gates with largest SC
values in the VDDH-partition since they are always most soft-error-critical and must be scaled
up to VDDH. Next, the FM-based framework is applied to further optimize the result of volt-
age scaling in terms of power-planning awareness, design penalty, and SER gain. The basic
manipulation of our FM-based bi-partitioning is, according to a joint cost function, to find a
sequence of best “moves” which leads to the greatest benefit. Here, a move is defined as a
switch of a gate’s supply voltage from one to the other.
Figure 4-5 demonstrates a move in circuit C17 that switches gate G3’s supply voltage
from VDDH to VDD
L. Before the move (see Figure 4-5(a)), the cut size and the number of
required LCs are both 4, and the SER reduction is 18%. After the move (see Figure 4-5(b)),
the cut size becomes 3, the number of required LCs remains 4, and the SER reduction is 14%.
These parameters form the cost function which determines whether a move is beneficial (or
the best) and will affect the overall quality of selective voltage scaling for SER reduction.
76
4.4.2 Cost Function
The cost function used is a weighted combination of the cut size (|cut|) and the number
of required LCs (#LC). The cut size stands for the complexity of power network implementa-
tion, and the number of required LCs can represent the design penalty.
.10whereLC#LC#)1(
|cut||cut|COST
init
new
init
new ≤≤⋅−+⋅= ααα (26)
Our bi-partitioning framework aims at minimizing the cost by moving gates between
two partitions, based on the FM algorithm. To avoid analyzing exact power consumption for
each move, the power overhead owing to voltage up-scaling itself is not included in the cost
(a) Before the move: |cut| = 4 (nets) #LC = 4 (pin-to-pin wires) ΔSER = 18%
(LCs’ error impact considered)
(b) After the move: |cut| = 3 (nets) #LC = 4 (pin-to-pin wires) ΔSER = 14%
(LCs’ error impact considered)
VDDL (1.0V) VDD
H (1.2V) VDDL (1.0V) VDD
H (1.2V)
Figure 4-5: An example of a move in the FM-based bi-partitioning framework: switch the supply voltage of gate G3 from VDD
H to VDDL.
77
function. Instead, we specify an allowed range of the ratio between two partitions such that
the number of gates operating at VDDH is bounded. By doing so, we can also guarantee that
the SER reduction which has been accomplished by Algorithm 1 is maintained. This is be-
cause those gates critical for being scaled up (i.e., with high SC) will very likely stay in the
VDDH-partition, given that some “most” soft-error-critical gates have been pre-assigned and
are fixed with VDDH.
Consider the example in Figure 4-6(a) where gate G is identified as the next move from
the VDDH-partition to the VDD
L-partition and as mentioned, most soft-error-critical gates are
(a) Move gate G from the VDDL-partition to the VDD
H-partition
Figure 4-6: Cost function: a weighted combination of the cut size (|cut|) and the number of required LCs (#LC)
(b) After the move: Δ(|cut|) = –1 and Δ(#LC) = +1
VDDHVDD
L
VDDL VDD
H
78
fixed with VDDH (on the right of the dotted red line). After moving gate G (see Figure 4-6(b)),
|cut| decreases by 1 (better power-planning awareness) but #LC increases by 1 (higher design
penalty). The weight α in Equation (26) has significant impact on the result of selective
Figure 4-7: The proposed FM-based methodology for power-planning awareness
Algorithm 2: FM-based bi-partitioning (circuit, α) // α: weight in Equation (26)
01 Use the result of Algorithm 1 as the initial solution; 02 cutSize cut size of the initial solution, i.e., |cutinit| in Equation (26); 03 noLC number of required LCs in the initial solution, i.e., #LCinit; 04 WHILE (TRUE) { 05 IF (no improvement for consecutive 2 iterations) BREAK;
06 Unlock all gates in circuit; 07 Lock/fix most soft-error-critical gates with VDD
H; // usually first 50% 08 WHILE (TRUE) { 09 IF (all gates locked) BREAK; 10 IF (any of the unlocked gates, when being moved,
cannot maintain the allowed range of partitioning ratio) BREAK;
11 Find the best move according to the gain in the cost; // Equation (26)12 Move and then lock the gate corresponding to the best move; 13 Update Δ(|cut|) and Δ(#LC) for affected gates/moves; 14 Calculate the cost gain of each unlocked gate
using α, cutSize, noLC, Δ(|cut|) and Δ(#LC); } // End of the inner WHILE loop /* Given that the first m moves out of a total of n moves can lead to the largest cumulative cost gain, */
15 Keep the first m moves and undo the last (n – m) moves; } // End of the outer WHILE loop
79
voltage scaling. Assuming that |cutinit| is equal to #LCinit, if we have α smaller than 0.5, the
move increases the cost defined by Equation (26) and thus is an adverse move. However, if
we have α greater than 0.5, the move decreases the cost and will be regarded as a beneficial
one. By choosing an appropriate α, the proposed methodology can be either more
power-planning-aware or more power-aware (overhead-aware). Note that power awareness
and power-planning awareness do not conflict and can be realized simultaneously by our
methodology (Algorithm 2, as depicted in Figure 4-7) which minimizes the joint cost func-
tion in Equation (26). The whole algorithm usually converges within four iterations.
4.5 Experimental Results
The experimental settings for SVS-based SER reduction are the same as those in Chap-
ter 3.3 except that two supply voltages, VDDL = 1.0V and VDD
H = 1.2V, are available for
voltage scaling.
Table 4-1 reports the experimental results of our proposed approach when the lower
bound l is 8 and the upper bound u is 16. That is, we will certainly scale up the first 8% of
80
internal gates (soft-error-critical gates) and minimize the overall SER and the number of
required LCs by manipulating the next 8% (soft-error-relevant gates). The inserted LCs are
Table 4-1: Average mean error susceptibility (MES) improvement and overall soft error rate (SER) reduction
81
also considered as potential sources of radiation-induced transient glitches. For each bench-
mark in Table 4-1, various glitches sizes and different input distributions are applied. We list
the numbers of VDDH-gates and required LCs in columns four and five. Synchronous LCs,
which may be needed at the outputs of sequential elements, are not incorporated as in
[23][24]. The average MES values over all primary outputs before and after selective voltage
scaling are shown in columns six and seven. Columns eight and nine demonstrate the MES
improvement and possible maximum improvement which are obtained by assigning VDDH to
all gates in the circuit.
For instance, circuit C432 has 32 primary inputs, 7 primary outputs, and 156 internal
gates. For soft error hardening against glitches with duration of 60ps, the numbers of
VDDH-gates and required LCs are 31 and 12, respectively. The average MES of the original
circuit is 0.00357, while that of the radiation-hardened version is 0.00205. The percentage of
the MES improvement is 42.50%; the possible maximum improvement by scaling up all (156)
gates in C432 is 62.02%. When considering all possible glitch sizes, the overall SER reduc-
tion for C432 is 35.28%. The absolute SER in FITs (failures-in-time) drops from 12.9 FITs to
8.4 FITs. On average across all benchmarks, 33.45% SER reduction can be achieved with
18.89% (slightly larger than the upper bound u) of total gates scaled up and 3.86% LCs
82
inserted, as a fraction of the gate count.
In some cases, for example circuit x4, the SER reduction is 27.12%, below the average
(33.45%). However, one can note that the MES improvements for 80-120ps duration sizes
are very close to the possible maximum improvements. The results reveal that, by scaling up
a small portion of internal gates in a circuit, we can reduce the overall SER either by a sig-
nificant percentage or near the theoretical minimum. On average, more than three-fifths
(33.45% out of 52.85%) of maximum SER reduction is accomplished with less than one-fifth
(18.89%) of gates being scaled up.
As for the scalability, Algorithms 1 and 2 for selective voltage scaling have been ex-
perimentally verified to be efficient, requiring less than only a few minutes or even seconds
when applied on all benchmarks considered. The benchmarks listed in Table 4-1 are those
which can be finished analyzing in terms of MEI, MES, and then SER within a reasonable
amount of runtime. The SER analysis engine on which our SVS-based approach relies is a
symbolic one based on binary and algebraic decision diagrams. To improve the scalability of
the SER analyzer used, the authors of [19] proposed to partition a circuit into smaller pieces
such that the number of gates in each sub-circuit is below a certain limit and/or the number of
nets crossing the cuts between sub-circuits is minimized. For the purpose of runtime effi-
83
ciency, it is important that the sub-circuits have smallest possible numbers of inputs, on
which the size and the manipulation of BDDs/ADDs greatly depend. Once a given circuit is
partitioned, we apply the analysis framework on each sub-circuit, instead of the circuit as a
whole, and combine the probabilistic results to derive those SER statistics including MEI and
MMI. Without a significant loss of accuracy, this partitioning strategy can drastically reduce
the runtime for the computation of MEI and MMI, which are necessary for identifying
soft-error-critical and soft-error-relevant gates based on their SC values (Equation (25)). For
example, according to the MEI and MMI results from the aforementioned flow, 94% (95%)
of those gates in C432 (C1908) that were supposed to operate at VDDH for soft error harden-
ing will be scaled up indeed. However, the analysis of MEI/MMI/SER is sped up by two
orders of magnitude while the amount of SER reduction is affected only marginally.
The corresponding power and delay overheads are shown in Figure 4-8, where power
and timing are measured by using Synopsys® PrimeTime PX. Input probability distributions
used for the results in Table 4-1 are also applied for switching activity analysis in PrimeTime
PX. Our approach incurs an average of 11.74% power overhead, which is much smaller than
those introduced by other frameworks applying voltage scaling/assignment where LCs are
avoided. As mentioned earlier, the circuit performance does not change much or even be-
84
comes better, except for circuit vda, which has a delay overhead of 6.34% with a largest SER
reduction of 43.09%. Overall, the overhead in normalized power-delay-area product per 1%
SER reduction is 0.64%, while that of [24] is 0.85%. There is a key point to be clarified. Our
results are only from voltage scaling, while the results of [24] are jointly from gate sizing,
VDD and VTH scaling, and output load attaching. Using MEI and MMI described in Chapter 3,
we can easily characterize each gate and also exploit these techniques, for example, gate
sizing [14] for further SER reduction without much additional effort.
The goal of this methodology is to assign VDDH to gates with high scaling criticality.
Therefore, after those gates are scaled up, the MEI values of internal gates will become
smaller. In Figure 4-9, the distributions of overall MEI values for circuit x2 are presented.
Each point in the figure denotes the number of gates (y-axis) having MEI within the interval
Figure 4-8: SER reduction vs. power and delay overheads
85
(x-axis). As it can be seen, the MEI distribution after optimization shifts toward the left,
which means the MEI values of internal gates become smaller due to selective voltage scal-
ing.
To validate the efficacy of applying Algorithm 2, we use benchmarks alu4 and vda, in
which more LCs are inserted. For alu4, after applying the FM-based partitioning algorithm
given α = 0.5, the number of VDDH-gates decreases by 2%, the number of required LCs (#LC)
decreases by 23%, the cut size between two voltage islands (|cut|) decreases by 16%, and the
amount of SER reduction decreases by only 2%. Those results for vda are 4%, 40%, 24%,
and 2%, respectively. Note that, before applying Algorithm 2, we also try to remove unnec-
essary LCs by employing the two refinement techniques, which implicitly reduces the cut
size. Hence, the aforementioned results in terms of #LC and |cut| are additive improvements
x2
Figure 4-9: Mean error impact (MEI) distributions
86
on top of Algorithm 1, which demonstrate that we can further optimize our selective voltage
scaling problem for both power and power-planning awareness while maintaining the SER
reduction.
In this work, it is assumed that both VDDL and VDD
H are needed for converting a
VDDL-signal to a VDD
H-signal, as presented in [49]. LCs are thus restricted to be placed
physically around the boundaries between VDDL-regions and VDD
H-regions during physical
design implementation. In [50], single-supply LCs, which realize level conversion with only
VDDH, are developed such that LCs can be placed “in” the VDD
H-regions without increasing
the complexity of power network routing. Given a dual-VDD design with required LCs speci-
fied, it is evident that using single-supply LCs for layout generation can simply alleviate the
power-planning issues by relaxing the restriction on LC placement. As a pre-layout proce-
dure, the proposed idea of exploiting FM-based partitioning focuses on the exploration of
voltage scaling/assignment for power-planning awareness before implementing a dual-VDD
design at the physical level. The benefit of power-planning-aware partitioning can be further
strengthened with the appropriate use of single-supply LCs, which is beyond the scope of this
work and not particularly addressed in this chapter.
We also perform experiments with different lower and upper bounds. As shown in
87
Figure 4-10, the SER reductions when using (l, u) smaller than (8, 16) are not as significant
as the case when (l, u) is (8, 16). On the other hand, using (l, u) greater than (8, 16) may
induce more VDDH-gates and LCs. More VDD
H-gates will result in larger power penalty; more
LCs will lead not only to larger overhead in terms of area and power, but also to higher error
impact since LCs are also vulnerable to particle hits.
4.6 Concluding Remarks
In this chapter, we propose a power- and power-planning-aware soft error hardening
framework via selective voltage scaling using dual supply voltages for combinational logic.
A novel metric, scaling criticality (SC), is used to estimate the effects of VDDH assignments
alu2
Figure 4-10: SER reduction with different lower and upper bounds
88
on circuit SER. Based on the estimation through SC, we introduce an efficient heuristic and
two refinement techniques for SER reduction while keeping the numbers of VDDH-gates and
required LCs sufficiently low. In addition, a FM-based partitioning algorithm is developed to
further address potential physical-level layout issues. Various experiments on a set of stan-
dard benchmarks demonstrate that the entire methodology can effectively reduce the circuit
susceptibility to radiation-induced transient errors, with both power and power-planning
awareness explicitly considered.
89
Chapter 5 SER Reduction via Clock Skew Scheduling (CSS)
When the combinational block of a sequential circuit can propagate SETs/SEUs freely,
the sequential circuit may become very sensitive to such transient events. This is because,
once latched, soft errors can circulate through the circuit in subsequent clock cycles and
affect more than one output, more than once, resulting in so-called multiple-bit upsets
(MBUs). The untraceable propagation of soft errors greatly affects the circuit operation for
consecutive cycles and thus, necessitates design methods for soft error tolerance of sequential
circuits, in a similar manner to classic design constraints such as performance and power
consumption.
Soft error tolerance for sequential circuits cannot be perfectly addressed without tack-
ling MBUs. This chapter presents a SER mitigation framework where the MBU impact is
90
explicitly considered and alleviated. To the best of our knowledge, this is the first work ad-
dressing MBU-aware soft error tolerance in sequential circuits. On one hand, for an original
error (SET/SEU) in the clock cycle when a particle strikes, we maximize the probability of
timing masking via clock skew scheduling (CSS). On the other hand, during clock cycles
following the particle hit, we avoid multiple errors (MBU) from propagating repeatedly by
exploring the effects of (i) implication-based masking and (ii) mutually-exclusive propaga-
tion, as explained later in Chapter 5.1.1 and Chapter 5.1.2, respectively. CSS is a sequential
optimization technique which borrows time from adjacent combinational blocks by adjusting
skews of corresponding clock signals. These skews, also known as useful skews [53][54], are
often exploited to minimize the delay (clock period) of a sequential circuit. For more details
about CSS for delay minimization, please refer to [53][54].
We take advantage of useful skews to increase the probability of timing masking via
CSS, while accounting for the MBU impact to further enhance soft error tolerance. The end
result of this methodology is a net reduction in soft error rate, not only during clock cycles
when particles strike, but also during cycles subsequent to them. The proposed framework
involves only minor modifications of the clock tree synthesis step and does not touch the
combinational logic of sequential circuits. Hence, this CSS-based approach can also act as a
91
post-processing procedure for additional SER improvement on top of techniques targeting
only combinational logic, which typically change the circuit timing and topology (e.g., resiz-
ing [22] and rewiring [25]).
5.1 A Motivating Example
To motivate the use of clock skew scheduling for soft error tolerance, we use benchmark
s27 (see Figure 5-1) from the ISCAS’89 suite, where flip-flops (FFs) are posi-
tive-edge-triggered. Without loss of generality, we assume that the delay of each gate is 1
(unit delay model) and wires do not contribute to the circuit delay. The assumption can be
relaxed for a generic delay model, with consideration of wire loads.
It is important to note that, once latched, a particle-induced SET will become a
full-cycle error. Therefore, in cycles following the particle hit, one should take only logical
masking into account because electrical masking and timing masking are ineffective against
full-cycle errors. In this example, we focus on a SET which occurs at gate G8 and may be
captured by flip-flops FF2 and/or FF3.
92
Definition 6 (skew): Given two flip-flops FFi and FFj for which the arrival times to clock
pins are ci and cj respectively, the skew between FFi and FFj, denoted by skew(FFi, FFj), is
(ci – cj).
Definition 7 (error-latching window): The error-latching window of a flip-flop is a time
interval, [t–tsu, t+th], where t is the moment when a clock edge happens, tsu and th are the
setup and hold times of the flip-flop. An error must be present during this interval to be
latched; otherwise, it is filtered by latching-window (timing) masking. The error-latching
window associated with a flip-flop can be backwards propagated to internal gates (according
to respective propagation delays) to determine when an error has to occur to be latched by
that flip-flop.
Figure 5-1: An example circuit (s27) from the ISCAS’89 benchmark suite
93
Under unit delay model1, the delays from G8 to FF2 and to FF3 are 0 and 1, respectively.
Our goal is to overlap the error-latching windows of FF2 and FF3 at G8 by adjusting the
arrival times of clock signals to FF2 and/or FF3, which in effect decreases the probability that
an error at G8 is latched with increased impact of timing masking. The idea of overlapping
error-latching windows, first proposed in [30], is based on the fact that the probability of
timing masking is inversely proportional to the sum of sizes of disjointed error-latching
windows. As a result, the more the overlap between error-latching windows, the smaller the
sum of window sizes and the larger the probability of timing masking will be.
For example, in Figure 5-2(a), there are two separate error-latching windows at G8 (one
at time t-1 and the other at t) before skewing any flip-flop. If we lengthen the arrival time of
clock signals to FF3 by 1 and its new error-latching window is shown as the upper right
diagram in Figure 5-2(b), there will only be one joint error-latching window at G8 (at time t)
due to complete overlapping. This implies that, after skewing FF3, only errors occurring at
G8 during the error-latching window at time t will be latched, while errors occurring during
the no-longer-existing window at time t-1 will be filtered by timing masking, leading to a
significant reduction in SER. The general solution to completely overlap the error-latching
1 The use of unit delay model is just an assumption for the ease of illustration here. The assumption will be relaxed later for experimentation.
94
windows of FF2 and FF3 is to adjust the arrival times of clock signals to FF2 and/or FF3 such
that skew(FF2, FF3) = -1. Since the overlapped error-latching window (at time t) can be
backwards propagated till the primary inputs, the positive impact on circuit SER is also valid
for those gates in the fanin cone of G8.
However, in the case where FF3 has been skewed, MBUs may become more frequent
(a) Before skewing: two separate error-latching windows at G8
(b) After skewing: one joint error-latching window at G8
Figure 5-2: Overlapping of error-latching windows
95
because an error occurring at G8 during the joint error-latching window at time t will be
latched by both FF2 and FF3 simultaneously. Instead of using all flip-flops in a sequential
circuit as candidates for clock skew scheduling, we carefully pick pairs of flip-flops that are
beneficial for MBU elimination. In the sequel, we demonstrate how to identify pairs of
flip-flops that are capable of alleviating MBU effects (during clock cycles subsequent to
particle hits) and suitable to be managed by CSS for MBU-aware soft error tolerance.
5.1.1 Implication-Based Masking
We consider the following example to illustrate the concept of implication-based mask-
ing required for our methodology. The function of primary output O of circuit s27 is:
O = (a + f ’ + g)(c + d’ + e + g) (27)
The complement of Boolean difference of O with respect to (w.r.t.) FF2’s present-state
line f is:
F = (∂O/∂f)’ = a + c’de’ + g (28)
Equation (28) represents the Boolean expression of logical masking patterns for errors
propagated from f to O. For instance, a full-cycle error originating at f will be logically
masked at gate G2 and cannot be propagated to O if a is “1”, in which case Boolean function
96
F is evaluated to a “1”.
Similarly, the complement of Boolean difference of O w.r.t. FF3’s present-state line g is:
G = (∂O/∂g)’ = (a + f’)(c + d’ + e) (29)
Equation (29) represents the Boolean expression of logical masking patterns for errors
propagated from g to O. For instance, a full-cycle error originating at g will be logically
masked at gate G8 and cannot be propagated to O if a is “1” and c is “1”, in which case Boo-
lean function G is evaluated to a “1”.
Note that F is a function of g and G is a function of f, where f and g are the present-state
lines of FF2 and FF3 respectively and may be corrupt due to the presumed SET at G8. Thus, f
and g may not accurately reflect logical masking and should be removed from Equations (28)
and (29). To remove these variables while keeping the logical masking patterns, we apply
universal quantification.
The universal quantification of F w.r.t. g is:
edcaFFF ggg ′′+=⋅=∀ == 01 (30)
Equation (30) describes the patterns for logical masking of errors from f to O, for all
possible values of g (0 and 1). Since we do not know whether g is corrupt, applying universal
97
quantification makes sense and will correctly reflect logical masking of errors from f to O,
irrespective of g.
Similarly, the universal quantification of G w.r.t. f is:
)(01 edcaGGG fff +′+⋅=⋅=∀ == (31)
Up to now, Equations (30) and (31), which no longer include f or g, have been functions
of inputs a, c, d, and e. In addition, one can easily find that (31) is a subset of (30); that is to
say, with respect to O, the logical masking of an error on g implies the logical masking of an
error on f. More precisely in this case, both errors on f and g will be masked when (31) is
satisfied.
Definition 8 (implication-based masking): A pair of flip-flops X and Y is called an implica-
tion-based masking (IM) pair if, with respect to all outputs and flip-flops,
(i) the set of logical masking patterns for errors propagated from X (denoted by LM(X))
contains the one for errors from Y (denoted by LM(Y)), i.e., LM(X) ⊇ LM(Y) as illus-
trated in Figure 5-3(a), or
(ii) the set of logical masking patterns for errors propagated from Y (LM(Y)) contains the
one for errors from X (LM(X)), i.e., LM(Y) ⊇ LM(X) as illustrated in Figure 5-3(b).
98
Based on Definition 8, the first category of candidates for CSS can be identified. In cir-
cuit s27, as shown in Figure 5-1, {FF2 and FF3} is a pair of candidates falling into this cate-
gory. By overlapping the error-latching windows of these two flip-flops via CSS (see Figure
5-2(b)), not only can SER be reduced, but also CSS-induced MBUs will, to a certain extent,
be eliminated by implication. This will be demonstrated later in Chapter 5.3.
5.1.2 Mutually-Exclusive Propagation
In the previous section, we looked at primary output O in circuit s27 for determining the
first type of candidate flip-flops. For the second type, mutually-exclusive propagation, we
look at next-state line R. As opposed to implication-based masking, mutually-exclusive
propagation in s27 can be explicitly identified by a single side-input assignment, where a side
(a) Implication-based masking:
LM(X) ⊇ LM(Y)
(b) Implication-based masking:
LM(Y) ⊇ LM(X)
(c) Mutually-exclusive propagation:
LM(X) ⊇ LM(Y)’ ≡ LM(Y) ⊇ LM(X)’
Figure 5-3: Illustrative relationships between a pair of flip-flops (X and Y) as candidates for clock skew scheduling
99
input is a wire along which no error is propagated. Again, we focus on a SET which occurs at
gate G8 and may be captured by flip-flops FF2 and/or FF3.
To propagate errors from FF3’s present-state line g to R, gate G10 needs a
non-controlling value “0” on its side input G1 G10. As seen in Figure 5-1, the value assign-
ment at the output of gate G1 is a controlling value for gate G2, at which errors from FF2’s
present-state line f are thus logically masked. Therefore, with respect to R, the propagation of
an error on g implies that an error propagated from f is logically masked. In other words,
errors on f and g cannot be observable at R simultaneously.
Definition 9 (mutually-exclusive propagation): A pair of flip-flops X and Y is called a mutu-
ally-exclusive propagation (MEP) pair if, with respect to all outputs and flip-flops, the set of
logical masking patterns for errors propagated from X (LM(X)) contains the complement of
the one for errors from Y (LM(Y)’), i.e., LM(X) ⊇ LM(Y)’ as illustrated in Figure 5-3(c).
Intuitively, the sets of patterns for propagating errors from X and Y, represented as LM(X)’
and LM(Y)’ respectively in Figure 5-3(c), are disjoint.
Based on Definition 9, the second category of candidates for CSS can be identified. As
in the case of implication-based masking, Boolean algebra is used to identify MEP pairs.
100
Similar to IM pairs, we can overlap the error-latching windows of two flip-flops falling into
this category (e.g., FF2 and FF3 in s27) to achieve MBU-aware soft error tolerance because,
due to the property of mutually-exclusive propagation, at least one of the two errors propa-
gated from this pair of flip-flops will be logically masked before reaching a primary output or
a flip-flop. The mutually-exclusive property guarantees that the MBU impact after applying
CSS is at most equivalent to the case of not applying CSS, whereas circuit SER can be sig-
nificantly reduced as a result of increased timing masking. It is also probable that two errors
from a MEP pair are both masked and consequently less MBU impact is expected.
Any two flip-flops are regarded as candidates and will be beneficial for SER reduction
as long as they are either IM or MEP pairs. These two properties are the major motivation for
our framework aiming at soft error tolerance, and both address the MBU issue by mitigating
the occurrence of multiple-bit upsets. More precisely, as mentioned earlier, overlapping the
error-latching windows of flip-flops increases the probability of timing masking and in turn
decreases the soft error rate of a circuit. Furthermore, overlapping the error-latching win-
dows of a candidate pair of flip-flops, which meet the IM or MEP condition, can not only
reduce circuit SER but also alleviate potential MBU effects. Hence, for our objective of
MBU-aware soft error tolerance, we check all possible pairs of flip-flops and extract as can-
101
didates for the proposed CSS-based approach those satisfying the IM or MEP property.
5.2 Clock Skew Scheduling Based on
Piecewise Linear Programming (PLP)
Chapter 5.1 described what pairs of flip-flops can be identified as candidates for the
mitigation of MBU effects and manipulated by our CSS-based approach. It also explained
how circuit SER can be reduced by overlapping error-latching windows of candidate
flip-flops via clock skew scheduling. However, the motivating example in Chapter 5.1 is a
special case of CSS for MBU-aware soft error tolerance. A fundamental assumption in the
example is that we can completely overlap the error-latching windows of a given pair of
flip-flops (FFs) which have been recognized as candidates for CSS. This assumption is not
realistic because it is not always possible to completely overlap error-latching windows
without incurring any timing violations, i.e., setup time violations owing to long paths or
hold time violations owing to short paths. Moreover, adjusting the skew between two FFs
may also change skews between affected FFs and unaffected FFs. For a large sequential
circuit with hundreds of FFs, optimal skew scheduling, shown to be a signomial problem [55],
102
is difficult to be determined algorithmically. To address this problem, we develop an analyti-
cal method which can apply CSS with a global view on all extracted candidate FFs while
suppressing timing violations. A generalized problem formulation, based on piecewise linear
programming (PLP), is presented in the sequel.
5.2.1 Problem Formulation
Given a non-skewed sequential circuit (i.e., skew(FFi, FFj) = 0 for all i and j) and all
possible pairs of flip-flops as candidates beneficial for MBU elimination, our objective is to
achieve the highest level of MBU-aware soft error tolerance by maximizing the overlap
between error-latching windows of each flip-flop pair via clock skew scheduling.
Definition 10 (intersecting gate): The intersecting gate of two flip-flops FFi and FFj is the
root gate for the intersection of FFi’s and FFj’s fanin cones. In case of more than one such
gate, the one with the largest MEI value (Equation (8)) is selected.
In Figure 5-4, flip-flops FFi and FFj are a pair of candidates whose intersecting gate is
gate Gij. The propagation delays from Gij to FFi and to FFj are denoted by di and dj respec-
tively. Let the amounts of adjustments in the arrival times of clock signals to FFi and FFj be
103
si and sj, where si and sj can be positive or negative. To completely overlap the error-latching
windows of FFi and FFj at Gij, we have to determine si and sj such that skew(FFi, FFj) = (si –
sj) = (di – dj). However, complete overlapping may need significantly large |si| and/or |sj| and
thereby, may induce timing violations, which must be avoided in the resulting design. To
suppress timing violations, we set up the first two constraints as follows.
For each possible pair of flip-flops FFx (skewed by sx) and FFy (skewed by sy) between
which there exist combinational paths from FFx to FFy, Equation (32) expresses the setup
time constraint and Equation (33), the hold time constraint:
Figure 5-4: Generalized clock skew scheduling of a candidate pair of flip-flops (FFi and FFj) for MBU-aware soft error tolerance
104
sx + tcq + Axy + tsu < sy + Tclk (32) sx + tcq + axy > sy + th (33)
where Tclk is the clock period of the sequential circuit, tcq, tsu and th are respectively the
clock-to-output delay, setup and hold times of flip-flops, and Axy and axy are the maximum
and minimum delays of combinational paths from FFx to FFy, which can be obtained by
performing static timing analysis.
Due to the above two constraints, it may become impossible to overlap error-latching
windows of all flip-flop pairs completely and to realize the theoretical optimum in the un-
constrained case. A generalized methodology accommodating partial (incomplete) overlap-
ping of error-latching windows is thus required.
Let wij denote the reduction in SER of the given circuit obtained by completely overlap-
ping the error-latching windows of FFi and FFj at Gij. We apply the increased timing mask-
ing to calculate the new MEI values (Equation (8)) of Gij and those gates in its fanin cone,
and update the MES values (Equation (2)) of corresponding outputs. The difference between
the old MES and the new MES can be used to derive wij based on Equations (3) and (4). The
reason for selecting an intersecting gate with the largest MEI is that, by doing so, it is very
likely to obtain the largest wij for CSS.
105
Note that for sequential circuits, Equation (2) (MES) needs to be modified for evaluat-
ing the error susceptibility of next-state lines, and also extracting the error correlations be-
tween different state lines to find the probability of two or more next-state lines failing due to
a SET at a given gate. Given error probabilities of those state lines in the clock cycle when a
SET happens, the average probability of MBUs during the following cycles is modeled in [18]
using conditional probabilities.
The theoretical optimal SER reduction is:
( )∑ ∈Candidates),(, ji FFFFji ijw (34)
Since the optimum (Equation (34)) may be unachievable due to constraints (32) and
(33), we use another variable, fij (0 ≦ fij ≦ wij), to denote the actual reduction in SER result-
ing from the overlapping (complete or partial) of FFi’s and FFj’s error-latching windows.
Figure 5-5 shows fij as a function of sij (= skew(FFi, FFj) = si – sj) where tsu and th are the
setup and hold times of flip-flops. The rationale behind is that, once overlapped, fij is linearly
proportional to the size of the overlap between FFi’s and FFj’s error-latching windows, and fij
= wij when completely overlapped at sij = (di – dj). This is based on the fact that the size of the
overlapping interval is proportional to the probability of timing masking, and inversely pro-
portional to the probability of a SET being registered.
106
From Figure 5-5, one can note that the relationship of fij versus sij is neither convex, nor
concave. Instead, the formulation becomes piecewise linear if fij(sij) is broken into four inter-
vals: sij = (di – dj) – (tsu + th), sij = (di – dj), and sij = (di – dj) + (tsu + th). By introducing four
new binary variables pij,1, pij,2, pij,3, and pij,4 such that
pij,1 + pij,2 + pij,3 + pij,4 = 1 (35)
and four new floating variables rij,1, rij,2, rij,3, and rij,4 where
0 ≦ rij,k < pij,k for k = 1, 2, 3, and 4 (36)
we can re-express sij as:
Figure 5-5: fij versus sij, with four intervals that are piecewise linear: sij = (di – dj) – (tsu + th), sij = (di – dj), and sij = (di – dj) + (tsu + th)
107
[ ][ ][ ][ ])()(
)()(
)()(
)()(
4,4,
3,3,
2,2,
1,1,
hsujiijhsujiij
hsuijjiij
hsuijhsujiij
hsujiijij
jiij
ttddUBrttddp
ttrddp
ttrttddp
LBttddrLBp
sss
−−+−×+++−×+
+×+−×+
+×+−−−×+
−−−−×+×=
−=
(37)
where LB and UB are the lower and upper bounds on sij. As a pessimistic but valid case
obtained by rearranging Equations (32) and (33) with Axy = 0 and axy = Tclk, LB and UB can
be th – Tclk and Tclk – tsu respectively.
Similarly, fij can be rewritten as:
[ ][ ][ ][ ]00
)0(
)0(0
00
4,4,
3,3,
2,2,
1,1,
×+×+
−×+×+
−×+×+
×+×=
ijij
ijijijij
ijijij
ijijij
rp
wrwp
wrp
rpf
(38)
Geometrically, as shown in Figure 5-5, pij,k = 1 means sij is within the kth interval of fij(sij)
and rij,k indicates the ratio of sij within the kth interval. For a valid solution, there must be only
one among the four binary variables (pij,k) equal to 1 and only one among the four floating
variables (rij,k) greater than or equal to 0. All of the other variables are 0.
Lastly, our proposed PLP-based SER mitigation framework, for MBU-aware soft error
tolerance, is formulated as:
Maximize ( )∑ ∈Candidates),(, ji FFFFji ijf (39)
Subject to (32), (33), (35), (36), and (37)
108
where Equations (32) and (33) ensure no timing violation in the resulting circuit, and Equa-
tions (35), (36), and (37) are used to transform the original formulation to a piecewise linear
representation.
The optimal solution to (39) can be found by existing mixed integer linear programming
(MILP) solvers. The worst-case problem size of our PLP formulation is O(n2) where n is the
number of flip-flops in a sequential circuit. For most of used benchmarks, the complexity is
far below the worst case because not all flip-flops are identified as candidates for clock skew
scheduling. More precisely, assuming m is the number of candidate pairs of flip-flops, the
problem size can be simplified to O(m) in terms of the numbers of variables and constraints.
This PLP-based methodology has been experimentally verified to be very efficient in runtime,
usually of on the order of one minute for all benchmarks considered.
5.2.2 Interaction with Other Techniques
The efficacy of our approach highly depends on how much we can overlap er-
ror-latching windows of candidate flip-flops, which is basically bounded by Equations (32)
and (33) after the combinational logic of a sequential circuit has been fixed. However,
choosing candidate pairs of flip-flops is based only on circuit functionality, not timing or
109
topology. If those candidates are extracted earlier and then fed to front-end optimization steps,
we can try to balance the propagation delays to each pair of flip-flops from their intersecting
gate. Consider the same example in Figure 5-4 where FFi and FFj have been known as can-
didates for clock skew scheduling. If di and dj could be made as close as possible during
optimization of combinational logic, the error-latching windows of these two flip-flops are
more probable to be overlapped via CSS. On the other hand, fine-grained design techniques
such as wire resizing and delay insertion can also be applied as post-optimization tuning to
minimize the delay difference between two paths, especially for the shorter one due to its
flexibility in being lengthened.
5.3 Experimental Results
Again, the experimental settings for CSS-based SER reduction are the same as those in
Chapter 3.3, except that the setup (tsu) and hold (th) times of flip-flops are both assumed to be
10ps. Also, a larger interval for initial glitch duration, (dmin, dmax) = (60ps, 140ps), is used to
get a higher occurrence rate of MBUs for demonstrating the effectiveness in terms of
MBU-aware soft error tolerance. The problem formulated as piecewise linear programming
110
is solved by GNU Linear Programming Kit (GLPK) version 4.33 on a 3GHz Pentium 4
workstation running Linux.
Table 5-1 reports the experimental results for average MES improvement and SER re-
duction. For each benchmark in Table 5-1, we list the numbers of primary inputs, primary
outputs and internal gates in column two, and the numbers of flip-flops, candidate pairs along
with the corresponding percentage among all possible pairs in column three. For a circuit
with n FFs, we check all possible (n*(n-1)/2) pairs and extract those satisfying the IM or
MEP property as candidates for clock skew scheduling. The average MES values over all
primary outputs before and after applying our PLP-based CSS are shown in columns five and
six, for three different initial duration sizes (small: 60ps, medium: 100ps, and large: 140ps).
Columns seven and eight demonstrate the MES improvement and the overall SER reduction.
The runtime spent on solving the PLP problem, which is not included in the table, is about 1
minute for circuits s1196 and s1238 and very few or even less than 1 second for all the oth-
ers.
For example, circuit s208 has 10 primary inputs, 1 primary output, 68 internal gates, and
8 flip-flops. Among 28 (= 8*7/2) pairs of FFs, 21 pairs (75%) can be identified as candidates
for CSS. Based on Equation (39), we formulate the CSS problem with these 21 pairs and then
111
find its optimal solution by using GLPK. The MES improvements for small (60ps), medium
(100ps), and large (140ps) duration sizes are 15.89%, 35.69%, and 36.05%, respectively.
When considering all possible sizes of glitches, the overall SER reduction is 29.21%. On
average across all benchmarks, 35.75% SER reduction can be achieved.
Table 5-1 also shows the corresponding amount of skews due to CSS. This is measured
Table 5-1: Average mean error susceptibility (MES) improvement and overall soft error rate (SER) reduction
112
by normalized absolute adjustment in clock signal, which is defined as:
clk
ii
T
FF
⋅
Δ∑FFs#
)(AT (40)
where ΔAT(FFi) is the amount of adjustment in the arrival time of clock signal to FFi and Tclk
is the clock period of the circuit.
Normalized absolute adjustment (Equation (40)) quantifies the cost imposed by CSS in
terms of the degree of clock network modification. Intuitively, the larger the value of nor-
malized absolute adjustment, the more aggressive modification the clock network may suffer.
As it can be seen in the last column of Table 5-1, on average 4.44% normalized absolute
adjustment is needed by our CSS-based approach. Note that the adjustment does not neces-
sarily imply additional logic on the clock tree. For an H-tree structure, we can just unbalance
wire loads during tree connection/construction to implement the skews between pairs of FFs.
This is practically feasible, especially for those circuits which need significantly low adjust-
ments in clock signals. For those circuits needing higher adjustments, wire sizing/rerouting
and buffer sizing/relocation [56] are always the very first schemes for creating intentional
skews.
Furthermore, CSS itself involves only modifications of clock tree synthesis during the
113
physical design stage. In other words, the difference between original and optimized designs
lies in their clock trees, whereas the combinational network remains identical. Hence, our
CSS-based approach, when applied as a post-processing procedure, can provide additive SER
reduction without destroying existing SER improvements. As shown in Figure 5-6, an extra
30-40% reduction in SER can be achieved with a drastic decline of MBU effects, while the
clock network suffers a minor degree of modification ranging from 1% to 7%.
Figure 5-7 shows the mitigation of MBU effects during clock cycles subsequent to par-
ticle hits (SETs). In addition to the SER reduction for the first clock cycle via CSS, the po-
tential CSS-induced MBU effects during the following cycles can be significantly mitigated
by using IM and MEP pairs of flip-flops as candidates for CSS. On average across all subse-
Figure 5-6: SER reduction vs. normalized absolute adjustment in clock signal
114
quent cycles (from the 2nd to the 7th) in Figure 5-7, the MBU effects of circuits s208 (see
Figure 5-7(a)) and s298 (see Figure 5-7(b)) can be mitigated by 43% and 63%, respectively.
5.4 Concluding Remarks
In this chapter, we propose an analytical method for MBU-aware soft error tolerance of
sequential circuits. The approach adjusts the arrival times of clock signals such that er-
ror-latching windows of flip-flops can be overlapped, which in effect increases the probabil-
ity of timing masking and decreases the soft error rate of a sequential circuit. Moreover, two
types of candidate pairs of flip-flops, beneficial for MBU elimination, are introduced. The
(a) s208 (b) s298
Figure 5-7: Mitigation of MBU effects during clock cycles subsequent to particle hits
115
overall methodology using clock skew scheduling is formulated as a piecewise linear pro-
gramming problem and can be solved efficiently by GLPK. Experiments on a set of
ISCAS’89 benchmarks reveal the effectiveness of our approach.
5.5 Impact of Technology Scaling and Process Variability on SER
As technology scales further, variations become prominent as well. Technology nodes
beyond 90nm experience increasingly high levels of device parameter variations, which
change the design flows from deterministic to probabilistic. The performance of a chip is
heavily dependent on the manufacturing process variations. When considering transient
faults and their impact on circuit reliability, it is important to take into account the fact that
the delay of a particular gate is no longer fixed across dies or within the same die, but instead
should be characterized by a probability distribution. Furthermore, the propagation of a
transient fault is a function of gate delay. In other words, variations in gate delays, resulting
from process variations, can affect the size of the glitch propagated through the circuit and
the circuit error rate.
116
We also conducted experiments for process variability-aware SER analysis. The ex-
perimental results show that using the nominal case (variability-unaware), can underestimate
circuit SER by 5% (10%) when compared to the 50% (90%) yield point. The standard devia-
tion of circuit SER varies from circuit to circuit, due to different circuit topology, different
number of gates and gate types, having different variations in gate delay due to process
variations.
118
Chapter 6 NBTI Mitigation via Joint Logic Restructuring (LR) and Pin Reordering (PR)
In this chapter, we propose an optimization framework employing joint logic restruc-
turing (LR) and pin reordering (PR) against NBTI-induced performance degradation. Pin
reordering is used to change the order of input signals belonging to a single gate, while logic
restructuring is used to exchange two wires feeding different gates. The two wires to be
exchanged must be functionally symmetric to keep the circuit behavior unaltered. Before
presenting the overall NBTI-aware methodology, we illustrate two key observations which
motivate our proposed approaches.
Observation 1 (NBTI effects vs. signal probability):
Figure 6-1 shows the NBTI effect versus the probability of an input signal to a PMOS
119
transistor over 3×106 seconds. The circuit in Figure 6-1(a) is simply equivalent to a
3-input NAND gate. Signals S and T are both inputs to this NAND3 and can be swapped
with each other while maintaining the NAND3 functionality. Throughout this chapter,
S:P denotes the fact that probability of signal S being logic “0” is P. The signal prob-
ability of being “0”, denoted by SP, is defined such that SP(S) = P.
If we swap signals S and T in Figure 6-1(a), SP(b) decreases from 1/8 to 1/16 while SP(c)
increases from 7/8 to 15/16. As shown in Figure 6-1(b), the NBTI effect increases very
rapidly when SP is close to 0 and tends to saturate when SP approaches 1. Therefore, it
is beneficial to make the probability of a signal (e.g., signal b in Figure 6-1(a)) which is
small even smaller, by exchanging a signal (e.g., signal S) whose probability is large
with another signal (e.g., signal T) whose probability is even larger, assuming that S
S:P – the probability of signal S being “0” is P.
ΔNBTI(P)
ΔNBTI(Q)
(a) An equivalent NAND3
(b) NBTI-induced Vth degradation
Figure 6-1: NBTI effect vs. signal probability
120
and T are functionally exchangeable. In this case, the NBTI effect on pin Q is worsened
only marginally (i.e., ΔNBTI(Q)), but we can obtain a significant reduction in the NBTI
effect on pin P (i.e., ΔNBTI(P)); namely, ΔNBTI(P) is significantly larger than
ΔNBTI(Q).
This observation is the major motivation for our logic restructuring approach.
Observation 2 (NBTI effects vs. transistor stacking):
In the pull-up network of a NOR gate where PMOS transistors are connected in series,
the NBTI effect of a PMOS transistor closer to the output signal is smaller than that of a
PMOS transistor closer to the power supply (VDD), due to the stacking effect. There-
fore, it is beneficial to connect an input signal whose probability is small to a pin
(PMOS) closer to VDD for protecting the PMOS transistors below it. Figure 6-2 shows
the NBTI effect versus the time of operation with two opposite pin orders in a 3-input
NOR gate. As it can be seen, the overall degradation is slower if the input signal with
the smallest probability is assigned to the highest pin (i.e., the PMOS closest to VDD).
Nevertheless, the arrival time of the signal with a small probability may be large. Con-
necting such a signal to a higher pin will increase the arrival time of the output signal,
even though the NBTI effects of PMOS transistors below are effectively mitigated. In
121
order to obtain the input ordering for the least NBTI-induced performance degradation,
not only signal probabilities but also arrival times of input signals should be considered.
This observation is the major motivation for our logic restructuring and pin reordering
approaches.
6.1 Proposed Methodology
The objective of our methodology is to minimize the circuit delay under 10-year NBTI
while incurring as little area overhead as possible. The main procedure iteratively performs
logic restructuring and pin reordering, with minimum area penalty, until no further im-
Figure 6-2: NBTI effect vs. transistor stacking
(a) Original input pin ordering (b) Opposite input pin ordering
122
provement can be made. These two approaches are synergistic and can provide potential
benefits for each other. Transistor resizing is an optional post-processing procedure for addi-
tional NBTI reduction, with low area overhead.
6.1.1 Logic Restructuring
The logic restructuring approach is based on functional symmetries. Functional symme-
tries (FSs) provide substantial benefits for various synthesis and verification applications. In
the domain of synthesis, FSs are used for timing/power optimization at the logic/gate level
[57] or for circuit refinement at the post-placement stage [58][59]; in the domain of verifica-
tion, FSs can be exploited to reduce the size of a binary decision diagram (BDD), which is a
crucial step for symbolic model checking. Generally, FSs are classified into two categories:
non-equivalence symmetry (NES) and equivalence symmetry (ES), as defined in the sequel.
Definition 11 (non-equivalence symmetry): Two variables x and y in a Boolean function
F(…, x,…, y,…) are non-equivalence symmetric (NES) if and only if:
yxyx FFxyFyxF =⇒= ),,,,(),,,,( KKKKKK (41)
Definition 12 (equivalence symmetry): Two variables x and y in a Boolean function F(…,
x,…, y,…) are equivalence symmetric (ES) if and only if:
123
yxxy FFxyFyxF =⇒= ),,,,(),,,,( KKKKKK (42)
Definition 13 (functional symmetry): Two variables x and y in a Boolean function F(…, x,…,
y,…) are functionally symmetric if they are either NES or ES.
Traditional methods of detecting functional symmetries are mainly based on automatic
test pattern generation (ATPG) or binary decision diagram (BDD). However, these two tech-
niques often suffer from either high computational cost or space explosion. Instead of using
ATPG- or BDD-based methods which demands extensive computing resources, we use the
concept of generalized implication supergates (GISGs), proposed in [59], to identify func-
tional symmetries in a given circuit. The GISG-based algorithm is very efficient in run time
and memory usage, and thus will not become a bottleneck of our framework.
A generalized implication supergate (GISG) [59] is a group of connected gates that is
logically equivalent to a big AND/OR gate with a large number of inputs. For simplicity, we
will use only supergate (SG) to refer to a generalized implication supergate in the rest of this
dissertation. In practice, maximal supergates, which include maximal numbers of gates and
cannot expand anymore, are extracted for symmetry identification. To extract all maximal
supergates from a gate-level netlist, we first assign non-controlling values to all primary
output gates and treat them as SG roots. For each gate in a reverse topological order (from
124
primary outputs to primary inputs), backward implication is applied to determine the values
of all input gates until no more implication can be made or the current gate is not fanout-free.
Gates at which backward implication stops are treated as new SG roots. We then assign
non-controlling values to those new SG roots and apply backward implication recursively.
The whole process terminates while all primary inputs are reached. Figure 6-3 shows a su-
pergate with 9 inputs and 11 gates. This 9-input supergate behaves as a 9-input big NAND
gate with some inputs inverted.
Functional symmetries can be easily identified after all maximal supergates are ex-
tracted. Two wires S (to gate P) and T (to gate Q) are symmetric if (i) P and Q belong to the
same supergate rooted at gate R, and (ii) S (T) is not on the path from Q (P) to R. More spe-
cifically, S and T are non-equivalence symmetric if P and Q are assigned the same value;
otherwise, S and T are equivalence symmetric. To swap two wires (S and T) that are equiva-
Figure 6-3: A supergate (SG) and its most critical path segment (MCPS)
125
lence symmetric without changing the circuit behavior, two inverters fed by S and T are
required. For example in Figure 6-3, two wires f (to gate P) and a (to gate Q) are
non-equivalence symmetric because P and Q are both assigned “0” while being extracted.
Note that the “fanout-free” property must hold for the purpose of symmetry identifica-
tion; that is, those gates in a supergate except the root can have only one fanout gate. Al-
though this algorithm can only find local symmetries which are covered by a supergate, it
brings several advantages listed below. Moreover, our experimental results reveal that the
potential of SG-based symmetry identification is powerful enough for significant NBTI
mitigation.
1) Efficient identification of swappable wires: Due to the efficiency of supergate extraction
as described earlier, swappable wires can be obtained by traversing extracted
tree-structured supergates. Once a swappee (a wire to be swapped) is located, we per-
form a depth-first search from swappee to find the best swapper (a wire to swap) in the
same supergate. It is common to have a supergate with more than ten gates in a large
circuit. Hence, there exist many possible swappers for logic restructuring, leading to
great potential for NBTI-aware optimization.
126
2) Localized impact on power consumption: Given a supergate G, the changes of switching
activities resulting from the swap of any two symmetric wires in G are bounded within
the supergate. This can be intuitively explained by the fact that: (i) all gates in the su-
pergate except its root are fanout-free, and (ii) irrespective of the swap, the signal prob-
ability of the root is constant. The formal proof was presented in [60]. Unlike other
techniques which manipulate functional symmetries with a global view, the proposed
methodology bounds the scope where switching activities are affected and can thus lo-
calize the impact on power consumption. Furthermore, consider a simple analysis that
the switching activity of a signal is computed as 2 × SP × (1-SP) where SP is the signal
probability. As you will see later, our methodology tries to push SP’s toward 0 or 1.
Therefore, the switching activity and then power consumption can even be reduced.
The time and space complexities of the supergate extraction algorithm are both linear in
the gate count. We can extract maximal supergates for efficient symmetry identification.
However, not all functional symmetries are effective against NBTI-induced performance
degradation. Subsequently, we develop a NBTI-aware optimization flow, guided by the first
key observation, to identify pairs of symmetric wires which have positive impact on NBTI.
Given a network in gate-level netlist, the probability of each signal being “0” is calcu-
127
lated by using logic simulation. Based on the signal probabilities, we derive the stress prob-
ability of each PMOS transistor and do a static timing analysis under NBTI over 10 years
where degraded propagation delays are predicted by Equation (7).
Definition 14 (NBTI-critical path): After the timing analysis under NBTI, a path is called a
NBTI-critical path if and only if its delay is larger than the delay of the longest path without
consideration of NBTI effects.
Definition 15 (NBTI-critical node): After the timing analysis under NBTI, a node is called a
NBTI-critical node if and only if it is on a NBTI-critical path.
Theorem 1: A non-NBTI-critical node will not degrade the circuit performance even if the
node itself is degraded by NBTI.
Proof: Let D be the delay of the longest path in a circuit without consideration of NBTI
effects. According to Definition 14 and Definition 15, a non-NBTI-critical node lies on a path
whose delay under NBTI is smaller than or equal to D. In other words, even if this node is
degraded by NBTI, all paths passing through it still have delays smaller than or equal to D,
and thus will not dominate the circuit performance. Q.E.D.
For each extracted supergate rooted at gate R, we check whether gate R is a
128
NBTI-critical node or not. Only those supergates whose roots are NBTI-critical nodes need
to be considered for logic restructuring; other supergates, whose roots are not NBTI-critical
nodes, will not degrade the circuit performance and can be discarded. Deciding whether a
node is NBTI-critical is trivial as long as its slack time is stored during static timing analysis.
Definition 16 (NBTI-critical supergate): A maximal supergate is called a NBTI-critical
supergate if and only if it is rooted at a NBTI-critical node.
Up to this point, we have a list of NBTI-critical supergates. The signal probability and
timing information including arrival, required, and slack times of each gate/wire inside can
be retrieved. For each NBTI-critical supergate, we trace the most critical path segment
(MCPS) backwards from its root according to slack times. Rather than the longest local path
in the supergate, the MCPS is the intersection of the supergate and the longest global path
passing through its root. Slack times are used to trace the MCPS in constant time.
Definition 17 (NBTI-aware swappee): Given a NBTI-critical supergate G, a wire S (to gate
P, belonging to G) is a NBTI-aware swappee if (i) S is a side input to the MCPS of G, or (ii)
P is in the fanin cone of a side input to the MCPS of G.
Definition 18 (NBTI-aware swapper): Given a NBTI-critical supergate G and a NBTI-aware
129
swappee S, a wire T (to gate Q, belonging to G) is a NBTI-aware swapper if (i) S and T are
functionally symmetric, (ii) the swap of S and T may not cause any timing violation, and (iii)
the swap of S and T is beneficial in terms of NBTI effects as discussed in Observation 1.
We process the MCPS downstream to locate NBTI-aware swappees. For each
NBTI-aware swappee, a depth-first search is performed to find the best NBTI-aware swapper
in the current NBTI-critical supergate. The best swapper here is the wire that, once swapped,
can realize the most positive impact on NBTI. For the purpose of minimum area overhead,
we skip the ES case, in which two extra inverters are required for swapping. Finally, the
identified swappee and swapper are swapped to obtain an improvement in NBTI-induced
performance degradation. Every time a swap is done, we update the affected arrival times
and signal probabilities incrementally, also in constant time.
Consider the supergate in Figure 6-3 where the highlighted region is the MCPS. The
first NBTI-aware swappee is wire f and its best NBTI-aware swapper is wire a. The swap of
these two wires (see Figure 6-4(a)) makes SP(m) and SP(p) become smaller, which is benefi-
cial for NBTI mitigation, as illustrated in Figure 6-1(b). Moreover, SP(m) and SP(p) can
become even smaller by swapping wires o and k (see Figure 6-4(b)).
130
NBTI-aware swappees and swappers identified with respect to the original MCPS may
lead to a local optimum. In order to jump out of the local optimum and explore more solution
space for our NBTI-aware logic restructuring approach, we redistribute paths in a supergate
based on functional symmetries, too. The path redistribution cannot destroy the circuit timing.
(a) Swap f and a
(b) Swap o and k
(c) Swap h and g
Figure 6-4: An example of logic restructuring
131
For example in Figure 6-3, we may exchange wire m with wire h to generate a new MCPS,
which in effect allows for more possibilities to get a better solution.
6.1.2 Pin Reordering
The pin reordering approach is guided by the second key observation on the NBTI effect
versus transistor stacking. This observation indicates that, due to the transistor stacking effect,
the farther from the power supply a PMOS in a transistor stack is, the less NBTI impact this
PMOS suffers. For the smallest overall degradation, it is reasonable to assign inputs to the
series PMOS transistors of a NOR gate in increasing order of signal probabilities, from the
top to the bottom. However, our concern is the resulting circuit “timing” itself instead of the
timing “degradation.” To minimize the circuit delay under NBTI, not only signal probabili-
ties but also arrival times of input signals should be considered for pin reordering.
In our proposed framework, the NBTI-aware pin reordering approach is basically an
exhaustive search for the best input ordering. For each gate in a topological order, we enu-
merate all possible permutations of its input signals and find out the one resulting in the
smallest arrival time of its output signal, with NBTI effects taken into account. This strategy
is absolutely tractable because every gate type in our cell library has up to four input pins.
132
Note that pin reordering is always synergistic with logic restructuring. Pin reordering
changes the input order of a gate and thereby, may also change the most critical path segment
(MCPS) used for NBTI-aware swappee/swapper identification in logic restructuring. On the
other hand, logic restructuring exchanges wires between gates so that it may bring about a
better solution when doing the next run of pin reordering. For example, the next NBTI-aware
swappee-swapper pair in Figure 6-4(b) is comprised of wires h and g. By swapping these two
wires (see Figure 6-4(c)), SP(r) decreases from 1/4 to 1/8, which is beneficial for the subse-
quent pin reordering procedure since a signal with smaller SP can better protect the PMOS
(connected to wire p) on the MCPS. The synergistic influences indeed help reduce circuit
delay under NBTI. This is the main reason why logic restructuring and pin reordering work-
ing together can succeed in combating NBTI-induced performance degradation.
The proposed approaches involve only wire reconnection without touching gates or
transistors and thus, introduce no gate area overhead at all. Our overall NBTI-aware optimi-
zation flow, which includes joint logic restructuring and pin reordering, and optional transis-
tor resizing, is given in Figure 6-5.
133
6.2 Interplay between NBTI and Hot Carrier Injection (HCI)
Hot carrier injection (HCI) is recognized as another key aging mechanism which in-
creases the threshold voltages of both PMOS and NMOS transistors with time, and in turn
causes performance degradation as NBTI does. Whereas the NBTI effect is a function of
stress probability, the impact of HCI depends more on switching activity (density). The pro-
posed methodology focuses on manipulating stress probabilities such that NBTI-induced
Input probabilities and # of patterns
Circuit netlist
Supergate extraction Logic simulation
Pick a new NBTI-critical supergate.
Apply pin reordering on each NBTI-critical node.
List of supergates Signal probabilities
Performance improved?
EndPath
redistribution needed?
Trace the most critical path segment.Identify NBTI-aware swappees/swappers.
Swap feasible swappee and swapper.
Any new supergate exists?
No
No
NBTI prediction parameters from SPICE simulation
and MATLAB fitting
Yes, redistribute it.
Yes
No
Yes, renew all supergates. Identify
critical PMOS
transistors to be
resized.
Prediction parameters
Begin
Joint logic restructuring and pin reordering
Optionaltransistorresizing
Figure 6-5: The overall algorithm for NBTI mitigation
134
performance degradation can be mitigated. As shown in Figure 6-4, our methodology tries to
push signal probabilities toward 0 or 1. Consider a simple analysis that the switching activity
of a signal is computed as 2 × SP × (1-SP) where SP is the signal probability. Since SP’s are
pushed toward 0 or 1, the switching activity decreases based on the analysis of 2 × SP ×
(1-SP). Therefore, the overall HCI effect and even power consumption, both of which are
highly correlated with switching activities, can potentially be reduced by our NBTI mitiga-
tion framework.
6.3 Experimental Results
We have implemented the proposed framework for NBTI mitigation and conducted ex-
periments on a set of benchmarks from the ISCAS and MCNC suites. The technology used is
65nm, Predictive Technology Model (PTM) [35]. The supply voltage is 1.2V and the operat-
ing temperature is assumed to be 300K. The standard cell library consists of inverter, NAND
and NOR gates with 2 to 4 inputs. Our framework aims at enhancing circuit temporal reli-
ability under 10-year NBTI, with marginal design penalty. For each benchmark, logic simu-
lation with 10,000 random patterns, assuming that the probabilities of all primary inputs are
135
0.5, is applied to calculate the probability of each signal. In the case of real applications with
various workloads, we can apply different sets of input probabilities and use average signal
probabilities instead. Given signal probability α of the input to a PMOS, the 10-year Vth
degradation of the PMOS can be predicted by Equation (6). For each gate type and input pin
(PMOS), HSPICE simulations with its nominal and degraded threshold voltages are per-
formed for a discrete set of signal probabilities from 0 to 1. We fit these HSPICE results to
obtain coefficients a’s for Equation (7). Therefore, the gate delay and circuit timing under
NBTI can be estimated.
Table 6-1 and Figure 6-6 report the experimental results of our NBTI-aware methodol-
ogy. All baseline circuits, listed in column one, are pre-optimized and mapped in terms of
delay, and their nominal delays (without consideration of NBTI effects) are shown in column
two. Columns three and four show the circuit delays under NBTI and percentages of degra-
dation (blue bars in Figure 6-6) compared to the nominal cases. Columns five and six demon-
strate the improved delays and corresponding percentages (purple bars in Figure 6-6) when
only pin reordering is used, while columns seven and eight demonstrate those (yellow bars in
Figure 6-6) when logic restructuring and pin reordering are applied jointly.
136
For example, the nominal delay of circuit alu2 is 1,128ns and the delay considering
NBTI effects is 1,237ns, which means 9.66% performance degradation. The pin reordering
approach can reduce the circuit delay to 1,212ns (7.45% degradation). If we apply joint logic
restructuring and pin reordering, the circuit delay becomes 1,171ns and the performance
degradation is recovered to 3.81%. On average across all listed benchmarks, 56% of
NBTI-induced performance degradation can be recovered by our methodology.
(ps) (ps) (ps) (ps)
Table 6-1: Recovery of NBTI-induced performance degradation
137
Figure 6-7 shows the number of critical transistors versus stress probability for a com-
binational circuit C5315 and a sequential circuit s9234. Each point in the plot denotes the
number of critical PMOS transistors (y-axis) whose stress probabilities are in the interval
(x-axis) with step of 0.1. As it can been seen, the number of critical PMOS transistors in the
optimized circuit is significantly reduced, by an average of 36%. If one considers utilizing
transistor resizing for further NBTI mitigation, the required area overhead will be smaller
than that incurred by applying the resizing technique alone.
Figure 6-6: Recovery of NBTI-induced performance degradation
138
6.4 Concluding Remarks
In this chapter, we present a NBTI mitigation framework using joint logic restructuring
and pin reordering. Two principal observations motivating the proposed methodology are
introduced. The logic restructuring approach relies on detecting functional symmetries which
can mitigate NBTI-induced performance degradation; the pin reordering approach depends
on finding the best input ordering so that critical PMOS transistors can be protected due to
stacking effects. Experiments reveal that our framework successfully recovers benchmark
circuits from performance degradation with minimum cost. In addition, the recovered circuits
(a) C5315 (combinational circuit) (b) s9234 (sequential circuit)
Figure 6-7: Number of critical PMOS transistors vs. stress probability
139
have fewer critical transistors, leading to low overhead for post-processing transistor resiz-
ing.
140
Chapter 7 NBTI Mitigation Considering Path Sensitization
When dealing with the problem of aging-induced performance degradation, it is impor-
tant to consider path sensitization because (i) only a small portion of long paths can deter-
mine the delay of a circuit no matter whether aging applies, and (ii) a path that is not criti-
cal/sensitizable before aging may become critical/sensitizable after aging and affect circuit
performance (or vice versa). A path is sensitizable if it can be activated by at least one com-
bination of primary input transitions.
In this chapter, by employing timed automatic test pattern generation (timed ATPG)
[61], we examine the impact of path sensitization on aging-aware timing analysis and also
explore the benefits of considering path sensitization for aging-aware timing optimization.
Timed ATPG, based on the satisfiability (SAT) problem, is used to generate input patterns
141
activating critical paths. In this way, we can efficiently trace the longest sensitizable path,
which determines the performance of a circuit, and identify those gates along the critical
sensitizable paths as critical gates. A subset of critical gates are finally selected as candidates
for aging-aware timing optimization, and more importantly, with path sensitization explicitly
addressed.
7.1 Impact of Path Sensitization on Aging-Aware Timing Analysis
7.1.1 Sensitizable Paths vs. False Paths
A path is defined as a sensitizable path if there is at least one primary input vector acti-
vating the path. From the timing perspective, a sensitizable path can propagate a transition
(rising or falling) to at least one primary output, which may determine the delay of a circuit.
Figure 7-1 shows two conditions of path sensitization for a 3-input AND gate. As indicated
by red dotted lines, a path to be sensitized must hold either the earlier controlling transition
(i.e., falling transition for an AND gate, see Figure 7-1(a)) or the latest non-controlling (rising)
transition if all input transitions are non-controlling (see Figure 7-1(b)). In contrast, a path
142
that is not sensitizable is called a false path whose delay, however, cannot affect the circuit
performance.
For example, in Figure 7-2, the highlighted gates depict the longest topological path (f –
i – j – k – l – m – n – o). Since there does not exist a combination of input transitions activat-
ing the path, the longest topological path is a false path and will not determine the delay of
the circuit. Note that, in this case, no other path except the highlighted false path passes
through the gates feeding wires i and j; in other words, they do not lie on any sensitizable
path. Therefore, any amount of aging-induced delay increase at these two gates will never
reflect performance degradation on the circuit. Speeding up these gates is of no benefit in
terms of circuit performance. In order for more accurate and efficient optimization, the basic
principle of our methodology is to extract and manipulate the sub-circuit covering only sensi-
tizable paths which are critical or near-critical. The effective circuit delay (i.e., the delay of
Figure 7-1: Criteria of path sensitization
(a) Earliest controlling transition on the middle pin
(b) Latest non-controlling transi-tion on the middle pin given that all transitions are non-controlling
143
the longest sensitizable path) can be minimized by focusing on optimizing the sub-circuit and
disregarding anything else beyond it.
As reported in [62], less than 10% of long (critical and near-critical) paths should be
selected for performance optimization if false paths are excluded. Shortening the small por-
tion of long paths, e.g., by speeding up some gates covered, can simply reduce the effective
circuit delay, and those long paths that are false can be left un-optimized without affecting
the overall circuit performance.
7.1.2 Aging-Aware Timing Analysis Considering Path Sensitization
We exploit the NBTI prediction model, as introduced in Chapter 2.2, on top of timed
ATPG to analyze the effective delay of a circuit while accounting for both aging awareness
and path sensitization. Timed ATPG itself was presented as a false-path-aware timing ana-
Figure 7-2: A longest topological path that is false (un-sensitizable)
G
144
lyzer. Given a timing specification (Tspec) for a target circuit, the timed ATPG algorithm will
construct a corresponding timed characteristic function (TCF) in conjunctive normal form
(CNF). The TCF characterizes the timing behavior of the circuit as a Boolean equation and
its on-set specifies input vectors that, when evaluated, can propagate transitions stabilizing
later than or equal to Tspec at any of the outputs (i.e., with propagation delays greater than or
equal to Tspec). Because of the CNF (product-of-sum) representation, existing SAT solvers are
used to derive one set of input patterns if the TCF is satisfiable; otherwise solvers return
nothing, meaning that no such input vector exists to activate a path with delay greater than or
equal to Tspec. By actually applying the derived input vector to the circuit, the corresponding
sensitizable path(s) can be traced. Then, we can identify critical and near-critical sensitizable
paths if a timing specification smaller than (but close to) the delay of the longest topological
path is chosen.
One major concern for the unified treatment of aging awareness and path sensitization is
that, due to the asymmetric rate of aging, a path which is not critical/sensitizable at the be-
ginning of lifetime may become critical/sensitizable and affect circuit performance during the
lifetime span (or vice versa). Thanks to the support of timed ATPG, we just need to plug the
aging model such that timed ATPG can calculate the change in each pin-to-pin delay based
145
on manufacturing and operating parameters. To obtain the effective delay of a circuit, we use
the same stepping method as that in [61] which adjusts Tspec dynamically. The maximum Tspec
achieved for constructing a satisfiable TCF is the effective circuit delay. Table 7-1 demon-
strates the results of aging-aware timing analysis for standard benchmarks whose effective
delays are not determined by longest topological paths. We list in the table the values of fresh
circuit delay (at time 0) and aged delay under a generic stress condition of 10 years. For
circuit alu2, the difference in fresh delay between the longest topological path (column 2)
and the longest sensitizable path (column 5) is 36ps, while that in 10-year aged delay (col-
umns 3 and 6) is 53ps. As it can be seen, the difference increases (except circuit C7552) as a
result of aging. Moreover, the percentage of aging-induced performance degradation de-
Table 7-1: Aging-aware timing analysis with and without path sensitization considered
146
creases if path sensitization is taken into account. For more accurate timing analysis and to
avoid underestimation of circuit lifetime, it is necessary to consider path sensitization when
aging effects are getting severe.
7.2 Proposed Methodology for Aging-Aware Timing Optimization
The objective of our methodology is to minimize the circuit delay under 10-year NBTI
by incurring as little area overhead as possible, while taking into account and taking advan-
tage of the impact of path sensitization. The pre-processing task iteratively performs logic
restructuring and pin reordering [74], with minimum area penalty, until no more improve-
ment can be made. As the main procedure, transistor resizing is integrated with [74] for
further mitigation of NBTI-induced performance degradation, with low area overhead. From
the discussion in Chapter 7.1.1, it is evident that considering path sensitization can reduce the
overall design penalty for timing optimization. Efficient identification of candidates to be
manipulated (including gates, transistors, and wires) becomes a more challenging issue. In
the sequel, we present an efficient approach for identifying the critical sub-circuit, which
consists of potential candidates, for explicit consideration of path sensitization during ag-
147
ing-aware timing optimization.
7.2.1 Efficient Identification of Critical Sub-Circuits
Considering Path Sensitization
We use benchmark circuit C17 (see Figure 7-3) from the ISCAS’85 suite to explain the
key idea of our proposed methodology based on timed ATPG. Note that the timed ATPG
algorithm presented in [61] adopts the floating-mode operation where a transition of a node is
defined as a switch of its state from an unknown value to a known value. Without loss of
generality, we assume that wires do not contribute to the circuit delay and the delay of each
node is its intrinsic delay plus the fanout delay (unit fanout delay model). The intrinsic delay
of an internal gate is 1, while that of a primary input is 0. The fanout delay is calculated as
0.2X the number of fanout neighbors. The assumption can be relaxed for a generic delay
model, with consideration of wire loads.
Under unit fanout delay model, there are two longest topological paths in C17 (i.e., c –
G2 – G3 – G5 – j and c – G2 – G3 – G6 – k, as highlighted) with delays of 4.2 (=
0.4+1.4+1.4+1.0). By choosing Tspec = 4.2, the on-set of the corresponding TCF specifies
input vectors activating these two paths since they are both sensitizable. However, a typical
148
SAT solver derives “one” input vector satisfying the TCF at a time and may not enumerate all
satisfying vectors to activate all possible sensitizable paths. To extract the sub-circuit cover-
ing all sensitizable paths with delays greater than or equal to Tspec, we modify the TCF by
adding new clauses into its CNF such that a SAT solver, if used repeatedly, can generate
different input vectors and identify possible sensitizable paths in an efficient manner.
Let F be the TCF for C17 given Tspec = 4.2 and CNF(F) be the CNF representation of F.
Clearly, CNF(F) is satisfiable due to the existence of two sensitizable paths whose delays are
4.2. By running a SAT solver on CNF(F), we obtain a set of satisfying input patterns {a, b, c,
d, e} = {0, 1, 0, 1, 1}, which evaluates F to a “1”. The set of input patterns, when actually
applied to C17, can activate the critical path along c – G2 – G3 – G5 – j. Without modifying
CNF(F) or the implementation of the SAT solver, it is not possible to obtain a different set of
input patterns activating the other critical path along c – G2 – G3 – G6 – k. As a naïve solution,
we can append a clause (a + ¬b + c + ¬d + ¬e) to CNF(F) so the new CNF, denoted by
Figure 7-3: An example circuit (C17) for illustrating our methodology
149
CNF’(F), is CNF(F) × (a + ¬b + c + ¬d + ¬e). Intuitively, the same vector {a, b, c, d, e} = {0,
1, 0, 1, 1} evaluates the new clause to a “0”, making CNF’(F) unsatisfied. Therefore, the SAT
solver will find a different input vector which may or may not activate the other critical path.
One may note that the complexity of this naïve strategy grows exponentially with the
number of primary inputs. The exponential complexity implies clause explosion of the CNF
and an intractable approach with huge runtime for running SAT solvers. To reduce the com-
plexity to a feasible extent, we introduce the following theorem to modify CNF(F). The goal
is to find a minimum set of new clauses that, when added one by one, will make CNF(F)
un-satisfiable, which means that we can gradually identify critical and near-critical sensitiz-
able paths given a Tspec and eventually extract the critical sub-circuit.
Definition 19 (side input): For each gate on an activated path, a side input is an input pin of
the gate through which the activated path does not pass.
Definition 20 (side-input assignment): For each side input, its value assignment, called
side-input assignment, is the value evaluated by propagating a particular input vector.
Theorem 2: For each activated path with side-input assignments {xp, …, xq, …, xr} = {vp, …,
vq, …, vr}, a new clause
150
( ))()()()( rrqqpprqpi
ii vxvxvxvx ⊕++⊕++⊕=⊕∑∈
KKKK
(43)
can be added into CNF(F) such that different input vectors will be derived for activating
critical or near-critical paths which have not been identified yet.
Proof: Every sensitizable path can be activated only if its corresponding requirement of
side-input assignments is satisfied. If the requirement for a sensitizable path, which has been
identified, is no longer satisfiable with the current CNF(F), the path will certainly not be
activated again by any other satisfying input vector. Hence, by adding new clauses based on
Equation (43), which in effect dissatisfy the requirements of side-input assignments for al-
ready-identified paths, SAT solvers will generate different satisfying vectors (if there exist) to
activate not-yet-identified sensitizable paths. Q.E.D.
Consider path c – G2 – G3 – G5 – j activated by input vector {a, b, c, d, e} = {0, 1, 0, 1,
1}. By propagating the input vector, the side-input assignments for this activated path are {b,
d, f} = {1, 1, 1}. According to the theorem, the new clause to be added is ((b ⊕ 1) + (d ⊕ 1) +
(f ⊕ 1)) = (¬b + ¬d + ¬f). After adding (¬b + ¬d + ¬f) into CNF(F) (CNF’(F) = CNF(F) ×
(¬b + ¬d + ¬f)), input vectors which evaluate b to a “1”, d to a “1”, and f to a “1” cannot
satisfy CNF’(F) and thus will not be generated. The next set of input patterns derived by the
solver will be {a, b, c, d, e} = {1, 1, 1, 1, 0}, which activates the other critical path along c –
151
G2 – G3 – G6 – k whose side-input assignments are {b, d, i} = {1, 1, 1}. Finally, by adding
the corresponding clause, (¬b + ¬d + ¬i), the resulting CNF of F becomes un-satisfiable,
meaning that all critical sensitizable paths have been identified. Note that a single input
vector may activate several paths and for each activated path, a new clause should be added.
For example, input vector {a, b, c, d, e} = {0, 1, 0, 1, 0} can activate the two critical sensi-
tizable paths in C17 simultaneously.
Compared to the naïve approach of exponential complexity, the proposed methodology
significantly decreases the number of added clauses and the number of SAT runs. In the case
of C17 under unit fanout delay model, only two additional clauses are added and three runs
of the SAT solver are needed. Hence, we can efficiently extract the sub-circuit consisting of
critical and near-critical sensitizable paths. The extracted sub-circuit, called critical
sub-circuit, is the main focus of our integrated framework using logic restructuring, pin
reordering, and transistor resizing for aging-aware timing optimization. Anything beyond the
critical sub-circuit is either non-critical or un-sensitizable. On these
non-critical/un-sensitizable portions, timing optimization may not be effective and conse-
quently, they can be excluded for lowering the design penalty.
Let us use the circuit in Figure 7-2 to summarize our methodology for aging-aware tim-
152
ing optimization. Assuming unit fanout delay model, the delay of the longest topological path
(f – i – j – k – l – m – n – o) in the circuit is 8.4 (= 0.2+1.2*6+1.0). As mentioned, it is a false
path and will not be identified as part of the critical sub-circuit given Tspec = 8.4. The delay of
the circuit is determined by two longest sensitizable paths from d and e, via k – l – m – n, to o
with delays of 7.4 (= 0.2+1.4+1.2*4+1.0). By choosing Tspec = 7.4, the critical sub-circuit
consisting of these two paths can be extracted to be manipulated by logic restructuring, pin
reordering, and transistor resizing. For logic restructuring [74], we will swap c and p, instead
of c and j as path sensitization is not considered. Here, wires c, j, and p are functionally
symmetric and any two of them can be swapped with each other while maintaining the circuit
functionality. For pin reordering, we may change the input order of gate G (wires h, m, and s)
to minimize the circuit delay under aging. For transistor resizing, we apply a similar algo-
rithm to that in [63] on the critical sub-circuit and will not touch the transistors connected to
wires f and i that are on the longest topological (but false) path.
7.2.2 Achieving Full Coverage of Critical Sensitizable Paths
Up to this point, the efficient methodology for critical sub-circuit extraction does not
guarantee to identify all critical and near-critical sensitizable paths. In fact, identifying all
153
sensitizable paths given a Tspec is not necessary for our concern of extracting the critical
sub-circuit as long as the extracted sub-circuit has covered all of them already. This is usually
the case because a large fraction of those paths overlap and share many segments. In a few
cases, missing sensitizable paths may lead to incomplete extraction of critical sub-circuits.
Figure 7-4 shows a case where a sensitizable path may be missed. In this example, input
vector V1 activates paths P1 and P2 while V2 activates P2 and P3. Note that P2 can be activated
by both V1 and V2. Suppose V1 is generated by the SAT solver based on timed ATPG, P1 and
P2 will be activated and their corresponding clauses C1 and C2 will be added into the CNF.
However, after C2 has been added, V2 will no longer satisfy the new CNF and thus, P3 will
not be identified – a miss.
To deal with this issue, we apply the same Tspec on timed ATPG repeatedly until we ex-
tract all possible critical sub-circuits and optimize them. That is to say, if there are indeed
some missed sensitizable paths for a given Tspec, we use timed ATPG with the same Tspec for
V1
P1 P2
V2
P3
C1 C2 C3
Input vectors
Sensitizable paths
Corresponding clauses to be added
activate
Figure 7-4: A case of missing sensitizable paths
154
another run of critical sub-circuit extraction. Due to the fact that the number of unidentified
sensitizable paths decreases drastically after each run of extraction and optimization, this
strategy for achieving full coverage of sensitizable paths works well and will not impose
significant runtime overhead.
7.2.3 Proposed Algorithm Description
Our overall algorithm for aging-aware timing optimization, including all ideas presented
in Chapter 7.2.1 and Chapter 7.2.2, is given in Figure 7-5. As a pre-processing procedure,
joint logic restructuring (LR) and pin reordering (PR), which introduce no gate area overhead,
are performed to shorten the circuit delay under aging considering only topology information
but no path sensitization, for reduced computational complexity. Then, we iterate the pro-
posed methodology based on timed ATPG with decreasing Tspec until a specified performance
target is met or no further improvement can be made. In each iteration, transistor resizing, as
well as joint LR and PR, are applied on the extracted critical sub-circuit to optimize the
effective circuit delay, while explicitly considering path sensitization. Lines 16-17 are used
for guaranteeing full coverage of sensitizable paths by not decreasing Tspec if there are still
sensitizable paths identified during the current run of timed ATPG. The complexity of our
155
algorithm is bounded by that of satisfiability-based ATPG, which is a known NP-complete
problem but can be addressed efficiently by existing solvers using a wide combination of
techniques. In the worst case, the algorithm is of exponential complexity. In reality, it is
absolutely more scalable than other approaches based on path enumeration, whose aver-
age-case complexity is exponential.
Figure 7-5: The overall algorithm for aging-aware timing optimization
Input: circuit netlist, delay model, and performance target Output: optimized circuit netlist Algorithm: aging-aware timing optimization 01 Apply joint LR and PR without considering path sensitization 02 D delay of the longest topological path, without aging applied 03 D’ delay of the longest topological path, with aging applied 04 Δ (D’ – D) / n // n: number of iterations, usually specified to be 1005 Tspec D’ – Δ 06 DO { 07 C ∅ // critical sub-circuit, a set of “gates” instead of “paths” 08 F construct TCF given Tspec 09 WHILE (CNF(F) is satisfiable) { 10 V derive a satisfying input vector 11 P trace sensitizable path(s) by propagating V 12 C C ∪ (gates along P) // not on a path-wise basis 13 Add corresponding clause(s) into CNF(F) 14 } 15 Apply transistor resizing and LR/PR on C only 16 IF (no clause is added) // for guaranteeing full coverage 17 Tspec Tspec – Δ 18 } WHILE (performance target is met)
// Also terminates if no improvement for consecutive 2 iterations.
156
7.2.4 Impact of Process Variability
By extracting the corresponding critical sub-circuit before performing each run of opti-
mization, the proposed algorithm can exclude gates that do not need to be manipulated for
lower optimization effort and design penalty. A gate must be excluded from the critical
sub-circuit if it is on un-sensitizable or non-critical paths only, where “non-critical” paths are
in contrast to “critical” and “near-critical” paths. In the presence of process variations, the
fresh threshold voltage of each transistor (before aging, i.e., at time 0) is no longer a fixed
value but a random variable, which makes the problem of aging-aware timing optimization
non-deterministic across silicon instances of a design. More precisely, the circuit delay may
be different from one silicon instance to another because of different fresh threshold voltages,
different behaviors of transistor aging, and different patterns of path sensitization. However,
as indicated in [13][64], the impact of process variability can be compensated by the NBTI
effect. Due to the compensation effect of device aging on process variability, a non-critical
path will hardly dominate the circuit delay in the long term unless process variations incur a
significant delay increase on the path. This is particularly uncommon when the focus is, as
proposed, on the minimization of long-term (10-year) circuit performance. In addition, since
our algorithm involves an iterative process of exploiting timed ATPG with decreasing Tspec to
157
gradually reduce the effective circuit delay, a gate which is not covered by the critical
sub-circuit in the previous run will be covered in the current run if it is now on a critical
sensitizable path (but not previously). Hence, all potential candidate gates are guaranteed to
be identified, sooner or later, for the purpose of aging-aware timing optimization considering
path sensitization. It is also possible that a path is sensitizable in a silicon instance of a design,
but not sensitizable in another instance as a result of process variability. Similarly, every
potentially-sensitizable path, if long enough, will be part of a critical sub-circuit in the itera-
tive process.
7.3 Experimental Results
The experimental settings for aging-aware timing optimization considering path sensiti-
zation are the same as those in Chapter 6.3, except that transistor resizing is integrated with
logic restructuring and pin reordering to combat performance degradation.
158
Table 7-2 and Figure 7-6 report the experimental results of our proposed methodology
for aging-aware timing optimization. All baseline circuits, listed in column one, are
pre-optimized and mapped in terms of delay, and their nominal delays (without consideration
of aging effects) are shown in column two. Columns three and four show the circuit delays
under aging and percentages of degradation (blue bars in Figure 7-6) compared to the nominal
cases. Columns five and six demonstrate the improved delays and corresponding percentages
(purple bars in Figure 7-6) after being optimized by the pre-processing procedure using joint
logic restructuring (LR) and pin reordering (PR). Columns seven and eight demonstrate those
(yellow bars in Figure 7-6) after being optimized by the integrated framework using transistor
Table 7-2: Aging-aware timing optimization with path sensitization considered
159
resizing as well as joint LR and PR. Columns nine and ten show the area overheads (green
bars in Figure 7-6) and runtimes (numbers below the names of benchmarks). The runtimes, in-
cluding the times spent on logic simulation and the whole algorithm in Figure 7-5, are meas-
ured on a 3GHz Pentium 4 workstation running Linux. Every delay number in Table 7-2 is
found with path sensitization considered, i.e., the delay of the longest sensitizable path in a
circuit (denoted by D), which is the maximum Tspec achieved for constructing a satisfiable
TCF in timed ATPG. Any Tspec greater than D fails to derive a satisfiable TCF after our algo-
rithm finishes, meaning that no path with delay greater than D can be sensitized and D de-
termines the circuit performance accordingly.
For example, the nominal delay of circuit alu2 is 1,092ps and the delay considering
-0.50%
1.50%
3.50%
5.50%
7.50%
9.50%
11.50%
alu2(19s)
alu4(28s)
C3540(2m5s)
C5315(2m39s)
C7552(15m3s)
s1196(40s)
s1238(37s)
s9234(58s)
AVG.
10-year aging LR+PR TR & LR+PR Area overhead
Figure 7-6: Aging-aware timing optimization with path sensitization considered
160
10-year NBTI effects is 1,184ps, which means 8.42% performance degradation. The
pre-processing LR and PR can reduce the circuit delay to 1,127ps (3.20% degradation). After
being optimized by the proposed methodology as shown in Figure 7-5, the circuit delay
becomes 1,086ps and we can even achieve a performance improvement of 0.52% while
incurring 2.35% area overhead. On average across all listed benchmarks, aging-induced
performance degradation can be recovered to 1.21%, which is about only one-seventh of the
un-optimized case, with less than 2% area overhead. When compared to existing sizing
techniques accounting for aging, our methodology is not only more cost-efficient than
[36][37][38], which do not address path sensitization, but also more runtime-efficient than
[39], which addresses path sensitization on a path-wise basis. The runtimes for the proposed
framework range from <10 seconds to 15 minutes, as opposed to [39] whose largest ISCAS
benchmark that can be handled is C880.
Figure 7-7 depicts the incremental recovery of aging-induced performance degradation
by our iterative optimization algorithm. For circuit C1908 (C5315), it takes six (seven) itera-
tions of joint LR and PR to reduce performance degradation to 5.11% (3.18%) and takes
another five (four) iterations (Lines 6-18 in Figure 7-5) to reach 1.41% (0.17%). We employ
the same perturbation techniques as those in [63][74] to prevent the algorithm from being
161
trapped in local optima. The effect of perturbation has been included in the results even
though Figure 7-7 exhibits monotonic decreases in the overall degradation because we only
keep track of the best solution during each iteration.
7.4 Concluding Remarks
In this chapter, we present an efficient methodology for aging-induced timing analysis
and optimization considering path sensitization. The analysis results reveal the importance
and benefit of considering path sensitization for aging-aware timing optimization. Based on
Figure 7-7: Incremental recovery of aging-induced performance degradation
TR & LR+PR
LR+PR
LR+PR
TR & LR+PR
162
timed ATPG, we can identify the critical sub-circuit of a target circuit, which is truly neces-
sary to be manipulated, and then apply transistor resizing as well as joint LR and PR to miti-
gate aging-induced performance degradation. Experiments demonstrate that our framework
successfully recovers benchmark circuits from performance degradation with marginal cost.
Lastly, the proposed methodology is scalable for large-scale designs due to the runtime effi-
ciency.
163
Chapter 8 NBTI Mitigation for Power-Gated Circuits
In order to minimize static power dissipation which accounts for a large portion of total
power consumption in the 90nm technology or below, high-Vth sleep transistors [65] are
employed as switches to disconnect a circuit from VDD (see Figure 8-1(a)) or GND when
the circuit is inactive, i.e., in standby mode (sleep = “1”). A PMOS/NMOS sleep transistor is
referred to as a header/footer inserted between VDD/GND and the circuit. Despite smaller
size required for the same driving strength, a footer has to be placed in an isolated p-well,
which involves a twin-well manufacturing process and, for cell-based design, re-modeling
the cell library. Generally, the header-based style of using PMOS is fairly popular due to its
ease of manufacturing and library design. This technique, called power gating (PG), is a
coarse-grained application of multi-threshold CMOS (MTCMOS) and widely used for re-
ducing sub-threshold leakage current [66], so that static power can be minimized. However,
164
in a header-based PG design, the PMOS sleep transistors suffer continuous NBTI stress
during active mode (sleep = “0”) and age very rapidly. The relentless aging impact on the
headers will aggravate the performance degradation of the logic circuit in a PG structure. As
a result, not only the NBTI effects on logic networks but also those on sleep transistors need
to be addressed when header-based PG is exploited. In this chapter, for power-gated circuits,
we present an integrated NBTI degradation model for accurate analysis of the long-term
performance behavior. Afterwards, an optimization methodology is proposed to mitigate the
overall performance degradation for a longer period of reliable operation.
The first work addressing the aging of sleep transistors was outlined in [67]. The authors
proposed to realize NBTI-aware power gating through (i) sleep transistor over-sizing, (ii)
forward body-biasing, and (iii) stress time reduction. As opposed to [42][43][44], the aging
Virtual VDD
sleep
VDD
GND
Low-Vth Logic Network (LN)
High-Vth Sleep Transistors (STs)
ION
RSTs ΣISTi
Vdd
VVdd
Figure 8-1: A header-based power gating structure
(a) Power gating using PMOS sleep transistors (b) Equivalent RC model
165
of logic networks is not considered in [67]. In the sequel, we will show the interdependence
between the degradation effects on logic networks and sleep transistors. We have also ex-
perimentally verified that, without joint modeling of these interdependent effects, the overall
performance degradation of power-gated circuits cannot be precisely estimated.
Based on the characterization of NBTI effects on both logic networks (LNs) and sleep
transistors (STs), we present an analysis and optimization methodology for header-based
power-gated circuits in terms of performance-centric lifetime reliability. The contributions
and advantages of this work are threefold:
Joint modeling of interdependent degradation effects on logic networks and sleep
transistors: Due to the increasing Vth of sleep transistors during active mode, the voltage
level (denoted by VVdd as depicted in Figure 8-1) at which the logic network operates
gradually decreases, therefore imposing additive performance loss. On the other hand,
the decrease in VVdd can be offset, to a certain extent, by the smaller current required for
normal operation of the logic network due to its own degradation. These two effects are
interdependent and should not be treated separately. In this chapter, for the first time, a
joint model considering the interdependency is developed for accurate analysis of aging
behavior for power-gated circuits.
166
Exploration of ST redundancy and NBTI recovery: We introduce redundant STs and
implement a scheduling architecture such that the original STs can be shut off periodi-
cally during active mode. The proposed methodology explores the recovery mechanism
by taking STs’ turns recovering from NBTI. Hence, the VVdd decrease is slowed down,
which mitigates the long-term performance degradation and extends the circuit lifetime.
Significant lifetime extension while retaining the purpose of power gating – leakage
saving: To minimize the additional leakage current flowing through those redundant STs,
reverse body bias is applied to increase their fresh Vth values (at time 0). Based on the
observation in [13] that a high-Vth transistor ages slower than a low-Vth transistor, the
use of redundant STs with reverse body bias can achieve significant lifetime extension
for power-gated circuits without incurring too much overhead in leakage power. This is
in contrast to using forward body bias (as in [67]) which can increase leakage power by
197%.
167
8.1 Aging Analysis for Power-Gated Circuits
8.1.1 NBTI Degradation Model for Logic Networks
The same model introduced in Chapter 2.2 is used to predict NBTI effects in terms of
performance degradation for logic networks. The predictive model is not repeated here.
Please refer to Chapter 2.2 for more details.
8.1.2 NBTI Degradation Model for Sleep Transistors
To analyze the performance degradation of power-gated circuits due to the NBTI impact
on sleep transistors, the voltage level of virtual VDD should be the main focus. Virtual VDD,
which supplies the logic circuit in a PG structure with required operating voltage, is a virtual
bus connecting the drain terminals of all sleep transistors [65]. Because of the resistance
between VDD and virtual VDD when sleep transistors behave in the linear region during
active mode, a voltage drop at virtual VDD can be observed. Typically, sleep transistors are
sized such that a tradeoff among voltage drop, leakage saving, and area overhead is obtained
[66].
168
In the presence of NBTI, the effective resistance between VDD and virtual VDD in-
creases due to the increasing Vth of sleep transistors and thus, the voltage drop is becoming
larger with NBTI stress, which will impose additive performance loss beyond that on the
logic itself. The model for performance degradation as a result of the increasing voltage drop
is described as follows.
Consider the example of header-based power gating in Figure 8-1(a). An equivalent RC
model is shown in Figure 8-1(b) where the resistor characterizes the network of sleep tran-
sistors (between VDD and virtual VDD) and the current source characterizes the logic net-
work (between virtual VDD and GND). Note that a finer-grained RC model with various
resistors and current sources can be employed for more realistic analysis if the detailed in-
formation about physical implementation is available.
The increase in Vth of sleep transistors can be determined by Equation (5). Given the
degraded threshold voltage (Vth’ = Vth + ΔVth), we update the current flowing through a sleep
transistor STi using the MOSFET current equation:
STthgsST
oxpST VVVL
WCI i
i)( ′−≈′ μ as VST is small (44)
where μp is the hole mobility, Cox is the oxide capacitance, WST is the width of the sleep
169
transistor, and VST is its drain-to-source voltage, simply the voltage drop at virtual VDD and
supposed to be small (e.g., 5% of Vdd).
Next, the effective resistance of the network of sleep transistors under aging can be de-
rived as:
)(1
thgsoxpi STi ST
ST
i ST
ddddSTs VVCW
LI
VIVVVR
iii′−
⋅≈′
=′
−=′∑∑∑ μ (45)
where VVdd is the voltage level of virtual VDD.
We can then calculate the new (lower) VVdd:
STsONdddd RIVVV ′⋅−=′ (46)
where ION is the active (turned-on) current drained by the logic network, which is the maxi-
mum cumulative switching current of a set of gates that switch simultaneously.
Finally, the propagation delay of each gate in the power-gated circuit can be estimated
based on the alpha-power law:
( ) fthdd
ddp VVV
VVατ
−∝ (47)
where αf is the technology-dependent velocity saturation factor.
In prior art, only the NBTI degradation effect of VVdd on the logic network has been
170
examined, and the aging of the logic itself, which leads to a decreasing ION, is not included. It
is evident from Equation (46) that the performance degradation of power-gated circuits will
be overestimated without taking the ION decrease into account. The dependence of ION on the
degradation of logic networks is based on the charge-current formula:
p
ddGateON
VVIdtdVC
dtdQI
τΔ∝⇒== )( (48)
According to Equation (48), we can trace the change in the current drained by a gate and
further derive the degraded ION by summing up the current of those gates that switch simul-
taneously. In terms of the VVdd degradation (see Equation (46)), the decrease in ION is actually
beneficial since it partially (but not fully) offsets the increase in RSTs.
Here, we summarize the interdependence between the degradation effects on sleep tran-
sistors (STs) and logic networks (LNs):
(i) The effect of ST aging (i.e., decreasing VVdd) aggravates the performance degradation of LNs.
(ii) The effect of LN aging (i.e., decreasing ION) alleviates the effect of ST aging.
The interdependent effects are particularly important for accurate analysis of
NBTI-induced performance degradation for power-gated circuits and have been incorporated
in our analysis framework. Figure 8-2 shows the analysis results of the proposed framework
171
in 65nm PTM for an industrial benchmark AES assuming that it is in active mode 60% of the
time. As it can be seen, considering LN aging only (“LN only”) or considering ST aging only
(“ST only”) underestimates the performance degradation, while ignoring the interdependency
(“LN and ST independently”) overestimates the performance degradation.
Note that power gating can remove the stress condition for logic devices by pulling
down VVdd toward 0V several clock cycles after the circuit goes standby [44]. Therefore, the
10-year performance loss of “LN only” is smaller than that reported in the literature, where
circuits are not power-gated. In the case of joint modeling of interdependent LN and ST
aging (“interdep. LN and ST jointly”), the results exhibit an in-between degradation trend. It
is worth mentioning that, because sleep transistors are always on during active mode and
Figure 8-2: Analysis results of the proposed model for power-gated circuits
172
suffer more severe NBTI than logic devices, the VVdd degradation will never be stopped by
the decrease in ION. A chain of inverters simulated by HSPICE (see Figure 8-3) indicates that
the normalized error of the proposed NBTI degradation model is always within 1.5% over
the 10-year performance prediction. Note that the error tends to saturate after 7 years even
though it is increasing from the 2nd to 7th years.
8.2 Lifetime Extension for Power-Gated Circuits
It has been demonstrated in Chapter 8.1.2 that the overall performance degradation of a
power-gated circuit can reach significant levels (>12%). If the timing margin of a design is
10%, the design under power gating will likely wear out within two years, as shown in Figure
Figure 8-3: HSPICE validation with a chain of inverters
No more significant divergence
173
8-2. This is definitely unacceptable for most state-of-the-art applications of power gating. In
this subchapter, we propose to introduce redundant STs and develop a scheduling framework
such that the original STs can be shut off periodically during active mode. The ultimate goal
of our methodology is to maximize the lifetime of power-gated circuits while retaining the
purpose of power gating, i.e., leakage saving.
8.2.1 Problem Formulation
The proposed methodology is formulated as an area-constrained optimization problem
for concurrent lifetime extension and leakage saving. Given an allowable percentage p% on
the total width of redundant STs, the objective is to determine an optimal value of reverse
body bias such that, when applied on the redundant STs, the lifetime of a power-gated circuit
can be significantly extended with minimal leakage overhead. The lifetime is measured as the
duration of time during which the circuit can operate with its performance loss not exceeding
10% (wear-out if exceeding 10%). The problem formulation is given as:
Maximize
),,(),,(),,(
)1(
),,(),,(),,(
%0
%%0
%0
%0%
dVCLeakagedVCLeakagedVCLeakage
w
dVCLifetimedVCLifetimedVCLifetime
w
b
brb
b
bbr
−⋅−+
−⋅
(49)
Subject to maxVVV
pr
bdd ≤<≤
174
where w (0 < w < 1) is the weight for lifetime extension, Cr% is the circuit with r% ST re-
dundancy introduced (thus C0% is the original power-gated circuit), Vb is the bulk voltage
assigned to redundant STs (for reverse body-biasing), and d is the duty cycle of the circuit
(defined as the ratio of active time to total time).
8.2.2 Exploring NBTI Recovery via ST Redundancy
Given a power-gated circuit with the number and total width of STs optimally deter-
mined, a certain number of STs are introduced as redundant STs to combat NBTI-induced
performance degradation. Since the current circuit has more STs than necessary, not all of
them need to be turned on for normal operation during active mode, especially before the STs
experience significant aging. With the existence of ST redundancy, we can explore the re-
covery mechanism by shutting off STs by turns during active mode, giving them extra time to
recover from NBTI. Hence, the VVdd decrease due to ST aging is slowed down, which miti-
gates the long-term performance degradation and extends the circuit lifetime.
Figure 8-4 shows the hardware architecture of our NBTI-aware power gating design [68]
(the width of each ST specified below it) where ST1-ST4 and ST6-ST9 are the original STs, and
ST5 and ST10 (highlighted) are the redundant STs, i.e., 25% ST redundancy in terms of total
175
ST width (4W/16W). Shift registers (SRs) are deployed to drive groups of STs such that,
during active mode, one or more of the ST groups can be shut off by intermittent “sleep”
signals (logic “1”) sent from the power management unit (PMU). ST grouping is
pre-determined based on the wakeup scheduling [69] and the redundant STs are evenly dis-
tributed to the groups in which the subtotal widths of STs are smaller. By doing so, every
group has more balanced subtotal ST width and subsequently, we will have more flexibility
in exploring NBTI recovery by switching STs on and off. After introducing redundant STs,
the wakeup scheduling can be further refined for better behavior during power mode transi-
tion. The refinement of wakeup scheduling is beyond the scope of this work and not particu-
larly addressed here. The voltage sensor (VS) compares VVdd with a reference value and
outputs a signal on which the PMU decides whether to adjust the “sleep” patterns.
In this example, where 25% redundant STs have been placed, the logic circuit can oper-
VVDD
Logic Devices
VDD
SR1
PMU: Power Management Unit
PMU
VS
VS: Voltage Sensor
SR2 SR3 SR4SR: Shift Register
S1 S2 S4 S5S3 S6 S9S7 S10S8
3W 3W 1W 1W 2W 3W 3W 1W 1W 2W
Figure 8-4: NBTI-aware power gating design
176
ate properly under the 10% performance bound with part of the original and redundant STs
turned on, as long as the total width of turned-on STs is sufficient. To this end, round-robin
scheduling is adopted in the PMU to assert a “sleep” signal (logic “1”) every five cycles
during active mode, thus rendering a duty cycle of 80% for each ST while satisfying the
requirement on the total width of turned-on STs. Once the VS detects a significant voltage
drop, the PMU will assert “sleep” signals less frequently to realize a higher duty cycle and on
average, more STs can be turned on for guaranteeing reliable operation. In this hardware
configuration with the support of SRs, PMU, and VS, the stress probability of each ST is as
low as 80%, meaning that the STs no longer suffer continuous NBTI stress during active
mode and can recover from NBTI within the 20% time intervals.
Not shown in Figure 8-4, to avoid excessive glitches on the virtual VDD resulting from
the switching of STs when logic devices are draining current, the clock signals to SRs are
delayed and/or frequency-divided, so that STs will not be triggered at the same time as logic
devices. Without affecting the circuit behavior (timing and functionality) and cumulative
amount of time for NBTI recovery, this strategy effectively diminishes the likelihood of
excessive glitches occurring on the virtual VDD.
177
8.2.3 Applying Reverse Body Bias
The major drawback of introducing ST redundancy is the additional leakage current
flowing through those redundant STs, which is proportional to the total width of redundant
STs in a power-gated circuit. In order to not incur too much leakage overhead, reverse body
bias (RBB) is applied on the redundant STs to increase their fresh Vth values. It is
well-known that sub-threshold leakage current decreases exponentially with higher Vth:
nkT
qVV thgs
eII)(
0sub
−
⋅≈ (50)
where I0 is the current at Vgs = Vth, n is the sub-threshold slope factor, k is the Boltzmann
constant, T is the absolute temperature, and q is the electron charge.
Therefore, the use of reverse body-biasing greatly reduces the overhead in leakage
power. Meanwhile, the benefit of ST redundancy along with the proposed scheduling scheme,
which is influenced only marginally, is discussed in the following.
As previously indicated [13][64], the variation in Vth can be compensated by the NBTI
effect. Figure 8-5 shows the NBTI-induced aging behaviors of three PMOS transistors with
diverse fresh Vth values. As it can be seen, the transistor with a higher (lower) fresh Vth ages
at a lower (higher) rate and thus, the high Vth (blue solid line) and the low Vth (red dashed line)
178
tend to converge toward the nominal case (black dotted line) as the stress of NBTI continues.
As shown in the figure, the high, nominal, and low Vth values at time 0 are 210mV, 180mV,
and 150mV, respectively. The difference between the high Vth and the low Vth remarkably
shrinks from 60mV at time 0, to 11.9mV at 10 years. For an 11-stage ring oscillator in 65nm
PTM, the 60mV Vth difference at time 0 leads to a performance (or frequency) variation of
6.2%, which is reduced to 1.7% after 10 years of operation.
As a result, we can minimize the leakage overhead due to the introduction of ST redun-
dancy, by assigning an optimal value of bulk voltage (Vb) greater than Vdd to the redundant
STs. On the other hand, based on the aforementioned fact that a high-Vth transistor ages
slower than a low-Vth transistor, the mitigation of “long-term” performance degradation and
the extension of circuit lifetime are still comparable to the case where RBB is not applied.
The comparison will be demonstrated later in Chapter 8.3.
Figure 8-5: Aging behaviors of PMOS transistors with different Vth values
6.2% perf/freq variation
1.7% perf/freq variation
179
By determining the optimal Vb value which maximizes the cost as a joint function of
Lifetime and Leakage (see Equation (49)), we can achieve significant lifetime extension for a
power-gated circuit without incurring too much leakage overhead. Typically, w is chosen to
be smaller than 0.5 because the lifetime extension is always larger than the leakage overhead.
Due to the efficiency of our analysis framework, we can afford to exhaustively search for the
optimal Vb with a discrete step of 50mV from Vdd to Vmax.
8.3 Experimental Results
We have implemented the proposed methodology of mitigating NBTI-induced per-
formance degradation for power-gated circuits. Experiments are conducted on an industrial
circuit (AES) and a set of benchmarks from the ISCAS and MCNC suites. The technology
used is 65nm, Predictive Technology Model (PTM) [35]. The supply voltage is 1.0V and the
operating temperature is assumed to be 300K.
Figure 8-6 depicts the normalized aging behaviors of circuit AES with various settings:
(i) the nominal PG design of AES, (ii) applying RBB on all STs in the nominal design (no ST
180
redundancy), (iii) introducing 25% ST redundancy (RBB applied) with no ST scheduling
implemented, i.e., all STs (original and redundant) on during active mode, and (iv) 25% ST
redundancy (RBB applied) with round-robin scheduling implemented. As it can be seen, the
nominal design (dotted line) has a performance degradation of more than 10% after 1.47
years (lifetime = 1.47 yrs). If 25% body-biased redundant STs are introduced (blue line), the
lifetime becomes 3.33 yrs. If the proposed ST scheduling architecture is incorporated (red
line), the lifetime is further extended to 4.45 yrs. The upward bounce of red line around year
2 happens because we discard the scheduling scheme when it first reaches the margin of 10%
performance loss. By doing so, all original and redundant STs are constantly (rather than
periodically) on during active mode for redeeming the PG design from wear-out failure. The
Figure 8-6: Comparison of aging behaviors with various settings
181
case of applying RBB on all STs in the nominal design (black line) is considered to demon-
strate that, even though the aging “rate” of STs with RBB is slower, the overall performance
degradation is still larger than the other cases due to its lower VVdd at time 0. Accordingly, it
does not make much sense to use RBB alone.
Figure 8-7 shows the relationship of the circuit lifetime versus the amount of bulk volt-
age (Vb) applied. By assigning a Vb greater than 1.0V (RBB) to redundant STs, the aging
curve shifts toward the left slightly, implying a shorter lifetime. However, the difference is
not significant so as to provide perfect opportunities of trading lifetime extension for leakage
saving, which is more sensitive to Vb via an exponential dependency. This is the key motiva-
tion of our methodology exploiting RBB to reduce the leakage current flowing through re-
dundant STs.
Figure 8-7: Lifetime vs. Vb (bulk voltage)
No significant difference
182
Table 8-1 tabulates the experimental results of our NBTI-aware power gating method-
ology, where columns 2-4 correspond to the aforementioned 1st case (dotted line), the 3rd case
(blue line), and the 4th case (red line) in Figure 8-6, respectively. Column 5 shows the leakage
overhead incurred by redundant STs. Note that, if RBB is not applied, the leakage overhead
will be approximately equal to the percentage of ST redundancy (25% in this experiment). To
realize RBB, the bulk voltage (Vb) shown in the last column is assigned to all redundant STs.
For example, as also depicted in Figure 8-6, the nominal PG design of circuit AES has a
lifetime of 1.47 yrs. It is extended to 3.33 yrs if 25% ST redundancy is introduced but no
scheduling is used. By employing the proposed scheduling framework based on round robin,
Table 8-1: Optimization results of lifetime and leakage
183
the lifetime of power-gated AES can be further extended to 4.45 yrs, where the leakage over-
head is 5.32% with Vb = 1.5V assigned. On average across all benchmarks considered, we
can achieve 3.04X lifetime extension with only 5.95% leakage overhead. In contrast to
existing work [67] where 20-200% overhead in leakage power is incurred for 1.85X lifetime
extension, our methodology reveals superior benefits by jointly (i) exploring NBTI recovery
via ST redundancy (Chapter 8.2.2) and (ii) applying reverse body bias (Chapter 8.2.3). The
area overhead, which comes from redundant STs, SRs, and VS (see Figure 8-4), is also small
as compared to the whole circuit. Consider AES again, about 5% area overhead is needed for
the lifetime extension from 1.47 yrs to 4.45 yrs.
Finally, the impact of increasing ST redundancy on lifetime extension is demonstrated in
Figure 8-8. Despite notable improvements (right shifts) in the circuit lifetime, the overhead in
Figure 8-8: Lifetime vs. ST redundancy
184
area and power is a major concern when a higher degree of ST redundancy is deployed. The
use of 25% ST redundancy is a good tradeoff since it brings sufficient lifetime extension
while keeping the leakage overhead below an acceptable extent.
8.4 Concluding Remarks
In this chapter, we present an analysis and mitigation methodology in terms of
NBTI-induced performance degradation for power-gated circuits. For the first time, joint
modeling of interdependent degradation effects on logic networks and sleep transistors is
included. Based on exploring NBTI recovery, redundant STs are introduced for mitigating the
aging of STs and thus extending the lifetime of power-gated circuits. Furthermore, we for-
mulate an optimization problem for concurrent lifetime extension and leakage saving, by
applying RBB on redundant STs. Experiments demonstrate that the proposed methodology
can accurately analyze the performance degradation and effectively extend the circuit life-
time.
As a future direction, we plan to investigate the effectiveness of adaptive scheduling and
185
body-biasing that can be dynamically adjusted according to system profiles. In such a sce-
nario, multiple upward bounces will occur on the aging curves (behaviors), as shown in
Figure 8-6 (the red line), and hence, we expect to obtain additional lifetime improvements.
This, however, involves more complex design of PMU, either in hardware or software im-
plementation.
186
Chapter 9 Summary
This dissertation research addresses an important issue as a result of continuous scaling
trends: reliability. Two problems of reliability aware circuit optimization, SER reduction and
NBTI mitigation, are explored and formulated. For SER reduction, we present three SER
reduction approaches based on redundancy addition and removal (RAR), selective voltage
scaling (SVS), and clock skew scheduling (CSS). All of them rely on the symbolic SER
analyzer which provides a unified treatment of three masking mechanisms, while each of
them targets a different part of logic circuits, leading to orthogonal relationships and com-
pounding results. Various experiments on a set of standard benchmarks reveal the effective-
ness of our framework and demonstrate that the normalized joint cost per unit of SER reduc-
tion is relatively low when compared to other state-of-the-art techniques. For NBTI mitiga-
tion, we first develop a methodology using logic restructuring and pin reordering to combat
187
NBTI-induced performance degradation with marginally design penalty. The impact of path
sensitization on aging-aware timing analysis and optimization is then investigated. Finally,
we move our focus to lifetime extension for power-gated designs. By introducing redundant
sleep transistors with reverse body bias, not only can the lifetime of a power-gated circuit be
significantly extended but also the leakage overhead is minimized.
188
Bibliography
[1] ⎯⎯, International Technology Roadmap for Semiconductor, 2009.
[2] R. Baumann, “Soft errors in advanced computer systems,” IEEE Design and Test of Computers, vol. 22, no. 3, pp. 258-266, May 2005.
[3] J. W. McPherson, “Reliability challenges for 45nm and beyond,” in Proc. of Design Automation Conf. (DAC), pp. 176-181, July 2006.
[4] S. Mitra et al., “Robust system design with built-in soft-error resilience,” IEEE Com-puter Magazine, vol. 38, no. 2, pp. 43-52, Feb. 2005.
[5] P. Shivakumar et al., “Modeling the effect of technology trends on the soft error rate of combinational logic,” in Proc. of Int’l Conf. on Dependable Systems and Networks, pp. 389-399, June 2002.
[6] D. K. Schroder and J. A. Babcock, “Negative bias temperature instability: road to cross in deep submicron silicon semiconductor manufacturing,” Journal of Applied Physics, vol. 94, no. 1, Jul. 2003.
[7] J. H. Stathis and S. Zafar, “The negative bias temperature instability in MOS devices: a review,” Microelectronics Reliability, vol. 46, no. 2-4, Feb.-April 2006.
[8] S. Chakravarthi et al., “A comprehensive framework for predictive modeling of negative bias temperature instability,” in Proc. of Int’l Reliability Physics Symp. (IRPS), pp. 273-282, April 2004.
[9] N. Kimizuka et al., “The impact of bias temperature instability for direct-tunneling ultra-thin gate oxide on MOSFET scaling,” in Proc. of Symp. on VLSI Technology, pp.
189
73-74, June 1999.
[10] V. Reddy et al., “Impact of negative bias temperature instability on product parametric drift,” in Proc. of Int’l Test Conf. (ITC), pp. 148-155, Oct. 2004.
[11] S. V. Kumar, C. H. Kim, and S. S. Sapatnekar, “An analytical model for negative bias temperature instability,” in Proc. of Int’l Conf. on Computer-Aided Design (ICCAD), pp. 493-496, Nov. 2006.
[12] W. Wang et al., “The impact of NBTI on the performance of combinational and sequen-tial circuits,” in Proc. of Design Automation Conf. (DAC), pp. 364-369, June 2007.
[13] W. Wang et al., “The impact of NBTI effect on combinational circuit: modeling, simula-tion, and analysis,” IEEE Trans. on Very Large Scale Integration Systems (TVLSI), vol. 18, no. 2, pp. 173-183, Feb. 2010.
[14] N. Miskov-Zivanov and D. Marculescu, “MARS-C: modeling and reduction of soft errors in combinational circuits,” in Proc. of Design Automation Conf. (DAC), pp. 767-772, July 2006.
[15] N. Miskov-Zivanov and D. Marculescu, “Circuit reliability analysis using symbolic techniques,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 25, no. 12, pp. 2638-2649, Dec. 2006.
[16] N. Miskov-Zivanov and D. Marculescu, “Soft error rate analysis for sequential circuits,” in Proc. of Design, Automation, and Test in Europe (DATE), pp. 1436-1441, April 2007.
[17] N. Miskov-Zivanov and D. Marculescu, “Modeling and Optimization for Soft-Error Reliability of Sequential Circuits,” IEEE Trans. on Computer-Aided Design of Inte-grated Circuits and Systems (TCAD), vol. 27, no. 5, pp. 803-816, May 2008.
[18] N. Miskov-Zivanov and D. Marculescu, “A systematic approach to modeling and analy-sis of transient faults in logic circuits,” in Proc. of Int’l Symp. on Quality Electronic De-sign (ISQED), pp. 408-413, March 2009.
[19] N. Miskov-Zivanov and D. Marculescu, “Multiple transient faults in combinational and sequential circuits: a systematic approach,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 29, no. 10, pp. 1614-1627, Oct. 2010.
[20] M. Omana et al., “A model for transient fault propagation in combinational logic,” in Proc. of Int’l On-Line Testing Symp. (IOLTS), pp. 111-115, July 2003.
190
[21] K. Mohanram and N. A. Touba, “Cost-effective approach for reducing soft error failure rate in logic circuits,” in Proc. of Int’l Test Conf. (ITC), pp 893-901, Sep. 2003.
[22] Q. Zhou and K. Mohanram, “Gate sizing to radiation harden combinational logic,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 25, no. 1, pp. 155-166, Jan. 2006.
[23] M. R. Choudhury, Q. Zhou, and K. Mohanram, “Design optimization for single-event upset robustness using simultaneous dual-VDD and sizing technique,” in Proc. of Int’l Conf. on Computer-Aided Design (ICCAD), pp. 204-209, Nov. 2006.
[24] Y. S. Dhillon et al., “Analysis and optimization of nanometer CMOS circuits for soft-error tolerance,” IEEE Trans. on Very Large Scale Integration Systems (TVLSI), vol. 14, no. 5, pp. 514-524, May 2006.
[25] S. Almukhaizim et al., “Seamless integration of SER in rewiring-based design space exploration,” in Proc. of Int’l Test Conf. (ITC), pp. 1-9, Oct. 2007.
[26] S. Krishnaswamy et al., “Enhancing design robustness with reliability-aware resynthesis and logic simulation,” in Proc. of Int’l Conf. on Computer-Aided Design (ICCAD), pp. 149-154, Nov. 2007.
[27] M. Zhang et al., “Sequential element design with built-in soft error resilience,” IEEE Trans. on Very Large Scale Integration Systems (TVLSI), vol. 14, no. 12, pp. 1368-1378, Dec. 2006.
[28] V. Joshi et al., “Logic SER reduction through flipflop redesign,” in Proc. of Int’l Symp. on Quality Electronic Design (ISQED), pp. 611-616, March 2006.
[29] R. R. Rao, D. Blaauw, and D. Sylvester, “Soft error reduction in combinational logic using gate resizing and flipflop selection,” in Proc. of Int’l Conf. on Computer-Aided Design (ICCAD), pp. 502-509, Nov. 2006.
[30] S. Krishnaswamy, I. L. Markov, and J. P. Hayes, “On the role of timing masking in reliable logic circuit design,” in Proc. of Design Automation Conf. (DAC), pp. 924-929, June 2008.
[31] M. Nicolaidis, “Time redundancy based soft-error tolerance to rescue nanometer tech-nologies,” in Proc. of VLSI Test Symp. (VTS), pp. 86-94, April 1999.
[32] S. Krishnamohan and N. R. Mahapatra, “A highly-efficient technique for reducing soft
191
errors in static CMOS circuits,” in Proc. of Int’l Conf. on Computer Design (ICCD), pp. 126-131, Oct. 2004.
[33] S. Bhardwaj et al., “Predictive modeling of the NBTI effect for reliable design,” in Proc. of Custom Integrated Circuits Conference (CICC), pp. 189-192, Sep. 2006.
[34] W. Wang et al., “An efficient method to identify critical gates under circuit aging,” in Proc. of Int’l Conf. on Computer-Aided Design (ICCAD), pp. 735-740, Nov. 2007.
[35] ⎯⎯, Predictive Technology Model (PTM), 2007. [Online]. Available: ptm.asu.edu
[36] B. C. Paul et al., “Temporal performance degradation under NBTI: estimation and design for improved reliability of nanoscale circuits,” in Proc. of Design, Automation, and Test in Europe (DATE), pp. 780-785, March 2006.
[37] K. Kang et al., “Efficient transistor-level sizing technique under temporal performance degradation due to NBTI,” in Proc. of Int’l Conf. on Computer Design (ICCD), pp. 216-221, Oct. 2006.
[38] R. Vattikonda, W. Wang, and Y. Cao, “Modeling and minimization of PMOS NBTI effect for robust nanomater design,” in Proc. of Design Automation Conf. (DAC), pp. 1047-1052, July 2006.
[39] X. Yang and K. Saluja, “Combating NBTI degradation via gate sizing,” in Proc. of Int’l Symp. on Quality Electronic Design (ISQED), pp. 47-52, March 2007.
[40] S. V. Kumar, C. H. Kim, and S. S. Sapatnekar, “NBTI-aware synthesis of digital cir-cuits,” in Proc. of Design Automation Conf. (DAC), pp. 370-375, June 2007.
[41] H. Dadgour and K. Banerjee, “Aging-resilient design of pipelined architectures using novel detection and correction circuits,” in Proc. of Design, Automation, and Test in Europe (DATE), pp. 244-249, March 2010.
[42] Y. Wang et al., “Temperature-aware NBTI modeling and the impact of input vector control on performance degradation,” in Proc. of Design, Automation, and Test in Europe (DATE), pp. 546-551, April 2007.
[43] D. R. Bild, G. E. Bok, and R. P. Dick, “Minimization of NBTI performance degradation using internal node control,” in Proc. of Design, Automation, and Test in Europe (DATE), pp. 148-153, April 2009.
[44] A. Calimera, E. Macii, and M. Poncino, “NBTI-aware power gating for concurrent
192
leakage and aging optimization,” in Proc. of Int’l Symp. on Low Power Electronics and Design (ISLPED), pp. 127-132, Aug. 2009.
[45] S.-C. Chang, M. Marek-Sadowska, and K.-T. Cheng, “Perturb and simplify: multilevel Boolean network optimizer,” IEEE Trans. on Computer-Aided Design of Integrated Cir-cuits and Systems (TCAD), vol. 15, no. 12, pp. 1494-1504, Dec. 1996.
[46] L. A. Entrena and K.-T. Cheng, “Combinational and sequential logic optimization by redundancy addition and removal,” IEEE Trans. on Computer-Aided Design of Integra-tion Circuits and Systems (TCAD), vol. 14, no. 7, pp. 909-916, July 1995.
[47] Q. Ding, Y. Wang, H. Wang, R. Luo, and H. Yang, “Output remapping technique for soft-error rate reduction in critical paths,” in Proc. of Int’l Symp. on Quality Electronic Design (ISQED), pp. 74-77, March 2008.
[48] I. Sutherland, B. Sproull, and D. Harris, Logical Efforts: Designing Fast CMOS Circuits, Morgan Kaufmann, 1999.
[49] S. H. Kulkarni and D. Sylvester, “High performance level conversion for dual VDD design,” IEEE Trans. on Very Large Scale Integration Systems (TVLSI), vol. 12, no. 9, pp. 926-936, Sep. 2004.
[50] R. Puri et al., “Pushing ASIC performance in a power envelope,” in Proc. of Design Automation Conf. (DAC), pp. 788-793, June 2003.
[51] C. Chen, A. Srivastava, and M. Sarrafzadeh, “On gate level power optimization using dual-supply voltages,” IEEE Trans. on Very Large Scale Integration Systems (TVLSI), vol. 9, no. 5, pp. 616-629, Oct. 2001.
[52] C. M. Fiduccia and R. M. Mattheyses, “A linear-time heuristic for improving network partitions,” in Proc. of Design Automation Conf. (DAC), pp. 175-181, June 1982.
[53] W. Chuang, S. S. Sapatnekar, and I. N. Hajj, “Timing and area optimization for stan-dard-cell VLSI circuit design,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 14, no. 3, March 1995.
[54] S.-H. Huang and Y.-T. Nieh, “Synthesis of nonzero clock skew circuits,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 25, no. 6, June 2006.
[55] J. P. Fishburn, “Clock skew optimization,” IEEE Trans. on Computers, vol. 39, no. 7,
193
July 1990.
[56] J. L. Neves and E. G. Friedman, “Design methodology for synthesizing clock distribu-tion networks exploiting nonzero localized clock skew,” IEEE Trans. on Very Large Scale Integration Systems (TVLSI), June 1996.
[57] K. S. Chung and C. L. Liu, “Local transformation techniques for multi-level logic Cir-cuits utilizing circuit symmetries for power reduction,” in Proc. of Int’l Symp. on Low Power Electronics and Design (ISLPED), pp. 215-220, Aug. 1998.
[58] K.-H. Chang, I. L. Markov, and V. Bertacco, “Post-placement rewiring and rebuffering by exhaustive search for functional symmetries,” in Proc. of Int’l Conf. on Com-puter-Aided Design (ICCAD), pp. 56-63, Nov. 2005.
[59] C.-W. Chang et al., “Fast post-placement rewiring using easily detectable functional symmetries,” in Proc. of Design Automation Conf. (DAC), pp. 286-289, Jun. 2000.
[60] C.-W. Chang et al., “Fast post-placement optimization using functional symmetries,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 23, no. 1, Jan. 2004.
[61] Y.-M. Kuo, Y.-L. Chang, and S.-C. Chang, “Efficient Boolean characteristic function for timed automatic test pattern generation,” IEEE Trans. on Computer-Aided Design of In-tegrated Circuits and Systems (TCAD), vol. 28, no. 3, pp. 417-425, March 2009.
[62] H.-C. Chen, D. H.-C. Du, and L.-R. Liu, “Critical path selection for performance opti-mization,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 12, no. 2, pp. 185-195, Feb. 1993.
[63] O. Coudert, “Gate sizing for constrained delay/power/area optimization,” IEEE Trans. on Very Large Scale Integration Systems (TVLSI), vol. 5, no. 4, pp. 465-472, Dec. 1997.
[64] W. Wang et al., “Statistical prediction of circuit aging under process variations,” in Proc. of Custom Integrated Circuits Conference (CICC), pp. 13-16, Sep. 2008.
[65] C. Long and L. He, “Distributed sleep transistor network for power reduction,” in Proc. of Design Automation Conf. (DAC), pp. 181-186, June 2003.
[66] D.-S. Chiou, S.-H. Chen, and S.-C. Chang, “Timing driven power gating,” in Proc. of Design Automation Conf. (DAC), pp. 121-124, July 2006.
[67] A. Calimera, E. Macii, and M. Poncino, “NBTI-aware sleep transistor design for reliable
194
power-gating,” in Proc. of Great Lakes Symp. on VLSI (GLSVLSI), pp. 333-338, May 2009.
[68] M.-C. Lee et al., “NBTI-aware power gating design,” in Proc. of Asia and South Pacific Design Automation Conf. (ASP-DAC), pp. 609-614, Jan. 2011.
[69] M.-C. Lee et al., “An efficient wakeup scheduling considering resource constraint for sensor-based power gating designs,” in Proc. of Int’l Conf. on Computer-Aided Design (ICCAD), pp. 457-460, Nov. 2009.
[70] K.-C. Wu and D. Marculescu, “Soft error rate reduction using redundancy addition and removal,” in Proc. of Asia and South Pacific Design Automation Conf. (ASP-DAC), pp. 559-564, Jan. 2008.
[71] K.-C. Wu and D. Marculescu, “Power-aware soft error hardening via selective voltage scaling,” in Proc. of Int’l Conf. on Computer Design (ICCD), pp. 301-306, Oct. 2008.
[72] K.-C. Wu and D. Marculescu, “Clock skew scheduling for soft-error-tolerant sequential circuits,” in Proc. of Design, Automation, and Test in Europe (DATE), pp. 717-722, March 2010.
[73] N. Miskov-Zivanov, K.-C. Wu, and D. Marculescu, “Process variability-aware transient fault modeling and analysis,” in Proc. of Int’l Conf. on Computer-Aided Design (ICCAD), pp. 685-690, Nov. 2008.
[74] K.-C. Wu and D. Marculescu, “Joint logic restructuring and pin reordering against NBTI-induced performance degradation,” in Proc. of Design, Automation, and Test in Europe (DATE), pp. 75-80, April 2009.
[75] K.-C. Wu and D. Marculescu, “Aging-aware timing analysis and optimization consider-ing path sensitization,” in Proc. of Design, Automation, and Test in Europe (DATE), pp. 1572-1577, March 2011.
[76] K.-C. Wu, D. Marculescu, M.-C. Lee, and S.-C. Chang, “Analysis and mitigation of NBTI-induced performance degradation for power-gated circuits,” to appear in Proc. of Int’l Symp. on Low Power Electronics and Design (ISLPED), pp. xxx-yyy, Aug. 2011.
[77] K.-C. Wu and D. Marculescu, “A low-cost, systematic methodology for soft error ro-bustness of logic circuits,” submitted to IEEE Trans. on Very Large Scale Integration Systems (TVLSI), 2011.
195
Glossary (Index of Terms)
Mean error susceptibility (MES) 16 For each primary output Fj, initial duration d and initial amplitude a, MES(Fj) is the prob-ability of output Fj failing due to errors at internal gates. More formally, MES(Fj) is defined in Equation (2).
Mean error impact (MEI) 26 Mean error impact (MEI), as defined in Equation (8), characterizes each gate in terms of its contribution to the overall SER. The MEI value of a gate quantifies the probability that at least one primary output is affected by a glitch originating at this gate.
Mean masking impact (MMI) 27 Mean masking impact (MMI), as defined in Equation (9), characterizes each gate in terms of its capability of filtering passing glitches. The MMI value of a gate denotes the normalized expected attenuation on the duration (or amplitude) of all glitches passing through the gate.
Scaling criticality (SC) 67 The scaling criticality (SC), as defined in Equation (25), of gate G represents the decrease in MEI of gate G’s immediate fanin neighbors after gate G has been scaled up.
Soft-error-critical gate 68 A gate is called soft-error-critical if its SC is within the highest l% of overall SC values where l is a specified lower bound.
Soft-error-relevant gate 68 A gate is called soft-error-relevant if its SC is within the next l%-u% of overall SC values where u is a specified upper bound and greater than l.
196
Skew 92 Given two flip-flops FFi and FFj for which the arrival times to clock pins are ci and cj re-spectively, the skew between FFi and FFj, denoted by skew(FFi, FFj), is (ci – cj).
Error-latching window 92 The error-latching window of a flip-flop is a time interval, [t–tsu, t+th], where t is the moment when a clock edge happens, tsu and th are the setup and hold times of the flip-flop.
Implication-based masking (IM) 97 See Definition 8.
Mutually-exclusive propagation (MEP) 99 See Definition 9.
Intersecting gate 102 The intersecting gate of two flip-flops FFi and FFj is the root gate for the intersection of FFi’s and FFj’s fanin cones.
Normalized absolute adjustment 112 Normalized absolute adjustment, as formally defined in Equation (40), quantifies the cost imposed by clock skew scheduling in terms of the degree of clock network modification.
Non-equivalence symmetry (NES) 122 See Definition 11.
Equivalence symmetry (ES) 122 See Definition 12.
Functional symmetry 123 Two variables x and y in a Boolean function F(…, x,…, y,…) are functionally symmetric if they are either NES or ES.
Generalized implication supergate (GISG) 123 A generalized implication supergate (GISG) is a group of connected gates that is logically equivalent to a big AND/OR gate with a large number of inputs. For simplicity, we use only supergate (SG) to refer to a generalized implication supergate.
NBTI-critical path 127 After the timing analysis under NBTI, a path is called a NBTI-critical path if and only if its delay is larger than the delay of the longest path without consideration of NBTI effects.
197
NBTI-critical node 127 After the timing analysis under NBTI, a node is called a NBTI-critical node if and only if it is on a NBTI-critical path.
NBTI-critical supergate 128 A maximal supergate is called a NBTI-critical supergate if and only if it is rooted at a NBTI-critical node.
Most critical path segment (MCPS) 128 The most critical path segment (MCPS) associated with a supergate is the intersection of the supergate and the longest global path passing through its root.
NBTI-aware swappee 128 Given a NBTI-critical supergate G, a wire S (to gate P, belonging to G) is a NBTI-aware swappee if (i) S is a side input to the MCPS of G, or (ii) P is in the fanin cone of a side input to the MCPS of G.
NBTI-aware swapper 128 Given a NBTI-critical supergate G and a NBTI-aware swappee S, a wire T (to gate Q, be-longing to G) is a NBTI-aware swapper if (i) S and T are functionally symmetric, (ii) the swap of S and T may not cause any timing violation, and (iii) the swap of S and T is benefi-cial in terms of NBTI effects.
Side input 149 For each gate on an activated path, a side input is an input pin of the gate through which the activated path does not pass.
Side-input assignment 149 For each side input, its value assignment, called side-input assignment, is the value evaluated by propagating a particular input vector.