nitt project report-nocode
TRANSCRIPT
Optimizing soft error detection by employing derating methods
A thesis submitted in partial fulfillment of the requirements for the award
of the degree of
B.Tech in
Electronics and Communication Engineering
By
R. Deepak Dhanavel Kumar (108110022)
ELECTRONICS AND COMMUNICATION
ENGINEERING NATIONAL INSTITUTE OF
TECHNOLOGY TIRUCHIRAPALLI-620015
MAY 2014
BONAFIDE CERTIFICATE This is to certify that the project titled “Optimizing soft error detection by
employing derating methods” is a bonafide record of the work done by
R. Deepak Dhanavel Kumar (108110022)
in partial fulfilment of the requirements for the award of the degree of
Bachelor of Technology in Electronics and Communication Engineering of
NATIONAL INSTITUTE OF TECHNOLOGY,TIRUCHIRAPPALLI, during the
year 2013-2014. Dr.G.Lakshminarayanan Dr.D.Sriram Kumar
Project Guide Head of Department Project Viva voice held on Internal Examiner External Examiner
ABSTRACT Soft errors are random errors of a temporary nature that can occur in a circuit due to
external atmospheric phenomena such as particle strikes. They are one of the biggest
reliability challenges for present day electronic devices and their contribution to overall
device failure rate is increasing with technology scaling. Since they occur due to charge
build-up, the reduction in transistor length has led to more particle strikes to result in
soft errors. The chances of combinational elements to be affected by particle strikes is
increasing. Thus circuits that are required to run with a high level of security must be
optimized to detect and prevent the errors. In this report, we analyze one such method,
the delayed capture method. However, the method cannot be used as is, since it results
in a large number of flip flops in overhead. Therefore, intelligent grouping of flip-flops
is required to minimize the overhead incurred. In this, flip-flops are grouped and the
parity of the group is found out and then the parity is compared. This grouping of the
flip-flops has to be done carefully as two flip-flops that can be in error at the same time
should not be in the same group. This is why the grouping has to be done intelligently.
Further, by using derating methods, the grouping of the flip-flops can be further
optimized. The three derating methods are electrical, timing and logic derating. These
derating methods allow more flip-flops to be grouped together as they will not be in
error at the same time. These derating methods are to be implemented and the reduction
in the overhead flip-flops is documented to find out the effectiveness of the derating
methods followed.
Keywords: soft error, delayed capture method, derating effects
i
ACKNOWLEDGEMENT We are deeply indebted to Dr.G.Lakshminarayanan, Assistant Professor,
Department of Electronics and Communication Engineering, NIT Trichy, for his
valuable guidance and constant motivation during the entire course of the project. We take this opportunity to thank Dr. Sriram Kumar, Professor and Head of the
Department, Electronics and Communication Engineering for allowing us to undertake
this project. We sincerely thank and acknowledge G. Swaminathan, P.H.D. Scholar for helping
us complete this project on time. We are also very thankful to all the teaching and non-teaching staff of this department.
ii
TABLE OF CONTENTS
TITLE PAGE NO
ABSTRACT …………………..……………………….......................…………….i
ACKNOWLEDGEMENT.………..………………..……….................................. ii
TABLE OF CONTENTS..………………………… ……………………..…........iii
LIST OF FIGURES ………………………………….............................................v
LIST OF TABLES………………………………………………...……………….vi
Chapter 1 INTRODUCTION 1.1 Introduction ……………………………………………………………………1
1.2 Soft error basics ………………………………………………………………..1
1.3 Soft error effects………….………………………………………….................2
1.4 Errors resulting from SEU……………………………………………………...3
Chapter 2 DEPLAYED CAPTURE METHOD AND INTELLIGENT
GROUPING
2.1 Delayed capture method....................................................................................4 2.2 Intelligent grouping………………………………………...…………………..5 2.3 Graph colouring method…………………………………………...………..….6 2.4 Derating methods ……………………………………………….………….......6 2.4.1 Electrical derating……………………………………………………..............7 2.4.2 Timing derating………………………………………………………………..8 2.4.3 Logical derating……………………………………………………………….8 2.5 Placement consideration ………………………......…………………………....9
iii
Chapter 3 IMPLEMENTATION OF THE METHODS 3.1 The problem statement ……………………………………………………......10 3.2 Implementation of the method……………. ……………………………………..10 3.2.1 The fanin input format……………………………........................................10 3.2.2 The DIMACS graph format…………………………………………............11 3.2.3 Create meaningful output files……………………………….......................12 3.2.4 Implementation of electrical derating……………..........................................13 3.2.5 Implementation of timing deraing……….......................................................13 3.2.6 Placement consideration……………….…………………………………….14 3.3 Program flow of grouping programs…………………………………………...15 Chapter 4 SEARCH ALGORITHM 4.1 Brute-force method……………………………………………………………….16 4.2 Improved search algorithm………………………..............................................16 4.3 Speed improvement………………......................................................................17 Chapter 5 RESULTS
5.1 Overhead reduction results ……………………………....................................18 Chapter 6 CONCLUSION
6.1 Summary……………………………………………………………………………20 6.2 Conclusion…………………………………………………………………………..20
REFERENCES………………………………………………………...………….21
APPENDIX…………………………………………………………………………22
iv
LIST OF FIGURES Figure no.
1.1
Title
Soft error types
Page no.
2
1.2 Particle strike on logic gates 3
1.3 Delayed capture method 4
2.1 Simultaneous errors in flip-flops 5
2.2 Electrical derating 7
2.3 Timing Derating 8
2.4 IC design without placement consideration 9
3.1 fanin file generated by DC shell 10
3.2 DIMACS output file format 11
3.3 Output of the graph colouring program 11
3.4 Output file with group numbers 12 3.5 Output file with group information 12
3.6 Final output file with all information 12
3.7 Implementing timing derating 13
3.8 Flip-flops grouping based on placement information 14
3.9 The program flow of the grouping programs 15
v
LIST OF TABLES
Table no.
5.1
Title
Electrical derating results
Page no.
18 5.2 Timing derating results 19
vi
1
CHAPTER 1
INTRODUCTION 1. 1 INTRODUCTION
In the past few decades, the exponential growth in the number of transistors per chip
has brought tremendous progress in the performance and functionality of semiconductor
devices and, in particular, microprocessors. Each succeeding technology generation
has, however, introduced new obstacles to maintaining this exponential growth rate in
the number of transistors per chip. Packing more and more transistors on a chip requires
printing ever-smaller features. This led the industry to change lithography—the
technology used to print circuits onto computer chips—multiple times. Radiation-
induced transient faults arise from energetic particles, such as alpha particles from
packaging material and neutrons from the atmosphere, generating electron–hole pairs
(directly or indirectly) as they pass through a semiconductor device. This causes
temporary flips in the stored values. The necessity to find cheaper reliability solutions
has driven a whole new class of quantitative analysis of soft errors and corresponding
solutions that mitigate their effects. Here we analyze the delayed capture method. 1.2 SOFT ERROR BASICS The cost of recovery from a soft error depends on the specific nature of the
error arising from the particle strike. Soft errors can either result in a silent data
corruption (SDC) or detected unrecoverable error (DUE). Corrupted data that
go unnoticed by the user are benign and excluded from the SDC category. But
corrupted data that eventually result in a visible error that the user cares about
cause an SDC event. In contrast, a DUE event is one in which the computer
system detects the soft error and potentially crashes the system but avoids
corruption of any data the user cares about. An SDC event can also crash a
computer system, besides causing data corruption. However, it is often hard, if
not impossible, to trace back where the SDC event originally occurred. Besides
SDC and DUE, a third category of benign errors exists. These are corrected
errors that may be reported back to the operating system (OS). Because the
system recovers from the effect of the errors, these are usually not a cause of
concern. Nevertheless, many vendors use the reported rate of correctable erro rs
2
as an early warning that a system may have an impending hardware problem.
Fig.1.1 Soft error types
1.3 SOFT ERROR EFFECTS
Soft errors are random temporary errors in integrated circuit caused due to atmospheric
particle strike. They are primarily caused due to radiations emanating from packaging
materials (alpha particles) and cosmic radiations (neutrons). The impact of radiation on
integrated circuits results in excessive carrier generation. The excess carriers are
deposited in the internal capacitances of the transistors in the circuit. This causes a flip
in the flip-flop value. The charge required to disrupt the flip-flop is changing due to
technology scaling and consequent supply voltage reduction.
3
1.4 ERRORS RESULTING FROM SEU
Particle strike can happen on flip-flops directly or on logic gates leading to the
flip-flop. Only the affected flip-flop will change value in the first case. In the second
case, all the flip-flops in the fan-out of the gate will be affected. The chances of soft
errors affecting the logic gates was insignificant in previous technologies. However
as technology scales and the logic gates are made from smaller transistors, they
become more susceptible to particle strikes causing errors.
When a particle strike occurs on the logic gate, there is a chance of
more than one flip-flop to be in error. Thus a single strike can cause multiple errors.
This is the new danger being posed on modern ICs that needs to be combated.
Fig.1.2 Particle strike on logic gates
4
CHAPTER 2
DELAYED CAPTURE METHOD AND INTELLIGENT GROUPING
2.1. DEPLAYED CAPTURE METHOD
The delayed capture method uses the fact that the value of the flip-flop will
remain constant during the contamination time that is unless a soft error upset occurs
at the flip-flop at this time. Therefore using a separate flip-flop to save the parity
before the clock edge and then comparing it with the output to ensure no error has
occurred.
Figure 1.3 Delayed capture method
Parity is calculated at two different points at the input of the flip-flop and
at the output of the flip-flops. The input parity is latched at a time dseu + dparity w.r.t
functional clock. Assume that a radiation strike occurs and causes a transient in the
circuit. The transient will die down in Δseu. If the transient is captured in the functional
flip-flop, the pulse will attenuate significantly by the time the input parity is calculated
and the parity flip-flop captures the value using the delayed clock. The output parity
5
is calculated based on the data values captured by the functional flip-flops. Since the
flip-flop captures the wrong value due to the SEU pulse, the parity computed will be
different from the parity latched using the delayed clock. On comparing the input and
the output parity, as shown in Figure, we can get an indication of the presence of soft
errors of width less than Δseu.
2.2 INTELLIGENT GROUPING
Intelligent grouping is needed to reduce the overhead of the flip-flops.
Grouping is done to reduce overhead, but this grouping cannot be done randomly.
Consider particle strike on a gate leading to two flip-flops. This will result in two flip-
flops being in error at the same time. If these are in the same group, then parity
calculation will mask such an error. Therefore, common gates in the fan-in of the flip-
flops in the same group leads to aliasing errors. Intelligent grouping is needed to group
such that effect of SEU will be captured only in single flip-flop within a group. At the
same time, the number of groups must be kept minimum to reduce the overhead. This
is the critical aspect of the method. The intelligent grouping must be performed
efficiently to achieve better reduction in overhead.
Fig.2.1 Simultaneous errors in flip-flops
6
2.3 GRAPH COLOURING
The problem to find the minimum number of groups while maintaining the
effectives of the parity checking is resolved using the graph colouring method. Two
flip-flops are said to be in conflict if they share a common fan-in of gates. Each node in
the graph is a flip-flop and two nodes in conflict are connected by an edge.
Now graph colouring is a mathematical function that attempts to assign a colour to each
node such that no two adjacent nodes have the same colour. At the same time, it aims
to minimize the number of colours used. Now, those flip-flops with same colour are not
in conflict. Therefore, same colours can now be in one group. Graph colouring leads to
large groups with minimal number of colours.
2.4 DERATING METHODS
To further optimize the grouping, derating methods are used to reduce the
conflicts between the flip-flops. This reduction is done by observing certain phenomena
that help prevent the chances of simultaneous error of flip-flops. Only very particle
strikes that happen will reach the flip-flops. The logic gates on the path from the location
of the strike to the flip-flop will attenuate the particle strike induced pulse. Logic levels
will have cascading effect on attenuating the pulse. Pulse originating beyond certain
logic depth from the capture flip-flops will not have any impact. Therefore, the gates in
conflict that are more than a particular logic level need not be considered in the conflict
consideration. Even if the particle strike occurs in the vulnerable time period, it has to
reach the flip-flops at the same time to cause a simultaneous error. This fact can be
exploited so that even if the flip-flops are in the common fanin, there must be equal
propagation delay from that common gate to the corresponding flip-flops for the error
to be registered at the same time. Therefore, these methods can be used to reduce
number of conflicts among the flip-flops.
7
2.4.1 Electric Derating
For the temporary flip in the logic gate to have a change in the flip-flop value,
it must have sufficient amplitude when it reaches the flip-flops. The logic gates on the
path from the location of the strike to the flip-flop will attenuate the particle strike
induced pulse. Logic levels will have cascading effect on attenuating the pulse. Pulse
originating beyond certain logic depth from the capture flip-flops will not have any
impact. Therefore, the gates in conflict that are more than a particular logic level need
not be considered in the conflict consideration. Experiments have shown that particle
strikes more than 3-4 logic levels do not cause any effect on the flip-flop.
Figure 2.2: electrical derating
These concepts can be used to reduce the number of levels of fanin that needs
to be checked for conflicts. Graph colouring will be more effective with lesser number
of interconnected nodes.
8
2.4.1 Timing Derating
The particle strike on a logic gate will get captured only if they reach the
destination flip-flop at clock-edge. The probability of this occurrence reduces the
chances of a pulse resulting in an error. Therefore, each logic element will be vulnerable
only for a fraction of the clock period.
Now, even if the particle strike occurs in the vulnerable time period, it has to reach the
flip-flops at the same time to cause a simultaneous error. This fact can be exploited so
that even if the flip-flops are in the common fanin, there must be equal propagation
delay from that common gate to the corresponding flip-flops for the error to be
registered at the same time. Transient will not be captured in two downstream flip-flops
if the difference of delay from the gate to the two flip-flops in fan-out is significant.
Figure 2.3: Timing derating 2.4.3 Logic derating
The particle strike at the gate will reach the flip-flop only if the path from
the gate to the flip-flop is sensitized. If there are any controlling side inputs to the
gates on the path then the pulse will be masked and thus will not reach the flip-
flops. The circuit has to be analyzed to find out if the flip-flops in conflict are
masked separately from each other. In such a scenario, one of the flip-flops will
have the pulse masked and hence, they both will not be in error at the same time.
This can be used to improve the efficiency of the grouping.
9
2.5 PLACEMENT CONSIDERATION If only the fanin of the flip-flops is considered then far away flip-flops may be
grouped together. This makes wiring on an IC design difficult. Therefore the
grouping has to be take into consideration the location of the flip-flop which
grouping the flip-flops. The placement location of the flip-flops are also taken into
consideration and have an optional argument to include the coordinates of flip-
flops while grouping them. Any flip-flops that are separated by distance greater
than a critical distance are automatically not placed in the same group. Flip-flops
separated by more than a fixed distance are not included in the same group.
Figure 2.4: IC design without placement consideration
10
CHAPTER 3
IMPLEMENTATION OF THE METHODS 3.1 PROJECT OBJECTIVES
Grouping flip-flops in a circuit design by using a minimal number of groups and
ensuring that no two flip-flops in a group are in conflict. This is the primary condition
to be achieved. The efficiency depends on the amount of reduction in the redundant flip-
flops. Derating conditions can be applied on the grouping to further optimize the
grouping.
The available data is
1. fan-in information of different levels obtained from DC shell
2. ITC’s benchmark circuits were used, ranging from 10 to 3,000 flip-flops
3.2 IMPLEMENTATION OF THE METHODS
This section will contain details of the flow adopted to achieve the objectives of
the project. The various formats and programs used are explained.
3.2.1 The fanin input format
The Verilog file is read in DC shell to find the fanin information of the outputs
present in the circuit. This is stored as a text file. The Perl program is then used to read
file and store each fan-in information as an array. A 2D array of the entire fan-in
information in created. An efficient search algorithm has to be implemented to scan all
the arrays and find pairs of arrays that are in conflict.
Figure 3.1: fanin file generated by DC shell
11
3.2.2 The DIMACS graph format
The Perl program creates a conflict graph. This graph is then fed as input to a
graph colouring program. Since the colouring program requires a standard graph format,
the DIMACS format was used. The DIMACS format is a well-known format for
depicting an undirected graph in text file. The first line contains information regarding
the graph and the remaining lines each correspond to an edge in the graph.
A graph colouring program is used to colour the graph. Most graph colouring algorithms
are heuristic and do not always provide an absolute result. The heuristic colouring
algorithm called TabuCol was used for this program. The source code of the program was
modified to work on large circuits.
Figure 3.2: The DIMACS output file format
Figure 3.3: Output of the graph colouring program
12
3.2.3 Producing meaningful output files
The graph colouring program only assigns a colour to each node. This
information has to be then processed to group the flip-flops. The program also provides
all the information for different derating methods in one file so that they can be
compared.
Figure 3.4: Output file with group numbers Figure 3.5: Output file with group information
Figure 3.6: Final output file with all information
13
3.2.4 Implementation of electrical derating The Perl program called generate_electrical.pl was used for this purpose. The program
reads a single fan-in file and creates a conflict graph in DIMACS format. The DC shell
was used to produce fanin file for different derating levels. Electrical derating will be
applied depending on the input fan-in file. (level3 or level9 for underated) Therefore
all the different fanin level files are required to be present in the working directory and
the program will read the file corresponding to the derating level required. Program
can perform intelligent grouping without any derating and to any level of electrical
derating required. Experimentally, electrical derating means only up to level 3 needs
to be considered.
3.2.5 Implementation of timing derating The Perl program called generate_timing.pl was used for this purpose. The program
reads all the fanin level files for the particular module. These are generated from the
DC shell before the start of the program. In order the implement the timing derating,
each logic gate is assumed to have unit delay. Therefore, if the gate that causes the
conflict is in two different levels for the flip-flops, they are not in conflict since by
timing derating, they will not be in error at the same time. The fanin file from DC shell
contains all the gates in the fanin up to that level. Therefore, the fan-in from each level
separately is required for timing derating. Program reads all the fan-in files and finds
creates a 3D array having fan-in information of each fan-in level alone. The search
algorithm is then run separately for each level
Figure 3.7: Implementing timing derating
Level3 fan-in
Only Level3 fan-in Level2 fan-in