nitt project report-nocode

Optimizing soft error detection by employing derating methods

A thesis submitted in partial fulfillment of the requirements for the award

of the degree of

B.Tech in

Electronics and Communication Engineering

By

R. Deepak Dhanavel Kumar (108110022)

ELECTRONICS AND COMMUNICATION

ENGINEERING NATIONAL INSTITUTE OF

TECHNOLOGY TIRUCHIRAPALLI-620015

MAY 2014

BONAFIDE CERTIFICATE This is to certify that the project titled “Optimizing soft error detection by

employing derating methods” is a bonafide record of the work done by

R. Deepak Dhanavel Kumar (108110022)

in partial fulfilment of the requirements for the award of the degree of

Bachelor of Technology in Electronics and Communication Engineering of

NATIONAL INSTITUTE OF TECHNOLOGY,TIRUCHIRAPPALLI, during the

year 2013-2014. Dr.G.Lakshminarayanan Dr.D.Sriram Kumar

Project Guide Head of Department Project Viva voice held on Internal Examiner External Examiner

ABSTRACT Soft errors are random errors of a temporary nature that can occur in a circuit due to

external atmospheric phenomena such as particle strikes. They are one of the biggest

reliability challenges for present day electronic devices and their contribution to overall

device failure rate is increasing with technology scaling. Since they occur due to charge

build-up, the reduction in transistor length has led to more particle strikes to result in

soft errors. The chances of combinational elements to be affected by particle strikes is

increasing. Thus circuits that are required to run with a high level of security must be

optimized to detect and prevent the errors. In this report, we analyze one such method,

the delayed capture method. However, the method cannot be used as is, since it results

in a large number of flip flops in overhead. Therefore, intelligent grouping of flip-flops

is required to minimize the overhead incurred. In this, flip-flops are grouped and the

parity of the group is found out and then the parity is compared. This grouping of the

flip-flops has to be done carefully as two flip-flops that can be in error at the same time

should not be in the same group. This is why the grouping has to be done intelligently.

Further, by using derating methods, the grouping of the flip-flops can be further

optimized. The three derating methods are electrical, timing and logic derating. These

derating methods allow more flip-flops to be grouped together as they will not be in

error at the same time. These derating methods are to be implemented and the reduction

in the overhead flip-flops is documented to find out the effectiveness of the derating

methods followed.

Keywords: soft error, delayed capture method, derating effects

i

ACKNOWLEDGEMENT We are deeply indebted to Dr.G.Lakshminarayanan, Assistant Professor,

Department of Electronics and Communication Engineering, NIT Trichy, for his

valuable guidance and constant motivation during the entire course of the project. We take this opportunity to thank Dr. Sriram Kumar, Professor and Head of the

Department, Electronics and Communication Engineering for allowing us to undertake

this project. We sincerely thank and acknowledge G. Swaminathan, P.H.D. Scholar for helping

us complete this project on time. We are also very thankful to all the teaching and non-teaching staff of this department.

ii

TABLE OF CONTENTS

TITLE PAGE NO

ABSTRACT …………………..……………………….......................…………….i

ACKNOWLEDGEMENT.………..………………..……….................................. ii

TABLE OF CONTENTS..………………………… ……………………..…........iii

LIST OF FIGURES ………………………………….............................................v

LIST OF TABLES………………………………………………...……………….vi

Chapter 1 INTRODUCTION 1.1 Introduction ……………………………………………………………………1

1.2 Soft error basics ………………………………………………………………..1

1.3 Soft error effects………….………………………………………….................2

1.4 Errors resulting from SEU……………………………………………………...3

Chapter 2 DEPLAYED CAPTURE METHOD AND INTELLIGENT

GROUPING

2.1 Delayed capture method....................................................................................4 2.2 Intelligent grouping………………………………………...…………………..5 2.3 Graph colouring method…………………………………………...………..….6 2.4 Derating methods ……………………………………………….………….......6 2.4.1 Electrical derating……………………………………………………..............7 2.4.2 Timing derating………………………………………………………………..8 2.4.3 Logical derating……………………………………………………………….8 2.5 Placement consideration ………………………......…………………………....9

iii

Chapter 3 IMPLEMENTATION OF THE METHODS 3.1 The problem statement ……………………………………………………......10 3.2 Implementation of the method……………. ……………………………………..10 3.2.1 The fanin input format……………………………........................................10 3.2.2 The DIMACS graph format…………………………………………............11 3.2.3 Create meaningful output files……………………………….......................12 3.2.4 Implementation of electrical derating……………..........................................13 3.2.5 Implementation of timing deraing……….......................................................13 3.2.6 Placement consideration……………….…………………………………….14 3.3 Program flow of grouping programs…………………………………………...15 Chapter 4 SEARCH ALGORITHM 4.1 Brute-force method……………………………………………………………….16 4.2 Improved search algorithm………………………..............................................16 4.3 Speed improvement………………......................................................................17 Chapter 5 RESULTS

5.1 Overhead reduction results ……………………………....................................18 Chapter 6 CONCLUSION

6.1 Summary……………………………………………………………………………20 6.2 Conclusion…………………………………………………………………………..20

REFERENCES………………………………………………………...………….21

APPENDIX…………………………………………………………………………22

iv

LIST OF FIGURES Figure no.

1.1

Title

Soft error types

Page no.

2

1.2 Particle strike on logic gates 3

1.3 Delayed capture method 4

2.1 Simultaneous errors in flip-flops 5

2.2 Electrical derating 7

2.3 Timing Derating 8

2.4 IC design without placement consideration 9

3.1 fanin file generated by DC shell 10

3.2 DIMACS output file format 11

3.3 Output of the graph colouring program 11

3.4 Output file with group numbers 12 3.5 Output file with group information 12

3.6 Final output file with all information 12

3.7 Implementing timing derating 13

3.8 Flip-flops grouping based on placement information 14

3.9 The program flow of the grouping programs 15

v

LIST OF TABLES

Table no.

5.1

Title

Electrical derating results

Page no.

18 5.2 Timing derating results 19

vi

1

CHAPTER 1

INTRODUCTION 1. 1 INTRODUCTION

In the past few decades, the exponential growth in the number of transistors per chip

has brought tremendous progress in the performance and functionality of semiconductor

devices and, in particular, microprocessors. Each succeeding technology generation

has, however, introduced new obstacles to maintaining this exponential growth rate in

the number of transistors per chip. Packing more and more transistors on a chip requires

printing ever-smaller features. This led the industry to change lithography—the

technology used to print circuits onto computer chips—multiple times. Radiation-

induced transient faults arise from energetic particles, such as alpha particles from

packaging material and neutrons from the atmosphere, generating electron–hole pairs

(directly or indirectly) as they pass through a semiconductor device. This causes

temporary flips in the stored values. The necessity to find cheaper reliability solutions

has driven a whole new class of quantitative analysis of soft errors and corresponding

solutions that mitigate their effects. Here we analyze the delayed capture method. 1.2 SOFT ERROR BASICS The cost of recovery from a soft error depends on the specific nature of the

error arising from the particle strike. Soft errors can either result in a silent data

corruption (SDC) or detected unrecoverable error (DUE). Corrupted data that

go unnoticed by the user are benign and excluded from the SDC category. But

corrupted data that eventually result in a visible error that the user cares about

cause an SDC event. In contrast, a DUE event is one in which the computer

system detects the soft error and potentially crashes the system but avoids

corruption of any data the user cares about. An SDC event can also crash a

computer system, besides causing data corruption. However, it is often hard, if

not impossible, to trace back where the SDC event originally occurred. Besides

SDC and DUE, a third category of benign errors exists. These are corrected

errors that may be reported back to the operating system (OS). Because the

system recovers from the effect of the errors, these are usually not a cause of

concern. Nevertheless, many vendors use the reported rate of correctable erro rs

2

as an early warning that a system may have an impending hardware problem.

Fig.1.1 Soft error types

1.3 SOFT ERROR EFFECTS

Soft errors are random temporary errors in integrated circuit caused due to atmospheric

particle strike. They are primarily caused due to radiations emanating from packaging

materials (alpha particles) and cosmic radiations (neutrons). The impact of radiation on

integrated circuits results in excessive carrier generation. The excess carriers are

deposited in the internal capacitances of the transistors in the circuit. This causes a flip

in the flip-flop value. The charge required to disrupt the flip-flop is changing due to

technology scaling and consequent supply voltage reduction.

3

1.4 ERRORS RESULTING FROM SEU

Particle strike can happen on flip-flops directly or on logic gates leading to the

flip-flop. Only the affected flip-flop will change value in the first case. In the second

case, all the flip-flops in the fan-out of the gate will be affected. The chances of soft

errors affecting the logic gates was insignificant in previous technologies. However

as technology scales and the logic gates are made from smaller transistors, they

become more susceptible to particle strikes causing errors.

When a particle strike occurs on the logic gate, there is a chance of

more than one flip-flop to be in error. Thus a single strike can cause multiple errors.

This is the new danger being posed on modern ICs that needs to be combated.

Fig.1.2 Particle strike on logic gates

4

CHAPTER 2

DELAYED CAPTURE METHOD AND INTELLIGENT GROUPING

2.1. DEPLAYED CAPTURE METHOD

The delayed capture method uses the fact that the value of the flip-flop will

remain constant during the contamination time that is unless a soft error upset occurs

at the flip-flop at this time. Therefore using a separate flip-flop to save the parity

before the clock edge and then comparing it with the output to ensure no error has

occurred.

Figure 1.3 Delayed capture method

Parity is calculated at two different points at the input of the flip-flop and

at the output of the flip-flops. The input parity is latched at a time dseu + dparity w.r.t

functional clock. Assume that a radiation strike occurs and causes a transient in the

circuit. The transient will die down in Δseu. If the transient is captured in the functional

flip-flop, the pulse will attenuate significantly by the time the input parity is calculated

and the parity flip-flop captures the value using the delayed clock. The output parity

5

is calculated based on the data values captured by the functional flip-flops. Since the

flip-flop captures the wrong value due to the SEU pulse, the parity computed will be

different from the parity latched using the delayed clock. On comparing the input and

the output parity, as shown in Figure, we can get an indication of the presence of soft

errors of width less than Δseu.

2.2 INTELLIGENT GROUPING

Intelligent grouping is needed to reduce the overhead of the flip-flops.

Grouping is done to reduce overhead, but this grouping cannot be done randomly.

Consider particle strike on a gate leading to two flip-flops. This will result in two flip-

flops being in error at the same time. If these are in the same group, then parity

calculation will mask such an error. Therefore, common gates in the fan-in of the flip-

flops in the same group leads to aliasing errors. Intelligent grouping is needed to group

such that effect of SEU will be captured only in single flip-flop within a group. At the

same time, the number of groups must be kept minimum to reduce the overhead. This

is the critical aspect of the method. The intelligent grouping must be performed

efficiently to achieve better reduction in overhead.

Fig.2.1 Simultaneous errors in flip-flops

6

2.3 GRAPH COLOURING

The problem to find the minimum number of groups while maintaining the

effectives of the parity checking is resolved using the graph colouring method. Two

flip-flops are said to be in conflict if they share a common fan-in of gates. Each node in

the graph is a flip-flop and two nodes in conflict are connected by an edge.

Now graph colouring is a mathematical function that attempts to assign a colour to each

node such that no two adjacent nodes have the same colour. At the same time, it aims

to minimize the number of colours used. Now, those flip-flops with same colour are not

in conflict. Therefore, same colours can now be in one group. Graph colouring leads to

large groups with minimal number of colours.

2.4 DERATING METHODS

To further optimize the grouping, derating methods are used to reduce the

conflicts between the flip-flops. This reduction is done by observing certain phenomena

that help prevent the chances of simultaneous error of flip-flops. Only very particle

strikes that happen will reach the flip-flops. The logic gates on the path from the location

of the strike to the flip-flop will attenuate the particle strike induced pulse. Logic levels

will have cascading effect on attenuating the pulse. Pulse originating beyond certain

logic depth from the capture flip-flops will not have any impact. Therefore, the gates in

conflict that are more than a particular logic level need not be considered in the conflict

consideration. Even if the particle strike occurs in the vulnerable time period, it has to

reach the flip-flops at the same time to cause a simultaneous error. This fact can be

exploited so that even if the flip-flops are in the common fanin, there must be equal

propagation delay from that common gate to the corresponding flip-flops for the error

to be registered at the same time. Therefore, these methods can be used to reduce

number of conflicts among the flip-flops.

7

2.4.1 Electric Derating

For the temporary flip in the logic gate to have a change in the flip-flop value,

it must have sufficient amplitude when it reaches the flip-flops. The logic gates on the

path from the location of the strike to the flip-flop will attenuate the particle strike

induced pulse. Logic levels will have cascading effect on attenuating the pulse. Pulse

originating beyond certain logic depth from the capture flip-flops will not have any

impact. Therefore, the gates in conflict that are more than a particular logic level need

not be considered in the conflict consideration. Experiments have shown that particle

strikes more than 3-4 logic levels do not cause any effect on the flip-flop.

Figure 2.2: electrical derating

These concepts can be used to reduce the number of levels of fanin that needs

to be checked for conflicts. Graph colouring will be more effective with lesser number

of interconnected nodes.

8

2.4.1 Timing Derating

The particle strike on a logic gate will get captured only if they reach the

destination flip-flop at clock-edge. The probability of this occurrence reduces the

chances of a pulse resulting in an error. Therefore, each logic element will be vulnerable

only for a fraction of the clock period.

Now, even if the particle strike occurs in the vulnerable time period, it has to reach the

flip-flops at the same time to cause a simultaneous error. This fact can be exploited so

that even if the flip-flops are in the common fanin, there must be equal propagation

delay from that common gate to the corresponding flip-flops for the error to be

registered at the same time. Transient will not be captured in two downstream flip-flops

if the difference of delay from the gate to the two flip-flops in fan-out is significant.

Figure 2.3: Timing derating 2.4.3 Logic derating

The particle strike at the gate will reach the flip-flop only if the path from

the gate to the flip-flop is sensitized. If there are any controlling side inputs to the

gates on the path then the pulse will be masked and thus will not reach the flip-

flops. The circuit has to be analyzed to find out if the flip-flops in conflict are

masked separately from each other. In such a scenario, one of the flip-flops will

have the pulse masked and hence, they both will not be in error at the same time.

This can be used to improve the efficiency of the grouping.

9

2.5 PLACEMENT CONSIDERATION If only the fanin of the flip-flops is considered then far away flip-flops may be

grouped together. This makes wiring on an IC design difficult. Therefore the

grouping has to be take into consideration the location of the flip-flop which

grouping the flip-flops. The placement location of the flip-flops are also taken into

consideration and have an optional argument to include the coordinates of flip-

flops while grouping them. Any flip-flops that are separated by distance greater

than a critical distance are automatically not placed in the same group. Flip-flops

separated by more than a fixed distance are not included in the same group.

Figure 2.4: IC design without placement consideration

10

CHAPTER 3

IMPLEMENTATION OF THE METHODS 3.1 PROJECT OBJECTIVES

Grouping flip-flops in a circuit design by using a minimal number of groups and

ensuring that no two flip-flops in a group are in conflict. This is the primary condition

to be achieved. The efficiency depends on the amount of reduction in the redundant flip-

flops. Derating conditions can be applied on the grouping to further optimize the

grouping.

The available data is

1. fan-in information of different levels obtained from DC shell

2. ITC’s benchmark circuits were used, ranging from 10 to 3,000 flip-flops

3.2 IMPLEMENTATION OF THE METHODS

This section will contain details of the flow adopted to achieve the objectives of

the project. The various formats and programs used are explained.

3.2.1 The fanin input format

The Verilog file is read in DC shell to find the fanin information of the outputs

present in the circuit. This is stored as a text file. The Perl program is then used to read

file and store each fan-in information as an array. A 2D array of the entire fan-in

information in created. An efficient search algorithm has to be implemented to scan all

the arrays and find pairs of arrays that are in conflict.

Figure 3.1: fanin file generated by DC shell

11

3.2.2 The DIMACS graph format

The Perl program creates a conflict graph. This graph is then fed as input to a

graph colouring program. Since the colouring program requires a standard graph format,

the DIMACS format was used. The DIMACS format is a well-known format for

depicting an undirected graph in text file. The first line contains information regarding

the graph and the remaining lines each correspond to an edge in the graph.

A graph colouring program is used to colour the graph. Most graph colouring algorithms

are heuristic and do not always provide an absolute result. The heuristic colouring

algorithm called TabuCol was used for this program. The source code of the program was

modified to work on large circuits.

Figure 3.2: The DIMACS output file format

Figure 3.3: Output of the graph colouring program

12

3.2.3 Producing meaningful output files

The graph colouring program only assigns a colour to each node. This

information has to be then processed to group the flip-flops. The program also provides

all the information for different derating methods in one file so that they can be

compared.

Figure 3.4: Output file with group numbers Figure 3.5: Output file with group information

Figure 3.6: Final output file with all information

13

3.2.4 Implementation of electrical derating The Perl program called generate_electrical.pl was used for this purpose. The program

reads a single fan-in file and creates a conflict graph in DIMACS format. The DC shell

was used to produce fanin file for different derating levels. Electrical derating will be

applied depending on the input fan-in file. (level3 or level9 for underated) Therefore

all the different fanin level files are required to be present in the working directory and

the program will read the file corresponding to the derating level required. Program

can perform intelligent grouping without any derating and to any level of electrical

derating required. Experimentally, electrical derating means only up to level 3 needs

to be considered.

3.2.5 Implementation of timing derating The Perl program called generate_timing.pl was used for this purpose. The program

reads all the fanin level files for the particular module. These are generated from the

DC shell before the start of the program. In order the implement the timing derating,

each logic gate is assumed to have unit delay. Therefore, if the gate that causes the

conflict is in two different levels for the flip-flops, they are not in conflict since by

timing derating, they will not be in error at the same time. The fanin file from DC shell

contains all the gates in the fanin up to that level. Therefore, the fan-in from each level

separately is required for timing derating. Program reads all the fan-in files and finds

creates a 3D array having fan-in information of each fan-in level alone. The search

algorithm is then run separately for each level

Figure 3.7: Implementing timing derating

Level3 fan-in

Only Level3 fan-in Level2 fan-in

nitt project report-nocode

Documents