[ieee 11th ieee international on-line testing symposium - french riviera, france...

6
On Implementing a Soft Error Hardening Technique By Using an Automatic Layout Generator: Case Study Cristiano Lazzari , Lorena Anghel TIMA Laboratory INPG - Institute National Polytechnique de Grenoble Grenoble - France {cristiano.lazzari,lorena.anghel}@imag.fr Ricardo A. L. Reis UFRGS - Universidade Federal do Rio Grande do Sul PPGC - Instituto de Informtica Porto Alegre - RS, Brazil [email protected] Abstract Soft error rates induced by cosmic radiation become unacceptable in future very deep sub-micron technologies. Many hardening techniques at different abstraction levels have been proposed to cope with increased soft error rates. Depending on the abstraction level some techniques need to modify the design at architecture, circuit and transistor level, others required the modification of the circuit lay- out or to use new defined cells within the circuit. In this paper an Automatic layout generator is presented to com- plete the system design process being able to easily gen- erate the hardened design layout, thus reducing the system design time. This work aims at presenting a case study of a complete soft error tolerant integrated circuit by using an automatic layout generator called Parrot Punch. 1. Introduction In deep submicron technologies, decreasing feature sizes and lower voltage levels cause an increase in the soft er- ror failure rate in integrated circuits. Historically, mem- ories have been concerned in the past for single upsets. Efficient solution to memory protection are presented in [1, 2, 3, 4, 5]. However, since the transition time of the logic gates is very short and clock frequencies are increased significantly in future nanometric technologies, error rates in logic parts will increase almost to the same levels as er- Supported by CAPES Brazilian Agency ror rates in memories. Thus, it is mandatory to design logic tolerant to transient faults. In [6], a new technique to implement perturbation toler- ant circuits that cope with soft-errors maintaining accept- able levels of reliability for the commodity applications is presented. The technique takes advantage of the temporal nature of transient faults and achieves transient fault toler- ance by using time redundancy. Some of the mitigation techniques proposed in the liter- ature use structures and modules that are not implemented in standard cell libraries. Thus, classic ASIC standard cell based design cannot be completed. Automatic layout gen- erators may be used to create required robust structures to complete the design flow. An automatic layout generator develops each element (transistors and connections) according to a layout pattern that is intrinsically programmed within its algorithms. Fur- thermore, automatic generation can be flexible to create op- timized layouts to each situation where they are inserted. In these work, it is presented the implementation of this soft error tolerant techniques using the automatic layout gen- erator Parrot Punch[7]. Within Parrot Punch any kind of static CMOS circuit can be generated on-the-fly. Generated layouts have transistors and nets optimized according logic characteristics of the circuit. This tool is used to implement some hardened solution on complex structures such as pro- cessors. This work is described as following. In Section 2 is pre- sented an overview on the layout generation tool. Section 3 shows the soft error tolerance and Section 4 presents the tol- erance technique implemented using Parrot Punch. Finally, Proceedings of the 11th IEEE International On-Line Testing Symposium (IOLTS’05) 1530-1591/05 $20.00 © 2005 IEEE

Upload: ral

Post on 28-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

On Implementing a Soft Error Hardening Technique By Using an AutomaticLayout Generator: Case Study

Cristiano Lazzari∗, Lorena AnghelTIMA Laboratory

INPG - Institute National Polytechnique de GrenobleGrenoble - France

{cristiano.lazzari,lorena.anghel}@imag.fr

Ricardo A. L. ReisUFRGS - Universidade Federal do Rio Grande do Sul

PPGC - Instituto de InformticaPorto Alegre - RS, Brazil

[email protected]

Abstract

Soft error rates induced by cosmic radiation becomeunacceptable in future very deep sub-micron technologies.Many hardening techniques at different abstraction levelshave been proposed to cope with increased soft error rates.Depending on the abstraction level some techniques needto modify the design at architecture, circuit and transistorlevel, others required the modification of the circuit lay-out or to use new defined cells within the circuit. In thispaper an Automatic layout generator is presented to com-plete the system design process being able to easily gen-erate the hardened design layout, thus reducing the systemdesign time. This work aims at presenting a case study ofa complete soft error tolerant integrated circuit by using anautomatic layout generator called Parrot Punch.

1. Introduction

In deep submicron technologies, decreasing feature sizesand lower voltage levels cause an increase in the soft er-ror failure rate in integrated circuits. Historically, mem-ories have been concerned in the past for single upsets.Efficient solution to memory protection are presented in[1, 2, 3, 4, 5]. However, since the transition time of thelogic gates is very short and clock frequencies are increasedsignificantly in future nanometric technologies, error ratesin logic parts will increase almost to the same levels as er-

∗Supported by CAPES Brazilian Agency

ror rates in memories. Thus, it is mandatory to design logictolerant to transient faults.

In [6], a new technique to implement perturbation toler-ant circuits that cope with soft-errors maintaining accept-able levels of reliability for the commodity applications ispresented. The technique takes advantage of the temporalnature of transient faults and achieves transient fault toler-ance by using time redundancy.

Some of the mitigation techniques proposed in the liter-ature use structures and modules that are not implementedin standard cell libraries. Thus, classic ASIC standard cellbased design cannot be completed. Automatic layout gen-erators may be used to create required robust structures tocomplete the design flow.

An automatic layout generator develops each element(transistors and connections) according to a layout patternthat is intrinsically programmed within its algorithms. Fur-thermore, automatic generation can be flexible to create op-timized layouts to each situation where they are inserted. Inthese work, it is presented the implementation of this softerror tolerant techniques using the automatic layout gen-erator Parrot Punch[7]. Within Parrot Punch any kind ofstatic CMOS circuit can be generated on-the-fly. Generatedlayouts have transistors and nets optimized according logiccharacteristics of the circuit. This tool is used to implementsome hardened solution on complex structures such as pro-cessors.

This work is described as following. In Section 2 is pre-sented an overview on the layout generation tool. Section 3shows the soft error tolerance and Section 4 presents the tol-erance technique implemented using Parrot Punch. Finally,

Proceedings of the 11th IEEE International On-Line Testing Symposium (IOLTS’05)

1530-1591/05 $20.00 © 2005 IEEE

Figure 1. Main characteristics of a layout generated by Parrot Punch

Section 5 presents a case study in which the estimation ofcwsp cells in microprocessors were done and in Section 6some conclusions are remarked.

2. Layout Generation Strategy Overview

?

?

Parsers

Transistors Placement

ApplyGate Sizing

TICTAC::SizingIntegration

Connection Analysis

Contacts OrganizationRouting File

Routing Call Router Integration

Constraintsare Satisfied

Internal Routing

Layout

Y

N

Y

N

Placement

Design RulesUser Parameters Circuit Description

Figure 2. The Parrot Punch design flow

The main characteristics of a layout generated by ParrotPunch are presented in Figure 1. In this figure is shown thefolding technique, internal connections, the body tie inser-tion in the diffusion gaps, the attempt for using minimal al-lowed values to transistor spacing and input/output contactsbetween PMOS and NMOS transistors.

The possibility to optimize the circuit logic to a widerange of complex gates, the integration with a timing anal-

ysis and gate sizing tool, the implementation of a new rowbased folding algorithm and a new routing strategy are themain possibilities with Parrot Punch. The Parrot Punch de-sign flow is shown in Figure 2.

Parrot Punch uses a set of input required files as baseto the layout generation. Basically, four files are used asfollowing:

• User Parameters : Some parameters as initial transis-tors width, power supply lines and number of tracksinside of the cell can be chosen by the designer at themoment of layout generation;

• Design Rules : the technology rules furnished by thefoundry is specified in this file. This basically meanswidth, spacing and enclosure between the layers of theprocess;

• Circuit Description : The tool uses a SPICE-like de-scription of the transistors and the connection betweenthem;

• Placement : The Parrot Punch can be used to generatemacro blocks with thousand of transistors. In this case,a file with the relative position of each cell is necessary.

In the Parrot Punch design flow, transistors are placedbased on the Euler Path algorithm in order to maintain thelinear matrix layout style. Another important characteristicis the possibility to implement the folding technique by theintegration with a transistor sizing tool.

In the following, the structures of the Parrot Punch toolare detailed.

2.1. Transistors Placement

The transistor placement module consists on the search-ing for the Euler path in order to choose the position to eachtransistor in a row. This algorithm is responsible for order-ing transistors in such a way that PMOS and NMOS tran-sistors with common gate signal are easily connected.

Proceedings of the 11th IEEE International On-Line Testing Symposium (IOLTS’05)

1530-1591/05 $20.00 © 2005 IEEE

2.2. The Gate Sizing Tool Integration

The accuracy on timing verification is completely relatedwith the effectiveness of the used circuit model. Timinganalysis associated to layout generation can improve timingoptimization characteristics of the circuit design. Besides,timing analysis can be used to improve layout optimizationby gate sizing.

Gate sizing is a technique to optimize each individualgate of the circuit, given certain characteristics. Character-istics refer to widths of the transistors, relations betweengates and their implication in the timing. A tool called TIC-TAC:Sizing [8] is able to analyze a circuit and to realizegate sizing based on some constraints demanded by an auto-matic layout generation tool. The integration between TIC-TAC:Sizing and Parrot Punch is done in such a way thatsized transistor, given by the sizing tool, can by easily ap-plied in the layout generation.

Once transistors are sized, a list of gates and their tran-sistor widths is generated. This list is then used to realizethe gate sizing through the gate folding technique [9]. Thefolding technique consists on breaking a large transistor intosmaller ones, connecte them in parallel and place them con-tinuously with diffusion sharing. This is especially impor-tant in the case of row-based layouts because different tran-sistor sizes in a row cause non-uniform cell heights, whichmay lead to significant waste of area.

2.3. Row Internal Routing

Another module concerns the internal routing. Duringthis step, the contact positions are placed in function ofthe circuit routing. The internal routing is divided intotwo parts. First, polysilicon nets are connected and after,source/drains transistors nets are implemented.

Polysilicon RoutingIt consists on connecting transistors and input/output

contacts placed by the router that shares the same signal.

Source/drain connectionsThese connections are implemented with the first and the

second metal layers. The first metal layer is always used inwires between P and N diffusion strips. The second metallayer is used when the logic function demands connectionsover the transistors (i.e. in complex gates).

This tool is presented in details in papers [7, 8, 9]. Oncomparison with another academic tool with the same pur-pose [7], results show a gain of around 30% in the occu-pied area and around 20% in the propagation time whenISCAS85 benchmarks are used for evaluation. An impor-tant factor is the execution time to generate the layouts. Thebigger circuit generated with Parrot Punch is the c7552 witharound 15 thousand transistors and the layout were imple-

mented in 17 hours. Layouts with 100 or less transistors canbe generated in few seconds.

3. Perturbation Tolerance with Time Redun-dancy

In [6], it is presented a technique aiming at taking advan-tage of the temporal nature of transient faults, and achievetransient fault tolerance by using time redundancy. Thistechnique should lead to a significant reduction of hard-ware cost compared to the TMR classic solution becausethe main idea is to combine self-checking design with timeredundancy.

The fact that soft-errors affect the outputs of the circuitonly for short time duration can be exploited by using asyn-chronous sequential elements. These elements produce onits outputs a determined state for each correct input. Thisstate corresponds to the circuit fault-free operation. In addi-tion, the element preserves its previous output state for eacherroneous input. In addition, if a transient pulse changesan fault free input into an erroneous one, the output stateproduced by the correct input is preserved.

A way to generate a Code Word State Preserving(CWSP) element is to replace each transistor of the gateby a pair of transistors connected in series and driven byduplicated inputs. In this gate, when the inputs of a pair oftransistors are equal, the two transistors behave as a singletransistor driven by one of the duplicated inputs.

When the inputs of one or more transistor pairs have notequal values due to a transient error, the two transistors ofthe pairs behave as a single transistor in off state. This sit-uation preserves the same state at the output of the CWSPelement, as the same state obtained before the transient er-rors drive some input into non-equal values.

S

A

A*

B

B*

A*

A

B

B*

A

A* B*

B

B*

B

A

A*

SA*

A

A*

A S

Figure 3. Inverter, NOR and NAND gates usingthe duplication input code

Figure 3 shows examples of the inverter, NOR andNAND gates using this principle. In the time redundancyapproach, instead of circuit duplication we can duplicate the

Proceedings of the 11th IEEE International On-Line Testing Symposium (IOLTS’05)

1530-1591/05 $20.00 © 2005 IEEE

output signal of the circuit in the time domain, by observingthis signal at two different instants. One of the inputs of theCWSP element is coming directly from the combinationalcircuit output while the other input is delayed.

CWSP

Delay Block

Flip−FlopCircuitInverter

Figure 4. Perturbation tolerant circuit basedon time redundancy

Figure 4 shows the implementation of perturbation tol-erant combinational circuit based on time redundancy. Thedelay block must be able to degrade the signal at the inputof the CWSP cell according the time of the transient faultthat we achieve to tolerate. The time penalty in this case isDcw + 2 × Dtr, where Dcw is the logic transition time andDtr is the duration of the transient pulse.

3.1. Perturbation Tolerance to Combinational andSequential Blocks

It is clear that the perturbation tolerance technique pre-sented in [6] and reported in this section can deal with thetransient faults in combinational blocks. However, a bit-flipmay occur if a α−particle hits on an internal node of a Flip-Flop.

A way to provide perturbation tolerance to both com-binational and sequential blocks is using the TMR (TripleModular Redundancy) technique in such way we can guar-antee that the fault is only propagated to one Flip-Flop.

The Figure 5 shows two implementations of the TMRtechnique targeting perturbation tolerance to combinationaland sequential blocks. In the first implementation (a), theclock signal of each Flip-Flop (φ1,φ2 and φ3) is delayed

Flip−Flop

Flip−Flop

Flip−Flop

φ

φ

φ

Flip−Flop

Flip−Flop

Flip−Flop

φ3

φ2

φ1

Vot

er

Vot

er

δ2

δ1

(a) (b)

Figure 5. Perturbation tolerance with TMR

CLK

CLK

CLK

CLK

CLK

CLK

D Q

δ

δ

CLK

CLK

(a)

CLKQ

(b)

D

Figure 6. D-latch: (a) Classic, (b) Using CWSPlogic

in such a way the input signal is captured at three differentmoments. Thus, a transient fault at the input is captured byonly one Flip-Flop. The same idea is used in Figure 5(b),where the same clock signal is used in the tree Flip-Flops,but the input signal is delayed by Delay Blocks (δ1 and δ2).Assuming δ2 = 2×δ1, the time penalty of these TMR tech-niques over the time of a D-FlipFlop is 2×δ+TDelayV oter .The resulting area penalty is shown in the table 3.

These techniques deal with the perturbation tolerance,but their consequences in the design process are the in-creased area due to the triplication of any Flip-Flop of thecircuit and also the increased number of interconnectionsthat may influence the clock signal.

A latch design is proposed in [10] which allows to toler-ate faults affecting the internal nodes of conventional latchstructures. This new latch can deal with faults hitting inter-nal nodes but it is not able to support faults coming from thecombinational parts or faults affecting directly the output.

A technique targeting perturbation tolerance to bothcombinational an sequential logics is proposed in this work.It uses the timing redundancy technique presented in [6]to provide fault tolerance in circuits. To tolerate transientfaults in combinational circuits as well as in the sequentialelements (e.g. latches) our approach uses a modified latchwhere the the last inverter stage of the combinational circuitis replaced by a CWSP inverter, while a delay block hasbeen introduced in the feedback path of the latch.

Figure 6 shows in (a) a classic d-latch and in (b) theimplementation of a d-latch using the CWSP logic. The

Proceedings of the 11th IEEE International On-Line Testing Symposium (IOLTS’05)

1530-1591/05 $20.00 © 2005 IEEE

technique uses a CWSP inverter and Delay Blocks intend toachieve fault tolerance through timing redundancy. The re-sults obtained by the comparison of the Classic and CWSPd-latches are presented in the following.

It was proved by transient fault simulation that theseCWSP d-latches have 100% transient fault coverage. Con-cerning fault injection experiments, it is important to notethat all internal nodes have been considered as well as the Dand CLK inputs, transistors sizing was used to ensure com-plete robustness against transient fault ranging from 250psand 500ps.

4. Layout Generation of CWSP Cells

4.1. Transient Fault Model

The device-level fault model presented in [11] is usedin this work to estimate the effects of single events in thedeveloped cells. Transient faults caused by α-particles indigital circuits have been modeled by the double exponen-tial current representing the pulse resulting to the affectednode.

I(t) =Q

α − β

(e−t/α − e−t/β

)(1)

In equation 1, Q is the injected charge and may be pos-itive or negative, α is the collection time constant of thejunction and β is the time constant for initially establishingthe ion track. α and β are constants and depend on severalprocess-dependent factors.

4.2. Layout Generation and Simulation

The process of layout generation of CWSP elementswere developed based on a list of transistor and connections.The Parrot Punch Tool automatically generated the cells.The electrical simulations were used to extract the propaga-tion delay of the typical and CWSP cells. The used tech-nology was 0.18µm and an output capacitance of 3fF wasused. This capacitance represents the input capacitance of aD-FlipFlop giving by the 0.18µm technology data sheet.

Table 1. Area and Delay of Typical Cells andCWSP Cells

Area (µm2) Delay (ps)Typ. CWSP Over. Typ. CWSP Over.

INV 8.19 11.64 42% 83 160 92%NAND 12.28 19.07 55% 98 172 75%NOR 12.28 19.07 55% 102 260 154%

Table 1 presents the occupied area and propagation timeof CWSP cells according Figure 3 in comparison of typicalcells presented in a 0.18µm standard cell library. It is shown

that the area overhead is between 42% and 55% and thepropagation time is between 92% and 154% applying thesame output capacitance to both gates.

Table 2. Total Area and Delay of CWSP cellsTrans. (ps) Area (µm2) Delay (ps)

INV 250 28.8 323500 46.0 538

NAND 250 59.2 370500 91.2 559

NOR 250 59.2 352500 91.2 572

In this work, it is assumed the development of CWSPcells supporting transient faults ranging from 250ps of500ps. Thus, the delay blocks were developed and insertedin the CWSP gates in order to generate hardened cells. Ta-ble 2 presents the total area and propagation time of the newCWSP cells developed as shown in figure 4.

Table 3. TMR and CWSP D-FlipFlops areaoverhead

Areaµm2 Overhead

Classic 57.6 −CWSP (250ps) 181.7 215%CWSP (500ps) 249.6 333%TMR (250ps) 206.1 258%TMR (500ps) 206.1 258%

Table 3 presents the comparison between a classic D-FlipFlop found in a 0.18µm standard cell library and thetransient robust latch proposed in Section 3 (Figure 6) andthe TMR FlipFlop. We assume that delay blocks in theTMR D-FlipFlops can be shared between all FlipFlops inthe same clock domain. Despite of that, results show thatthe robust CWSP FlipFlops present smaller area overheadagainst faults of 250ps in comparison of the TMR tech-nique.

5. Implementation of CWSP cells on micropro-cessors

A case study was done in order to verify the penalties ofthe insertion of CWSP cells in a MIPS-like processor and a8051 controller. The logic synthesis and technology map-ping were done with Synopsys Design Compiler and thewhole circuit layout was generated with Cadence SiliconEnsemble. The CWSP robust latch layout was generated byusing Parrot Punch tool and inserted in the system layout.

Table 4 shows the occupied area and frequency in theMIPS and 8051 architectures mapped to a 0.18µm technol-ogy. The area overhead to the TMR is constant in the imple-mentation to faults of 250ps and 500ps because we assume

Proceedings of the 11th IEEE International On-Line Testing Symposium (IOLTS’05)

1530-1591/05 $20.00 © 2005 IEEE

Table 4. CWSP and TMR Techniques on microprocessorsMIPS 8051

No. of Comb. Elements 11,968 5,408No. of Flip-flops 1,793 1,359

Area Frequency Area Frequencyµm2 Overhead MHz Penalty µm2 Overhead MHz Penalty

Classic 480,317 − 77.7 − 234,720 − 58.2 −CWSP (250ps) 746,480 55% 75.8 2.4% 436,240 85% 57.2 1.8%CWSP (500ps) 890,172 85% 72.7 6.7% 550,560 134% 55.4 5.0%TMR (250ps) 808,200 68% 73.9 5.1% 491,400 109% 56.0 3.8%TMR (500ps) 808,200 68% 71.2 9.0% 491,400 109% 54.5 6.8%

that the delay blocks are shared to every D-FlipFlop. Wealso assume that three clock lines are not a problem in thedesign of these microprocessors.

The results in Table 4 show that we can deal with theproblem of transient faults in combinational and sequen-tial parts by using CWSP robust latch with smaller area andtime penalties than using TMR at latch level. To deal withfault duration of 250ps, the CWSP technique is always thebest solution concerning area overhead and delay penalty.Hardening designs aiming at faults of 500ps present the bestarea overhead with the TMR technique but the CWSP tech-nique always provide better time results.

In addition, the TMR technique with three clock signalscan be a problem in bigger designs. Thus, additional stepsas buffer insertion or transistor sizing may be necessary inthe clock tree in order to guarantee the functioning of thetechnique, due inherent CTS (Clock Tree Synthesis) androuting problems.

6. Conclusion

This work presented the layout generation of time re-dundant cells to be used in the synthesis of integrated cir-cuits. The importance of an automated process to generatethese kind of cells is related to the need for generating hard-ened circuits for several applications. Besides, typical celllibraries do not present any kind of fault tolerant cells.

Furthermore, a case study of sequential elements target-ing perturbation tolerance in combinational and sequentialblocks is presented. MIPS-like and 8051 microprocessorswere implemented based on robust CWSP latches in orderto compute area overhead and time penalties.

References

[1] D. Bessot, R. Velazco. “Design of SEU-HardenedCMOS Memory Cells: The HIT Cell”. Proceedings1994 RADECS Conference. pp 563-570.

[2] T. Calin, M. Nicolaidis, R. Velazco. “Upset HardenedMemory Design for Submicron CMOS Technology”.

33rd Int. Nuclear and Space Radiation Effects Confer-ence. July 1996. Indian Wells, CA.

[3] M. Nicolaidis, T. Calin. “A Theory of PerturbationTolerant Asynchronous FSM and its Application onthe Design of Perturbation Tolerant Memories”. 1997European Test Workshop, Cagliarri, 28-30 May, 1997.

[4] L. Rockett, “An SEU Hardened CMOS Data LatchDesign”, IEEE Transaction on Nuclear Science. volNS-35,n.6, Dec, 1988. pp 1682-1687.

[5] S. Whitaker, J. Canaris, K. Liu. “SEU Hardened Mem-ory Cells for a CCSDS Reed Solomon Encoder”,IEEE Transaction on Nuclear Science. vol NS-36, n.6,Dec. 1991. pp. 1471-1477.

[6] L. Anghel, D. Alexandrescu and M. Nicolaidis. Eval-uation of Soft Error Tolerance Technique Based onTime and Space Redundancy. SBCCI 2000.

[7] C. Lazzari, C. Domingues, J. Gntzel and R, Reis. “ANew Macro-cell Generation Strategy for Three MetalLayer CMOS Technologies”. VLSI-Soc. 2003. p. 193-197.

[8] C. Santos, C. Lazzari, G. Wilke, J, Guntzel, R, Reis.“A Transistor Sizing Method Applied to an AutomaticLayout Generation Tool”. SBCCI, 2003. Sao Paulo. pp303-307.

[9] F. Bastian, C. Lazzari, J.Guntzel, R. Reis. “A NewTransistor Folding Algorithm Applied to an Auto-matic Full-Custom Layout Generation Tool”, PAT-MOS2004, 14th International Workshop on Power andTiming Modeling, Optimization and Simulation, San-torini, September 15-17, 2004. LNCS 3254 Springer.p. 732-741.

[10] Omana, M. Rossi, D. Metra, C. “Novel TransientFault Hardened Static Latch”, ITC03, InternationalTest Conference, 2003. pp 886-892.

[11] G, Messenger, “Collection of Charge on junctionnodes from ion tracks”. IEEE Transactions on NuclearScience, 1982. pp. 2024-2031.

Proceedings of the 11th IEEE International On-Line Testing Symposium (IOLTS’05)

1530-1591/05 $20.00 © 2005 IEEE