a leakage current replica keeper for dynamic...

A Leakage Current Replica Keeper for Dynamic Circuits

Yolin Lih, Nestoras Tzartzanis, and William W. Walker

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007

By Eric Liskay

1Tuesday, March 13, 12

Introduction• Dynamic Logic

• Clock signal is used to evaluate combinational logic

• Two Phases

• Clock is low: Pre-charge / Setup Phase

• Clock is high: Evaluation Phase

• Output can decay if clock speed is too slow

• Good for high-speed OR and AND-OR Gates a multi-port memories

Static Dynamic

CL


Introduction

• Keepers

• Needed with dynamic gates to maintain a high state during evaluation

• Without a keeper, a minimum clock frequency must be maintained at all times, or the clock must be stopped only in the pre-charge state• This would make two-phase dynamic design impossible and complicating design-for-

test methodologies such as scan and IDDQ

48 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007

A Leakage Current Replica Keeperfor Dynamic Circuits

Yolin Lih, Member, IEEE, Nestoras Tzartzanis, Member, IEEE, and William W. Walker, Member, IEEE

Abstract—We present a leakage current replica (LCR) keeperfor dynamic domino gates that uses an analog current mirrorto replicate the leakage current of a dynamic gate pull-downstack and thus tracks process, voltage, and temperature. Theproposed keeper has an overhead of one field-effect transistor pergate plus a portion of a shared current mirror. Techniques forproperly sizing LCR keepers are presented. Using these sizings,LCR keepers allow design of AND–OR circuits with 30% more legsthan conventional keepers at the same noise margin in a 90-nm,1.2-V CMOS logic process. Furthermore, 16–24-leg dynamic AOcircuits are 25%–40% faster when using the replica keeper. Wedemonstrated the circuit operation on a 1024 words 72 bits,3W/4R embedded SRAM macro using a four-stage LCR-keeperdomino structure for a read-out circuit.

Index Terms—Dynamic logic, keeper, leakage, precharged logic,process variation, register file, SRAM.

I. INTRODUCTION

DYNAMIC gates [1] are indispensable for constructingwide high-speed OR and AND–OR gates in CMOS. They

are especially useful in multiport memories [2], [3] wheresingle-ended read bit-lines are needed for compactness and(even) low-power consumption. Dynamic gates (Fig. 1) needkeepers to maintain a high state during evaluation. Without akeeper, a minimum clock frequency must be maintained at alltimes, or the clock must be stopped only in the precharge state,making two-phase dynamic design impossible and compli-cating design-for-test methodologies such as scan and IDDQ.A conventional keeper is a small pFET pull-up transistor thatmust satisfy two corner design constraints, given here.

1) In the slow-pFET/fast-nFET (sPfN) process corner, thekeeper must source enough current to overpower the nFETlogic stack leakage current.

2) In the fast-pFET/slow-nFET (fPsN) process corner, thekeeper must be weak enough so that a single nFET legcan pull the dynamic node quickly enough through theswitching threshold of the succeeding static gate in orderto meet the delay specifications.

These constraints are contradictory in that increasing themargin for one reduces the margin for the other. As the logicwidth of a dynamic gate increases, eventually a point of zeromargin for both constraints is reached, which defines the max-imum possible dynamic gate logic width for a given technology

Manuscript received April 14, 2006; revised August 3, 2006.Y. Lih is a technical consultant (e-mail: [email protected]).N. Tzartzanis and W. W. Walker are with the Fujitsu Laboratories of

America, Sunnyvale, CA 94085-5401 USA (e-mail: [email protected];[email protected]).

Digital Object Identifier 10.1109/JSSC.2006.885051

Fig. 1. Generalized conventional dynamic gate topology.

Fig. 2. Conditional keeper.

node. As CMOS processes are scaled, threshold voltages ( )are scaled down. The exponential dependence of off-currenton leads to an exponential increase in the keeper transistorwidth to satisfy constraint 1. However, then constraint 2 canonly be met with a smaller maximum dynamic gate logic width.Unfortunately, memories only get larger as technology scales.Consequently, partitioning of read paths into more and smallerstages of local, semi-global, and global bit-lines using narrowerAND–OR and OR structures is needed, and it becomes impossibleto scale performance of memories at the same rate as the basic(e.g., inverter with fanout of 4) technology delay. The ultimateresult is the demise of the dynamic gate [4].

Several solutions to the dynamic gate scaling problem havebeen proposed. A conditional keeper [5] circuit consists of twopFETs, one connected identically to a conventional keeper andthe other controlled by a delayed clock signal and the dynamicnode (Fig. 2). During evaluation, the extra pFET is condition-ally enabled after a time set by the delayed clock signal if thedynamic node has not been already discharged. The conditionalkeeper requires a minimum of five extra FETs in each gate plusa portion of the shared delay circuit. In the adaptive keeper [6],[7] circuit (Fig. 3) and its variations [8], [9], additional keepertransistors are conditionally enabled based on a circuit that es-timates the process corner [6]–[8] or temperature [9]. The over-head of the adaptive keeper circuit is four pFETs per gate plus aportion of the shared circuitry that generates the control signals.In a similar approach [10], all of the keeper paths are condition-ally controlled through a process-dependent 3-bit code resulting

0018-9200/$20.00 © 2007 IEEE






I. INTRODUCTION













0018-9200/$20.00 © 2007 IEEE


Introduction

• Conventional Keeper

• A small pFET pull-up transistor that must satisfy two constraints1. In the slow-pFET/fast-nFET (sPfN) process corner, the keeper must source enough

current to overpower the nFET logic stack leakage current2. In the fast-pFET/slow-nFET (fPsN) process corner, the keeper must be weak enough

so that a single nFET leg can pull the dynamic node quickly enough through the switching threshold of the succeeding static gate in order to meet the delay specifications.

• Contradictory, results in maximum gate width






I. INTRODUCTION













0018-9200/$20.00 © 2007 IEEE






I. INTRODUCTION













0018-9200/$20.00 © 2007 IEEE


Introduction• The Problem

• Threshold voltages (Vt) are scaled down along with smaller CMOS processes

• To satisfy constraint 1, there needs to be an exponential increase in keeper transistor width because of the exponential dependence of off-current on Vt

• To then satisfy constraint 2, the maximum dynamic gate logic width must decrease

• Memories are getting larger as the technology scales

• Partitioning of read paths requires narrower AND-OR and OR

• Performance scaling of memories at the same rate as the basic (FO4 inverter) delay becomes impossible

• The result is “the demise of the dynamic gate”


Introduction

• Possible Solution - Conditional keeper circuit

• Requires an additional five extra FETs in each gate plus a portion of the shared delay circuit

• Extra pFET controlled by a delayed clock signal and the dynamic node

• This pFET is conditionally enabled during evaluation after a time set by the delayed clock signal if the dynamic node has not already been discharged






I. INTRODUCTION













0018-9200/$20.00 © 2007 IEEE






I. INTRODUCTION













0018-9200/$20.00 © 2007 IEEE


Introduction

• Possible Solution - Adaptive keeper circuit

• Keeper transistors are conditionally enabled based on a circuit that estimates the process corner or temperature

• Adds four pFETs per gate plus a portion of the shared circuitry that generates the control signals

• Neither of these options tracks the two critical process corners (fast-P/slow-N and slow-P/fast-N)

LIH et al.: LEAKAGE CURRENT REPLICA KEEPER FOR DYNAMIC CIRCUITS 49

Fig. 3. Adaptive keeper.

Fig. 4. LCR keeper dynamic gate topology.

in 11 FETs overhead per gate. None of these solutions tracks thetwo critical process corners: fast-P/slow-N and slow-P/fast-N.

In this paper, we present the leakage current replica (LCR)keeper [11] (Fig. 4), which is a circuit that addresses the short-comings of the conventional keeper and previously proposed en-hancements. The LCR keeper uses a conventional analog cur-rent mirror that tracks any process corner as well as voltageand temperature. The only variation that the LCR keeper cannottrack is random on-die variation, which still must be addressedusing conventional margining. A single current mirror structurecan be shared among more than one dynamic gates. The LCRkeeper overhead is one pFET per dynamic gate plus a portion ofthe shared current mirror circuit. In Section II, we will show thatthe LCR keeper can either allow wider dynamic gates in a giventechnology or improve the speed of a fixed-width dynamic gate.To demonstrate its efficacy, we implemented the read-path of a1.2-V 90-nm CMOS 1024 words 72 bits, 3-write, 4-read portSRAM using a four-stage LCR-keeper domino gate. We will de-scribe the SRAM design and measurements in Section III. In theconclusion, we will summarize the design guidelines for LCRkeepers.

II. LCR KEEPER

A. Circuit Design Principles

Fig. 1 shows a generalized AND–OR dynamic gate withpull-down legs. and denote the dynamic node andthe clock/precharge signal, respectively, and and

are the dynamic gate inputs. An example applica-tion is a single-ended read bit-line in a memory, where isthe bit-line, is the read word lines, andrepresents the bits stored in the cells. The worst case leakagecurrent in the pull-down network occurs withand , in which case it is essentially equivalentto an OR with the sources of connected to ground.

A dynamic gate with an LCR keeper (Fig. 4) includes oneextra series pFET and a replica current mirror that can be sharedamong all gates having the same topology. The current mirrortracks the leakage current and copies it into the dynamic gatethrough . The overhead per gate is plus a portion of theshared current mirror. Assuming that is the dynamic gateleakage current, we construct a current mirror to draw ,where is a safety factor that is discussed in detail later. NFET

is used in the current mirror as a replica of the worst caseleakage current. Its gate is connected to ground and its sizeis derived from the sizes of nFETs . NFETsand have the same (generally minimum) channellength. Assuming and have the same dimensions, thewidth of is set equal to the sum of the widths oftimes the safety factor . To reproduce the narrow width ef-fect of , is implemented using multiple fingers.Transistor then mirrors the replica leakage current to

. Devices and are designed using large to elimi-nate channel-length modulation and to reduce variation. Thesize of is not critical, but it should use minimum to re-duce output loading, and it must be large enough so that itsdrain-to-source voltage is negligible when on. For the analysisthat follows, it is assumed that, when on, is a virtual shortand the potential at the drain of is equal to the potential ofthe dynamic node .

The safety factor is set by ratioing the transistor geome-tries. Assuming that and have the same channel lengthand that all nFETs have the same channel length, is given by

(1)

where , , and denote the width of , ,and , respectively, and denotes the width of

.An LCR current mirror deviates from a conventional analog

current mirror in which and operate in saturation. In anLCR current mirror, must be pulled close to to avoidcompromising the noise immunity of the dynamic gate, sooperates in the triode region. Let denote the potential ofnode and denote the threshold voltage of . With

, is pulled up first to , whereexits saturation, and continues to rise until ’s triode regioncurrent matches the actual leakage (Fig. 5). Node voltageis critical. If is too low, the replica fails to track the dy-namic gate leakage current since the drain-to-source voltageof would deviate significantly from the drain-to-sourcevoltage of . If is too high, the sensitivity topFET threshold voltage variation increases since operates inweak-overdrive.

Now we derive a required safety factor. In addition to forcinginto the triode region, is used to margin against on-die

random variation between the transistors in the dynamicgate and the current mirror. Both factors must be consideredin choosing for a specific CMOS process, leading toformulated as the product of two safety factors

(2)






II. LCR KEEPER








(1)







(2)


The LCR Keeper

• Presented in this paper: Leakage Current Replica Keeper• Overhead is only one pFET per dynamic gate plus a portion of the

shared current mirror circuit• Uses a conventional analog current mirror that can track any

process corner, voltage, or temperature• Can be shared between multiple dynamic gates• Cannot track on-die variations






II. LCR KEEPER








(1)







(2)






II. LCR KEEPER








(1)







(2)


Circuit Design Principles

• AND-OR Dynamic Gate

• N+1 pulldown legs

• DN: Dynamic Node

• Inputs: A0,...,An and B0,...,Bn






I. INTRODUCTION













0018-9200/$20.00 © 2007 IEEE






I. INTRODUCTION













0018-9200/$20.00 © 2007 IEEE


Circuit Design Principles

• Example Application

• Single-ended read bit-line in a memory

• DN is the bit-line

• A0,...,An are the read word lines

• B0,...,Bn represent the bits stored in the cells

• Worst case leakage current in the pull-down network occurs when A0-An = 0 and B0-Bn = 1.






I. INTRODUCTION













0018-9200/$20.00 © 2007 IEEE






I. INTRODUCTION













0018-9200/$20.00 © 2007 IEEE


The LCR Keeper

• The current mirror tracks the leakage current and copies it into the dynamic gate through p1

• Mirror is constructed to draw a current of sf • Ileak

• nrpl is used in the current mirror as the replica of the worst case leakage current• Its size is derived from the sizes of nFETS n0 to nn

• Assuming p1 and p3 have the same dimensions, the width of nrpl is set to be equal to the sum of the widths of n0,..,nn times the safety factor sf






II. LCR KEEPER








(1)







(2)






II. LCR KEEPER








(1)







(2)


The LCR Keeper

• DN must be pulled close to VDD in order to retain the noise immunity of the dynamic gate

• Node voltage VKPR is critical• If VKPR is too low, the replica fails to track the dynamic gate leakage current

since VDS of nrpl would deviate significantly from the VDS of n0,...,nn

• If VKPR is too high, the sensitivity of the pFET threshold voltage variation increases since p1 operates in weak-overdrive






II. LCR KEEPER








(1)







(2)






II. LCR KEEPER








(1)







(2)


Fig. 5. LCR keeper current ratio.

where protects against nFET variation betweenand and pulls into the triode region andprotects against pFET variation between and .

The required factor is equal to the ratio between themaximum and minimum nFET leakage current over the worstcase on-die variation

(3)

The subthreshold drain-to-source current is given by [12]

(4)where and are the transistor dimensions, is a process-dependent constant, is the gate-to-source voltage, is thenFET threshold voltage, is the drain-to-source voltage,is a dimensionless ideality factor between 1 and 2, and is thethermal voltage, with , approximately 26 mV at 27 C.

For equal device dimensions and the same drain-to-sourcevoltage , combining (3) and (4) gives

(5)

where is the maximum and the minimumnFET threshold voltage, considering random on-die variations.

The required safety factor is equal to the ratio be-tween the maximum current and the minimum currentwhile taking into account random on-die variations and their re-gion of operation

(6)

Neglecting channel length modulation, the maximumdrain-to-source current is given by

(7)

where and are the transistor dimensions, is the pFETmobility, is the gate-oxide capacitance per unit area,is the gate-to-source voltage of , and is the minimumpFET threshold voltage.

With operating in the triode region, its minimum drain-to-source current is given by

(8)

where is the gate-to-source voltage of , is thedrain-to-source voltage of , and is the maximumpFET threshold voltage.

For equal device dimensions, substituting (7) and (8) into (6)gives

(9)

Both and have the same gate-to-source voltage

(10)Also, the drain-to-source voltage of is

(11)

Plugging (10) and (11) into (9), we get a final expression forthe required safety factor

(12)

If, in addition to threshold voltage, we also consider supplyvoltage variation, (12) becomes

(13)

where and are the maximum and minimumsupply voltages.

Next, we provide an example of a safety factor calculationusing some typical parameters of a 1.2-V 90-nm CMOS process.Assume that the nominal short-channel nFET threshold voltageis 120 mV with 40 mV random on-die variation and that thenominal long-channel pFET threshold voltage is 150 mV with

25-mV random on-die variation. We ratio and fornominal V. Minimum voltage on node is setto 1.1 V. Plugging these numbers into (5) and (12), we obtain

mV mVmV

(14)


Fig. 5. LCR keeper current ratio.

where protects against nFET variation betweenand and pulls into the triode region andprotects against pFET variation between and .

The required factor is equal to the ratio between themaximum and minimum nFET leakage current over the worstcase on-die variation

(3)

The subthreshold drain-to-source current is given by [12]

(4)where and are the transistor dimensions, is a process-dependent constant, is the gate-to-source voltage, is thenFET threshold voltage, is the drain-to-source voltage,is a dimensionless ideality factor between 1 and 2, and is thethermal voltage, with , approximately 26 mV at 27 C.

For equal device dimensions and the same drain-to-sourcevoltage , combining (3) and (4) gives

(5)

where is the maximum and the minimumnFET threshold voltage, considering random on-die variations.

The required safety factor is equal to the ratio be-tween the maximum current and the minimum currentwhile taking into account random on-die variations and their re-gion of operation

(6)

Neglecting channel length modulation, the maximumdrain-to-source current is given by

(7)

where and are the transistor dimensions, is the pFETmobility, is the gate-oxide capacitance per unit area,is the gate-to-source voltage of , and is the minimumpFET threshold voltage.

With operating in the triode region, its minimum drain-to-source current is given by

(8)

where is the gate-to-source voltage of , is thedrain-to-source voltage of , and is the maximumpFET threshold voltage.

For equal device dimensions, substituting (7) and (8) into (6)gives

(9)

Both and have the same gate-to-source voltage

(10)Also, the drain-to-source voltage of is

(11)

Plugging (10) and (11) into (9), we get a final expression forthe required safety factor

(12)

If, in addition to threshold voltage, we also consider supplyvoltage variation, (12) becomes

(13)

where and are the maximum and minimumsupply voltages.

Next, we provide an example of a safety factor calculationusing some typical parameters of a 1.2-V 90-nm CMOS process.Assume that the nominal short-channel nFET threshold voltageis 120 mV with 40 mV random on-die variation and that thenominal long-channel pFET threshold voltage is 150 mV with

25-mV random on-die variation. We ratio and fornominal V. Minimum voltage on node is setto 1.1 V. Plugging these numbers into (5) and (12), we obtain

mV mVmV

(14)


The Safety Factor

• Set by ratioing the transistor geometries

• Assumes p1 and p3 have the same channel length and that all nFETs have the same channel length

• Forces p1 into the triode region






II. LCR KEEPER








(1)







(2)


The Safety Factor

• Margins against on-die random Vt variation between the transistors in the dynamic gate and current mirror

• sf∆ileak protects against nFET Vt variation between nrpl and n0,...,nn

• sfmirror pulls p1 into the triode region and protects against pFET Vt variation between p1 and p3






II. LCR KEEPER








(1)







(2)


Deriving The Safety Factor

(4)

Maximum over minimum nFET leakage current

Subthreshold drain-to-source current (IDS)

For devices with equal dimensions and VDS



Maximum p3 current over minimum p1 current

For equal device dimensions



(10)Given And

(13)

If we consider supply voltage variation


The Safety Factor• Example

• 1.2V 90nm CMOS process

• Assume Vtn = 120mv ± 40mV Vtp = 150mv ± 25mV

• Ratio p3 and nprl for nominal VKPR = 0.9V

• Minimum voltage on node DN is set to 1.1V

• sf = 7.78 • 2.04 = 5.87

• With a 5% supply voltage variation (VDD=1.17-1.23V)

• sfmirror =5, thus sf = 38.9

mV mVmV

VV V V V V

(15)


Circuit Simulation

• LCR keeper structure evaluated against the conventional keeper

• Same 1.2V 90nm CMOS process at 110°C

• Target application is a multiport memory

• Wide AND-OR structure with 1µm pull-down nFETs

• Both the conventional and the proposed keeper structures were sized to sustain a maximum voltage drop of 10% of VDD on the dynamic node DN under the pull-down current conditions for the worst-case with slow-P and fast-N transistors


VV V V V V

(15)

Therefore, from (2), (14), and (15), we calculate that a safetyfactor of 15.9 should be appropriate for this example. If in addi-tion to threshold voltage variation we also consider a 5% supplyvoltage variation (i.e., V and

V), (13) gives that , and so .Our derivation of the required safety factor was based on ideal

FET models, and is meant to show semi-quantitatively the needfor a large value. It is also possible to derive the sensitivity ofthe safety factor to design parameters such as by ana-lyzing (9) and (12). The actual required safety factor should beconfirmed by simulating over all process corners consideringall sources of on-die process variation or, even more effectively,by performing Monte Carlo analysis. After the required hasbeen determined, it is implemented by sizing the LCR keeperand current mirror transistors according to (1). Since the cur-rent mirror is shared between many dynamic gates, the safetyfactor in each gate is kept constant by adjusting the widthof its transistor. Using (1), the width of for each dynamicgate is given by

(16)

Other effects to be considered, especially for sub-90-nmprocesses, are drain-induced barrier lowering (DIBL) [13] andgate tunneling leakage. Both gate leakage current and excessdrain–source leakage current due to DIBL of the pull-downstack are also tracked by the LCR keeper, but tracking is notideal because the drain-to-source voltage of is ap-proximately equal to , whereas the drain-to-source voltageof is equal to . We also need to consider pFET gateleakage current. Referring to Fig. 4, the gate leakage of isnegligible compared to the subthreshold leakage of due to

’s weak gate bias and small size. However, if a large numberof dynamic gates are driven by the same current replica, thenthe sum of the currents through each gate’s is sunk through

, driving up the node voltage of KPR. This effect imposesa practical limit to the number of gates that can be driven bya replica current mirror. The pFET gate leakage problem isavoided simply by scaling up the replica current mirror as gatesare added. Scaling the replica has the added benefit of providingbetter averaging of the threshold voltages.

B. Circuit Simulation Results

To evaluate the LCR keeper structure, we compared it againsta conventional keeper using dynamic gate delay at equal noisemargin for the same 1.2-V 90-nm CMOS process at 110 C asthe metric. The target application is a multiport memory, and weassumed a wide AND–OR structure with 1- m pull-down nFETsbiased under worst-case conditions for leakage current (Fig. 6).In our four-read-port memory design, we assumed that 25% ofthe word lines could be coupled with their voltage being raised

Fig. 6. Experimental setup for worst case noise in pull-down nFET network.

to ns, equal to 15% of . The worst case noise assump-tions used for this experiment exceed the worst case scenariofor the target memory since at most two read word lines canbe victims of coupling. Both the conventional and the proposedkeeper structures were sized to sustain a maximum voltage dropof 10% of on the dynamic node under the pull-downcurrent conditions for the worst-case with slow-P and fast-Ntransistors. The maximum allowed voltage drop is well abovethe acceptable input high voltage of the inverter. The gain ofthe inverter voltage-transfer characteristic becomes 1 when itsinput voltage drops by 39% (typical-P/typical-N), 30% (fast-P/slow-N), or 47% (slow-P/fast-N) of . A safety factor of 48was required in the LCR keeper to maintain this noise margin.Device of the LCR keeper was 1.5 times larger than the con-ventional keeper. No other load was added in the output of theinverter. We swept the number of legs from 4 to 32 in incrementsof four. In order to satisfy the noise margin, the strength of thekeeper increased linearly with the number of legs for both theLCR and the conventional keeper. The dynamic node also in-cluded wiring capacitance based on a typical memory cell size.We simulated delay with only one pull-down stack dischargingthe dynamic node. For comparison, we also simulated delaywith the keeper removed. For the LCR keeper, the referencevoltage varied from 0.544 V (slow-P/fast-N) to 0.829 V(typical-P/typical-N), to 0.998 V (fast-P/slow-N).

Fig. 7 shows the simulation results. For the conventionalkeeper and the no-keeper case, the fast-P/slow-N corner resultsin the maximum delay. As legs are added, the conventionalkeeper gate delay increases rapidly until it fails to switch withmore than 24 legs. For the LCR keeper, we include the gatedelays for both fast-P/slow-N and slow-P/fast-N cases. For 16or fewer legs, the fast-P/slow-N case is slowest. For more than16 legs, the slow-P/fast-N emerges as the worst case becausethe large keeper current required to compensate for leakagein the slow-P/fast-N corner cancels out its speed advantagebeyond 16 legs. For all cases, the LCR keeper results in smallerdelay than the conventional keeper. Moreover, the conventionalkeeper is practically unusable for more than 16 legs since thedelay increases super-linearly with the number of legs, whereasthe LCR keeper can be used for up to 28 legs. The delay for thefast-P/slow-N corner of the LCR keeper increases linearly as afunction of the number of legs following a similar behavior asthe no-keeper case.

Fig. 8 shows the current ratio between the current mirrorand the LCR keeper for all process corners normalized tothe typical corner. The current ratio remains almost constant forthe process corners of interest (e.g., tPtN, fPsN, and sPfN) even


Circuit Simulation

• Simulation results show that the conventional keeper has more delay than the LCR keeper and fails to switch with more than 24 legs

• For 16 and fewer legs, the fast-P/slow-N is slower

• For more than 16 legs, the slow-P/fast-N becomes slower

• Larger keeper current required to compensate for leakage

• Current ratio remains almost constant


Fig. 7. Delay versus number of legs.

Fig. 8. Normalized current ratio for all process corners.

though operates in the linear region whereas operates insaturation.

III. SRAM MACRO USING LCR KEEPERS

A 1024-word 72-bit 3W/4R SRAM (Fig. 9) was designedand fabricated in the same 1.2-V 90-nm CMOS process usingLCR keepers in its single-ended dynamic read path. The SRAMmacro was integrated along with a built-in-self-test (BIST) cir-cuit and a phase-locked loop (PLL) to facilitate at-speed tests.Due to dummy metal above the SRAM, only the IO cells andbond pads can be seen in the micrograph, so the photo is over-laid with a block diagram of the SRAM, PLL, and BIST circuitsfrom the layout database. The SRAM size is 1.34 mm 1.31mm. The SRAM array is split into 16 blocks of 32 words144 bits and the read and write word decoders are placed in themiddle. Address and data inputs are at the top of the macro anddata outputs are at the bottom. Fig. 10 shows the SRAM storagecell, which includes single-ended reads and differential staticwrites. Read ports are evenly placed in both sides of the cellto balance the loads on internal nodes and . Differentialwrite was chosen to simplify bit masking and column decoding.A bit is masked by clamping both write bit lines ( and ) toground, in which case the cell data remains unchanged when thewrite word line is enabled. Both read and write decodersare implemented using static CMOS.

Previous SRAMs in this technology were designed usinghigh- nFETs in the dynamic gates, but application of LCRallowed us to switch to low- nFETs to improve the speed.

Fig. 9. Micrograph of the 1024 72 3W/4R SRAM macro with block diagramoverlay.

Fig. 10. SRAM storage cell.

Fig. 11 shows the four-stage domino read path from readword line to latched output , which is entirelyconventional except for the LCR keepers. Each of the 32-word

144-bit blocks is divided into two subblocks with 16 wordseach. Local bit lines connect 16 cells through an AND–OR dy-namic gate. The second stage is an eight-leg AND–OR dynamicgate used for column select with local bit line inputs comingfrom opposing subblocks. The third stage is an OR dynamic gateto select one of four top blocks. The final stage is a five-inputdynamic OR gate with one of the inputs fed from the top globalbit line and the other four fed from the bottom blocks. Sinceall dynamic gates were scaled from a common topology, onlyone current mirror was used for the entire SRAM, and it wasplaced under the wiring channels in the center (Fig. 9), givinga net overhead of one FET per dynamic gate. Due to thediffering number of legs and different nFET sizes used in the



































SRAM Macro Using LCR

• 1024-word 72-bit 3W/4R SRAM

• Designed and fabricated on same process using LCR keepers in its single-ended dynamic read path

• 32-word x 144-bit blocks

• Only one current mirror needed for the entire SRAM













SRAM Macro Using LCR• SRAM storage cell with single-

ended reads and differential static writes

• Read and Write decoders implemented using static CMOS

• Previous SRAMs in this technology designed using high-Vt nFETS in the dynamic gates

• LCR allows us of low-Vt nFETs to increase speed













SRAM Macro Using LCRLIH et al.: LEAKAGE CURRENT REPLICA KEEPER FOR DYNAMIC CIRCUITS 53

Fig. 11. SRAM single-ended four-stage domino read path.

TABLE ISIMULATED SRAM ACCESS TIME, CONVENTIONAL VERSUS LCR

(TYPICAL PROCESS, 1.2 V, 125 C)

pull-down stages, ratioing (between and in Fig. 4, bysubtracting fingers from ) was used to divide the replicacurrent resulting in safety factors between 6 and 10, which isvery aggressive based on the analytical model presented in theprevious section. We also used 2 minimum channel length for

and to eliminate channel-length modulation and reducevariation.For simulation, a design with a conventional dynamic read

path was created. The conventional design was not fabricated.In the conventional design, the -gated transistors inFig. 11 were eliminated, all dynamic gate FETs were convertedto high- , and the keeper FETs were sized appropriately.Table I compares simulated delays. The delay from to

is identical for both circuits because they use identicalstatic logic, but the delay from to improves by19%, and the overall access time improves by 7.6% for theLCR SRAM.

Parts were fabricated in five process corners, and high- andlow-frequency Shmoo plots were measured at varying tempera-tures. The target specification for this design is 330-MHz clockfrequency at V , C. We variedthe supply voltage from 0.9 to 1.3 V in 0.2-V increments andthe frequency from 160 to 800 MHz in 10-MHz increments.The SRAM passed for a frequency range from 190 MHz at

Fig. 12. Shmoo plot for high-frequency and different process corners (25 C).

0.9 V to 760 MHz at 1.3 V (typical part). Fig. 12 shows su-perimposed Shmoo plots for typical and skewed process cornerparts. The measured maximum clock frequency difference be-tween fast-P/slow-N and slow-N/fast-P process corners is 50MHz with 32-MHz standard deviation.

Fig. 13 shows the effect of temperature on the SRAM.Two Shmoo plots are superimposed for the same typical partfor room and high temperature (125 C). For supply voltageshigher than 1.05 V, the SRAM operating frequency is reducedat high temperature compared to room temperature as wouldbe expected. However, for supply voltage less than 1.05 V,the SRAM operates better at high temperature compared toroom temperature. We confirmed this behavior with our circuitsimulation model. As the supply voltage decreases, global bitlines are not completely charged to because of their largeRC delay. Precharge pFETs were sized for operation at 1.08-Vminimum supply voltage. Due to the exponential dependenceof subthreshold current on temperature, at 125 C the replicakeeper sources more current which lowers the potential atnode (Fig. 4) and assists in charging the global bit lines.



SRAM Macro Using LCR• First stage - local bit lines connect

16 cells through an AND-OR dynamic gate

• Second stage - an eight-leg AND-OR dynamic gate used for column select with the local bit line inputs

• Third stage - an OR dynamic gate to select from one of four blocks

• Fourth stage - a five-input dynamic OR gate with one of the inputs fed from the top global bit line and the other from the bottom blocks


SRAM Macro Using LCR

• Due to the differing number of legs and different nFET sizes used in the pull-down stages, ratioing was used to divide the replica current resulting in safety factors between 6 and 10(very aggressive)

• Used 2x minimum channel length for p1 and p3 to eliminate channel-length modulation and reduce Vt variation


SRAM Macro Results• LCR keeper design compared

compared against simulated conventional design without KPR-gated transistors and with high-Vt dynamic FETs

• Delay from CLK to RWL is identical since both circuits use static logic

• Delay from RWL to RDO improved by 19%

• Overall access time improved by 7.6%














SRAM Macro Results

• The measured maximum clock frequency difference between fast-P/slow-N and slow-N/fast-P process corners is 50 MHz with 32-MHz standard deviation

• The plot of the effect of temperature on SRAM shows data for the part at room temperature (25°C) and 125°C. • VDD >1.05 V, the SRAM operating frequency is reduced at high temperature compared to

room temperature as would be expected

• VDD < 1.05 V, the SRAM operates better at high temperature compared to room temperature














Fig. 13. Shmoo plot for varying temperature (typical process).

Fig. 14. Shmoo plot for low-frequency and varying temperature (typicalprocess).

Therefore, high temperature increases the voltage margin of theLCR keeper compared to room temperature.

We also tested parts from all corners at low frequencies(1–5 MHz) and various temperature settings (25 C, 62 C, and110 C). Parts from all process corners except slow exhibitedfailures at room temperature. However, all parts pass the tests at62 C and 110 C (Fig. 14). Using the diagnosability features ofour BIST circuit, failures were traced to dc leakage in local orglobal bit-lines for different parts. These failures are attributedto threshold-voltage variations between the replica and thedynamic gate FETs and the excessively small safety factorwe used. Since subthreshold current increases exponentiallywith temperature, in the high-temperature tests, is low-ered, which reduces the effect of threshold voltage variation,allowing the parts to pass the Shmoo test.

IV. CONCLUSION

In this paper, we proposed an LCR keeper to improve thescaling of dynamic gates. The LCR keeper requires an overheadof one FET per dynamic gate plus a portion of a shared replica

circuit. We showed that for equal noise margins LCR keepersresult in either more legs or faster gates with the same numberof legs compared to conventional feedback keepers. A fairlylarge safety factor is needed to account for random on-die vari-ation, especially threshold-voltage variation. As hardware mea-surements indicate, the aggressively small safety factor (6–10)that was used in the SRAM macro was not sufficient. How-ever, according to circuit simulations, benefits are still substan-tial with a safety factor as high as 48. As CMOS process scale,threshold voltage variation is expected to increase. Assuming a50% random on-die threshold voltage variation increase in theprocess example discussed before and disregarding supply vari-ation, the analytical model indicates that a safety factor of 61 isrequired. Nevertheless, increasing process variation affects boththe LCR and the conventional keeper approaches. However withthe LCR keeper it could be possible to use dynamic gates withmore legs compared to using conventional keeper. Therefore,the use of dynamic logic can be extended until process varia-tion increases would limit the maximum number of legs per gateeven for LCR keepers.

Although it was not explored in this paper, it is also possibleto statically or dynamically vary the safety factor by adjustingthe width of the replica pull-down network in the current-mirrorcircuit. For instance, the safety factor can be increased whenthe clock stops or the frequency is reduced for scan testing. Itcan also be statically set using fuses. Finally, the random on-dievariation effect can be mitigated by using several parallel-con-nected current mirror circuits dispersed through-out the covereddynamic gate area instead of using a centralized topology.

REFERENCES

[1] R. H. Krambeck, C. M. Lee, and H.-F. S. Law, “High-speed compactcircuits with CMOS,” IEEE J. Solid-State Circuits, vol. SC-17, no. 3,pp. 614–619, Jun. 1982.

[2] W. H. Henkels, W. Hwang, and R. V. Joshi, “A 500 MHz 32-word64-bit 8-port self-resetting register file and associated dynamic-to-

static latch,” in Proc. IEEE Symp. VLSI Circuits, Kyoto, Japan, Jun.1997, pp. 41–42.

[3] N. Tzartzanis et al., “A 34 word 64 b 10R/6W write-through self-timed dual-supply-voltage register file,” in IEEE ISSCC Dig. Tech. Pa-pers, San Francisco, CA, Feb. 2002, pp. 416–417.

[4] M. Anders et al., “Robustness of sub-70 nm dynamic circuits: analyt-ical techniques and scaling trends,” in Proc. IEEE Symp. VLSI Circuits,Kyoto, Japan, Jun. 2001, pp. 23–24.

[5] A. Alvandpour et al., “A conditional keeper technique for sub-0.13 mwide dynamic gates,” in Proc. IEEE Symp. VLSI Circuits, Kyoto, Japan,Jun. 2001, pp. 29–30.

[6] C. R. Gauthier and S. A. Desai, “ Adaptive keeper sizing for dynamiccircuits based on fused process corner data,” U.S. Patent 6,914,452, Jul.5, 2005.

[7] ——, “ Process monitor based keeper scheme for dynamic circuits,”U.S. Patent 6,894,528, May 17, 2005.

[8] D. Stasiak et al., “A 2nd generation 440 ps SOI 64b adder,” in IEEEISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2000, pp. 288–289.

[9] A. Alvandpour and R. K. Krishnamurthy, “Conditional burn-in keeperfor dynamic circuits,” U.S. Patent 6,791,364, Sep. 14, 2004.

[10] C. H. Kim et al., “A process variation compensating technique forsub-90 nm dynamic circuits,” in Proc. IEEE Symp. VLSI Circuits,Kyoto, Japan, Jun. 2003, pp. 205–206.

[11] Y. Lih et al., “A leakage current replica keeper for dynamic circuits,”in IEEE ISSCC Dig. Tech. Papers, 2006, pp. 442–443.

[12] P. R. Gray, P. J. Hurst, S. H. Lewis, and R. G. Meyer, Analysis andDesign of Analog Integrated Circuits, 4th ed. New York: Wiley, 2001.

[13] R. R. Troutman, “VLSI limitations due to drain induced barrier low-ering,” IEEE Trans. Electron Devices, vol. ED-26, no. 4, pp. 463–469,Apr. 1979.






IV. CONCLUSION




REFERENCES
















SRAM Macro Results• Tested parts from all corners at low

frequencies (1-5MHz) at temperature points: 25°C, 62°C and 110°C

• All parts passed the test at 62°C and 110°C

• Subthreshold current increases exponentially with temperature, VKPR is lowered, which reduced the effect of threshold voltage variation

• Parts from all processes except slow showed failures at room temperature

• Failures traced to DC leakage in local or global bit-lines for different parts

• Attributed to variations in Vt between the replica and dynamic gate FETs and the “excessively small” safety factor that was used






IV. CONCLUSION




REFERENCES















Fig. 14. Shmoo plot for low-frequency and varying temperature (typical process)


Conclusions• LCR keeper requires an overhead of only one FET

per dynamic gate + shared replica circuit

• For equal noise margins, LCR keepers result in either more legs or faster gates than conventional keepers

• A fairly large safety factor is needed to account for random on-die variation, especially threshold-voltage variation

• Aggressively small safety factor used in SRAM macro(6-10) was not sufficient

• Benefits are still substantial with safety factor as high as 48 according to simulations


Conclusions• Process variation is expected to increase as the

CMOS process scales

• With the LCR keeper it could be possible to use dynamic gates with more legs compared to a conventional keeper

• With the LCR keeper, the use of dynamic logic can be extended until process variation increases limit the maximum number of legs per gate


Further Study• Possible to statically or dynamically vary the

safety factor by adjusting the width o the replica pull-down network in the current-mirror circuit

• Safety factor can be increased when the clock stops or the frequency is reduce for scan testing

• Random on-die variation effect can be mitigated by using several parallel-connected current mirror circuits dispersed through-out the covered dynamic gate area instead of using a centralized topology


Questions?


ReferencesLih, Y.; Tzartzanis, N.; Walker, W. W.; , "A Leakage Current Replica Keeper for Dynamic Circuits," Solid-State Circuits, IEEE Journal of , vol.42, no.1, pp.48-55, Jan. 2007

doi: 10.1109/JSSC.2006.885051

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4039592&isnumber=4039574


http://ieeexplore.ieee.org.proxy.lib.pdx.edu/stamp/stamp.jsp?tp=&arnumber=4039592&isnumber=4039574




a leakage current replica keeper for dynamic...

Documents