on mixed ptl/static logic low-power and high-speed...

VLSI DESIGN2001, Vol. 12, No. 3, pp. 399-406Reprints available directly from the publisherPhotocopying permitted by license only

(C) 2001 OPA (Overseas Publishers Association) N.V.Published by license under

the Gordon and Breach Science Publishers imprint,member of the Taylor & Francis Group.

On Mixed PTL/Static Logic for Low-powerand High-speed Circuits

GEUN RAE CHO* and TOM CHEN

Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523

(Received 20 June 2000," In finalform 3 August 2000)

We present more evidence in a 0.25 tm CMOS technology that the pass-transistor logic(PTL) structure that mixes conventional PTL structure with static logic gates canachieve better performance and lower power consumption compared to conventionalPTL structure. The goal is to use the static gates to perform both logic functions as wellas buffering. Our experimental results demonstrate that the proposed mixed PTLstructure beats pure static structure and conventional PTL in 9 out of 15 test cases foreither delay or power consumption or both in a 0.25 lm CMOS process. The averagedelay, power consumption, and power-delay product of the proposed structure for 15test cases are 10% to 20% better of than the pure static implementations and up to 50%better than the conventional PTL implementations.

Keywords: VLSI; Mixed PTL/Static logic; PTL logic; Low-power; High-speed; Pass-transistorlogic

1. INTRODUCTION

With power becoming more and more a limitingfactor for system performance, demand for lowpower circuits without sacrificing circuit perfor-mance is evident and will continue to be the focusof low-power high-performance circuit designcommunity.

Conventional static CMOS logic gates havebeen widely used in ASIC design today mainly dueto the advantages of easy synthesis and a welldeveloped synthesis and analysis environment.

Pass-transistor logic (PTL), which was first pro-posed in [1, 2], has attracted a lot of attention earlyon. PTL’s advantage as an alternative low powercircuit design approach in 0.5tm and 0.3tmCMOS technology has been well documented[3, 4]. However, PTL’s advantage in deep-submi-cron CMOS technology is not well understood yet.There are speculations [3] that with aggressivescaling in transistor critical dimension and supplyvoltage, the advantages of PTL over conventionalstatic CMOS gates will gradually diminish. Thispaper presents results in a 0.25 tm CMOS process

*Tel." 970-491-7999, Fax: 970-491-2249, e-mail: [email protected] author. Tel." 970-491-6574, Fax: 970-491-2249, e-mail: [email protected]

399

400 G. R. CHO AND T. CHEN

that show PTL circuits, when mixed with staticlogic gates, offer up to 50% power-delay benefitscompared to conventional PTL. The adoption ofPTL is further exacerbated by the lack of synthesisenvironment to support the use of PTL gates. Thewell known method of synthesize PTL circuit froma set of logic equations was proposed in [5]. Theuse of binary decision diagrams (BDDs) [6-8]made the synthesis process easy and straightfor-ward. To minimize delay, a buffer is inserted everyfixed number of transistors in series in the PTLchain. The process of buffer insertion can also beviewed as the process of BDD decomposition assuggested in [9], where buffers are inserted at theroot of each decomposed BDD. For complex logicfunctions, the number of buffers to be inserted in aPTL circuit can be significant. One way to alleviatethis problem is to use static logic gates acting asbuffer as well as performing logic functions. MixedPTL with static logic gates has been proposed in[10], where only one circuit example is used toshow the benefit of mixed PTL/Static logicstructure in a 0.35 tim CMOS process.

This paper presents more evidence in a deep-submicron CMOS process to further validate thebenefit of mixed PTL circuits. By eliminatingexplicit buffering between PTL stages, mixedPTL circuits can further improve the low powernature of PTL circuits without sacrificing perfor-mance. Our experimental results show that ourproposed structure beats pure static structure andconventional PTL in 9 out of 15 test cases foreither delay or power consumption or both in a0.25 lxm CMOS process. The average delay, powerconsumption, and power-delay product of theproposed structure for 15 test cases are 10% to20% better of than the pure static implementationsand up to 50% better than the conventional PTLimplementations.The remaining part of this paper consists of

three sections. We will review some basic issuesrelated to PTL and propose our mix PTL structurein Section 2. In Section 3, a set of experimentalresults is presented in detail. The concludingremarks are given in Section 4.

2. MIXED PTL/STATIC CIRCUITS

2.1. Properties of Pass-transistor Chains

The average power, Pay, consumed by a CMOSgate is given by

Pay Vd f CL Ai (1)

where Vdd is the supply voltage, f is the operatingfrequency, CL is the total load capacitance drivenby the gate, and Ai is the switching activity of thegate. Vdd, f, and Ai are fixed. Decreasing CL willhave a positive impact on power consumption.Since the input capacitance of a PTL gate issmaller than the input capacitance of a staticCMOS gate, replacing CMOS gates with PTL gateshould result in a reduction in the powerconsumed.We cannot arbitrarily connect PTL gates

together, however, since the delay through pass-transistors is a quadratic function of the number oftransistors in series. The delay in PTL can beestimated as td 7"n’N2 where rn and N are thetime constant and the number of transistors inseries, respectively [11]. This quadratic dependencecan be seen in Figure where the results from anHSPICE simulation of a 0.25tm process areshown. The delay is plotted versus the number ofnMOS transistors in series in Figure 1. Figure 2plots the percentage of increase in delay over onepass-transistor against the number of transistors inseries. These graphs show that the delay isunacceptably high when the number of pass-transistors is three or more. Therefore, we limitedourselves to a maximum of two pass-transistors inseries without an intervening static gate for thisprocess.Another issue with pass-transistors is that they

do not have full output voltage swing. The highoutput voltage is limited to Vdd-- Vtn for an nMOStransistor chain where Vtn is the nMOS thresholdvoltage. We had hoped to use this limited voltageswing to reduce the dynamtc power consumption.However, this output voltage was not high enoughto completely turn off the pMOS transistors in the

MIXED PTL/STATIC LOGIC 401

/

1: Delay 6I Passtransisr Chain (.InpUt Rising)2: Delay er Inverter (Input Rising)

3t3 :Delaydf, PasstransisrChain(Int Falling)4 Delay ater Inverter (’nput Falling)ii

ZP -J-- 2,4 );.’! j/" "’1

17.5

i+iiiLO 2 4 6 8 I0

Number of Transistors

FIGURE Relationship between the # of pass-transistors ina chain and delay (0.25 I.tm CMOS technology).

3 layoPas=mnstor.Chain4 Delay after, lnder (laput.Falli)..

0 )}}+)))):::+++"’,+

Number of Transistom

FIGURE 2 Relationship between the # of pass-transistors ina chain and percentage increase in delay (0.251xm CMOStechnology).

following static gates, and, therefore, the staticcurrent became unacceptably high. To fix thisproblem, an inverter and a pull-up pMOStransistor had to be added at the output of eachPTL cell to restore the swing all the way to Vaa.The size of inverter and pull-up pMOS was chosen

in such a way that the loading effect at the celloutput is minimized. This arrangement is shown inthe inset of Figure 5.

2.2. Conventional Static and PTL Mapping

One of our goals is to compare our proposedmixed PTL structure with the conventional PTLstructure. To map a given set of Boolean functionsto a conventional PTL circuit, we followed amethod similar to the one described in [5]. Thelogic equations are first converted to a reduced andordered BDD. This BDD is then converted to aPTL circuit simply by replacing each node withtwo nMOS pass transistors in a Y-shaped patternas shown in the inset in Figure 3. This is a straightforward method, and it assures that only one pathis on at a time. The final step is to add buffers afterevery two pass-transistors in series for the reasonsoutlined in Section 2.1. The buffers use pull-uptransistors to restore full swing at the output of thepass-transistors.As an example of a conventional PTL circuit,

the Boolean equation

yos2stlX2 "t- S1SoX2Xl + S2X2XlXO "1- yos2slsox2()

FIGURE 3 Conventional PTL logic implementation.


2’

Xo@ - f$2 XYo’ s

YoX2

FIGURE 4 Pure Static logic implementation.

was mapped using the above process. The resultsare shown in Figure 3.For the pure static mapping, we used the SIS

program as described in [12, 13]. Since technologymapping is an NP-hard problem [14, 15], SIS usesa set of heuristic algorithms. The first algorithmdecomposes the given Boolean function into agraph containing only simple gates, such as2-input NAND’s and inverters. In the second step,this large graph is then partitioned into smallerpieces called subject graphs [16, 17] to make thethird step tractable. The third step is calledcoveting [16-18] and consists of trying to matchportions of the subject graph to graphs createdfrom the cell library. The cell library used by SIScontains only static CMOS gates. In the final step,the algorithm chooses which library cells to use,usually based upon some sort of timing constraint.

SIS was used to implement the function f givenin (2). The result is shown in Figure 4.

2.3. PTL/Static Hybrid Mapping

Our approach to the mixed PTL/Static technologymapping is similar to the method as described inSIS [12, 13]. The mapping begin by decomposing

the Boolean functions into a graph containing onlysimple gates called base functions: 2-input AND’s,2-input OR’s, and inverters in our case. Theprimary purpose of this step is to express all localfunctions as simple functions and is required toensure that every node is covered by at least onelibrary cell. In order to ensure the existence of asolution the library have to include the basefunctions.During partitioning, the large graph is broken

into subject graphs. Since the technology mappingproblem is NP-complete in nature, we can reducethe problem size and the problem becomestractable by breaking the graph into cones. Wehave chosen multiple fanout points as the bound-ary of the subject graphs.We used a similar covering method as that in

SIS, but differs from the SIS method in that threePTL gates are added to the library of static gatesand the static gate is used to perform both logicfunctions and buffers between PTL gates. ThesePTL gates have a maximum of 2 pass-transistorsin series. Figure 5 shows the schematic of thesethree cells. With the addition of the PTL gates,some new rules must be added to the conventionalcovering scheme to generate mapped circuit afterfinding possible matches at every node in thesubject graph. First, all inputs to a PTL cell mustbe from static cells. Similarly, all outputs from a

c out

]a’Cell Type_l Cell Type_2

Cell Type_3

out

FIGURE 5 Pass-transistor logic cells.


S

S

FIGURE 6 Mixed PTL/Static logic implementation.

PTL cell must be to static cells. These rules assurethat two PTL cells are never connected in series,and thus no more than two pass-transistors are inseries at any point in the circuit. This has to bedone to keep the delay at acceptable levels asdiscussed in Section 2.1.The details of the mapping process is beyond the

scope of this paper. The mapping algorithm will bepresented in a separate paper at a later time. Asample of our hybrid approach is shown inFigure 6 where the function f given in (2) isimplemented.

3. EXPERIMENTAL RESULTS

We implemented three circuits from a micropro-cessor design in each of the three logic styles,conventional static, Conventional PTL, and mixed

PTL/static. The characteristics of three circuits aresummarized in Table I. These three circuits are

TABLE Benchmark circuits

No. of PIs No. of POs No. of Gates

Circuit 7 13 52Circuit 2 32 16 107Circuit 3 93 53 172

implemented using a commercial 0.25 lxm CMOSprocess with Vdd 2.5V. For each of the threecircuits, we randomly chose five paths to comparedelay, power consumption (under different inputsetup to activate the given path), and power-delayproduct. The transistor size in all the circuits weredetermined such that rise and fall time of all thesignal nodes in the circuits are in the rangebetween 300 ps and 400 ps.The HSPICE simulation results are shown in

Table II, where the paths numbered through 5are from circuit 1, 6 through 10 are from circuit 2,and 11 through 15 are from circuit 3. This was theoriginal process we designed our cells for, and theresults are encouraging. In 9 out of 15 paths (path# 1, 6, 7, 8, 11, 12, 13, 14, 15), our proposedscheme outperformed the conventional static logicand the conventional PTL in either delay or powerconsumption. Out of these 9 paths, 6 pathsoutperformed both delay and power consumption.In 10 out of 15 paths (path # 1, 6, 7, 8, 9, 11, 12,13, 14, 15), our scheme outperformed the conven-tional static logic and the conventional PTL inpower-delay product. On average, our proposedscheme has a 30 percent improvement in delaywhile consuming approximately half of the powerwhen comparing to the conventional PTL. Moremodest, but still substantial, gains are seen whencomparing our approach to the static CMOSimplementation, where there is approximately aten percent savings in both delay and power. Forpower-delay product (PDP), our approach yields a17 percent improvement over the static CMOSmethod.

In addition, our experimental results show that,at 0.25 lxm, the advantages of conventional PTLover conventional static are disappearing. In fact,the conventional PTL scheme is significantlyoutperformed by the conventional static logic inall three categories of delay, power consumption,and power-delay product. It is our belief thatexplicit use of buffering between PTL stages in theconventional PTL addel significant delay andadditional power consumption to the conventionalPTL circuits.


4. CONCLUSIONS

We presented a new pass-transistor scheme inwhich conventional PTL stages are mixed withstatic logic gates. The static logic gates not only actas buffers between PTL stages, but also performpart of the logic functions of the given block.Comparing to the conventional PTL, the elimina-tion of explicit buffering between PTL stagessignificantly improved both circuit performanceand power consumption. The experimental resultsconfirms that, by using proposed scheme, over10% improvement in delay and power consump-tion over the conventional static logic can beachieved, and up to 50% improvement in powerconsumption and up to 30% in delay over theconventional PTL can be achieved. The proposedscheme requires a new way the technology map-ping is done. The problem is NP-complete innature. We are continuing our effort to formalizethe process of mapping a given logic functionusing the mixed PTL and static logic gates.

References

[9] Buch, P., Narayan, A., Newton, A. R. and Sangiovanni-Vincentelli, A. (1997). "Logic synthesis for large passtransistor circuits", Proceedings of IEEE/ACM Interna-tional Conference on Computer-Aided Design.

[10] Yamashita, S., Yano, K., Sasaki, Y., Akita, Y., Chikata,H., Rikino, K. and Seki, K. (1997). "Pass-transistor/cmos collaborated logic: the best of both worlds", In:Symposium on VLS1 Circuits Digest of Technical Papers,pp. 31-32.

[11] Weste, N. H. E. (1993). Principles ofCMOS VLSIDesign:A System Perspective, Addison Wesley, 2nd edn.

[12] Sentovich, E., Singh, K., Moon, C., Savoj, H., Brayton, R.and Sangiovanni-Vincentilli, A. (1992). "Sequential circuitdesign using synthesis and optimization", In: IEEE Int.Conf. Comput. Design.

[13] Sentovich, E. M., Singh, K. J., Lavagno, L., Moon, C.,Murgai, R., Saldanha, A., Savoj, H., Stephan, P. R.,Brayton, R. K. and Sangiovanni-Vincentelli, A. (1992).SIS: A Systemfor Sequential Circuit Synthesis, Memoran-dom No. UCB/ERL M92/41, University of California,Berkeley, CA 94720.

[14] Hachetl, G. D. and Somenzi, F. (1996). Logic Syntesis andVerification Algorithms, Kluwer Academic Publisher.

[15] Garey, M. R. and Johnson, D. S. (1979). COMPUTERSAND INTRACTABILITY: A Guide to the Theory ofNP-Completeness, W. H. FREEMAN AND COMPANY.

[16] Rudell, R. L. (1989). Logic Synthesis in VLS1 Design,Ph.D. Dissertation, University of California, Berkeley, CA94720.

[17] Micheli, G. D. (1994). Synthesis and Optimization ofDigital Circuits, McGraw-Hill, Inc.

[18] Mailhot, F. and Micheli, G. D., "Algorithms forTechnology Mapping Based on Binary Decision Diagramsand Boolean Operations", IEEE Trans. on Computer-Aided Design of Integrated Circuit and Systems, 12,599-620, May, 1993.

[1] Mead, C. A. and Conway, L. (1980). Introduction to VLSISystems, Reading, Mass.:Addison-Wesley.

[2] Radhakrishnan, D. and Whitaker, S. R., "Formal DesignProcedures for Pass Transistor Switching Circuits", IEEEJournal of Solid-State Circuits, SC-20, 531-536, April,1985.

[3] Munteanu, M., Ivey, P. A., Seed, L., Psilogeorgopoulos,M., Powell, N. and Bogdan, I. (1999). "Single Ended Pass-Transistor Logic", In: VLSI Systems on a Chip (KluwerAcademic Publisher), pp. 206-217.

[4] Yano, K., Yamanaka, T., Nishida, T., Saito, M.,Shimohigashi, K. and Shimuzu, A., "A 3.8ns CMOS16 x 16-b Multiplier Using Complementary Pass-Transis-tor Logic", IEEE Journal of Solid-State Circuits, 25,388- 394, April, 1990.

[5] Yano, K., Sasaki, Y., Rikino, K. and Seki, K., "Top-Down Pass-Transistor Logic Design", IEEE Journal ofSolid-State Circuits, 31, 792-803, June, 1996.

[6] Akers, S. B., "Binary Decision Diagrams", IEEE Trans.on Computers, C-27, 509-516, June, 1978.

[7] Bryant, R. E., "Graph-Based Algorithms for BooleanFunction Manipulation", IEEE Trans. on Computers,C-35, 677-691, Aug., 1986.

[8] Drechsler, R. and Becket, B. (1998). Binary DecisionDiagrams: Theory and Implementation, Kluwer AcedemicPublishers.

Authors’ Biographies

Geun Rae Cho worked for Semiconductor Re-search and Development Center at SamsungElectronics, Kiheung, KyungKi-Do, Korea. He iscurrently a Ph.D. student in the Department ofElectrical and Computer Engineering at ColoradoState University. His research interests includeVLSI design and CAD methodology, Logicsynthesis for low-power applications, and Com-puter Architecture.Tom Chen received the B.Sc. degree in ElectronicEngineering from Shanghai Jiao-Tong University,China, and the Ph.D. degree in Electrical En-gineering from the University of Edinburgh, Scot-land, UK, in 1982 and 1987, respectively. From1987 to 1989 Dr. Chen was with Philips Semi-conductors in Hamburg, Germany. From 1989 to


1990 he was with the department of Electrical andComputer Engineering, New Jersey Institute ofTechnology. He is currently an associate professorin the Department of Electrical and Computer

Engineering at Colorado State University. Dr.Chen’s research interests are in the areas of VLSIdesign, VLSI CAD methodology, and Low-powercircuit design.

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2010

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of



RotatingMachinery


Hindawi Publishing Corporation http://www.hindawi.com

Journal ofEngineeringVolume 2014

Submit your manuscripts athttp://www.hindawi.com

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

SensorsJournal of


Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks


on mixed ptl/static logic low-power and high-speed...

Documents