arithmetic and logic circuits using sub-threshold pass-transistor logic for ultra-low energy...

7/23/2019 Arithmetic and Logic Circuits Using Sub-Threshold Pass-Transistor Logic For Ultra-Low Energy Applications

1/60

University of Southampton

Faculty of Physics and Applied Sciences

School of Electronics and Computer Science

Arithmetic and Logic Circuits Using Sub-Threshold Pass-Transistor LogicFor Ultra-Low Energy Applications

By

Choudhury Md Salim Ul Haque Salmee

21st

September, 2012

A dissertation submitted in partial fulfillment of the degree of

MSc Microelectronics Systems Design

By examination and dissertation

Project Supervisor: Dr Tom J. Kazmierski

Second Examiner: Dr Koushik Maharatna


2/60

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design September 2012


2

ABSTRACT

This dissertation paper summarises the research and design work carried out during an MSc

project which was aimed to develop practical arithmetic and logic circuits being integrated into an

Arithmetic Logic Unit for energy constrained applications. The method adopted for ultra-low power

design was sub-threshold pass-transistor style logic. The project started with a wide range ofliterature review, including research publications, focused on the performance of various pass-

transistor logic styles in terms of speed, power dissipation and area. The circuits of this project were

developed both in CMOS and PTL style in order to provide a power comparison between the two

styles. Some of the PTL logic circuits designed in this project were modified in terms of transistor size

and design style in order to ensure the smooth power efficient operation in the sub-threshold

region. Comprehensive simulations were carried out to characterise the circuits in terms of

propagation delay and power consumption. Simulations were conducted for supply voltages below

and around the threshold with different ambient temperatures and fan-outs. The results show that

the implementation of sub-threshold PTL circuits to develop a complex hierarchical structure such as

an ALU is feasible. Furthermore, comparative analysis and assessment of the results suggest that for

sub-threshold design, PTL logic is power efficient for large scale circuits such as ALU compared to itsCMOS counterpart. Measurements of the 8-bit ALU structure show that for worst case simulation

conditions such as high sub-threshold supply and extreme temperature, the PTL version consumed

153.15 pw of dynamic power whereas the CMOS version consumed 314.21 pw which is two times

more than the earlier one. Maximum power consumption of such design is restricted to a few

hundreds of pico watt power which ensures the ultra-low power design of a system. However,

power efficiency of PTL is gained at the cost of circuit performance. Despite of that, such system is

beneficial for numerous applications for which power is a scarce resource and performance is not

the primary concern.


3/60



3

ACKNOWLEDGEMENTS

First of all, I would like to take the opportunity to express my sincerest gratitude to my

project supervisor, Dr Tom J Kazmierski for allowing me to do the project which is related to one of

the leading research topics in the field of digital design and also related to my professional interest

low power design. His astute supervision and proper guidance helped me to make this projectpossible. Furthermore, his knowledge on the topic and continuous support during the project

encouraged me to conduct in-depth research.

I am grateful to Dr Koushik Maharanta, the project second examiner who pointed out some

facts about the project that guided me to revise my work with more accuracy. I am also grateful to

all the lectures of the modules that I took during my MSc study. Especially I would like to mention Dr

Koushik Maharatna and Iain McNally whose lectures and laboratory sessions were very helpful to

conduct my research properly. I would also like to thank Mr Robert Rudolf, a full-time research

graduate from Electronics and Electrical group for his assistance with Cadence simulation.

I would like to acknowledge the library facilities provided by the University of Southampton.

I am also thankful to ECS School for providing computer access with state-of-the-art EDA tools andscientific publications.


4/60



4

LIST OF CONTENTS

ABSTRACT ................................................................................................................................................ 2

ACKNOWLEDGEMENTS ........................................................................................................................... 3

CHAPTER 1 INTRODUCTION .................................................................................................................... 6

1.1. Motivation .................................................................................................................................... 6

1.2. The Project ................................................................................................................................... 7

1.3. Results and Benefits ..................................................................................................................... 8

CHAPTER 2 BACKGROUND AND PREVIOUS WORK ................................................................................. 9

2.1. Energy Constraint Applications .................................................................................................... 9

2.1.1. Micro-sensor Network and Nodes ........................................................................................ 9

2.1.2. Radio Frequency Identification ............................................................................................. 9

2.1.3. Low Power Digital Signal Processor and Microcontroller Unit ............................................. 9

2.1.4. Energy Harvester ................................................................................................................... 9

2.2. Sub-Threshold Operations of MOSFET and CMOS Logic Gates ................................................. 10

2.2.1. Strong Inversion .................................................................................................................. 10

2.2.2. Weak Inversion ................................................................................................................... 11

2.2.3. Static CMOS Inverter in Sub-Threshold Operation ............................................................. 12

2.2.4. Application, Advantages and Demerits of Sub-Threshold Logic ......................................... 13

2.3. Pass Transistor Logic (PTL) ......................................................................................................... 15

2.3.1. Basic Operations Principle .................................................................................................. 15

2.3.2. Complementary Pass-Transistor Logic (CPL) ....................................................................... 16

2.3.3. Dual Pass-Transistor Logic (DPL) ......................................................................................... 17

2.3.4. LEAP and Other PTL Styles .................................................................................................. 17

2.3.5. Merits and Demerits of PTL ................................................................................................ 18

2.4. Sub-Threshold Pass-Transistor Logic.......................................................................................... 18

2.5. Basic Circuits .............................................................................................................................. 19

2.5.1. PTL Logic Circuits ................................................................................................................. 19

2.5.2. CMOS Logic Circuits ............................................................................................................ 21

2.6. Arithmetic Logic Unit (ALU)........................................................................................................ 23

2.6.1. ALU Design .......................................................................................................................... 23

2.6.1.1. Tree Structure .................................................................................................................. 24

2.6.1.2. Chain Structure ................................................................................................................ 24


5/60



5

CHAPTER 3 BASIC CIRCUITS DESIGN AND CHARACTERISATION ........................................................ 26

3.1. Design ......................................................................................................................................... 26

3.1.1. PTL Circuit ........................................................................................................................... 26

3.1.2. CMOS Circuits...................................................................................................................... 26

3.2. Characterisation ......................................................................................................................... 28

3.2.1. Propagation Delay Measurement ....................................................................................... 29

3.2.2. Power Consumption Measurement .................................................................................... 30

3.3. Presentation of Results .............................................................................................................. 31

3.3.1. Propagation Delay ............................................................................................................... 31

3.3.2. Power Consumption............................................................................................................ 33

3.4. Result Analysis ........................................................................................................................... 36

CHAPTER 4 ARITHMETIC LOGIC UNIT DESIGN, POWER MEASUREMENTS AND RESULTS ANALYSIS . 38

4.1. ALU Design ................................................................................................................................. 38

4.1.1. 1-Bit PTL Design .................................................................................................................. 38

4.1.2. 1-Bit CMOS Design .............................................................................................................. 44

4.1.3. 8-Bit PTL Design .................................................................................................................. 45

4.1.4. 8Bit CMOS Design ............................................................................................................. 47

4.2. Power Consumption Measurements and Results ................................................................... 47

4.2.1. Simulation Setup ................................................................................................................. 47

4.2.2. Results ................................................................................................................................. 48

4.3. Result Analysis ........................................................................................................................... 54

CHAPTER 5 CONCLUSION AND FUTURE WORK .................................................................................... 56

APPENDICES .......................................................................................................................................... 58

Appendix A Project Gantt Chart ..................................................................................................... 58

Appendix B - Design Files .................................................................................................................. 58

Appendix C - Detailed Simulation Data ............................................................................................. 58REFERENCES .......................................................................................................................................... 59


6/60



6

CHAPTER 1 INTRODUCTION

Power consumption is a major concern for integrated electronic circuits and devices. It

influences the design and fabrication of such circuits and systems in two aspects. Firstly, power

dissipates in the form of heat which affects the performance of a chip. It also requires special cooling

and packaging which is expensive. Secondly, the increasing number of mobile systems and energyconstrained applications such as an energy harvester, micro-sensor nodes and self-powered Radio

frequency identification (RFID) require low power consumption to maximise their battery life.

Therefore, there have been on-going researches on a multiple level of systems such as behavioural,

architecture, logic and technology level.

1.1. Motivation

The previous project [1] and [2] on sub-threshold pass-transistor logic provided solid

assessment based on basic logic circuits and adder circuits that sub-threshold PTL circuits are more

energy efficient than CMOS counterparts with the circuit propagation delay being trade off with low

power consumption. This research project concentrates on developing more complex and practicalarithmetic and logic circuits based on sub-threshold PTL in a view of minimizing the energy

consumption of a digital circuit system (processor) for ultra-low energy applications.

For energy constrained applications, standard practice is to use conventional

microcontroller. These microcontrollers have far more contemporary and multipurpose functionality

with the capability of operating in tens to hundreds of megahertz of clock frequency. With multiple

general purpose input-output terminals, these microcontrollers also have very precise and high-

speed ADCs. All these features and flexibility of use lie behind the obvious usage popularity of such

microcontrollers in a wide range of applications. Energy consumption of these microcontrollers is

not a serious issue for typical household, industrial or automotive applications. However, for energy-

constraint applications where power is a scarce resource, this power consumption is a significant

factor.

The project investigates to find more energy efficient cohesive circuits for designing the

building blocks of a customary processor in deep transistor level. There were three aspects of

research. Firstly, the research focused on PTL circuits only instead of CMOS logic circuits since many

publications and research [3], [4], [5], [6] and [7] concluded that PTL has lower leakage and require

less number of transistors compared to CMOS logic. The second aspect is the use of transistor in sub-

threshold region as a method for low power consumption [8], [9], [10] and [11]. Transistor operating

in the sub-threshold region consumes a very small amount of energy, but at the cost of circuit

performance in terms of speed [11]. However, for the aforementioned energy-constraint

applications, performance is insignificant and primary concern is power consumption. Chapter 2

includes the details of the sub-threshold operation of CMOS logics and other prominent energy-constraint applications. Lastly, the study includes energy efficient structural methods for complex

circuits [29], [30] and [31].

The research is motivated by the previous project [2] work which shows that PTL logic

circuits are more energy efficient than CMOS logic and PTL can operate in sub-threshold voltage.

Moreover, other studies [4], [6] and [7] conceptualized that PTL can be operated in sub-threshold

voltage. However, the project [2] validates sub-threshold PTL only for a limited number of basic logic

circuits and relatively smaller hierarchical structures. Positive outcome of [2] could effectively lead

towards building larger circuit blocks and hierarchical structures and ultimately to the development

of an ultra-low power digital system (processor). If successful, this can be advantageous for energy-

constraint applications in two ways. Firstly, it will make the design simpler with a smaller number ofcircuits and devices. Secondly, energy consumption will be more efficient which can ensure ultra-low


7/60



7

power consumption of the system. To the knowledge of the author, apart from the previous project

[1] and [2], there has been a very insignificant amount of studies and publication in sub-

threshold PTL.

1.2. The Project

The whole design project was conducted with Cadence AMS 0.35m process design kit. The

MOSFET transistors used in this project are obtained from this PDK built-in library where the

transistors are fully characterised for all three regions of operations including sub-threshold.

Therefore, the simulation results are asserted to be valid and accurate.

The project started with developing a comprehensive collection of PTL and CMOS basic

circuits for large scale design structures. Therefore, a total of 9 basic logic circuits were added to the

existing strings of PTL and CMOS circuits from [2]. Circuits were chosen and designed carefully in

order to develop efficient and hierarchical structures of 1-bit and 8-bit ALU. All the PTL circuits were

thoroughly characterised in terms of propagation delay and power consumption for different fan-

outs, ambient temperatures and sub-threshold supply voltages. The characterisations were carried

out for all the PTL circuits and two CMOS circuits only. This is because the project goal was to

develop more advanced and larger PTL circuits and also to avoid the repetition of the previous

project work on CMOS circuits. Based on basic circuits, 4 versions of 1-Bit PTL ALU with different

style and functionality were developed and characterised for power consumption. Development of

the latest version was encouraged by the successful implementation of the earlier ones. Design of 8-

bit PTL ALU was based on the latest version of 1-bit ALU, which is explained in chapter 4. The 8-bit

ALU was designed both in PTL and CMOS logic and the two structures were compared for power

consumption in different temperatures and supply voltages. A total of 7 PTL hierarchical circuits and

4 CMOS hierarchical logic circuits were created during the ALU design process. Additional 53 PTL test

circuits and 25 CMOS test circuits were designed for simulation purpose. An overall of more than

2500 simulations were executed for design and characterization during the course of the project.

The dissertation paper describes all the research and project works that were carried out

during the course of this project. The project started with a wide range of literature review and

study of the previous project [2] which is included in chapter 2. Literature review comprises of sub-

threshold operations of MOSFET and CMOS inverter, applications, benefits and disadvantages of

sub-threshold operations. It also summarises the contemporary and major research findings on sub-

threshold design. The review continued with different PTL design styles with their advantages and

disadvantages. A brief section in this chapter includes the sub-threshold PTL operation. It also

contains a review of the basic PTL and CMOS circuits from [2]. Different design methods for ALU

were also a part of the literature review.

Chapter 3 includes the design work of extended clusters of PTL and CMOS basic circuits withbrief descriptions of functionality and features. All the PTL circuits including two CMOS circuits were

characterised under different simulation conditions which includes different supply voltages and

temperatures for different fan-out circuits. The result of characterisation - propagation delay and

power consumption (static and dynamic) of PTL circuits are presented with explanations.

The paper continues with practical design work for 1-bit and 8-bit ALU in chapter 4. It

provides design details of 1-bit ALU and power comparison between different versions of 1-bit ALU

with explanations for the best possible version, selected for 8-bit hierarchical ALU design. Along with

the detailed design architecture, the chapter presents power comparison of the 8-bit ALU in PTL and

CMOS structure and concludes with result analysis.

The paper finishes with project outcome and suggestions on prospective future work which


8/60



8

is in chapter 5. A grant chart with detailed timing on project progress and development is included in

appendix A. Appendix B includes lists of all the design files along with the Cadence design files.

Appendix C is provided with detailed simulation data. Both the appendix B and C are available in the

submitted zip file.

1.3. Results and Benefits

The result shows the successful implementation of sub-threshold PTL logic in a complex and

hierarchical design such as ALU. As mentioned earlier, the previous project [1] and [2] validated this

method on basic logic circuits only and no other researches provided a solid assessment of the

practical feasibility of using PTL in sub-threshold. Moreover, achievements of this project along with

[2] directly oppose the suggestion of other research [12] that sub-threshold PTL is unfeasible in

principle.

The ALU developed in this project is one of the major building blocks of a processor. The

project requires a lot of research work and in-depth analysis which was beyond the scope of this

project due to the specific goal and time constraint in MSc degree. The successful implementation of

this method will be an essential development in terms of power consumption for ultra-low energyapplications. The challenging part is the effective implementation sub-thresholds PTL for other major

building blocks to successfully develop an ultra-low power digital system (processor), which

demands a significant amount of research and design work.


9/60



9

CHAPTER 2 BACKGROUND AND PREVIOUS WORK

2.1. Energy Constraint Applications

The following section includes a brief description of prominent and contemporary

applications that can be benefited from ultra-low power design.

2.1.1. Micro-sensor Network and Nodes

A micro sensor node is a node in a micro-sensor network capable of sensing, computing and

communication functionality. Typically, tens of thousands of spatially distributed micro-sensor nodes

constitute a wireless micro sensor network for sensing, processing and relaying information data to

the end user [11]. There have been many on-going researches on the practical implementation of

such network and substantial proposed applications are health monitoring, automotive sensing,

habitat and structural monitoring [11]. The performance requirements for this application are very

low, for example, measuring the rate of change of data for health monitoring is in the order of few

second to a minute [11]. The battery lifetime required for micro-sensor network is very long since it

is impossible to change the battery of such nodes frequently. Therefore, low performance andlonger battery life requirement make the micro-sensor network a perfect candidate for ultra-energy

technology implementation.

2.1.2. Radio Frequency Identification

Radio frequency identification (RFID) system is used to track and identify an object by means

of an RFID tag attached to the object [11]. RFID tags use radio frequency to communicate with the

end user. These tags are being used for many years and flexibility of use has spawned in to many

applications such as medical implants, tracking automobiles, pharmaceutical goods, livestock and

pets, smart credit cards and smart keys for automobiles. An RFID tag usually has antenna and other

communication circuits [11]. The functionally of an RFID tag requires very simple logic processing[11]. An active RFID tag transmits signals to the reader using energy from the battery. Extra energy

from battery could ensure extended processing. Moreover, low powered design means lower energy

for communication and hence communication distance could be longer. On the other hand, a

passive tag operates and also most often energized by the electromagnetic signal it receives from

the reader. As a result, a passive tag is smaller in size and independent of energy consumption. By

minimizing the digital processing power, it would require less transmission power from the reader

and makes the communication distance longer.

2.1.3. Low Power Digital Signal Processor and Microcontroller Unit

Portable applications have successfully used Texas Instrument (TI) C5xx family of Digital

Signal Processor and the T1 MSP430 microcontroller unit for metering, measurement andinstrumentation purposes [11]. Modern day portable devices, such as mobile phones and PDAs

require a dynamic range of power consumption and performance. Such applications require high

performance digital signal processor or microcontroller unit during active mode. When in standby

mode, they urge for limited processing and low power consumption in order to extend the battery

life. Although in a variety of applications for both active and standby mode, devices are required to

be optimized for power consumption.

2.1.4.Energy Harvester

Energy harvesting is the source of energy for small wireless electronic autonomous devices

like wireless sensor networks [13]. By this process, energy is derived from external sources such asthermal, solar, wind and kinetic energy into electrical energy for circuits. A wide range of low power


10/60



10

applications can be benefited from the energy harvesting process provided that there is abandoned

energy source and sufficient amount of energy can be derived from the source for the required

operations [13]. Figure 1 shows a block diagram of a typical self-powered wireless sensor node using

piezoelectric vibrating energy harvester [13]. The system includes a microcontroller unit (MCU) with

integrated antenna for transmission and sensors for collecting information from the environment.

The supply voltage required for the MCU is 3.3 volt.

Figure 1 a) Block Diagram of a Self-Powered Smart Sensor Node with Energy Harvesting Method b)

Different Node Voltages with Time (Adapted from [13] and reprinted (b) from [2])

The derived energy from the harvester is rectified and fed to a super capacitor with nominal

capacitance of milli-farads to tens of farads [13].It takes hours to charge the capacitor to 1-1.2 v

(figure 1b) which allows the voltage regulator to start. To reach a fully functional energy level for the

system, it takes more than 26 hours of energy harvesting. Moreover, the voltage regulator requires a

cold start circuit [13] for successful operation of the system. All these factors are disadvantageous

since the system consumes time and energy and also it requires the additional components which

implies higher cost.

Therefore, the energy harvesting process is a prime candidate for ultra-low power design

which can ensure low power consumption with relatively faster operation time and also make the

design simpler hence cost effective.

2.2. Sub-Threshold Operations of MOSFET and CMOS Logic Gates

2.2.1. Strong Inversion

The requirement for the normal operation of a MOSFET is the gate voltage to be

bigger than the device threshold voltage [14]. The region of this operation can be referred to as

strong inversion operation [14].

VGS > VT, strong inversion requirement (1)

There are two regions of operation for strong inversion triode and saturation region. Both

region of operation is controlled by the bias voltage of the device. For an nMOS transistor,

Expression (2) and (3) shows the condition for triode and saturation region operation consecutively.

VDS < VGS VT (2) VDS VGS VT (3)


11/60



11

Triode Region

Saturation Region

0 0.3 0.6 0.9 1.2 1.5 1.8

400

300

200

100

0

IDS (A)

VDS (V)

VDS = 1.8 V

VDS = 1.5 V

VDS = 1.2 V

VDS = 0.9 V

Figure 2 Current Voltage Characteristics of an Ideal NMOS transistor [14]

In triode (linear) region, the device behaves like a linear resistor whose value is controlled by

VGS [14]. In saturation, the device current reaches a maximum value and the device is said to be

pinched off [14].

2.2.2. Weak Inversion

A MOSFET is said to be in cut-off region for gate voltages less than the device threshold

voltages. In theory, there is no current flow. However, in practical a weak inversion layer exists

which causes the flow of diffusion carriers in the channel [11]. Therefore, the device current IDS

exhibits an exponential dependence on VGS [15]. This region of operation is called the sub-threshold

regime.

VGS < VT, weak inversion requirement (4)

The sub-threshold current is mainly contributed by diffusion current [11]. Expression (5) represents

the basic equation for sub-threshold current.

=Io exp (5) [11]

= o ( 1)2, drain current at VGS VT [11]

= 1 + (6) [11]

Expression (5) shows that the sub-threshold current is strongly corresponding to thermal

voltage = . It also depends exponentially on VGS. Expression (6) shows the sub-thresholdslope n which depends on device capacitance.

An nMOS transistor operating in different gate voltage, VGS below threshold voltage

(approximately 0.57V) and the corresponding drain current IDS response is shown in figure 3. It

implies that nMOS can operate in the sub-threshold region [2].


12/60



12

Figure 3: VGS versus IDS for nMOS Transistor at VDS = 0.5v in 0.35m AMS Technology

(Adapted from [2])

2.2.3. Static CMOS Inverter in Sub-Threshold Operation

Inverter in sub-threshold mode requires the supply voltage VDD to be less than the threshold

voltage, VT to ensure the weak inversion operation for both the NMOS and PMOS transistor of

inverter while maintaining input logic 1 value less than VT [9] and [11]. That ensures the successful

implementation of CMOS inverter in sub-threshold.

Vin Vout

PMOS

NMOS

VDD < VT

Figure 4: CMOS Inverter in Sub-Threshold Operation [11]

Although the sub-threshold inverter implementation is feasible, many researchers expressed

concern on the delay of such logic gates [16], [17], [18] and [19]. The propagation delay of a

symmetric inverter for VDD < VT is stated in expression (7), from where it can be seen that the delay is

strongly depended and inversely proportional to Vdd [11]. On the other hand, dependence on Vdd

of the speed (tpd) of a normal inverter (8) is insignificant. Figure 5 shows the normalised speed for

different supply voltage of an inverter. In the sub-threshold region, the speed decreases at the rate

of 6 times per 100 mv [11].


13/60



13

, = (7)

= () (8)

Figure 5 Relative Normalized Speed versus Voltage of a CMOS Inverter [11]

The voltage transfer characteristics (VTC, shown in figure 6) of a static CMOS inverter is

similar for both normal and sub-threshold operation [11]. This is a key fact that makes the sub-

threshold implementation of logic cell possible without any large scale adjustment in design.

Figure 6 Voltage Transfer Characteristics of a AMS 0.35m CMOS inverter for VDD = 1.8 V and

0.3v [2]

2.2.4. Application, Advantages and Demerits of Sub-Threshold Logic

The most important feature of sub-threshold design is that it can offer minimal energy

consumption in electronic circuits. Figure 3 shows that for a small drop in supply voltage, the

consumption of current reduces by a decade [2]. However, such energy efficiency comes at the

expense of performance which is the large propagation delay in circuits. Figure 7 depicts a rough

idea of how speed can be affected by low power. Conventionally design is optimized at Minimum-Delay Operation Point (MDP). When emphasized in power consumption, it can only achieve


14/60



14

Minimum Energy Point (MEP). In [8], Markovic states that for 10 times lower energy consumption,

the propagation delay would increase by 1000 times.

Dmin

Emin

Normalis

edEnergy ~ x 1,000

~x

10

Traditional

Operation Region

Suboptimal

Ultralow-Energy

RegionInfeasible

Normalised Delay

MDP

MEP

Figure 7: Energy Delay Trade-off for Minimum Delay Point (MDP) and Minimum-EnergyPoint (MEP) [8]

Dependence of threshold voltage on the temperature along with process is another major

concern for sub-threshold design [18]. For a mere change in temperature, the exponentially

dependent current (5) changes significantly. Therefore, sub-threshold design has to concede

restriction for a primary design parameter such as speed.

On the other hand, sub-threshold design does not require immense amount of design effort

and hence easier to implement. Calhoun and Wang showed in their research that with a slight

modification, a standard cell library using 0.18m technology can operate smoothly in sub-threshold

voltage [9] and [11]. They analysed different process corners TT, SF and FS in order to discover the

lowest working voltage for each process. The result show that all the process can operate in sub-

threshold voltage. However, certain cells in FT process show unstable operation in sub-threshold.

This is because the cells are designed with a longer series of logic gates and a large number of

parallel transistors, as the authors conclude [8] and [25]. In [25], Calhoun and Wang suggested

resizing of transistor for the unsettling cells to achieve stable sub-threshold operation.

Positive outcome from researches [9], [10] and [11] regarding stable operation of standard

CMOS library in sub-threshold voltage is very beneficial for the design process since modern day

digital design process is dependent on cell library synthesis and HDL entry. Therefore, it could be

possible to design a VLSI integrated circuit with minor modification using standard designing

process.

In spite of all the concern regarding speed, temperature and process dependency, a number

of applications implement sub-threshold technique since it offers low power consumption and easier

design process. As mentioned earlier in this chapter that portable applications like mobile phone,

PDA require dynamic range of power and process operation. Ultra-Dynamic Voltage scaling (UDVS) is

used to ensure the low power consumption in such devices for extending the battery life [25]. For

high performance critical operations, it allows devices to run in high voltage or in high frequency.

While in sleep mode, the devices run in sub-threshold voltage to minimize power consumption.

Another major platform of sub-threshold technique exploration is the energy constrained

applications. These applications typically do not require high performance process and strive for low

power consumption. Earlier section of this chapter (section 2.1) exemplifies how these applications

can be benefited from low power consumption which is the primary goal of this project.


15/60



15

2.3. Pass Transistor Logic (PTL)

In standard CMOS logic circuits all input signals are applied to the gate of both nMOS and

pMOS transistors. When in static mode, the complementary transistors are either in cut-off mode

(high impedance) or in saturation mode (conducting) depending on the input signals state. However,

in pass transistor logic (PTL) the input signals are connected to both drain and source of atransistor [20].

2.3.1. Basic Operations Principle

A popular alternative of conventional CMOS logic is PTL. PTL requires comparatively fewer

number of transistor than CMOS and easier to implement. Figure 8a shows an nMOS transistor

implemented as in PTL AND gate. Source voltage of the transistor is VDD VT [27] and [20]. In

practice, the supply voltage is much bigger than the voltage drop caused by V T and the output

voltage is considered as logic 1. However, it is inadequate to carry out the AND operation for the

arrangement of figure 8a where circuit goes to high impendence state for gate logic 0. Therefore

another nMOS is added to the design (figure 8b) [27] and [20]. The addition of nMOS2 is essential for

the static design since it ensures low impendence path to the supply rail (input rail for PTL) under all

the circumstances provided [27].

VA = VDD

VB = VDD

VY = VDD - VT

Drain Source

Gate

A Y = A.B

B

Drain Source

Gate

A A.B

B

nMOS

nMOS2

Drain Source

Gate

B.nB

nB

nMOS1B

Y = A.B

a) b)

Figure 8 a) Pass Transistor Operation Using Single nMOS b) AND Operation Using Two nMOS

PLT logic makes the design much easier with fewer transistor and variety of logic operations.

Compared to 6 transistor in CMOS implementation, it uses only 2 transistor for the AND operation.

Other logic operations are also achievable with the appropriate change of wiring. Expression (9)

shows the logic function of a PTL AND gate.

VY = VG1 VD1 + VG2 VD2 (9)

A major concern for PTL design is the lower output voltage due to V T drop, as mentioned

earlier. A PTL NAND gate should not be connected the input of another gate [27] for the V T drop atoutput end [27] (figure 9a). The degraded output ultimately becomes insufficient to drive the next

gate. When connected in series, the input signal is degraded for VT drop throughout the chain

(Figure 9b). Therefore, it does not allow a very longer chain connection.

VIN

VDD

VIN VT1

nMOS2nMOS1 nMOS3

VIN VT1VT2

VDD VDD

VIN VT1 VT2 VT3

nMOS1

VDD

VIN VT1

nMOS1

VIN2 VT2VIN2

Figure 9 a) Pass Transistor Output Driving Another Gate b) Degradation of Voltage inPass-Transistor Chain [27]


16/60



16

However, this signal degradation can be recovered by using a level restorer buffer (figure

10). Conventionally, a CMOS inverter is used at the end of the chain to restore the signal to logic

values 1 = VDD and 0 = 0V. This added inverter however leads to static dissipation.

A A.B

B

nMOS2

B.nB

nB

nMOS1B

A.B

VDD

Y = n(A.B)

Figure 10 Level Restoration Using CMOS Inverter [27]

An important feature of PLT logic needs to be addressed is that it uses complementary signals for

input signal. In accordance to that fact, a number of design methods have been introduced such as

CPL, LEAP, and Dual PTL.

2.3.2. Complementary Pass-Transistor Logic (CPL)

Complementary Pass Transistor Logic (CPL) is based on the true and complementary signal at

both the input and output end. The operation is based on the discussed PTL AND gate (figure 8b).

The logic is also known as differential pass transistor logic for the complementary outputs. Figure 11

shows AND/NANND, and OR/NOR gate. They follow the same topology with input signal

combinations defining the type of logic operation [20]. Furthermore a XOR/XNOR gate could also be

derived from the same topology.

VDD

A

B

B

nB

nY

VDD

nA

nB

B

nB

Y

VDD

B

A

B

nB

nY

VDD

nB

nA

B

nB

Y

a) b)

Figure 11 Pass Transistor Logic Circuits a) AND/NAND b) OR/NOR

The main feature of CPL is that it offers a simple Full-Adder implementation. Simple design

of XOR/XNOR gate allows to design a 2-input Full-Adder very easily. This Full-Adder is used in this

project and detailed discussion is included in section 2.5.1.

First major publication on CPL implementation was made on 1989 [7]. The researcher from

Hitachi Research Laboratory proposed a 3.8ns CPL multiplier (16x16) in 0.5m technology. It was

reportedly the fastest version of multiplier at the time of publication. The research concluded that

for low static power dissipation and smaller circuit capacitances, CPL is more efficient in terms of

power consumption and speed. When compared to transmission-gate logic (TG), research [21]

shows the similar result in terms of speed efficiency for CPL. However, the study is based on 2-input


17/60



17

basic logic cells only.

2.3.3. Dual Pass-Transistor Logic (DPL)

Dual Pass-Transistor Logic (DPL) overcomes the CPL threshold voltage drop when passing

logic 1. Unlike CPL logic which uses a CMOS inverter to overcome the voltage drop, DPL uses pMOS

logic in parallel with nMOS. Figure 12 shows DPL AND/NAND gate. In this approach, the pMOS

transistor passes logic 1 without any threshold loss while logic 0 is passed by nMOS transistor [20].

A

nB

nA

B

A.B

A

nBnA

B

n(A.B)

Figure 12 AND/NAND Logic gate in DPL

Similarly for CPL, DPL offers a very efficient Full-Adder design. Other logic gates such as

OR/NOR and XOR/XNOR could also be designed effectively. Furthermore, the circuit capacitance in

DPL is equally distributed for each output as well as for the inputs [6] and [20]. The researchers in

the project [6] successfully designed a 32-bit ALU based on 0.25m technology and reported that the

ALU is 30% faster than the CMOS version. The research also proposed a carry propagation circuit to

resolve the signal propagation issue which is a major concern for PTL design.

2.3.4. LEAP and Other PTL Styles

Lean Integration with Pass Transistor (LEAP) was introduced in 1996 in [3]. The researchers

successfully developed a smart and small PTL based cell library (7 cells) with a synthesis tool defined

as cell inventor. The main objective of the research was to optimize area, speed and power

optimization in digital design. The outcome of scheme [3] indicates that LEAP obtains all the primary

objectives. Furthermore, LEAP was more cost effective compared to CMOS. Along with 4 different

inverters used to meet the drive requirement, the cell library consists of 3 logic cells Y1, Y2 and Y3

(figure 13). These 3 cells are capable of executing basic logic function with different number of input

signals as necessary. The Y3 cell is used in this project for 4-input MUX which is further discussed in

chapter 3. Further study [22] on LEAP cell-library focused on synthesis algorithm.

A B

C nC

Y

Y1 Y2 Y3

Figure 13 Basic Cells for Logic Operation in LEAP [3]

Further research on PTL technology similarly emphasized on synthesis algorithm of basic


18/60



18

cells [5], [23], [24] and [25]. In these projects, a complete cell library was designed using MUX gates

only. The MUX cells adapted the same circuit topology as the Y3 cell of LEAP technology (figure 13).

All the MUX gates were associated with different drive inverters.

2.3.5. Merits and Demerits of PTL

As mentioned earlier that the key benefit of PTL design style is that it requires lower number

of transistor compared to CMOS design [3], [6], and [7] and hence easier to design. Furthermore, PTL

is comparatively power efficient in terms of both static consumption and dynamic consumption.

Ideally, PTL designs do not have a direct path to from power rail to ground rail provided that no

inverters are used. Therefore, no gate current induces which is the main contributor of static power

dissipation. This leads to better speed operation of PTL [3] and [7]. Furthermore, lower number of

transistor leads to reduced dynamic power dissipation. Expression (10) shows the equation of

dynamic power dissipation [14]. PTL designs have lower number of switching nodes and

subsequently lower node capacitance which is why PTL have low dynamic power consumption. As

the PTL devices do not define the drive of the gates, transistor sizes are kept to a minimum which

also lead to lower circuit capacitance and hence lower dynamic dissipation. Moreover, due to

reduced voltage swing, PTL requires low switching energy [27].

= 2Where is switching activity factor (10) [14]

However, PTL design styles require major modification in process technology, and hence the

cost of fabrication increases, since most of the aforementioned researches use specific low

threshold voltage MOS devices [21]. Zimmermann in his research [26] identified that the previous

works on PTL focused developing Full-Adders only which is relatively easier to design in CPL or DPL

compared to least efficient CMOS approach. Furthermore, design topology of PTL requires immense

design effort and layout of such design is complicated as well. In fact the outcome research [26] is

based on the variety of digital application in CMOS which does not thoroughly cancel out the merits

of PTL design.

2.4. Sub-Threshold Pass-Transistor Logic

A number of researches have been conducted on sub-threshold voltage implementation and

pass-transistor logic separately for different parameter optimization such as speed, power

consumption and area. However, there is only a limited amount of research discussing about

combining both the techniques. Most of the researches concentrate on circuit performance in terms

of speed for different design techniques. In [16], Moalemi and Afzali-Kusha examined the

propagation delay dependency on temperature for different sub-threshold PTL design. Speed is a

major concerns for such sub-threshold design. However the result of [16] is not comprehensive sinceit investigated only XOR gates. Moreover, the research ignored the resistive component of input

capacitance for series chain of pass-transistors and carried out the test with ideal load capacitors

only.

Other researches focused on sub-threshold PTL in the perspective of reducing power

consumption. In [19], the researchers analysed a Dynamic Threshold MOS (DTMOS). The gate

terminal of such device is shorted to the body (figure 14). This connection allows the threshold

voltage to change depending on gate voltage values. In this method, however, the threshold voltage

changes along with the supply voltage and hence this approach cannot be categorised as sub-

threshold design. Furthermore, each DTMOS requires their body to be isolated which give rise to

design complexity.


19/60



19

DTnMOS DTpMOS

Figure 14 DTnMOS and DTpMOS Circuit in DTMOS Mode

As mentioned earlier, that many researchers declared different type of PTL design to be

more energy efficient than CMOS design. Moreover, sub-threshold implementation is capable of

optimizing the design for minimal power consumption. Combination of these two techniques

indicates a substantially power efficient design at the cost of speed. Therefore, sub-threshold PTL

design could be greatly beneficial for self-power energy-constraint application where power is a

scarce resource and performance is not the main concern.

2.5. Basic Circuits

The previous project [2] developed a hierarchical Accumulator-Adder and compared the

power consumption of PTL and CMOS design. Therefore, it created a total of 6 PTL basic circuits and

another 5 CMOS circuits. The following section includes the design details and features of each thebasic circuits from [2].

2.5.1. PTL Logic Circuits

AND/NAND, OR/NOR and XOR/XNOR

Design of these basic circuits is based on CPL method which is discussed earlier in 2.3.2. All

the circuits use to same circuit topology (figure 15). It is the input combinations which determine the

function of the circuits. Because of the differential design, the circuits have complementary inputs

and outputs. It eliminates the necessity of additional inverters which is often a requirement for static

CMOS design. Moreover, the design of XOR and XNOR gate have 4 transistors only which makes the

design very simple compared to their CMOS counterpart. Each design has a level restoring inverterfor recovering voltage level of logic 1 to Vdd. Transistor size of the inverter is selected such that

they provide balanced minimum delay, but at the same time providing sufficient drive [14]. The size

of the pass transistors are kept to minimum since they do not define the drive of the gate. It also

minimizes the circuit capacitance which in turns reduces dynamic power consumption [14].

VDD

nA

nB

B

nB

Y

VDD

nB

nA

B

nB

Y

VDD

A

nB

B

nA

Y

NAND NOR XOR

W=0.4u

L=0.35uW=0.4u

L=0.35u

W=0.4u

L=0.35u

W=0.4u

L=0.35u

W=0.4u

L=0.35u

W=0.4u

L=0.35u

W=3.3u

L=0.35u

W=0.85u

L=0.35u

W=3.3u

L=0.35u

W=0.85u

L=0.35u

W=3.3u

L=0.35u

W=0.85u

L=0.35u

AND

VDD

A

B

B

nB

nY

W=0.4u

L=0.35u

W=0.4u

L=0.35u

W=3.3u

L=0.35u

W=0.85u

L=0.35u

VDD

B

A

B

nB

nY

W=0.4u

L=0.35u

W=0.4u

L=0.35u

W=3.3u

L=0.35u

W=0.85u

L=0.35u

OR

VDD

nY

nB

A

nA

B

W=0.4u

L=0.35uW=0.4u

L=0.35u

W=3.3u

L=0.35u

W=0.85u

L=0.35u

XNOR

a) b) c)

Figure 15 PTL Basic circuits a) AND, NAND b) OR and NOR and c) XOR and XNOR

D-Type Flip Flop

The design of D-type flip flop [2] is based on the proposed version by Hsiao in [4] with a

slight modification on the circuit component. Figure 16 shows the flip flop designed in [2].


20/60



20

VDD VDD VDD

D

nClock

nReset

Clock Q

nQ

W=0.4u

L=0.35uW=3.3u

L=0.35u

W=0.85u

L=0.35u

W=0.4u

L=0.35u

W=1u

L=0.35u

W=0.4u

L=0.35u

VDD

W=3.3u

L=0.35u

W=0.85u

L=0.35u

W=3.3u

L=0.35u

W=0.85u

L=0.35u

Figure 16 Resettable D-Type Flip Flop Based on PTL

The designer in [2] modified the original design for using the flip flop in sub-threshold

voltage. The feedback pMOS transistor used in [4] for better performance of inverter and speed

increment, was removed. This is because the author [2] claimed that pMOS caused the inverter to be

in permanent pull-up mode in sub-threshold and hence the circuit was not operational. Moreover,

the circuit in [4] has pMOS and nMOS clock transistors. The author [2] observed that pMOS caused

significant delay in the circuit causing inappropriate non-synchronous operation of the circuit.Therefore, the pMOS clock transistor was replaced by the nMOS transistor which enables the edge

triggering of flip-flop. Moreover, the whole project was inspired by nMOS pass-transistor, the author

claimed [2]. The transistor size in the flip flop is same as the other basic logic circuits.

2-Input Multiplexer

The 2-input multiplexer is a very simple circuit consisting of 2 nMOS transistor. This is the

most commonly used multiplexor in PTL method, especially in CPL and LEAP. The inputs of the circuit

are controlled by the complementary control signal Load and nLoad. This multiplexer is used with

the D-type flip flop to design a load register (figure 17). According to the author, no level restoring

inverter is used with the multiplexor because it is loaded with small capacitance from D-type.

MUX2

D

Q

nLoad

Load

D

ClocknClocknReset

Q

nQ nQ

ClocknClocknReset

Q

DTYPE

W=0.4u

L=0.35u

W=0.4u

L=0.35u

Figure 17 PTL Load Register using 2-Input MUX and D-Type Flip Flop

Load Register

As mentioned earlier that the register is designed with connecting the 2-input multiplexor

with the D-type flip flop as shown in figure 17. When the Load signal is enabled (logic 1), the

register updated with value from input signal, D otherwise it retains the value from previous stage.

Full -Adder

PTL Full-Adder is one of the major benefits of PTL based design because it is easier to design

with effective circuit functionality. Figure 18 shows a classic Full-Adder circuit is based on PTL

AND/NAND, OR/NOR and XOR/XNOR circuits. It appeared in a number of publications [26], [20] and

[7] and was analysed successfully. Moreover, the publications also concluded that this PTL version is

faster and more energy efficient than any other CMOS version. With all input signals

being differential, the Full-Adder can provide complementary output of sum signal S and


21/60



21

carry-out signal Cout.

B

nA

A

nB

W=0.4u

L=0.35u

W=0.4u

L=0.35u

nA

W=0.4u

L=0.35u

W=0.4u

L=0.35u

Cin

B

nCin

A

W=0.4u

L=0.35u

W=0.4u

L=0.35u

nA

W=0.4u

L=0.35u

W=0.4u

L=0.35u

B

nCin

Cin

W=0.4u

L=0.35u

W=0.4u

L=0.35u

nCin

W=0.4u

L=0.35u

W=0.4u

L=0.35u

Cin

nB

nCin

A

W=0.4u

L=0.35u

W=0.4u

L=0.35u

nA

W=0.4u

L=0.35u

W=0.4u

L=0.35u

nB

nBB

nA

W=0.4u

L=0.35u

W=0.4u

L=0.35u

W=0.4u

L=0.35u

W=0.4u

L=0.35u

nA

A

Cout

nCout

S

nS

Wn=3.3u

Wp=1.85u

Wn=3.3u

Wp=1.85u

Wn=3.3u

Wp=1.85u

Wn=3.3u

Wp=1.85u

Figure 18 PTL Based Full-Adder

2.5.2. CMOS Logic Circuits

AND Gate

Figure 19 shows the classic CMOS logic circuits for 2-input AND gate. The AND gate includes

a classic CMOS inverter. Transistor sizes of the inverter are kept same as the ones used in PTL design.

It allows comparing of the CMOS structures with their PTL counterparts under realistic condition. In

fact, all the CMOS circuits except the D-type flip flop have the same size of nMOS and pMOS

transistor as the inverter. This is because the ratio of pMOS to nMOS transistor from 1.4 to 2 is

proven to provide minimum delay and sufficient drive [14].

VDD VDD

W=3.3u

L=0.35u

W=3.3u

L=0.35u

W=1.85u

L=0.35u

W=1.85u

L=0.35u

B

A

B

VDD

W=3.3u

L=0.35u

W=0.85u

L=0.35u

Y

Figure 19 Classic CMOS 2Input Logic Circuit for AND Gate


22/60



22

D-Type Flip Flop

The CMOS version of D-type flip flop is shown in figure 20. The circuit has a reset input signal

(nReset) and it is triggered at the rising edge of clock cycle which is similar to its PTL counterpart. The

circuit consists of six NAND gates with three 2-input gate, two 3-input gate and one 4-input gates.

The design of flip-flop is an optimized style of a typical Master-Slave circuit [28]. Although the input

signal nD is inverting, the output signal is differential. In typical design approach, the pMOStransistor is bigger in size compared to the nMOS transistor. This is, however was not operational in

sub-threshold since the circuit did not respond to the positive edge of Clock signal which was

reported in [2]. The researchers in [9] also reported similar incident for sub-threshold voltage and

suggested resizing of the flip flop with nMOS transistors bigger in size than the pMOS ones.

Therefore the transistor were resized as shown in figure 20 (Wp= 1.85 um and Wn= 3.3 um) and the

flip flop was observed to be operational at the positive edge of Clock signal [2].

nD

nReset

Clock

Q

nQ

Wp=1.85u

Wn=3.3u

Wp=1.85u

Wn=3.3u

Wp=1.85u

Wn=3.3u

Wp=1.85u

Wn=3.3u

Wp=1.85u

Wn=3.3u

Wp=1.85u

Wn=3.3u

Figure 20 CMOS D-type Flip Flop with Master-Slave Configuration [28]

2-Input Inverting Multiplexer

The two-input multiplexor circuit in CMOS design is shown in figure 21a. The input signal

nLoad and the output signal nD are inverting which compensate for the inverting input of D-type

flip flop. This inverting output, however discard the use of additional inverter at the output when

required for circuit operation.

W=1.85u

L=0.35u

W=1.85u

L=0.35u

W=3.3u

L=0.35u

W=3.3uL=0.35u

VDD

Load

Q

Q

nLoad

W=1.85u

L=0.35u

W=1.85u

L=0.35u

W=3.3u

L=0.35u

W=3.3uL=0.35u

VDD

D

D

nLoad

Load

VDD

W=3.3uL=0.35u

W=0.85u

L=0.35u

LoadnLoad

Q

D

Load

nD

a b

Figure 21 a) 2-Input CMOS Multiplexer Circuit with Inverting Output b) Circuit Symbol [28]


23/60



23

Load register

The design of CMOS load register is similar to the PTL version with slight modification. Figure

22 shows that the Load Register uses inverting multiplexer in order to compensate for the inverting

input of the modified version of D-type flip flop, as mentioned earlier. The operation of the register

is similar, with the input signal D being stored at the positive edge of Clock signal.

D

Load

nDD

Clock

nReset

Clock

nReset

QQ

nQ nQ

MUX2 DTYPE

Figure 22 CMOS Load Register

Full-Adder

The Full-Adder circuit shown in figure 23 is a classic version of CMOS design. Although itrequires a total of 28 transistors, it is the most optimized version in terms of performance and the

number of transistor required [14], [20] and [26]. Transistor size ratios are maintained as similar to

basic logic circuits which are 3.3um/0.35um for pMOS and 1.8um/0.35um for nMOS.

W=1.85u

L=0.35u

W=1.85u

L=0.35u

W=3.3u

L=0.35u

W=3.3u

L=0.35u

VDD

A

Cin

Cin

A

W=1.85u

L=0.35u

W=1.85u

L=0.35u

W=3.3u

L=0.35u

W=3.3u

L=0.35u

VDD

A

A

B

B

VDD

B

B

W=1.85u

L=0.35u

W=1.85u

L=0.35u

W=1.85u

L=0.35u

W=1.85u

L=0.35u

W=3.3u

L=0.35u

W=3.3u

L=0.35u

VDD

A

A

VDD

B

B

W=1.85u

L=0.35u

W=1.85u

L=0.35u

VDD

Cin

W=1.85u

L=0.35u

Cin

VDD

A

Cin

B

Cin

A

B

W=3.3u

L=0.35u

W=3.3u

L=0.35u

W=3.3u

L=0.35u

W=1.85u

L=0.35u

W=1.85u

L=0.35u

W=1.85u

L=0.35u

VDD

W=3.3u

L=0.35u

W=0.85u

L=0.35u

W=1.85u

L=0.35u

VDD

W=3.3uL=0.35u

W=0.85u

L=0.35u

S

Cout

Figure 23 CMOS Full-Adder Circuit with Transistor Sizes [26]

2.6. Arithmetic Logic Unit (ALU)

Arithmetic Logic Unit (ALU) is one of the fundamental building blocks of a typical

microprocessor. The ALU performs both the arithmetic and logic functions. Therefore, it consist of

basic functional components like Adder, AND, OR, XOR gates and others. Each functional component

can offer one type of operation. For example, the adder in an ALU performs the add operation.

However, combination of multiple units is also required for a few specific operations such as

subtraction operation which requires both XOR gate and Adder for carrying out the calculation.

2.6.1. ALU Design

This project goal is to develop ultra-low power ALU. Therefore, the design of ALU is

influenced by low power implementation. However, there are many approaches to reduce the

power consumption in ALU or in general, the digital circuits. At the low level design, transistor sizing

method is used to minimize circuit capacitance. Technology mapping is another process at the logic


24/60



24

gate level. Different algorithms have been developed for different ALU architecture targeted for

power reduction. At the system level and register transfer level (RTL), power gating and clock

gating and are two popular techniques. Among the other popular techniques, Dynamic Voltage

Scaling (DVS) is widely used in portable devices, which is discussed on chapter 2.

Another possible approach is structural level customization. A numbers of customizations

have been proposed and implemented for performance enhancement of digital design. However,most of the projects such as University of Illinois Illiac 2 project, IBM Stretch Project and [29]

emphasized on performance. On the other hand, a few researches [30] and [31] have proposed

structural level power minimization techniques.

There are two basic methods for structural design of ALU which are chain method and tree

method. Following section includes the brief description of the two techniques.

2.6.1.1. Tree Structure

In tree structures, functional components are connected in parallel with a multiplexer.

Figure 24 shows an ALU with Adder, AND, OR and NOR gate connected in parallel through a 4-input

multiplexer (MUX). Depending on the value of MUX control signal, the ALU output is determined

from the results of all the functional components.

ADDER

OR

MUX4

A

B

A

B

D1

D2

D3

D4

A

B Y

Y

Q Q

S0 S1

ANDA

BY

XOR

A

B Y

Figure 24 Tree Structure Design [30]

This structure requires more area. Furthermore routing of signals is complicated which

makes the layout difficult. However, the circuit operation is faster.

2.6.1.2. Chain Structure

In chain structure the larger multiplexer is replaced by a chain of smaller multiplexers

typically with 2input MUX (figure 25). The first stage of the chain starts with two arbitrary

functional components with outputs connected with the first MUX. The MUX output is then

connected to one of the two inputs of next stage MUX. The other input is occupied by another

functional component output (figure 25). Due to the concatenation, some of the component outputs

have to travel longer transmission path.


25/60



25

MUX2

ADDERA

BD

A

B Y D

Q

S0

Y

ANDA

B

A

B Y

ORA

B

A

B Y

MUX2

DD

Q

S1

Y

XORA

B

A

B Y

S2

MUX2

DD

Q

YQ

Figure 25 Chain Structure Design [30]

The chain a structure requires smaller area for design. Moreover, chain structures offer

variety of ways for component placement. This in turn can be utilized to reduce power by placingfrequently functional component closer to the output. However, circuit operation is relatively slower

compared to tree connection because of the chain structure.


26/60



26

CHAPTER 3 BASIC CIRCUITS DESIGN AND CHARACTERISATION

In order to achieve the project goal, it was essential to develop a sub-threshold ALU both in

PTL and CMOS logic and compare the two designs in terms of power consumption. However, the

basic circuits available from the previous project [2] were inadequate for designing a large

hierarchical circuit block like ALU. Therefore, an additional of 8 basic CMOS logic circuits and 1 PTLcircuit were designed. This chapter includes the design details, functionality and characterisation of

the additional circuits.

3.1. Design

All the design work in this project was carried out in the Cadence AMS 0.35m process. This

technology is chosen specifically for two reasons. Firstly, the Spectre simulator included in this

process can provide very detailed and precise simulation on analogue circuits with user friendly

interface. Most importantly it can characterise the MOS devices from its own library for sub-

threshold operation. Secondly, this technology is well known and has been widely used for years in

custom processor design, while providing cost effective solution for such complex design.

3.1.1. PTL Circuit

4-Input Multiplexor

A 4-input multiplexor is an essential part of digital circuit blocks. Figure 26 shows a PTL 4-

input multiplexor (MUX4). The size of the transistors which is also shown in figure 26, are kept same

as the other PTL circuits. This circuit is adapted from the Y3 circuit of LEAP (Lean Integration with

Pass-Transistor) technology [3] which was discussed previously in section 2.3.4. The Y3 circuit is a

generic PTL logic circuit which can be utilised for multiple logic operations with different input signal

combinations. The proper combination of complementary input control signals (Load1, nload1,

Load2 and nLoad2) enables the circuit to operate as MUX4 for the data input signals (D1, D2, D2 andD4). Since the output of the Y3 circuit is inverted, an additional inverter is added to the output to

generate the non-inverted output signal. Moreover, analysis showed that, without the additional

inverter, the output of the Y3 circuit is degraded for sub-threshold supply. Transistor sizes of the

inverters are explained in the following section of this chapter. PTL transistor size is kept same as the

other basic circuits.

D1 D2 D3 D4

Load1

Load2

nLoad1

nLoad2

Y

W=0.4u

L=0.35u

W=0.4u

L=0.35u

W=0.4u

L=0.35uW=0.4u

L=0.35u

W=0.4u

L=0.35u

W=0.4u

L=0.35u

Wp=3.3u

Wn=1.85u

Wp=3.3u

Wn=1.85u

Figure 26 A 4-Input Multiplexor with Transistor Sizes in LEAP Technology [3]

3.1.2. CMOS Circuits

Inverter

In the previous project, inverter was used as an integrated part of logic circuits such as AND


27/60



27

and Full-Adder circuits. However, there was no separate circuit designed and characterised for

inverter operation. Moreover, for larger design blocks, inverter is used extensively. Therefore, an

inverter circuit was designed and characterised (figure 27). Transistor sizes of this inverter are same

as the ones used in other basic circuits and as explained earlier that this set of transistor size can

provide balanced minimum delay and yet sufficient drive [14]. In addition, the project [2] used

inverter with the same transistor ratio in sub-threshold without any flows being reported. Both thePTL and CMOS circuits use this same inverter.

A

VDD

W=3.3u

L=0.35u

W=1.85u

L=0.35u

Y

Figure 27 CMOS Inverter with Transistor Size

AND, OR and NORThe circuits of figure 28 shows a classic 2-input CMOS design for AND, OR and NOR gates

with the transistor sizes used in this project and the previous one [2]. AND gate is derived from the

previously designed NAND gate [2] by adding an inverter to it. The sizes of the transistor in these

circuits are kept same as the inverter used in PTL circuits for comparing the CMOS design with its PTL

counterparts.

Y

A B

A

B

W=3.3u

L=0.35u

W=3.3u

L=0.35u

W=1.85uL=0.35u W=1.85u

L=0.35u

VDD

VDD

W=3.3u

L=0.35u

W=1.85u

L=0.35u

Y

VDD VDD

W=3.3u

L=0.35u

W=3.3u

L=0.35u

W=1.85uL=0.35u

W=1.85u

L=0.35u

A B

A

B

VDD

W=3.3u

L=0.35u

W=1.85u

L=0.35u

Y

a

A B

A

B

W=3.3u

L=0.35u

W=1.85uL=0.35u W=1.85u

L=0.35u

VDD

Y

b c

Figure 28 Classic CMOS Logic Circuits with Transistor Sizes a) AND Gate b) OR Gate and c) NOR Gate

XOR and XNOR

The design of these two logic circuits are based on NAND gates which a classic method in

CMOS process. Figure 29 shows the symbol diagram of two logic circuits. Similarly to other CMOS

circuits, the transistor sizes are maintained accordingly.

A

B

Q

A

B

Q

Wp=3.3u

Wn=1.85u

Wp=3.3u

Wn=1.85u

Wp=3.3u

Wn=1.85u

Wp=3.3u

Wn=1.85uWp=3.3u

Wn=1.85u

Wp=3.3u

Wn=1.85u

Wp=3.3u

Wn=1.85u

Wp=3.3u

Wn=1.85u

Wp=3.3u

Wn=1.85u

a b

Figure 29 Symbol Diagram of Classic CMOS Logic Circuits a) XOR Gate b) XNOR Gate


28/60



28

4-Input Multiplexor

The 4-input multiplexor is designed using three 2-input multiplexor cascaded to each other

as shown in figure 30. The 2-input MUXs are adopted from the project [2]. As mentioned earlier, the

2-input MUX has inverting output. However, for the circuit connection shown in figure 30, it can be

observed that the signal travels from input to output through 2 inverting MUXS and hence the signal

obtained at output is noninverting. This in turn eliminates any requirement for additional inverter

to obtain a non-inverting output signal.

D2

D1

S0

D4

D3

S0

S1

Q

MUX2

MUX2

MUX2

Wp=3.3u

Wn=1.85u

Wp=3.3u

Wn=1.85u

Wp=3.3u

Wn=1.85u

Figure 30 A 4-Input CMOS Multiplexor Designed from 2-Input CMOS Multiplexors with

Transistor Sizes

Tri-State BufferThe circuit of a tri-state buffer with transistor sizes is shown in figure 31 [35]. Transistor sizes

are maintained as the other CMOS logic circuits. When enable signal EN is at logic 1, the output

signal has the input signal values of logic 0 and logic 1. On the other hand when EN is set to logic

0, the output is in high-impedance state.

W=1.85u

L=0.35u

W=1.85u

L=0.35u

OUT

W=3.3u

L=0.35u

W=3.3u

L=0.35u

Wp=3.3u

Wn=1.85u

IN

EN

Figure 31 Tri-State Buffer with Transistor Sizes [35]

3.2. Characterisation

The basic logic circuits designed in this project and the other basic circuits from the previous

project [2] were characterized in terms of propagation delay and power consumption (static and

dynamic) under different simulation conditions such as different ambient temperatures, supply


29/60



29

voltages and different fan-outs. However, the simulations were mainly carried out on PTL basic

circuits and only a few CMOS circuits were characterised. This is because the project intended for

designing complex and practical circuits based on PTL method only. Moreover, CMOS basic circuits

were already characterised in the project [2] and this project was planned to avoid repeating of

previous project works on CMOS circuits [2] and to progress further towards the ultimate goal of

ultra-low power custom processor design. However, ALU modules (both 1-bit and 8-bit versions)were designed and simulated for PTL and CMOS version to distinguish the difference in power

consumption which is discussed in chapter 4. All the supply voltages used for simulation were in the

sub-threshold region except for 0.6V. Figure 32 shows the simulation tree for the characterisation of

the PTL basic circuit under different simulation condition.

Delay TestPower

Consumption Test

FO = 0

FO = 4

FO = 0

FO = 4

FO = 0

FO = 4

-20 C 27 C 85 C

Static Dynamic

FO = 2 FO = 2 FO = 2

FO = 1

FO = 4

FO = 1

FO = 4

FO = 1

FO = 4

-20 C 27 C 85 C

FO = 1

FO = 4

FO = 1

FO = 4

27 C 85 C

Simulation on Supply

Voltages of 0.3V, 0.4V,

0.5V and 0.6V

PTL Cells:

AND/NAND, OR/NOR,

XOR/XNOR, Load Register

CMOS Cells: Inverter

PTL Cells:

AND/NAND, OR/NOR, XOR/XNOR,

MUX4, Full Adder, Load Register

CMOS Cells:

Inverter, Tri-State Buffer

Figure 32 Simulation Tree Diagram for Characterisation of Basic PTL Circuits (Adapted from [2])

3.2.1. Propagation Delay Measurement

Propagation delay is an important design characteristic of a logic circuit. For design and

validation purpose, this parameter must be available to the design engineer. In this project,

propagation delay was measured for PTL logic circuits such as AND/NAND, OR/NOR, XOR/XNOR and

load register and the only CMOS logic circuit that was simulated was inverter. These 5 circuits were

used frequently for designing of other circuits.

The propagation delay of a circuit is defined as the average value (11) of delay at the rising

edge and the falling edge of the output signal [14] figure 33b. Figure 33a shows a generalised test

circuit for delay measurement with different fan-out. Measurement of the fan-out 0 circuitprovides the parasitic delay of a logic circuit. For comprehensive characterisation, tests were carried


30/60



30

out for fan-out 1 and 4. Fan-out 4 circuits were specifically used because digital logic circuits show

realistic characteristics for a minimum of 4 fan-out connection. Supply voltages used for the

simulations were 0.3V, 0.4V, 0.5V and 0.6V. Except for 0.6V, all the other supply voltages are sub-

threshold voltages which is the major simulation variable for this project. Another important variable

is temperature which influences the performance of circuits. It has already been discussed in chapter

2 that sub-threshold circuits have strong dependence on temperature [18]. All the test circuits wererun at three different temperatures which are -20C, 27C and 85C. This temperature range (-20C

to 85C) does not necessarily cover all the operational temperatures but is wide enough to examine

the temperature effect in sub-threshold. Figure 32 shows the simulation tree diagram used for basic

circuits.

= ()+()2 (11)

Pulse Stimulus

Vin Vout

FO = 0, 2 and 4

Delay

At Fall

Delay

At Rise

50% Vdd 50% Vdd

50% Vdd50% Vdd

a b

Vin

time

time

Vout

Figure 33 a) Generalized Simulation Setup for Propagation Delay Measurement b) Definition of

Propagation Delay (Adapted from [2])

3.2.2. Power Consumption Measurement

Power consumption measurement is the most important simulation procedure in this

project since this project objective is the design of an ultra-low power system. Static and dynamic

consumptions were measured for both PTL and CMOS logic circuits. Similarly, to delay

measurement, this procedure was also focused on PTL circuits such as AND/NAND, OR/NOR,

XOR/XNOR, Full-Adder and Load Register. Only two CMOS logic circuits - inverter and tri-state buffer

were characterised. The simulations were carried out for different fan-out and temperature with

different supply voltages as shown in figure 32. A generalised test circuit is shown in figure 34. The

circuit under the test was powered by an external independent source and the current drained from

this source was measured.

Static Power Consumption

A logic circuit is said to be in static mode when the input signal does not change its state.

The main sources of static power dissipation are gate leakage (tunnelling of the electron through

gate oxide), reverse-biased junction leakage (diode leakage between diffusion region) and sub-

threshold conduction (due to carrier diffusion for supply voltage smaller than threshold voltage)

[14]. Since the project is designed for sub-threshold supply, static power dissipation would be a

significant source of power consumption. Moreover, it will be more prominent since the PTL circuits

are slow and require longer processing time. It has already been mentioned in chapter 2 that the

level restoring inverter used in PTL causes static power dissipation due to shot-circuit conduction.

Most of the PTL circuits in this project use this inverter and therefore the shot-circuit dissipation is


31/60



31

also a major source of static dissipation in this project.

Static power dissipation is modelled with (12) for both PTL and CMOS circuits. Here,

measured current IDD is multiplied by the supply voltage VDD to derive static power consumption. The

current is measured directly from simulation result. Figure 32 shows the different simulation

conditions for a circuit under the test.

= () (12)Dynamic Power Consumption

Dynamic power dissipation is produced by the energy drawn from the supply for charging

and discharging of a logic circuit output node capacitance. Therefore, energy consumption depends

on the rate of change of state for output signals.

= 1 21

()(13)

Measurement of dynamic consumption can be modelled with expression (13). Similar tostatic measurement, the average value of the current can be measured directly from simulation

result. The Spectre simulator provides the average current for a particular time of simulation, which

is then multiplied by the circuit Vdd to obtain the dynamic dissipation. However, simulations for the

dynamic dissipation were carried out for two temperature values (27C and 85C) as shown in figure

32.

Pulse Stimulus

Vin Vout

Cell Under Test

Vdd

Vdd Test Vdd

Average Current

Measurement

Figure 34 General form of Simulation Circuit used for Current Measurements to Extract Static and

Dynamic Power Dissipation (Adapted from [2])

3.3. Presentation of Results

3.3.1. Propagation Delay

The results of propagation delay measurement of PTL logic circuits AND/NAND, OR/NOR,

XOR/XNOR, Load Register and CMOS Inverter are presented in the table 1. As shown in the

simulation tree diagram of figure 32, the simulations were carried out for three different fan-out

circuits (0, 2 and 4). Each fan-out circuit, however was tested under 3 different temperature - -20 C,

27C and 85 C. Supply voltages used for simulations were 0.3V, 0.4V, 0.5V and 0.6v.

After a thorough observation of the data in table 1, propagation delay characteristics can be

summarized into three different aspects. Firstly, the delay is strongly dependent on temperature.

For a very low temperature of -20 C, all the circuits have thousands of microseconds of delay.

However, increment in temperature shows that the delay improves by 10 times to more than 1000times. For example, for 0.4V Vdd and FO = 4, the AND/NAND gate has a delay of 524.91s at -20C,


32/60



32

whereas for 85C the delay reduces significantly to 2.53s. The second aspect of delay is the

influence of supply voltage. In the deep sub-threshold region all the logic circuit has a very low

speed. As the Vdd increases gradually towards the circuit threshold, the speed of the circuit

increases immensely. The other aspect of delay is the fan-out of circuits. For larger fan-out, the

arithmetic and logic circuits using sub-threshold pass-transistor logic for ultra-low energy...

Documents

arithmetic logic units

ttl: transistor transistor logic

ttl: transistor-transistor-logic...

computer architecture, the arithmetic/logic unitslide 1 part...

pass transistor logic

design and simulation of 1 bit arithmetic logic unit … ·...

arithmetic logic unit

file · web viewunipolar. saturated. non saturated. ......

diode-transistor logic (dtl)

transistor–transistor logic ttl - course materials | check...

digital design - sanjayvidhyadharan.in materils for...

diode & transistor logic gates

transistor transistor logic

scaling characteristics of logic cells: pass-transistor...

arithmetic and logic unit

jan. 2011computer architecture, the arithmetic/logic...

alu arithmetic logic unit

arithmetic-logic units · july 2, 2003 arithmetic-logic...

arithmetic logic unit

arithmetic logic unit alu