[ieee 2011 annual ieee india conference (indicon) - hyderabad, india (2011.12.16-2011.12.18)] 2011...

4
Clock Gating –A Power Optimizing Technique for VLSI Circuits Jitesh Shinde 1 , Dr.S.S. Salankar 2 1, 2 Department of Electronics & Telecommunication Engineering J.L.Chaturvedi College of Engineering, Nagpur 1 [email protected] AbstractClock gating is one of the power-saving techniques used on the Pentium 4 processor and in next generation processors. To save power, clock gating refers to activating the clocks in a logic block only when there is work to be done. From the earliest days of the Pentium 4 processor design, power consumption was a concern. The clock gating concept isn't a new one; however, the Pentium 4 processor used this technology to a large extent. Every unit on the chip has a power reduction plan, and almost every Functional Unit Block (FUB) contains clock gating logic. The work in this paper investigates the various clock gating techniques that can be used to optimise power in VLSI circuits at RTL level and various issues involved while applying this power optimization techniques at RTL level. Keywords— Clock Gating (CG), latch free clock gating, latch based clock gating, core dynamic power dissipation. I. INTRODUCTION With the advent of the consumer era and the popularity of mobile applications, power optimization is the mantra of the day. Designers go through several iterations to optimize power in order to achieve their power budgets. Though power should be optimized at all stages of the design flow, optimizations in early design stages have the greatest impact in reducing power [9, 10]. Clock power consumes 50-70 percent of total chip power and is expected to significantly increase in the next generation of designs at 45nm and below. This is due to the fact that power is directly proportional to voltage and the frequency of the clock as shown in the following equation: Power = Capacitance * (Voltage) 2 * (Frequency) Hence, reducing clock power is very important. Clock gating is a key power reduction technique used by many designers and is typically implemented by gate-level power synthesis tools. RTL Clock Gating is the most commonly used optimization technique for reducing dynamic power. The challenge of optimizing power by adding clock gating is knowing where and when to insert clock gating. The traditional method of looking at the percentage of registers that are clock gated is not indicative of the power savings because it does not take into account switching activity. The average Clock-Gating Efficiency for a design is a much better indicator of dynamic power consumption because it is a measure of both how many and how long registers are gated. II. CLOCK GATING Clock gating, which is probably one of the most well-known low-power techniques, is very effective in reducing the power consumption in digital circuits and also VLSI circuits. The goal of this technique is to disable or suppress transitions from propagating to parts of the clock path (i.e., flip-flops, clock network, and logic) under a certain condition computed by clock-gating circuits. The savings are mainly due to the switching capacitance reduction in the clock network and the switching activity in the logic fed by the storage elements because unnecessary transitions are not loaded when the clock is not active. CG is illustrated in figure 1 block CG, which inhibits the clock signal when the idle condition is true, is associated with each sequential functional unit.. The clock signal is computed by function Fcg. CLK is the system clock and CLKG the gated clock of the functional unit. Fig 1. Clock gating principle It is good design idea to turn off the clock when it is not needed. Automatic clock gating is supported by modern EDA tools. They identify the circuits where clock gating can be inserted. The RTL stage is the best point in the design process to optimize dynamic power. At this point, the system architecture is defined, the design is clock cycle accurate, and there is accurate power information available from lower design stages. The only thing left is for hardware designers to have a RTL metric to evaluate and identify candidate logic within a design for optimization of clock gating.

Upload: s-s

Post on 08-Dec-2016

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: [IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011 Annual IEEE India Conference - Clock gating — A power optimizing technique for

Clock Gating –A Power Optimizing Technique for VLSI Circuits

Jitesh Shinde1, Dr.S.S. Salankar2 1, 2 Department of Electronics & Telecommunication Engineering

J.L.Chaturvedi College of Engineering, Nagpur 1 [email protected]

Abstract— Clock gating is one of the power-saving techniques used on the Pentium 4 processor and in next generation processors. To save power, clock gating refers to activating the clocks in a logic block only when there is work to be done. From the earliest days of the Pentium 4 processor design, power consumption was a concern. The clock gating concept isn't a new one; however, the Pentium 4 processor used this technology to a large extent. Every unit on the chip has a power reduction plan, and almost every Functional Unit Block (FUB) contains clock gating logic. The work in this paper investigates the various clock gating techniques that can be used to optimise power in VLSI circuits at RTL level and various issues involved while applying this power optimization techniques at RTL level. Keywords— Clock Gating (CG), latch free clock gating, latch based clock gating, core dynamic power dissipation.

I. INTRODUCTION With the advent of the consumer era and the popularity of

mobile applications, power optimization is the mantra of the day. Designers go through several iterations to optimize power in order to achieve their power budgets. Though power should be optimized at all stages of the design flow, optimizations in early design stages have the greatest impact in reducing power [9, 10].

Clock power consumes 50-70 percent of total chip power and is expected to significantly increase in the next generation of designs at 45nm and below. This is due to the fact that power is directly proportional to voltage and the frequency of the clock as shown in the following equation:

Power = Capacitance * (Voltage) 2 * (Frequency) Hence, reducing clock power is very important. Clock

gating is a key power reduction technique used by many designers and is typically implemented by gate-level power synthesis tools.

RTL Clock Gating is the most commonly used optimization technique for reducing dynamic power. The challenge of optimizing power by adding clock gating is knowing where and when to insert clock gating. The traditional method of looking at the percentage of registers that are clock gated is not indicative of the power savings because it does not take into account switching activity. The average Clock-Gating Efficiency for a design is a much better indicator of dynamic power consumption because it is

a measure of both how many and how long registers are gated.

II. CLOCK GATING Clock gating, which is probably one of the most

well-known low-power techniques, is very effective in reducing the power consumption in digital circuits and also VLSI circuits. The goal of this technique is to disable or suppress transitions from propagating to parts of the clock path (i.e., flip-flops, clock network, and logic) under a certain condition computed by clock-gating circuits. The savings are mainly due to the switching capacitance reduction in the clock network and the switching activity in the logic fed by the storage elements because unnecessary transitions are not loaded when the clock is not active. CG is illustrated in figure 1 block CG, which inhibits the clock signal when the idle condition is true, is associated with each sequential functional unit.. The clock signal is computed by function Fcg. CLK is the system clock and CLKG the gated clock of the functional unit.

Fig 1. Clock gating principle It is good design idea to turn off the clock when it is not

needed. Automatic clock gating is supported by modern EDA tools. They identify the circuits where clock gating can be inserted.

The RTL stage is the best point in the design process to optimize dynamic power. At this point, the system architecture is defined, the design is clock cycle accurate, and there is accurate power information available from lower design stages. The only thing left is for hardware designers to have a RTL metric to evaluate and identify candidate logic within a design for optimization of clock gating.

Page 2: [IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011 Annual IEEE India Conference - Clock gating — A power optimizing technique for

RTL clock gating works by identifying groups of flip-flops which share a common enable control signal. Traditional methodologies use this enable term to control the select on a multiplexer connected to the D port of the flip-flop or to control the clock enable pin on a flip-flop with clock enable capabilities. RTL clock gating uses this enable signal to control a clock gating circuit which is connected to the clock ports of all of the flip-flops with the common enable term. Therefore, if a bank of flip-flops which share a common enable term have RTL clock gating implemented, the flip-flops will consume zero dynamic power as long as this enable signal is false.

III. HOW TO IMPLEMENT CLOCK GATING There are many clock gating styles available to optimize

power in VLSI circuits. They can be: 1) Latch-free based design. 2) Latch-based design. 3) Flip-flop based design. 4) Intelligent clock gating optimizing option available

in synthesis tool like Xilinx, Altera, Cadence SOC Encounter etc.

LATCH-FREE BASED CLOCK GATING DESIGN The latch-free clock gating style uses a simple AND or

OR gate (depending on the edge on which flip-flops are triggered). Here if enable signal goes inactive in between the clock pulse or if it multiple times then gated clock output either can terminate prematurely or generate multiple clock pulses. This restriction makes the latch-free clock gating style inappropriate for our single-clock flip-flop based design (figure 2).

Fig 2.Latch free clock gating

LATCH-BASED CLOCK GATING DESIGN The latch-based clock gating style adds a level-sensitive

latch to the design to hold the enable signal from the active edge of the clock until the inactive edge of the clock. Since the latch captures the state of the enable signal and holds it until the complete clock pulse has been generated, the enable signal need only be stable around the rising edge of

the clock, just as in the traditional ungated design style (figure 3).

Fig 3.Latch Based clock gating

In some applications, latch-based designs are preferred to D Flip Flop (DFF)–based designs. The basic concept is that a DFF can be split into two latches, and each one is clocked with an independent clock signal. The two clocks are non-overlapping clocks as presented in figure 4. Combinational network is usually inserted between the two latches to build a pipelined datapath.The main advantage is that this kind of design supports greater clock skew before failing than a similar DFF-based design. The second advantage is that time borrowing is achieved naturally in the pipelined datapath.

Fig 4. Master-slave latch and no overlapping clock concepts

The clock gating is easy to implement. A simple AND gate is used to generate the gated clock. This configuration (figure 5) is glitch-free because the control signal, generated when Phi1 is high, is stable and remains stable when Phi2 goes high.

Fig 5. Clock gating of latched based design

Page 3: [IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011 Annual IEEE India Conference - Clock gating — A power optimizing technique for

FLIP-FLOP BASED CLOCK GATING DESIGN This technique is similar to latch based design with only

difference that instead of latches usually D-flip-flops are used. But due to advantages latch based design offers, the flip-flop based design is generally not preferred.

Fig 4. Enabled (a) to gated clock transformation (b). It is well-known that this kind of flip-flops based design

are area and power-consuming, but their advantage compared with gated-clock-based design is that testability can be easily implemented and clock skew is more manageable.

INTELLIGENT CLOCK GATING OPTIMIZING OPTION AVAILABLE IN SYNTHESIS TOOL

Recently, in many industry sign-off tools like Cadence SOC Encounter, Altera, Xilinx etc intelligent clock gating option has been made available in the tool to optimize the power consumption of the design [6].

It is important to note that in such cases it may be possible that designer may not always get power reduction to desired level. Hence in such conditions designer may have to incorporate possible clock gating methods discussed above at RTL level to further reduce the dynamic power consumption of the circuit.

IV. ISSUES IN IMPLEMENTATION OF CLOCK GATING DESIGN TECHNIQUES

i.] The clock gate (i.e., AND or OR) must not alter the waveform of the clock other than turning the clock on or off.

ii.] Clock gating hold time violations and set-up time violations can be fixed like other violations during physical design phase (Timing Closure phase of Backend design).

iii.] Techniques can used to fix hold violations are clock skewing/buffering in data path near to endpoint (Timing Closure phase of Backend design).

iv.] Is clock gating dividing clock? , then designer should take care about phase of clock gating signal.

v.] Glitches may occur in the gated clock if clock gating is not done properly.

vi.] Improper control of the gating signal could result in big functional problems.

vii.] Overhead in design, verification and silicon area.

viii.] Clock-Gating Efficiency is defined as the percentage of time a register is gated for a given switching activity. When looking at an entire design, the average Clock-Gating Efficiency can be computed as the average of Clock-Gating Efficiencies for all registers in the design for a given simulation test bench.

Improving the Clock-Gating Efficiency in turn means reduced switching, which can save dynamic power. A designer’s goal is to improve the average Clock-Gating Efficiency as much as possible. It is not practical to achieve 100%, which means the design is idle and non-functional all the time.

Low Clock-Gating Efficiency is a good metric to identify candidate areas of the design to add clock gating. It may not always be possible to add clock gating to low efficiency areas and adding clock gating may not necessarily be accompanied by reduced power because dynamic power is also a function of clock frequency, voltage, and capacitance.

While Clock-Gating Efficiency is not an absolute indicator of power, it is a very good metric for hardware designers to gain visibility into power at the RTL without requiring time consuming power analysis or synthesis.

V. CASE STUDY : IMPLEMENTATION OF CLOCK GATING IN 8-BIT ARITHMETIC LOGIC UNIT

At RT and gate-level for dynamic power management, a gated clock provides a way to selectively stop the clock, and thus, force the original circuit to make no transition, whenever the computation that is to be carried out at the next clock cycle is redundant. In other words, the clock signal is disabled according to the idle conditions of the logic network. For reactive circuits, the number of clock cycles in which the design is idle in some wait states is usually large. Therefore, avoiding the power waste corresponding to such states may be significant.

In this case study, first an 8 bit ALU (Arithmetic Logic Unit) is designed and implemented on Xilinx ise Project Navigator 12.4 tool. This 8-bit ALU is planned to be used in design of an 8-bit microprocessor later wherein it may be required to inhibit the activity of 8 bit ALU during certain number of cycles of the instruction as required to reduce dynamic power consumption of the microprocessor.

So, during first phase of study, an 8-bit ALU is implemented. During this phase, design was tested with respect to various intelligent clocks gating options and design strategy available in Xilinx Project Navigator Tool version 12.4 to study its effect on net dynamic power dissipated or area in terms of logic blocks used. The results of this synthesis and implementation (FPGA Family-Spartan -6) are as follows:

CLK

CTRL

Datain

Datain

CLK

CLKG

CTRL

Register

CTRLint

CLR

QSET

L

H Q2

A Q1

Q2H

A

Register

Q1

ENB

ENB

Qbar

D1

0

Page 4: [IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011 Annual IEEE India Conference - Clock gating — A power optimizing technique for

Table 1: Results for 8 bit-ALU [FPGA family: Spartan 6]

Design Strategy

Power Reduction

Total Power (Watt)

Dynamic power (Watt)

No. of Logic slices used

Balanced OFF 0.210 0.197 57 / 2400 Balanced ON 0.209 0.196 57 / 2400 Power Minimization

--- 0.209 0.196 60 / 2400

Area Minimization

OFF Strategy-I Strategy-II

0.209 0.209

0.196 0.196

60 / 2400 59 / 2400

Area Minimization

ON Strategy-I Strategy-II

0.208 0.208

0.195 0.195

60 / 2400 59 / 2400

From the above results, it was observed that using

inherent tool capability to optimize dynamic power or area may not achieve optimization as desired.

So in next phase of case study, a clock gating concept (latch based) was incorporated in the design without affecting the functionality of the design. The results of this synthesis and implementation (FPGA Family-Spartan -6) are as follows:

Table 2: Results for 8 bit-ALU with CG [FPGA family: Spartan 6]

Design Strategy

Power Reduction

Total Power (Watt)

Dynamic power (Watt)

No. of Logic slices used

Balanced OFF 0.034 0.025 22 / 2400 Balanced ON 0.032 0.024 22 / 2400 Power Minimization

--- 0.031 0.022 21 / 2400

Area Minimization

OFF Strategy-I Strategy-II

0.034 0.034

0.025 0.025

22 / 2400 20/ 2400

Area Minimization

ON Strategy-I Strategy-II

0.032 0.032

0.024 0.023

22 / 2400 20/ 2400

On comparing the results of table1 and table 2

respectively, following points are concluded:- i.] It is good practice to use inherent intelligent clock

gating option viz. Design Strategy: Power Minimization, or Balanced with power reduction ON, to enhance the further chances of minimizing dynamic power consumption of the circuit.

ii.] If is possible, it is wiser to look for options whether clock gating concepts can be incorporated in circuit at RTL level. It is evident from results obtained in Table 1 and Table 2 respectively. In both the cases, test vectors applied to the circuit were same.

iii.] Estimating power depends on representative switching activity. A simulator can generate a switching activity file based on a given test-bench. This is only as

representative as the test-bench itself, so selection of a representative test-bench is critical to good power estimation.

VI. CONCLUSIONS Power optimization, traditionally relegated to the

synthesis, and placement and routing stages, has moved up to the System level and RTL stages. Hardware designers use clock gating to turn off inactive sections of the design and reduce overall dynamic power consumption.

The RTL approach is important because designers usually verify power only at the gate level and any change to the RTL needs many design iterations to reduce power. The RTL solution thus saves weeks of effort by fixing potential power issues up-front.

The RTL coding step is not too early in the design flow to address power consumption optimization. For each source of consumption and each type of digital block, appropriate solutions can be implemented. Although the theory behind some of these techniques can be complex, they are often easy to implement. RTL designers should be aware of these techniques and use their knowledge of the system not only to optimize the speed performance, but also to reduce the unnecessary switching activity.

REFERENCES [1] Massoud Pedram and Afshin Abdollahi, “Low Power RT-Level

Synthesis Techniques: A Tutorial” Dept. of Electrical Engineering, University of Southern California

[2] L. Benini, M. Favalli, and G. De Micheli, “Design for testability of gated-clock FSM’s,” in Proc. European Design and Test Conf., Paris, France, Mar. 1996, pp.

[3] Veena S Chakravarthi, ,K S Gurumurthy, , “Low Power Design Methodology for Core based ASSP” Centillium Communications India Pvt. Ltd and U V College of Engineering Bangalore, India. Pieter J. Schenmakers and J.Frans M. Theeuen, Eindhoven University of Technology, Neatherland, “Clock Gating on RT-VHDL”.

[4] Frank Emnett and Mark Biegel, Automotive Integrated Electronics Corporation, “Power Reduction Through RTL Clock Gating”.

[5] Safeen Huda, Muntasir Mallick, Jason H. Anderson, Dept. of ECE, Univ. of Toronto, Toronto, ON Canada, ‘‘Clock Gating Architectures for FPGA Power Reduction.”

[6] Frederic RivoallonReducing ‘‘Switching Power with Intelligent Clock Gating’’, WP370, Xilinx, March 1, 2011.

[7] Mitch Dale ‘‘Power Optimization in a High Performance Microprocessor Design’’, Calypto Design Systems.

[8] Mitch Dale “Utilizing Clock-Gating Efficiency to Reduce Power in RTL Designs”,, Calypto Design Systems.

[9] Mitch Dale “Power Optimization in a High Performance Microprocessor Design”,, Calypto Design Systems.