accurate power estimation for sequential cmos circuits...

VLSI DESIGN2001, Vol. 12, No. 2, pp. 187-203Reprints available directly from the publisherPhotocopying permitted by license only

(C) 2001 OPA (Overseas Publishers Association)N.V.Published by license under

the Gordon and Breach SciencePublishers imprint.

Accurate Power Estimation for Sequential CMOSCircuits Using Graph-based Methods

MIRIAM LEESERa’* and VALERIE OHMb’t

aDept, of Electrical and Computer Eng., Northeastern University, Boston, MA 02115,"bSequence Design Inc., 469 El Camino Real, Santa Clara, CA 95050

(Received 20 June 2000; In finalform 3 August 2000)

We present a novel method for estimating the power of sequential CMOS circuits.Symbolic probabilistic power estimation with an enumerated state space is used toestimate the average power switched by the circuit. This approach is more accurate thansimulation based methods. Automatic circuit partitioning and state space explorationprovide improvements in run-time and storage requirements over existing approaches.Circuits are automatically partitioned to improve the execution time and to allow largercircuits to be processed. Spatial correlation is dealt with by minimizing the cutsetbetween partitions which tends to keep areas of reconvergent fanout in the samepartition. Circuit partitions can be recombined using our combinational estimationmethods which allow the exploitation of knowledge of probabilities of the circuit inputs.We enumerate the state transition graph (STG) incrementally using state spaceexploration methods developed for formal verification. Portions of the STG aregenerated on an as-needed basis, and thrown away after they are processed. BDDs areused to compactly represent similar states. This saves significant space in the storage ofthe STG. Our results show that modeling the state space is imperative for accuratepower estimation of sequential circuits, partitioning saves time, and incremental statespace exploration saves storage space. This allows us to process larger circuits thanwould otherwise be possible.

Keywords: Power estimation; Low power design; Binary decision diagrams; Design automation

1. INTRODUCTION

We present an accurate way to estimate averagepower dissipation in sequential CMOS circuits.We use symbolic power estimation with an

enumerated state space to estimate the averagepower dissipated by the circuit. This approach ismore accurate than simulation based methodswhich are biased depending on their starting stateor states.

*Corresponding author, e-mail: [email protected]: [email protected]

187

188 M. LEESER AND V. OHM

Our approach differs from similar approaches intwo important ways. We enumerate the statetransition graph (STG) incrementally using thestate space exploration method [2] developed forformal verification. This saves significant space inthe storage of the STG. In addition, we useautomatic circuit partitioning which speeds up therun-time for power estimation. Combining incre-mental state space exploration with partitioningallows us to accurately estimate larger sequentialcircuits than was previously possible.Our approach is similar to cell-based estimation,

which has previously been applied to combina-tional circuits [6]. It differs in that we partitioncircuits automatically. This results in a moreaccurate estimation because partitions are gener-ated to provide the best size for power estimationand do not depend on the size of a gate ormacrocell.Our work is similar to other sequential prob-

abilistic power estimation, but is more accurate,while addressing issues of efficiency. For example,Monteiro et al. [7] construct the completeSTG and compute state probabilities using theChapman-Kolmogorov equation. They restrictthemselves to circuits with a smaller number offlip-flops than we can handle. Chou and Roy [3]build an extended STG structure that includesprimary input vectors in the states. This extendedSTG is considerably larger than a normal STG,which greatly increases the required storage spaceand the computation for finding state probabil-ities. Other researchers [9, 10] accommodate largerSTGs by unrolling the circuit a few cycles toapproximate the correlation. Tsui et al. [10] useChapman-Kolmogorov to compute probabilities,while Schneider et al. [9] use Boolean equations.Najm et al. [8] run a Monte Carlo logic simulationon the sequential circuit to collect the signaland transition probabilities of the next-state lines,which are then fed into their combinationalestimator. Our method is more accurate than thesebecause it builds the whole STG; it requires lessstorage because the STG is built incrementally.

In Section 2 we describe combinational powerestimation, which is used for sequential estimation.

In Section 3 we describe the techniques used forsequential estimation. Our sequential methodbuilds a STG for a circuit using state spaceexploration, calculates transition probabilitiesand switched capacitance for each STG edge,determines state probabilities using Chapman-Kolmogorov, then estimates power averaged overthe STG. In Section 4 we discuss ways to improveon the basic algorithms to make the approachmore efficient in both time and space. Results arepresented in Section 5.

2. COMBINATIONAL POWERESTIMATION

We estimate average power dissipated in combina-tional circuits by deriving a behavior graph andthen computing the average dissipated power ofthat graph. We use the behavior graph to findaverage power dissipated over any or all possibleinput vector combinations. Large circuits are

partitioned and behavior graphs are formed foreach subcircuit, and the power estimate is com-puted hierarchically. Spatial correlation is dealtwith by minimizing the cutset between partitionswhich tends to keep areas of reconvergent fanoutin the same partition.

2.1. Behavior Graph

In the behavior graph, each vertex represents thestate of the circuit or partition when it is subjectedto a particular input vector(s). This is representedas a circuit vector of Booleans which are the valueson the gate outputs. Figure shows a circuit, itstruth table grouped by circuit vectors, and thecorresponding behavior graph. Multiple inputvectors that result in the same circuit vector are acompatible set of input vectors and are stored withthe vertex. State 10 has a compatible input set of001, 011 and 101, as all three of these input vectorsproduce the same logic values on circuit nodes dand e.

The edges in the behavior graph representpossible transitions from state to state. In a purely

GRAPH BASED METHODS 189

abc110

101000010100

switchedcapacitance l/ ",

Co001/ _. 10 compatible

d e input set

O0

1 0

I I

C.. Co..

FIGURE Behavior graph example.

combinational circuit, the behavior graph iscompletely connected. Between every state and j(including the case i--j) there exists an edge, ei,j.Associated with each ei,j is a capacitance Ci,j thatrepresents the capacitance switched in the traversalfrom state to state j. Ci,j is

Cij Z (xi(k) D xj(k))Cgfanoutk. (1)kGN

where N is the set of all gate outputs in the circuit,xi(k) is the logical value on gate output k in state i,Cg is the default gate capacitance, and fanoutk isthe fanout of gate k.

2.2. Partitioning

Power estimation based on behavior graphs hasmany advantages. The behavior graph contains allessential information for determining switchingactivity, and once a behavior graph has beengenerated, power estimation is fast and accurate.Behavior graph formation is accomplished byevaluating the state of all gates in the circuit forevery possible input, and as a result, is exponentialin the number of circuit inputs, which leads to long

run times. We solve this problem by partitioningthe circuit, then building behavior graphs for thesubcircuits. This results in fast run times with aminimal loss of accuracy.Our goals in partitioning are to minimize the

cutset between partitions and to keep the parti-tions relatively balanced. Minimizing the cutsetcorresponds to minimizing the number of netsbetween partitions, which in turn corresponds tominimizing the amount of spatial correlation datalost. We hierarchically partition the circuit until allpartitions are at or below a size that results inefficient power estimation.

Reconvergent fanout results when two or moreinputs to a gate are functions of the same signal.Figure 2 shows a circuit with two incidences ofreconvergent fanout. Minimizing the cutset tendsto keep reconvergent portions of the circuit in thesame partition, as they frequently manifest them-selves as well-connected components. This helps toimprove accuracy as highly correlated signals arekept in the same partition and weakly correlatedsignals are separated. Keeping the partitionsbalanced prevents the execution time from beingdominated by one partition that is significantlylarger than the others.


Here V is the set of states in the behavior graph,Ci,j is as described in Eq. (1), and Pi is the state

probability of state i, as described in the nextsection. The Cavg values for all the behavior graphsare summed to get Cavg,totat which is the averagecapacitance switched over all possible input vectorpairs.

FIGURE 2 Circuit with reconvergent fanout.

2.4. State Probabilities

To achieve these goals we partition the circuitusing the ratio-cut algorithm [11]. Ratio-cut is amodification of the Kernighan-Lin algorithm [4]for variable-sized partitions. Rather than minimiz-ing the cutset while perfectly bisecting the graph, itseeks to minimize

CABR(AB) (2)IAI x IBIwhere CA is the size of the cutset of the parti-tions and IAI and IBI are the numbers of nodes ingraph partitions A and B, respectively. WhereKernighan-Lin requires the graph to be exactlybisected, ratio-cut allows the sizes of the parti-tions to be changed. This allows the algorithm tofind a smaller cutset without producing severelyunbalanced partitions.

2.3. Power Estimation

The average dissipated power in a circuit orsubcircuit is computed from the correspondingbehavior graph. To compute the average powerdissipation of the circuit, we use:

2Pave, - Vlclfctk Cavg,total (3)

For each behavior graph we compute Cag, theweighted sum of the capacitance switched overeach edge in the behavior graph:

Cavg ZpiPjCij (4)iE VjE V

The state probability of state i, Pi is simply theprobability that the circuit will be in state at anygiven time. The sum of all the state probabilities isequal to 1. The general equation for the stateprobability of state is:

nnp

Pi- ZH a, where ajlik=l

{p(k) if xj(k)-p(k) otherwise

Ii is the compatible set of input vectors for state i,

ninp is the number of primary inputs to the circuitor subcircuit, p(k) is the signal probability ofprimary input k, and xj(k) is the logic value on

primary input k in input vector j. The stateprobability of state is the probability that thecircuit (or subcircuit) will be in state at any giventime. The signal probability of gate k is theprobability that the logic value on gate k is equalto 1. The signal probability of a primary input isthe probability that the input is 1.

In the case where all values of p(i) are 0.5 (allinput vectors are equally probable), Eq. (5)reduces to

Pi- II otl (6)

where IIotl is the number of possible input vectors.This will be the case for unpartitioned combina-tional circuits when we assume that all inputvalues are equally probable. Our approach canalso be used where the probability of different


inputs is known. This allows us to process circuitshierarchically.

2.5. Power Estimation of Partitioned Circuits

We evaluate power hierarchically, based on knowl-edge of the signal probabilities of the inputs to thebehavior graph. We assume that the probabilitiesof the primary inputs of the circuit are provided.We use the default value of 0.5 for primary inputprobabilities if no knowledge of the environmentof the circuit is given. Ca,,g is evaluated from thebehavior graph of a circuit or subcircuit whenthe signal probabilities are available for all theprimary inputs of the subcircuit. Some of theprimary inputs of a subcircuit may be outputs ofgates from other subcircuits. For example, theoutput of gate D in Figure 3(a) is the output ofsubcircuit p2 and an input to subcircuit p l. Notethat D is duplicated in p as a primary input tothat subcircuit.The signal probabilities of subcircuit inputs

are computed from the behavior graph of the

subcircuit that contains the corresponding gate.The input probabilities of that behavior graphmust have been already computed as well. Theequation for the signal probability for gateoutput j is:

Pi" Xi(j). (7)P(J) -[ V

where xe(j) is the logic value of gate output j instate and Pi is the state probability for statedescribed above. The signal probabilities arestored in a probability vector as they are com-puted.The behavior graphs need to be processed in

order of dependencies. To visualize this, we canform a directed graph of the subcircuits with edgesrepresenting dependencies. Figure 3(c) is such arepresentation of the circuit partitioning repre-sented in Figures 3(a)-(b). As such a graph isacyclic, it is straightforward to process eachbehavior graph as the probability informationrequired becomes available. In the case of a circuitlike the one shown in Figures 3(d)-(0 that has

(a) (b) (c)

(d) (e) (f)

FIGURE 3 Partitioning examples (dotted arrows indicate dependencies).


cyclic dependencies, we need to do some additionalwork.The following pseudocode shows how we

can evaluate Cavg for all behavior graphs notinvolved in cyclic dependencies. We start byforming a list of all the behavior graphs calledbgraph_list.

while BGRAPH_LIST decreasing in size

for each BGRAPH in BGRAPH LIST

if all signal probabilities on

inputs are available

compute input probabilitiesfor BGRAPH

compute Cavg of BGRAPH and add to

Cavg,totalcompute signal probabilities

for other signals and storein probability vector

remove BGRAPH from BGRAPH LIST

For a circuit with no cyclic dependencies, theabove procedure is sufficient to compute Cavg,total

for the entire circuit. Otherwise, bgraph_list willcontain all the behavior graphs of subcircuitswithin cycles. To find the remaining Cavg values,we guess the unavailable signal probabilities(starting with a value of 0.5) and looping over allthe behavior graphs until the signal probabilitiesconverge. Convergence is achieved when all theelements of the probability vector vary by less than0.01% in successive iterations.

guess 0.5 for all unavailable

signal probabilitieswhile probability vector has not

convergedfor each BGRAPH in BGRAPH LIST

compute input probabilities for

BGRAPH

compute signal probabilitiesfor other signals

for each BGRAPH in BGRAPH LIST

compute Cavgof BGRAPH and add to

Cavg,total

3. SEQUENTIAL POWER ESTIMATION

Sequential power estimation builds on combina-tional power estimation and adds models forlatches and state transitions. Figure 4 shows oursequential circuit model. When we analyze thecombinational portion of a sequential circuit, wetreat the present state lines as if they werecombinational primary inputs, and the next statelines as combinational outputs, and use combina-tional power estimation.We use a STG to represent a circuit. Each vertex

in a STG is a state and is associated with a binarycode that corresponds to the values of the presentstate lines. Edges correspond to possible transi-tions from one state to another and are associatedwith one or more input vectors. The circuit vectorof a sequential circuit depends on the input vectorand the previous state. Figure 5 shows an exampleof a sequential circuit and its state transitiondiagram.

Sequential power estimation consists of thefollowing steps: the states of the STG areenumerated, transition probabilities and switchedcapacitances are calculated for each STG edge,state probabilities are found, and power isaveraged over the STG. In the rest of this section,we describe these steps in more detail.

3.1. Building the State Transition Graph

We build the STG using state space exploration [2],a form of breadth-first search. Instead of havingthe graph structure already available, we derive itincrementally using the transition relation, which is

Present State:/ N Next State,Com_btnational

.\ LogicPrimaryInputs_

Clock

Primary Outputs

FIGURE 4 Sequential circuit model.


a+

FIGURE 5 State transition diagram example.

derived from the circuit description. The transitionrelation is a list of Binary Decision Diagrams(BDDs), one for each next state variable, whosevariables are the present state variables and theprimary inputs. To find all of the possible nextstates, we evaluate each BDD in the transitionrelation with the current values of the present statevariables. These partially evaluated BDDs are thenin terms of the primary inputs to the circuit. Fromthese, we derive the next states and the inputvectors associated with them.As edges in the STG are discovered, their

transition probabilities can be computed fromthe signal probabilities of the primary inputs,which are either provided by the user or assumedto be 0.5. We store the transition probabilities in amatrix indexed by source and destination states.From the states of the edge and the input vectors,we construct a partial circuit vector with thepresent and next state values. The logical values onthe other circuit nodes are unknown, so we recordthem as don’t cares. The partial circuit vector andthe compatible set of input vectors are stored withthe edge for use later when we compute theswitched capacitance.

3.2. Markov Processes and Steady-stateProbabilities

We need to know the steady-state and transitionprobabilities to estimate the power. We compute

the transition probabilities as we construct theSTG. To find the steady-state probabilities, we usea technique from Markov process analysis calledlimiting state probabilities. Limiting state prob-abilities are derived from n-step transition prob-abilities. The 1-step transition probability, p0(1)(or p#) is defined as the probability that the STGwill transition from state to state j in one timestep, and is computed directly from the signalprobabilities of the primary inputs. The n-steptransition probability, po(n) is the probability thatthe STG will transition from state to state j in ntime steps. We compute it from a modified versionof the Chapman-Kolmogorov equation: p/(n)=k pik(n- 1)pj.

In the STGs of practical circuits, we observethat as n grows, pij(n) becomes independent ofboth n and and converges to the steady-stateprobability Pj, which is the probability that theSTG will be in state j at any given time.At this point, each edge in the STG is associated

with one or more partial circuit vectors. Multiplecircuit vectors for a single edge (i,j) occur whenmore than one primary input vector applied to thecircuit in state result in a transition to state j. Thenext step is to fill in the don’t cares so we cancompute the switched capacitance. We use ourcombinational circuit estimation method whichrepresents each partition of the circuit as aseparate behavior graph. The brute-force methodfor this is to intersect each partial sequential circuit


vector with every circuit vector in every behaviorgraph. We use a more efficient method calledcircuit vector recombination that is described inSection 4.1. Vectors whose intersections containclashes between ones and zeros represent aninvalid state in the circuit and are discarded. Aspartial sequential circuit vectors fully specify theprimary circuit inputs and the present state lines,each partial sequential circuit vector will produceone fully defined circuit vector and correspondinginput vector after the intersection operation.

3.3. Computing Switched Capacitance

Switched capacitance in an STG depends on edge-to-edge information. We associate a single capac-itance with each edge (i,j) in the STG, which is aweighted average of all edge-to-edge transitionsthat originate at (i,j). Some edges may havemultiple circuit vectors corresponding to multiplesets of gate outputs that represent the transitionfrom state to state j.

Figure 6 shows a portion of an STG where someedges have multiple circuit vectors. We define theswitched capacitance between two circuit vectors:

C/jm0gn Cg(fanout. [$rijm ( Stjkn]) (8)

where Vijm is the ruth circuit vector on edge (i,j), vjk.is the nth circuit vector on edge (j, k), and Cqm,ign isthe capacitance switched when the state machinetransitions from state to j to k via those circuitvectors.

FIGURE 6 Capacitance computation example.

For the example in Figure 6 we compute thetotal average switched capacitance for edge (i,j)with:

P/jlCij (Pjk Cijl,jkl "J- Pjk Cijl ,jk

+ njlCij’,jl + PjmCijl,jm)

+ (Pig’ Cii2,jkl + Pik Cij2,jk

+ + (9)

PO Pii’ + Pij (10)

We compute the total average switched capaci-tance for edge (i,j) with:

(11)

where

Pii E Pijn (12)

is the total transition probability for the edge (i,j).Along with the steady-state probabilities, the

switched capacitance values are used to computeCavg the weighted sum of the switched capacitanceswitched over all edges in the STG:

Cavg E Pi ZpiiCi (13)is jS

This accounts for the power dissipated due tocombinational logic. We also must account forpower dissipated by flip-flops. To do this wecompute the power for each of the four possibletransitions (0, 0) (0, 1) (1,0) (1, 1) for a standardsize flip-flop, and add them into the computationof the switched capacitance where appropriate.

4. IMPROVING SEQUENTIALESTIMATION

In this section we describe techniques to im-prove on sequential estimation as described in


Section 3. We use a more efficient technique torecombine circuit vectors. To improve on theefficiency in space and time of the power computa-tion for sequential circuits, we use incrementalstate-space exploration to reduce the storagerequirement, and circuit partitioning to reduceruntime.

4.1. Circuit Vector Recombination

Previously, we mentioned that filling in the don’tcares of the partial sequential circuit vectors can bedone by brute-force circuit recombination. Thisrequires intersecting the partial sequential circuitvector associated with a STG edge with everycircuit vector in every behavior graph, and is tooexpensive for large circuits. Some recombination isrequired for sequential power estimation toassociate the sequential circuit vectors with thecombinational ones. We use a tertiary search treefor this.When we execute brute-force recombination of

a partial sequential circuit vector with a behaviorgraph, we treat the behavior graph as if it weresimply a list of circuit vectors with no regard to thegraph structure itself. We start with the initialpartial sequential circuit vector and conduct atrial intersection with each circuit vector in thebehavior graph until we find a valid result. Wereplace the initial sequential circuit vector with thenew result and intersect it until we find a anothervalid result. This process continues until all ofthe circuit vectors in the behavior graph areexhausted.For real circuits, we find that the vast majority

of circuit vector intersections do not yield a validresult. Our goal in formulating an algorithm tospeed up this process involves substantially redu-cing the number of useless intersection operationswithout missing any of the necessary ones. Weaccomplish this by restructuring the list ofbehavior graph circuit vectors into a tertiarysearch tree such that we can quickly locate thecircuit vectors that are useful and ignore most ofthe ones that are not.

4.1.1. Building the Tertiary Search Tree

Each circuit node represented in a circuit vectorcan take on one of three logical values: ’0’, ’1’ or’X’ (don’t care). In a tertiary search tree, each levelof the tree is associated with the index of the circuitvector. The root (the first level) of the treerepresents the first circuit node, the second levelthe second circuit node and so on. Each node in thetree has three outgoing edges, each correspondingto one of the three logical values the circuit nodefor that level can take on.

Building the tertiary search tree for a set ofcircuit vectors is a recursive procedure. Theprocedure is passed the current list of circuitvectors and a vector index. The circuit vector list isdivided into three lists-which list a circuit vectoris added to depends on the logical value at thevector index. A new node is created and the threecircuit vector lists are passed on to three more callsof the procedure with the index incremented. Thenew node is connected to those three results andthe node is returned. A tertiary search tree built tothe full depth of the length of the circuit vector willusually have far more terminal nodes than thereare circuit vectors. We choose to terminatebuilding a branch when the circuit vector list hasonly one vector left. We also avoid building emptyterminal nodes by not recursing when the vectorlist is empty. Figure 7 shows the pseudo-code forthis procedure and Figure 8 shows an exampletertiary tree for a set of partial circuit vectors.

4.1.2. Recombination with Tertiary SearchTrees

In the previous section, we described the construc-tion of a tertiary search tree for a set of partialcircuit vectors. Note that tertiary search trees arebuilt very similarly to binary search trees: the don’tcares are treated the same way as ’0’ and ’1’ values.When we search one of these trees, however, weconsider the don’t cares as a special case.

If we were to treat the don’t care as a literallogical value, when we searched a tree with a new


node MAKE_Tlgg(bgraph BGraph)1 return MAKE-TREENODE(VECTORLIST_FROM_BGRAPH(BGraph, 1)node MAKE_TREENODE(list vectorlist, integer index)123456789101112131415161718

list H, L, Xnode parent NEW_NODE(index)if SIzE(vectorlis0 1

parent.vectors vectorlistreturn parent

elsefor each vec E vectorlist

switch (vec[index])case ’1’:

APPEND(H, vec)case 0’:

APPEND(L, vec)case ’X’:

APPESD(X, vec)if (H) parent.high MAKE_TREENODE(H, em index+l)if (L) parent.low MAKE_TREENODE(L, index+l)if (X) parent.x MAKE_TREENODE(X, index+l)return parent

FIGURE 7 Pseudo-code for tertiary tree formation.

12345670 X X 0 1X 10 X X 1 1X 01 0 X 0 lX 1lOX llXO1 lX X 0 X 1

FIGURE 8 Example tertiary tree.

circuit vector, we would simply follow the paththat corresponds to the values of that vector thatreturns one partial circuit vector when the terminalnode is reached. Because the don’t care can take on

a ’0’ or ’1’ value, we need to follow more than onepath to get all of the possible andidate vectors. Inaddition, a ’0’ or ’1’ in the new vector can alsointersect with vectors in the ’X’ branch of the tree.


list TRAVERSE_TREE(node current_node, vector vec, vectorlist result, int index)if (TERMINAL(current_node))

return current_node, vectorsswitch (vec[index])

case: ’1’:APPEND(result, TRAVERSE_TREE(current_node.high, vec, result, index+l))APPEND(result, TRAVERSE_TREE(current_node.x, vec, result, index+l))

case: ’0’:APPEND(result, TRAVERSE_TREE(current_node.low, vec, result, index+l))APPEND(result, TRAVERSE_TREE(current_node.x, vec, result, index+l))

case: ’X’APPEND(result, TRAvERSE_TREE(current_node.high, vec, result, index+l))APPEND(result, TRAVERSE_TREE(current_node.low, vec, result, index+l))APPEND(result, TRAVERSE_TREE(current_node.x, vec, result, index-t-I))

return result

FIGURE 9 Pseudocode for searching a tertiary tree.

The search procedure will return a list of all ofthe vectors that could successfully intersect withthe circuit vector in question. Figure 9 shows thepseudocode for the search procedure.Though there is quite likely to be many more

vectors in the result list than will successfullyintersect, the size of this list is typically much lessthan the number that would be intersected by thebrute-force method. Figure 10 shows the result ofusing tertiary search trees for several sequentialcircuits. The x-axis is the normalized partition sizeand the y-axis is the number of actual intersectionsperformed divided by the number of intersectionsthat would have been required using the bruteforce method. We find that for unpartitionedcircuits, the number of intersections is reduced byover 80%. For reasonable numbers of partitions(up to four, in these examples) that there is stillreasonable improvement.

4.2. Partitioning in Power Estimation

In combinational estimation of partitioned cir-cuits, we avoided recombination due to thecomplexity. As a result, we also expect to lose acertain degree of accuracy by neglecting somecorrelation in the circuit. We experimented withestimating the power in sequential circuits withoutrecombination to determine if there is a significant

enough speed improvement that the cost in lostaccuracy is acceptable. In unpartitioned estima-tion, with each edge we stored a list of pairs. Eachpair consists of a circuit vector/input vectorcombination and the transition probability foreach. When we include circuit partitions, we storea list of these pairs for each circuit partition. Somerecombination is still required to combine thesequential circuit vector with each behavior graph.When we fill in the probability matrix we willalready have an edge’s total transition probabilitywhen we intersect it with the first behavior graph-all of the circuit inputs are already accounted forin the partial sequential circuit vector.The switched capacitance computation is similar

to the recombined case. We compute the switchedcapacitance for each partition and add each valueinto the capacitance matrix as they are computed.At this point, limiting state probability and totalpower dissipation computation are computed thesame way as for unpartitioned and recombinedcircuits.

4.3. State-space Exploration

A third problem, that must be addressed is thestorage requirements of enumerating and savingthe entire STG. While our method clearly requiresthe full enumeration of the STG, there is no

198

6

5

M. LEESER AND V. OHM

nonincremental, nonpartitioned

00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

1/no. of partitions

FIGURE 10 Circuit vector intersection improvement with tertiary search trees.

0.9 1

critical reason to have the entire STG available inmemory for power estimation. Transition prob-abilities are computed directly from the STGedges-having other edges available in memoryat the same time is not required. Computing theswitched capacitance for an edge (i,j), however,requires both the edge (i,j) and all ofis outgoingedges. Other than that, nothing else is needed. Wetherefore use a modification to state-space ex-ploration called incremental state space exploration(ISSE). This method involves generating newedges and nodes in the STG as we need them,then discarding them when no longer required.During ISSE we maintain three lists of states

called tiers. The first tier is a list of one or morestates in the STG. The second tier is a list of all of

the next states from all of the states in the first tier.Note that the second tier may contain states thatare also in the first tier, which would be the case ifone or more states in the first tier has an outgoingedge that points back to itself. Similarly the thirdtier contains all of the subsequent states to thestates in the second tier. Only three tiers of statesare required to be held in memory at any giventime, allowing us to discard unnecessary states andhence save storage.

In addition to the three tier lists, we alsomaintain a data structure in which we indicatewhich STG edges have already had their capaci-tances computed. This prevents the procedurefrom re-processing any states that have alreadybeen done. Each state in the STG also has a value


associated with it that indicates how many tier liststhe state is currently in, which is used to determinewhich states are still needed and which are not.

5. RESULTS

We ran a number of experiments on bothcombinational and sequential circuits to show theefficacy of our approach. These results of theseexamples are given in this section.

5.1. Combinational Results

Our power modeling and estimation algorithmwas implemented on a Sun Ultrasparc in C. Weran trials on several circuits to evaluate thecomplexity of behavior graph formation withand without partitioning. We also measuredruntimes for power estimation and the accuracyof power estimation in the presence of partition-ing. In Figures l(a) and (b) and Table I weshow the execution times of the behavior graph

8-to-1priority decoderparity gen/chk

io.. "’,. "........""o-’ Z

Number of partitions

(a) Behavior Graph Formation (smallcircuits)

8-bit RC adder

priority decoderparity gen/chk

...............................Numb

(b) Power Estimation (small circuits)

..o c2670c6288brotX3

Ntnnl:)m’ partilks

(c) Behavior Graph Formation (largercircuits)

(d) Power Estimation (larger circuits)

FIGURE 11 Program execution times (in ms).

200

Circuit

M. LEESER AND V. OHM

TABLE Program execution times (in ms)

part 2 parts 4 parts 8 parts

Description Gates bgr pwr bgr pwr bgr pwr bgr

16 parts

pwr bgr pwr

adder8fastadd4mux81pridecparity

8-bit RC adder 60 76E3 10E6 928 3318 115 123 55 424-bit lookahead adder 35 800 810 450 130 350 100 155 658-to-1 mux 23 950 40 940 200 455 112 70 338-bit priority decoder 33 170 500 78 38 45 20 38 188-bit parity generator 92 12E4 13E7 1657 10475 332 258 137 60

1251257538108

TABLE II Program execution times (bgraph formation/power estimation, in ms)

ISCAS85 benchmarks LGSynth89 benchmarks

Partitions c1355 c2670 c6288 b9 rot x3

32 6940/15640 70/7064 2040/3220 90/50108 940/1100 70/30 5860/22380164 950/400 14510/30400 17090/23980 230/30 2000/2270 6510/20730200 17620/14830 8660/18200 2560/1570 4700/9000240 8920/6300 2660/1220 4520/5100248 8670/5830 12870/10920 4130/3970280 16370/8750

formation (columns labeled bgr) and powerestimation (columns labeled pwr) phases of ourpower estimator for some small circuits. Figuresl(c) and (d) and Table II show similar data for

some larger circuits from the ISCAS85 [1] andLGSynth89 [5] benchmark sets.

Behavior graph formation is exponential in thenumber of inputs to the circuit or subcircuit. It ishere that partitioning provides reduction inexecution time. As shown in Figures l(a) and(c), in some cases the execution time drops offexponentially as the number of partitions in-creases. This does not happen in all cases becausethe execution time is more closely correlated to thenumber of inputs in the subcircuit than thenumber of gates.

Partitioning also has a dramatic effect on thecomplexity of power estimation. Figures 11 (b) and(d) show the substantial decreases in executiontime from even minimal partitioning. In Table IIIwe show average power estimates for the samecircuits. In all cases we assume the signalprobabilities of the primary inputs are all equal

to 0.5 (all possible input vectors are equally likely),transistor gate area is 1.5 Ixm2, gate capacitance persquare micron is 1.7 fF and the clock period is10 ns. Note that as a result of partitioning, inputsto partitions internal to the circuit might havedifferent input probabilities, which are accountedfor in the power estimation. The first column ofresults in the table shows the average power overall possible input vector combinations for anunpartitioned circuit, as calculated by our pro-gram. The following columns show the samecalculation over partitioned circuits. Table IVshows some typical power estimates on thepartitioned benchmark circuits from Figuresl(c) and (d). Figure 12 shows the partitioned

estimates never deviate more than 3% from theestimates on unpartitioned circuits.

5.2. Sequential Results

Table V compares averagb power dissipationestimates for sequential circuits. The line labeledcomb corresponds to circuits run ignoring the

GRAPH BASED METHODS

TABLE III Comparison of average power estimates (in IxW) for small static CMOS circuits,Tctk= I0ns, W 1.5 gm, L= gm, Cg= 1.7 fF/gm2. (see also Fig. 12(a))

Partitions

Circuit 2 4 8 16

7seg 24.9 26.0 26.0 26.0 26.0adder8 33.8 33.8 33.8 33.9 34.4fastadd4 26.9 26.9 26.9 26.9 26.9mux81 10.3 10.2 10.2 10.2 10.2pridec 15.1 15.1 15.1 15.1 15.1parity 57.3 57.3 57.3 57.2 57.1

201

TABLE IV Comparison of average power estimates (in, pW) for larger static CMOS circuits, Tck= 10ns,W 1.5 pm, L lxm, Cg 1.7 fF/lxm-. (see also Fig. 12(b))

ISCAS85 benchmarks LGSynth89 benchmarks

Partitions c1355 c2670 c6288 b9 rot x3

32 419.59 104.9964 420.11 105.08108 419.17 105.16 569.40164 422.99 1057.02 2520.03 105.13 568.54 622.58200 1054.41 2512.15 569.69 621.69240 1037.56 570.65 621.69248 1059.69 2515.29 621.70280 2515.29

seg decoder8- RC adder4-bit LA adder8-to-1

of parUUom

(a) Small Circuits

.I * oaa

iC1355 (587 gates).) C2670 (1350 ga1:es)Jc6288 (2448 gates)l

x3 (898 g:es)

(b) Larger Circuits

FIGURE 12 Estimation accuracy for small and larger CMOS circuits.

effects of the sequential logic; the line labeled seqshows power estimates using state space explora-tion in order to take into account state spacecorrelation. Clearly, incorporating sequential cor-relation is critical to accurate power estimates.

TABLE V Comparison of power estimates

circuit gray5 ent7 cnt8 s386

comb 39.25 14.16 16.31 140.34seq 6.52 2.31 2.32 8.05


4500

3OOO

.nonincreme.ntal ,exp!qratio.n.,exploration seq. partitioning

10 20 35partRIons

FIGURE 13 Runtime reduction with partitioning.

circuit

TABLE VI Memory savings from exploration

% states in mem

total states avg peak

cokemach 15 63 100s386 13 65 85gray5 32 13 16cnt7 128 3.3 3.9cnt8 256 1.6 1.9s1488 48 20 44

Figure 13 shows the difference in run times withand without sequential partitioning for a counter.Sequential partitioning results in significant speed-up. Incremental state space exploration results in asmall percentage of the total state space stored inmemory for sequential machines, as shown inTable VI.

6. CONCLUSIONS

We have presented a symbolic, probabilistictechnique for combinational and sequential circuitpower estimation. Our approach is more accuratethan simulation based approaches, and moreefficient than many symbolic approaches. Weincorporate automatic circuit partitioning, beha-vior graph recombination, and incremental statespace exploration to efficiently and accuratelyestimate power in digital circuits.

Our combinational results show the circuitpartitioning results in a large gain in performancewith a very small loss in accuracy. Our sequentialresults show that modeling the state space isimperative for accurate power estimation ofsequential circuits, partitioning saves time, andincremental state space exploration saves storagespace. This allows us to process larger circuits thanwould otherwise be possible.

References

[1] Franc Brglez and Fujiwara, H. (1985). A neutral netlistof 10 combinational benchmark circuits and a targettranslator in fortran. In: IEEE International Symposium onCircuits and Systems, pp. 695-698.

[2] Burch, Jerry R., Clarke, Edmund M., Long, David E.,McMillan, Kenneth L. and Dill, David L., Symbolicmodel checking for sequential circuit verification. IEEETransactions on Computer-Aided Design of IntegratedCircuits and Systems, 13(4), 401-424, April, 1994.

[3] Tan-Li Chou and Kaushik Roy (1995). Accurate estima-tion of power dissipation in CMOS sequential circuits.In: IEEE International ASIC Conference and Exhibit,pp. 285- 288.

[4] Kernighan, B. W. and Lin, S., An efficient heuristicprocedure for partitioning graphs. Bell System TechnicalJournal 49(2), 291 308, February, 1970.

[5] LGSynth89 Benchmark set, http://cbl.ncsu.edu/CBL_Docs/lgs89.html 1989.

[6] Jiing-Yuan Lin, Wen-Zen Shen and Jing-Yang Jou(1997). A power modeling and characterization methodfor macrocells using structure information. In: IEEE/ACMInternational Conference on Computer-Aided Design,pp. 502- 506.

[7] Jos6 Monteiro, Srivinas Devadas and Bill Lin (1994). Amethodology for efficient estimation of switching activityin sequential logic circuits. In: ACM/IEEE DesignAutomation Conference, pp. 12-17.

[8] Najm, Farid N., Shashank Goel and Hajj, Ibrahim N.(1995). Power estimation in sequential circuits. In: ACM/IEEE Design Automation Conference, pp. 635-640.

[9] Schneider, Peter H., Senn, Matthias A. and Bernd Wurth(1996). Power analysis for sequential circuits at logic level.In: European Design Automation Conference, pp. 22-27.

[10] Chi-Ying Tsui, Jos6 Monteiro, Massoud Pedram,Srinivas Devadas, Despain, Alvin M. and Bill Lin(1995). Power estimation methods for sequential logiccircuits. In: IEEE Transactions on VLSI, pp. 404-415.

[11] Yen-Chuen Wei and Chung-Kuan Cheng, Ratio cutpartitioning for hierarchical designs. IEEE Transactionson Computer-Aided Design, 10(7), 911-921, July, 1991.

Authors’ Biographies

Miriam Leeser is an Associate Professor at North-eastern University, Department of Electrical and


Computer Engineering. She received her BS degreein Electrical Engineering from Cornell University,and Diploma and Ph.D. Degrees in ComputerScience from Cambridge University in England.After completion of her Ph.D., she joined CornellUniversity, Department of Electrical Engineering.In January, 1996 she joined the faculty of North-eastern University, where she is a member of theCenter for Communications and Digital SignalProcessing and the Computer Engineering group,and head of the Rapid Prototyping Laboratory.In 1992 she received an NSF Young Investi-gator Award. Her research interests includehardware description languages, high level

synthesis, and reconfigurable computing for signaland image processing applications. She is asenior member of the IEEE, and a member ofthe ACM.Valerie Ohm received her S.B. in ElectricalEngineering from the Massachusetts Institute ofTechnology in May, 1993. She received a Masterof Engineering degree in May, 1994 and a PhD inMay, 1999, both in Electrical Engineering atCornell University in Ithaca, NY. Since gradua-tion she has been working at Sequence Design,Inc. Her research interests include design tools forlow power, and computer-aided design tools forVLSI circuits.

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2010

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of



RotatingMachinery


Hindawi Publishing Corporation http://www.hindawi.com

Journal ofEngineeringVolume 2014

Submit your manuscripts athttp://www.hindawi.com

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

SensorsJournal of


Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks


accurate power estimation for sequential cmos circuits...

Documents