cmu-ece-1996-018

8/10/2019 CMU-ECE-1996-018

1/28

8/10/2019 CMU-ECE-1996-018

2/28

A gate level simulator for powerconsumption analysis

David J. Pursley ([email protected])

Department of Electrical and Computer EngineeringCarnegie Mellon University

Pittsburgh, PA 15213

Power onsumption f digital circuits has become critical design parameter. As such, it is neces-

sary that the system designer is able to estimate power consumption nd correlate the results back

to high level specifications. A gate level tool that estimates power consumption nd correlates the

results with functiona modules and control states has been designed. This tool has produced sti-

mations of the power consumption f twelve different implementations f the discrete cosine trans-

form (DCT). These results are being used to judge the relative impact of high-leve

transformations, such as pipelinin$ and varying the amount of resource sharing and parallelism,

on power dissipation for the D CT algorithm.

A gate level simulator for power onsumption nalysis May 1, 1996 1

8/10/2019 CMU-ECE-1996-018

3/28

Acknowledgments

I would ike to first thank my advisor, Don Thomas, or his patience, guidance and exam-

ple over the past two years.

I wouldalso like to thank my research partners and officemates, Pinar Ceyhan nd Sad

Coumeri, or both their help and their willingness to always end an ear.

Finally, I thank my professors at Bucknell University who irst interested me n the field of

computer engineering and then aided me in my decision to continue towards my Masters and

(someday) my Ph.D. Those helpful professors include Daniel Hyde, Jerud Mead, Xiannong Meng

(who is now at The University of Texas-Pan American), James Lu and Maurice Aburdene.

8/10/2019 CMU-ECE-1996-018

4/28

1.0 Introduction

Powerconsumption f digital circuits has become critical design parameter. For example, porta-

ble applications require low power ircuits to extend battery life, and all circuits have o deal with

the problemof electromigration. Thus, it is important hat the system designer is able to estimate

power onsumption nd correlate the results with high level specifications.

We ave designed a gate level tool that estimates power consumption nd correlates the results to

the original register-transfer level (RTL) pecifications. A unique aspect of this tool is that power

consumption s both estimated for individual modules nd reported by control state. It can also be

back-annotated with actual capacitance values from layout to produce more accurate estimations.

This tool is also being used to help pinpoint areas where power-saving optimizations are most

neededand to verify the accuracy of existing statistical power stimation techniques. We ave esti-

mated he power consumptionof 12 different implementations of the discrete cosine transform,

and we are currently laying out the designs in order to obtain capacitance values for back-annota-

tion. In the future, we hope o use this tool to aid in the design of systems using QuadRall echnol-

ogy, a low-power CMOS-based echnology currently being designed at CMU Kri96].

1.1 Our approach

Our goal is to provide a power stimation tool that will be maximally seful to the system designer

in considering various high level transformations, such as pipelining and varying the amount of

resource sharing and parallelism, and their effect on power onsumption. Therefore, this tool must

be easy to integrate into existing high level design tool flows, and its results must aid the designer

in clearly identifying power onsumption rade-offs. Our ool is easy to integrate into existing tool

flows as it accepts as input gate level Verilog code. Optionally, he tool also takes a list of capaci-

tance estimates for each of the nets in the design. These estimates can be extracted from the layout

A gate level simulator for power consumption nalysis May 1, 1996 2

8/10/2019 CMU-ECE-1996-018

5/28

8/10/2019 CMU-ECE-1996-018

6/28

2.0 Power estimation

Thepurpose of this tool is to accurately estimate power onsumption t a high level of abstraction.

More pecifically, this tool estimates dynamic ower dissipation based on simulation results.

Dynamic ower dissipation of a CMOS ircuit can be calculated with the following equation

[Wes85]:

1 __2P :

where C is load capacitance and fs is the switching frequency of the circuit. Our tool calculates

fs and takes as inputs the values of Vddand C, which can either be extracted from ayout or esti-

mated by other tools [Don79][Feu82][Lan94]. tatic and short circuit power estimation is not

taken into consideration here, although t is assumed hat if the target cell library is known, hese

could be calculated and added o the results produced by this tool.

2.1 Related work

Most of the previous work done in power consumption stimation differs from our approach in the

level of abstraction at which he estimations are made or the methodof obtaining the estimations.

Also, none of this work correlates the power estimates to both functional modules and control

states.

At the circuit level, both SPICE Nag75] and PowerMill Epi96] can be used to measure power

consumption by digital systems. Although PowerMill can run over 1000 times faster than SPICE,

it is still impractical o simulate at the circuit level for large designs or if manynputs vectors are to

be simulated. At the next level of abstraction, the switch level, simulators such as IRSIM Sa189]

are able to simulate circuits over 500 times faster than SPICE, with a root mean quare error of

less than 15% Lan94]. Still, faster simulations could be done at the gate level.


8/10/2019 CMU-ECE-1996-018

7/28

Several faster gate level tools have been developed, but none are generally applicable to a wide

variety of applications. Several require that the input vectors are able to be characterized probabi-

listically a priori [Naj91 [Gho92] Cho94] Mar94]. Other work s designed for and applicable only

to signal-processing algorithms [Pow90]. Devedas, et.al, have designed a generally-applicable

gate level algorithm, but it only predicts worst-case power dissipation [Dev90].

Landman nd Rabaey have developed architecture level techniques, [Lan93] [Lan94] [Lan95], but

these also require a priori characterization of the input vectors. Although hese tools provide accu-

rate results for the types of applications they are geared toward, a more generally applicable tool is

needed.

By working t the gate level of abstraction our tool is able to simulate larger designs than the

switch level simulators, and by making stimations based on simulation, our tool can be used to

estimate power egardless of whether heir inputs can be readily characterized.

We hose to use simulation-based power estimation over probabilistic power estimation tech-

niques for several reasons. First, not all systems have nputs that are easily or accurately character-

ized by probabilistic methods. Second, his tool fits into existing tool flows easily. Even f the

systemcould be accurately characterized for probabilistic estimation, designing he statistical

models may nvolve a significant amount of additional work or the designer. Simulation-based

estimation requires very little extra work or the designer. Finally, we hope o use this tool to verify

the results of probabilistic estimation methods or various types of algorithms.


8/10/2019 CMU-ECE-1996-018

8/28

3.0 Implementation

Our goal is to provide an easily integratable tool that accurately estimates power onsumption t a

high level of abstraction. Our powerestimates are calculated by simulating the hierarchical gate

level Verilog description and then post-processing the value change dump VCD) ile produced

the simulation. Thus, the designer does not need to alter the existing tool flow. This tool can be

added on the side for additional help in evaluating power rade-offs.

3.1 Tool flow

Oneof the goals in creating this tool is that it must be easily integratable into existing tool flows.

Since its input is hierarchical gate level Verilog, this tool can easily be inserted in existing tool

flows. Figure 1 illustrates where he power stimator can be used in the tool flow currently used in

the Center for Electronic Design Automation t Carnegie Mellon University. As shown n the dia-

gram, this tool is used to estimate power fter logic and datapath synthesis has been performed.

Powerestimation can be performed again after the circuit has been laid-out to produce a more

accurate estimation using capacitance values extracted from the layout information. Note also that

the addition of the power stimation tool does not alter the original tool flow at all. It merely adds

another tool that can be used when high level power stimates are desired.

3.2 Estimating power

Thepowerestimation tool is actually a series of programs, as shown n Figure 2. The gate level

Verflog code produced by logic and datapath synthesis is simulated with a standard Verilog simu-

lator and a VCD ile is created. The VCD ile is then passed to heads~rpl~er and s~a~-

e s t r i pp e r, which xtracts the information rom he VCD ile. Then1 i s t dr i ver s is invoked

and, through use of the Verilog programming anguage nterface (PLI), a list of the drivers of the

nets in the design is produced. If the design has been laid-out, parsespf is used to extract capac-


8/10/2019 CMU-ECE-1996-018

9/28

BehavioralLevel Verilog

Behavioral Synthesis SAW

Register TransferLevel Verilog

Logic and Datapath f Synopsys ~D,e_sign Compiler/~Synthesis ~ CASCADEEpoch

Gate LevelVerilog

Place and Route

StandardParasitics File(SPF)

FIGURE . Tool flow at CEDA

itances from he standard parasitics file (SPF). The results of all the programs re finally passed

pc~wer___parser which produces power estimates by module and control state.

Note that the implementation resented in Figure 2 assumes hat all of the functionality of the

powerestimation tool is being used. If, for example, ayout has not yet been performed nd no

standard parasitics file (SPF) s available, lhe par s e sp f program wouldnever be invoked and


8/10/2019 CMU-ECE-1996-018

10/28

stripped SPF ile would be passed to power_gars r. f power estimates by control state were

not desired, then states tril2per would not be invoked and no state information file would be

passed to t~ower_~arser. Similarly, if the power estimates are not to be correlated with func-

tional modules, the 1 i s tdr iver s program would not be used.

Belowwe will discuss the functionality and implementation of each of the programs nvolved in

the power stimation tool., and a users manual or the tool is located in the Appendix.

3.2.1 Simulation

Thefirst step of the power stimation tool is simply a straightforward Verilog simulation that cre-

ates a value change dump VCD) ile. In general, a Verilog simulation would be done at this phase

of the design process even f no power stimates were to be made n order to verify the gate level

design. Thus, no overhead is added to the design process by running the simulation. The only

modification of the Verilog description that needs to be made s the addition of the value change

dumpVerilog commands, dumpy 1 e and $ dumpvars, f these are not already included in the

code.

By creating a VCD ile, the designer can perform many different analyses on the same gate level

design while only running the actual Verilog simulation once. Because of their size, we usually

compress the VCD iles and then pass them to the other programs hrough a pipe from zcat, a

UNIX rogram hat outputs the contents of a compressed ile. As a result, the VCD ile cannot be

rewound uring reading. This is largely why he estimation environment has been broken into sev-

eral smaller programs.

3.2.2 headstripller

The purpose of heads tr ipl3er is to create a copy of the portion of the VCD ile that defines the

tokens of all nets, registers and variables. This information is needed for the 1 istdrivers and

A gate level simulator for power onsumption nalysis May1, 1996 8

8/10/2019 CMU-ECE-1996-018

11/28

8/10/2019 CMU-ECE-1996-018

12/28

headstripper uns airly uickly, nd uses ess han 00KB f memory, eadstripper

was mplemented ith pproximately 0 lines f C code.

3.2.3 statestril l~er

statestripper s much ike eadstripper xcept hat t parses he ntire CD ile nd

copies only those lines that have o do with the control state ariable. This is necessary so that the

VCDile needs to be parsed only once during the execution of power__pars er.

statestripper requires that the designer knows he names of the nets whose value is the con-

trol state. One imitation of this tool is that the control state must have one net name such as

CSTATE 3 : 0 ] ) and cannot be a concatenation of several nets (such

( a [ 1 ], foo, bar, cout [ 3 ] }). Currently, the designer must manually lter the gate level Ver-

ilog code, if necessary. Since we are using the Synopsys Design Compiler or our designs, we have

always ound t easy to make his alteration since the names f the control state registers are nearly

identical to the control state register names t the register transfer level.

s tatestripper xecutes airly uickly, lthough t does ake onger han eadstripper s

s tares ripper ust arse he ntire CD ile. t uses 00 B of memory uring xecution,

and s a simple rogram ith pproximately 0 ines f C code.

3.2.4 parsespf

Thepurposeof p ar s e sp f is to extract the capacitances or the nets in the design. The output is a

list of net names with their associated capacitances.

This is simply a parser written in C++ hat steps through the SPF ile. It executes quickly and uses

less than 200 KB of memory, parsespf is implemented with approximately 80 lines of C code.


8/10/2019 CMU-ECE-1996-018

13/28

3.2.5 listdrivers

In hierarchical Verilog descriptions several named ets in the hierarchy often refer to the same

physical net. Hereafter, I shall term such Verilog nets analogous nets. To accurately correlate

powerestimations with modules, the Verilog name of the driver of the physical net must be deter-

mined. Then all power consumed n the net is attributed to the module ontaining the driver.

I i s tdr vers utputs a list of the Verilog nets connected o the drivers.

The driving net is determined by both looking at the header of the VCD ile and through use of the

Verilog programming anguage interface (PLI). The VCD eader file is used to rapidly determined

analogous nets, since analogous nets are assigned to the same oken in the VCD ile. Once he

analogousnets are found, the PLI is used to determine whichof the nets is connected o the driver.

Note that although the PLI is used, the simulation is not run a second time; I istdrivers eter-

mines he drivers at the end of compilation and then exits.

1 is tdr ivers still executes fairly quickly, but the memory equirement is much arger, as much

as 60 MB or a 21,000 net design. However, pproximately 23 MB f this is the overhead involved

in running the Verilog simulator. The amount of memory sed by the PLI code is O(n), where n

is the number f distinct nets.

1 i s tdr ivers was implemented with approximately 760 lines of C code linked to the verilog

simulator through the PLI.

3.2.6 power_~arser

Finally, the header file, control state alue change ile and driver list are parsed along with the

VCDile and power estimations are produced. A general discussion of the algorithm used and its

complexity follows.

A gate level simulator for power onsumption nalysis May 1, 1996 11

8/10/2019 CMU-ECE-1996-018

14/28

First, the header ile is read and a sparse table data structure is created for the nets so that the

search time for any net is O(1) when he nets token (as specified in the VCD ile) is known.

ating and initializing this structure is O(n), wheren is the number f distinct nets.

Thedriver list is then parsed and the driving nets are tagged. The driver names re first placed n a

binary tree. Building such a tree has time complexity O(nlogn). Eachnet must search the tree for

a driver, so the time complexity of the all of the searches is n O(logn) O(nlogn). Threfore,

the total time complexity or parsing and tagging the drivers is O(nlogn).

Thestripped SPF ile is then read in and stored in a binary tree and the nets are assigned heir cor-

responding capacitances with an algorithm similar to that used above. By a similar argument, the

total time complexity or assigning the capacitance values is O(nlogn).

The VCDile is then parsed and transitions are counted for each net and also categorized by time.

Adding transition for a net is O(1), as discussed above. Adding ransitions for a certain Verilog

time step is also O(1), so the total time for the parsing of the VCD ile and counting ransitions

O(v), where v is the number of value changes.

Next, power s calculated for each net. This involves one floating-point multiply for each net, so

the time complexity of this is simply O(n). Next, the state value changes are parsed and transi-

tions are characterized by time. This involves stepping through an array with one entry for each

simulated time step, so the time complexity or this step is O(t), where is the number f time

steps in the simulation.

Next, statistics are gathered by module. This involves stepping through he table of nets and doing

a strcmt3 ( ) function call for each nets driver. Thus, O(n) s~rcmt3 ) s will be called.


8/10/2019 CMU-ECE-1996-018

15/28

Finally, statistics are gathered by control state. This involves O(t) comparisons nd additions.

Note that O(t) operations must be performed or the entire system as well as for each module or

which tatistics are being gathered. Since the number f modules or which tatistics are being col-

lected can be assumed o be a small constant, the total number f operations that must be done to

collect state statistics is O(t).

Thus, the total worst-case rime complexity of the VCD arser would be O(nlogn) for large

designs with a shorter simulation time or O(v) for lo nger si mulations. The memory usage for

shorter and medium ize simulations is dominated by O(n) because each net is represented by a

class instantiation. The memory sage for a very long simulation would be O(t), as one double

and one nt are malloced for each Verilog time step.

power__~ar et is the portion of the power estimation tool that consumes he most CPU ime, as

will be shown n Section 4.0. It also is the largest piece of code, implemented n over 2200 ines of

C++.

3.3 Estimating capacitance

Accuratecapacitance estimations are essential for accurate power stimations. In the set of tools

described above, capacitance value are extracted from the design once a layout has been com-

pleted. Although his does give very accurate estimates of capacitance, it is not the only way such

estimates could be produced. Since t~ower_pars er simply reads in a file of net names and their

associated capacitances, the tool is highly flexible, allowing designers to use either high level esti-

marion techniques such as those presented in [Don79]lFeu82][Lan94] r to back-annotate capaci-

tances by extracting them from the layout once it has been done. If high level estimation

1. v will always e greater han or equal o t, since a time step is executed n a simulation f andonly if oneor more alue changes ccur during hat time step. Therefore, O(v) dominatesO(t).

A gate level simulator for power consumption nalysis May1, 1996 13

8/10/2019 CMU-ECE-1996-018

16/28

techniques were used, the only change to the tools above would be the modification or omission of

parsespf.


8/10/2019 CMU-ECE-1996-018

17/28

4.0 Case study: DCT

Wehave run twelve different versions of the one-dimensional discrete cosine transform. These

designs were created by Coumeri and the behavioral level differences in the designs can be seen in

Table 1 [Cou96]. Note that # of partitions refers to the number of pipeline stages (each with its

own control logic) in the design. DCT1 hrough DCT8 re not pipelined; they have only one parti-

tion. # of mult is the number of multipliers in the design. memory refetch is yes if values

for iteration i+1 of the loop are being fetched while iteration i is being executed. This column does

grey code grey code of memory state memory

example partitions of mult prefetch encoding access of nets

DCT1 1 3 no no no 15810DCT2 1 3 no yes no 15879

DCT3 1 2 no no no 12196

DCT4 1 2 no yes no 12226

DCT5 1 3 yes no no 16848

DCT6 1 3 yes yes yes 16833

DCT7 1 2 yes no no 13306

DCT8 1 2 yes yes yes 13280

DCT9 6 3 ..... no no 21327

DCT10 6 3 ..... yes yes 21346

DCTll 2 3 ..... no no 16725

DCT12 2 3 ..... yes no 16745TABLE . DCT Descriptions

not apply to the pipelined designs (DCT9 hrough DCT12) ince they are already executing multi-

ple iterations of the loop at the same time. grey code state encoding and grey code memory

accesses are yes if the states and memory ddresses, respectively, are accessed in grey code

order. Finally, # of nets is the number of physical nets in each design.

Table 2 shows the CPU time and memory usage for headstripper, tatestripper,

i is tdrivers nd power__parser eing executed on three of the DCT designs. These three

designs were chosen because DCT3 s the smallest example (i.e. fewest number of nets), DCT9

one of the largest examples, and DCT1 s somewhere in between. The CPU ime is reported in


8/10/2019 CMU-ECE-1996-018

18/28

minutes and seconds. Note that these results are dependent on which tatistics are being collected.

headstripper statestripper listdrivers power_parser

CPU time Memory CPU time Memory CPU time Memory CPU time Memory

DCT1 0:21 92 KB 6:56 100 KB 1:32 52196 KB 24:44 40940 KBDCT3 0:17 92 KB 7:42 100 KB 1:19 50368 KB 20:05 39408 KB

DCT9 0:26 92 KB 5:33 100 KB 2:20 60000 KB 32:59 45212 KBTABLE . Execution imes and memory sage for the power stimation for three of the DCT esigns

As stated in Section 3.2.6, collecting statistics by module nd state both involve some overhead n

computation ime. For each of the examples, he same statistics were being collected: total energy,

energy consumed y each multiplier, energy consumed y all of the adders and subtracters, energy

consumed y registers, energy consumed y random glue logic (everything except the above mod-

ules) and total energy by control state. The statistics were gathered on an IBM RS/6000worksta-

tion with 384 MB f memory.

For our DCT xamples, power_parser never took more than 35 minutes of CPU ime on the

RS/6000,and most of the time was spent during the actual parsing of the VCD ile. Also, the

power estimation environment never required more than 60 MB f memory.

Thedisk space overhead or the output files is presented in Table 3. For the 12 versions of the DCT

we tested, ranging from 12,000 to 21,000 nets, heads t ripper and s t a t es tripper

involved an overhead of 1.0 to 1.7 megabytes of hard disk storage for the header information and

56 to 140 kilobytes for the state information. Note that the input for the DCTs was25 8x8 blocks

of imagedata, and that the disk space overhead or state information scales linearly with the num-

ber of blocks in the simulation. The header information is unaffected by the length of the simula-


8/10/2019 CMU-ECE-1996-018

19/28

tion. The hard disk space overhead involved in storing the list of drivers is 297 to 490 kilobytes for

the DCT xamples. This is independent of the length of the simulations.

headstripper statestripper listdrivers power_parser

DCT1 1216 KB 116 KB 395 KB < 1 KBDCT2 1223 K_B 116 KB 396 KB < 1 KB

DCT3 1000 KB 137 KB 290 KB < 1 KB

DCT4 1004 KB 137 KB 291 KB < 1 KB

DCT5 1335 KB 81 KB 412 KB < 1 KB

DCT6 1335 KB 81 KB 412 KB < 1 KB

DCT7 1129 KB 101 KB 309 KB < 1 KB

DCT8 1126 KB 101 KB 308 KB < 1 KB

DCT9 1723 KI3 55 K13 479 KB < 1 KB

DCT10 1726 KB 60 KB 479 KB < 1 KB

DCTll 1318 KB 73 KB 411 KB < 1 KBDCT12 1320 KB 73 KB 411 KB < 1 KB

TABLE . Disk space usage for power estimation of the 12 DCT esigns

4.1 Results

Figure 3 gives a comparison of the results for the 12 designs. Since all 12 designs were not laid-

out, the results shown assume that all nets have the same capacitance, i. e. only transition counts

are reported. One nteresting thing we immediately noticed from these results is that using grey

code for the state encoding produced a 10% or better reduction in transition count for all of the

designs. Also, notice that pipelining the design reduces power consumption.However, for these

designs it would be arguable if pipelining into six stages, since in DCT9 nd DCT10, nstead of

two stages, as in DCT11 nd DCT12, is worthwhile as power savings is about 10% while the com-

plexity (as measured by number of nets) has increased over 30%. Still, if power savings is more

important than minimizing complexity of the circuit, the six stage pipeline designs would be

desired.

Figure 4 shows the transition count by control state for the DCT and DCT designs. Again, the

value of using grey code for state encoding can be seen here. By comparing the states in DCT1 nd


8/10/2019 CMU-ECE-1996-018

20/28

Transitions by Module for Twelve DCT Designs

8000000

7000000

6000000

5000000

4000000

~3000000

2000000

1000000

DCT1 DCT2 DCT3[] DCT4 DCT5~a DCT6 DCT7 DCT8 DCT9 DCT10[] DCT11 DCT12

Module

FIGURE . Comparison of 12 DCT designs

DCT2 hat perform the same functions, we see that the approximate 10% eduction is apparent in

all similar states. (Remember hat because the states of DCT2 re grey coded, DCT1 tate 2 is the

same as DCT2 tate 3, DCT1 tate 3 is the same as DCT2 tate 2, DCT1 tate 4 is the same as DCT

state 6, etc.)

Transitions by Control State for Two DCTDesigns

140000012000001000000

8000006000004200000

1 2 3 4 5 6 7Control State

FIGURE . Transition count by control state for two DCT esigns

A gate level simulator for power onsumption nalysis April 30, 1996 18

8/10/2019 CMU-ECE-1996-018

21/28

Finally, the DCT design was laid-out and capacitance values were extracted to give us the results

in Figure 5. Notice that although the transition counts appear to be a good predictor of energy for

the multipliers, transition counting alone underestimates the amount of power consumed y ran-

dom ogic and overestimates the power consumed y the adders and subtracters. This makes ense

as one can imagine hat the random ogic would often be driving fairly long nets with a large

capacitance, while the nets inside the adders and subtracters would be very short 0.e. low capaci-

tance) most of the time. The nets inside the array multipliers would not be as short as those inside

the adders nor as long as those in the random ogic. Correspondingly, he transition counts for the

multipliers are a better predictor of energy consumed han for either the random ogic or the adders

and subtracters.

The ransition counts in Figure 3 and Figure 5 differ because different target cell libraries were

used. Also, because the library used for Figure 5 had more accurate timing models, more glitching

occurred within the circuit. As a result, only 16 8x8 blocks were able to be simulated because of

the larger size of the VCD ile. (The 16 block example roduced a VCD ile four times larger than

the 25 block examples hat assume qual delay for all cells.) Thus, the results in Figure 3 and

Figure 5 should not be compared gainst each other.

Our gate level power estimation tool has produced estimations for 12 different DCT mplementa-

tions. These estimations have allowed us to determine he relative impact of several high level

transformations, such as grey coding state assignments, resource sharing and pipelining, on

dynamicpowerdissipation for these designs.


8/10/2019 CMU-ECE-1996-018

22/28

8/10/2019 CMU-ECE-1996-018

23/28

8/10/2019 CMU-ECE-1996-018

24/28

Appendix A - Users manual

In the following sections, I shall outline how o use each subprogram f the power estimation tool.

Please refer to Figure 2 on page 9 to see how ach of these tools are tied together.

A.1 headstripper

heads ripper eads he CD ile rom tdin nd utputS the eader ile o s tdout. o

invoke t from he ommand ine, ou ould nter:

headstripper < vcdfile > vcd.head

or, if the VCDile is compressed:

zcat vcdfile.gz [ headstrip~er > vcd.head

A.2 statestripper

statestripper lSO reads the VCD ile from stdin nd outputs to stdout. t also takes as

an argument he hierarchical Verilog name of the net with the value of the control state. As men-

tioned in Section 3.2.3 on page 10, the net must be a single multi-bit net and not a concatenation of

nets. To invoke the program rom the command ine, you would enter:

statestripper top.foo.bar. CSTATE < vcdfile > vcd. state

or if the VCD ile is compressed:

zcat vcdfile.gz I statestripper top.foo.bar.CSTATE > vcd. state

A.3 l arsespf

parsesp also eads rom tdin nd utputs o s tdout. t akes s an rgument he ierarchi-

cal refix to be added o the net names resent in the SPF ile. For example, n our DCT xamples,

the DCTwas laid-out, but in the Verilog source that passes input vectors to the DCT he DCT

module was named top. dct_l. To invoke parsespf in this case we used:

parsespf top.dct_l. < spffile > strippedspf


8/10/2019 CMU-ECE-1996-018

25/28

listdrivers-v ../mod.v+turbo+3

Note that the trailing period on tOp. ct_l, s necessary. If multiple modules were laid-out,

then par s e sp f would be executed for each and then all of the SPF files could then be concate-

nated together as follows:

first_strippedspf second_strippedspf > final_strippedspf

A.4 listdrivers

1 i s t dr ive r s is a PLI routine and should be invoked in the same way as the original simula-

tion. One additional argument must be added +dump_ ollowed by the name of the VCD eader

file. For one of our DCT imulations, we invoked 1 i s tdr iver s as follows:

+dump_dct.head sim.v -v dctlf.map2.v \-v ../ms0803vcells_mosis -v ../ms080_3vprims \

The output of listdrivers list of the driver nets in the file DRV. ist.

A.5 power_parser

power_parser also accepts input from stdin and outputs to stdout. There are a number of arguments

that can be passed to power_parser, outlined below.

-fdrv driverfile name of the driver file created by listdrivers

-fhead cd. head name of the VCD header file created by heads tripper

-fstate cd. state name of the VCD state file created by statestripper

- f s sp s t r i pp d s p f name f the ~ipped PF ile reated y a r s e s p f

-state op. oo. bar. STATE name of the control state net. The net name should be

the same as the one used in statestripper.


8/10/2019 CMU-ECE-1996-018

26/28

-net top. foo. bar. \* names of nets to determine energy consumption or. In this case,

statistics would be gathered for the module op. foo. bar. Note that both leading and trailing

s are legals, but internal *s are not. Note also that the * must be escaped with the \ so that

the shell does not try to expand t. Any number f -net arguments s legal.

As an example of the use of power_s)arser, the following command as used to get the statis-

tics displayed in Figure 5, Power estimates including capacitance information for one DCT

design (DCT ), on page 20:

zcat ../tmp/dctll.gz I ../parser -fdrv dctll.drv \

-fhead dctll.head -fstate dctll.state -state top.dct_l.CSTATE \

-net top.dct_l.multi* -net top.dct_l.mult_l\* \

-net top.dct_l.mult_2\* -net top.dct_l.mult_3\* \

-net top.dct_l.U\* -net top.dct_l.r\* -net \*_reg\* \

-net top.\* -fsspf dctll.spf > dctll.results

A gate level simulator for power onsumption nalysis May , 1996 24

8/10/2019 CMU-ECE-1996-018

27/28

References

[Cho94] T. Chou, K. Roy and S. Prasad, Estimation of circuit activity considering signalcorrelations and simultaneous switching, Proceedings of ICCAD 4, pp. 300-303, Nov. 1994.

[Cor90] T. H. Cormen, C. E. Leiserson and R. L. Rivest, Introduction to algorithms, NewYork: McGraw-Hill Book Company, pp. 244-259, 1990.

[Cou96] S. L. Coumeri, private communication, April 1996.

[Epi96] Epic Design Technologies, Inc., http://www.epic.com/powermill.html, 1996.

[Dev90] S. Devadas, K. Keutzer and J. White, Estimation of power dissipation in CMOScombinational circuits, Proceedings of Custom C Conference 90, pp. 19.7.1-19.7.6.

[Don79] W. Donath, Placement and average interconnection lengths of computer ogic,

IEEETransactions on Circuits and Systems, pp. 272-277, April 1979.

[Feu82] M. Feuer, Connectivity of random ogic, IEEETransactions on Computers, pp.29-33, Jan. 1982.

[Gho92] A. Ghosh, S. Devadas, K. Keutzer, and J. White, Estimation of average switch-ing activity in combinational nd sequential circuits, Proceedings of DAC 2, pp.253-259, 1992.

[Kri961 R. K. Krishnamurthy, . Lys and L. R. Carley, Static power driven voltage scalingand delay driven buffer sizing in mixed swing quadrail for sub-IV I/O swings,submitted to IEEE/ACM nternational Symposium n Low Power Electronics andDesign 96, August 1996.

[Lan931 R E. Landman nd J. M. Rabaey, Power estimation for high level synthesis,Proceedings of EuroDAC 3, pp.361-366, Feb. 1993.

[Lan941 P. E. Landman, Low-power rchitectural design methodologies, ElectronicsResearch Laboratory, College of Engineering, University of California, Berkeley(UCB/ERLM94/62), 1994.

lLan95] E E. Landman nd J. M. Rabaey, Architectural power analysis: the dual type bitmethod, IEEE Transactions on VLSISystems, pp. 173-187, June 1995.

[Mar94] R. Marculescu, D. Marculescu and M. Pedram, Switching activity analysis con-sidering spatiotemporal correlations, Proceedings oflCCAD 4, pp. 294-299,Nov. 1994.

[Nag75] L. W. Nagel, SPICE2: computer program to simulate semiconductor circuits,Technical report, University of California, Berkeley (ERL-M520), 975.

[Naj91] E Najm, Transition density, a stochastic measure of activity in digital circuits,Proceedings of DAC 1, pp. 644-649, June 1991.


8/10/2019 CMU-ECE-1996-018

28/28

cmu-ece-1996-018

Documents