cmu-ece-1996-018
TRANSCRIPT
-
8/10/2019 CMU-ECE-1996-018
1/28
-
8/10/2019 CMU-ECE-1996-018
2/28
A gate level simulator for powerconsumption analysis
David J. Pursley ([email protected])
Department of Electrical and Computer EngineeringCarnegie Mellon University
Pittsburgh, PA 15213
Power onsumption f digital circuits has become critical design parameter. As such, it is neces-
sary that the system designer is able to estimate power consumption nd correlate the results back
to high level specifications. A gate level tool that estimates power consumption nd correlates the
results with functiona modules and control states has been designed. This tool has produced sti-
mations of the power consumption f twelve different implementations f the discrete cosine trans-
form (DCT). These results are being used to judge the relative impact of high-leve
transformations, such as pipelinin$ and varying the amount of resource sharing and parallelism,
on power dissipation for the D CT algorithm.
A gate level simulator for power onsumption nalysis May 1, 1996 1
-
8/10/2019 CMU-ECE-1996-018
3/28
Acknowledgments
I would ike to first thank my advisor, Don Thomas, or his patience, guidance and exam-
ple over the past two years.
I wouldalso like to thank my research partners and officemates, Pinar Ceyhan nd Sad
Coumeri, or both their help and their willingness to always end an ear.
Finally, I thank my professors at Bucknell University who irst interested me n the field of
computer engineering and then aided me in my decision to continue towards my Masters and
(someday) my Ph.D. Those helpful professors include Daniel Hyde, Jerud Mead, Xiannong Meng
(who is now at The University of Texas-Pan American), James Lu and Maurice Aburdene.
-
8/10/2019 CMU-ECE-1996-018
4/28
1.0 Introduction
Powerconsumption f digital circuits has become critical design parameter. For example, porta-
ble applications require low power ircuits to extend battery life, and all circuits have o deal with
the problemof electromigration. Thus, it is important hat the system designer is able to estimate
power onsumption nd correlate the results with high level specifications.
We ave designed a gate level tool that estimates power consumption nd correlates the results to
the original register-transfer level (RTL) pecifications. A unique aspect of this tool is that power
consumption s both estimated for individual modules nd reported by control state. It can also be
back-annotated with actual capacitance values from layout to produce more accurate estimations.
This tool is also being used to help pinpoint areas where power-saving optimizations are most
neededand to verify the accuracy of existing statistical power stimation techniques. We ave esti-
mated he power consumptionof 12 different implementations of the discrete cosine transform,
and we are currently laying out the designs in order to obtain capacitance values for back-annota-
tion. In the future, we hope o use this tool to aid in the design of systems using QuadRall echnol-
ogy, a low-power CMOS-based echnology currently being designed at CMU Kri96].
1.1 Our approach
Our goal is to provide a power stimation tool that will be maximally seful to the system designer
in considering various high level transformations, such as pipelining and varying the amount of
resource sharing and parallelism, and their effect on power onsumption. Therefore, this tool must
be easy to integrate into existing high level design tool flows, and its results must aid the designer
in clearly identifying power onsumption rade-offs. Our ool is easy to integrate into existing tool
flows as it accepts as input gate level Verilog code. Optionally, he tool also takes a list of capaci-
tance estimates for each of the nets in the design. These estimates can be extracted from the layout
A gate level simulator for power consumption nalysis May 1, 1996 2
-
8/10/2019 CMU-ECE-1996-018
5/28
-
8/10/2019 CMU-ECE-1996-018
6/28
2.0 Power estimation
Thepurpose of this tool is to accurately estimate power onsumption t a high level of abstraction.
More pecifically, this tool estimates dynamic ower dissipation based on simulation results.
Dynamic ower dissipation of a CMOS ircuit can be calculated with the following equation
[Wes85]:
1 __2P :
where C is load capacitance and fs is the switching frequency of the circuit. Our tool calculates
fs and takes as inputs the values of Vddand C, which can either be extracted from ayout or esti-
mated by other tools [Don79][Feu82][Lan94]. tatic and short circuit power estimation is not
taken into consideration here, although t is assumed hat if the target cell library is known, hese
could be calculated and added o the results produced by this tool.
2.1 Related work
Most of the previous work done in power consumption stimation differs from our approach in the
level of abstraction at which he estimations are made or the methodof obtaining the estimations.
Also, none of this work correlates the power estimates to both functional modules and control
states.
At the circuit level, both SPICE Nag75] and PowerMill Epi96] can be used to measure power
consumption by digital systems. Although PowerMill can run over 1000 times faster than SPICE,
it is still impractical o simulate at the circuit level for large designs or if manynputs vectors are to
be simulated. At the next level of abstraction, the switch level, simulators such as IRSIM Sa189]
are able to simulate circuits over 500 times faster than SPICE, with a root mean quare error of
less than 15% Lan94]. Still, faster simulations could be done at the gate level.
A gate level simulator for power consumption nalysis May 1, 1996 4
-
8/10/2019 CMU-ECE-1996-018
7/28
Several faster gate level tools have been developed, but none are generally applicable to a wide
variety of applications. Several require that the input vectors are able to be characterized probabi-
listically a priori [Naj91 [Gho92] Cho94] Mar94]. Other work s designed for and applicable only
to signal-processing algorithms [Pow90]. Devedas, et.al, have designed a generally-applicable
gate level algorithm, but it only predicts worst-case power dissipation [Dev90].
Landman nd Rabaey have developed architecture level techniques, [Lan93] [Lan94] [Lan95], but
these also require a priori characterization of the input vectors. Although hese tools provide accu-
rate results for the types of applications they are geared toward, a more generally applicable tool is
needed.
By working t the gate level of abstraction our tool is able to simulate larger designs than the
switch level simulators, and by making stimations based on simulation, our tool can be used to
estimate power egardless of whether heir inputs can be readily characterized.
We hose to use simulation-based power estimation over probabilistic power estimation tech-
niques for several reasons. First, not all systems have nputs that are easily or accurately character-
ized by probabilistic methods. Second, his tool fits into existing tool flows easily. Even f the
systemcould be accurately characterized for probabilistic estimation, designing he statistical
models may nvolve a significant amount of additional work or the designer. Simulation-based
estimation requires very little extra work or the designer. Finally, we hope o use this tool to verify
the results of probabilistic estimation methods or various types of algorithms.
A gate level simulator for power consumption nalysis May 1, 1996 5
-
8/10/2019 CMU-ECE-1996-018
8/28
3.0 Implementation
Our goal is to provide an easily integratable tool that accurately estimates power onsumption t a
high level of abstraction. Our powerestimates are calculated by simulating the hierarchical gate
level Verilog description and then post-processing the value change dump VCD) ile produced
the simulation. Thus, the designer does not need to alter the existing tool flow. This tool can be
added on the side for additional help in evaluating power rade-offs.
3.1 Tool flow
Oneof the goals in creating this tool is that it must be easily integratable into existing tool flows.
Since its input is hierarchical gate level Verilog, this tool can easily be inserted in existing tool
flows. Figure 1 illustrates where he power stimator can be used in the tool flow currently used in
the Center for Electronic Design Automation t Carnegie Mellon University. As shown n the dia-
gram, this tool is used to estimate power fter logic and datapath synthesis has been performed.
Powerestimation can be performed again after the circuit has been laid-out to produce a more
accurate estimation using capacitance values extracted from the layout information. Note also that
the addition of the power stimation tool does not alter the original tool flow at all. It merely adds
another tool that can be used when high level power stimates are desired.
3.2 Estimating power
Thepowerestimation tool is actually a series of programs, as shown n Figure 2. The gate level
Verflog code produced by logic and datapath synthesis is simulated with a standard Verilog simu-
lator and a VCD ile is created. The VCD ile is then passed to heads~rpl~er and s~a~-
e s t r i pp e r, which xtracts the information rom he VCD ile. Then1 i s t dr i ver s is invoked
and, through use of the Verilog programming anguage nterface (PLI), a list of the drivers of the
nets in the design is produced. If the design has been laid-out, parsespf is used to extract capac-
A gate level simulator for power consumption nalysis May 1, 1996 6
-
8/10/2019 CMU-ECE-1996-018
9/28
BehavioralLevel Verilog
Behavioral Synthesis SAW
Register TransferLevel Verilog
Logic and Datapath f Synopsys ~D,e_sign Compiler/~Synthesis ~ CASCADEEpoch
Gate LevelVerilog
Place and Route
StandardParasitics File(SPF)
FIGURE . Tool flow at CEDA
itances from he standard parasitics file (SPF). The results of all the programs re finally passed
pc~wer___parser which produces power estimates by module and control state.
Note that the implementation resented in Figure 2 assumes hat all of the functionality of the
powerestimation tool is being used. If, for example, ayout has not yet been performed nd no
standard parasitics file (SPF) s available, lhe par s e sp f program wouldnever be invoked and
A gate level simulator for power consumption nalysis May 1, 1996 7
-
8/10/2019 CMU-ECE-1996-018
10/28
stripped SPF ile would be passed to power_gars r. f power estimates by control state were
not desired, then states tril2per would not be invoked and no state information file would be
passed to t~ower_~arser. Similarly, if the power estimates are not to be correlated with func-
tional modules, the 1 i s tdr iver s program would not be used.
Belowwe will discuss the functionality and implementation of each of the programs nvolved in
the power stimation tool., and a users manual or the tool is located in the Appendix.
3.2.1 Simulation
Thefirst step of the power stimation tool is simply a straightforward Verilog simulation that cre-
ates a value change dump VCD) ile. In general, a Verilog simulation would be done at this phase
of the design process even f no power stimates were to be made n order to verify the gate level
design. Thus, no overhead is added to the design process by running the simulation. The only
modification of the Verilog description that needs to be made s the addition of the value change
dumpVerilog commands, dumpy 1 e and $ dumpvars, f these are not already included in the
code.
By creating a VCD ile, the designer can perform many different analyses on the same gate level
design while only running the actual Verilog simulation once. Because of their size, we usually
compress the VCD iles and then pass them to the other programs hrough a pipe from zcat, a
UNIX rogram hat outputs the contents of a compressed ile. As a result, the VCD ile cannot be
rewound uring reading. This is largely why he estimation environment has been broken into sev-
eral smaller programs.
3.2.2 headstripller
The purpose of heads tr ipl3er is to create a copy of the portion of the VCD ile that defines the
tokens of all nets, registers and variables. This information is needed for the 1 istdrivers and
A gate level simulator for power onsumption nalysis May1, 1996 8
-
8/10/2019 CMU-ECE-1996-018
11/28
-
8/10/2019 CMU-ECE-1996-018
12/28
headstripper uns airly uickly, nd uses ess han 00KB f memory, eadstripper
was mplemented ith pproximately 0 lines f C code.
3.2.3 statestril l~er
statestripper s much ike eadstripper xcept hat t parses he ntire CD ile nd
copies only those lines that have o do with the control state ariable. This is necessary so that the
VCDile needs to be parsed only once during the execution of power__pars er.
statestripper requires that the designer knows he names of the nets whose value is the con-
trol state. One imitation of this tool is that the control state must have one net name such as
CSTATE 3 : 0 ] ) and cannot be a concatenation of several nets (such
( a [ 1 ], foo, bar, cout [ 3 ] }). Currently, the designer must manually lter the gate level Ver-
ilog code, if necessary. Since we are using the Synopsys Design Compiler or our designs, we have
always ound t easy to make his alteration since the names f the control state registers are nearly
identical to the control state register names t the register transfer level.
s tatestripper xecutes airly uickly, lthough t does ake onger han eadstripper s
s tares ripper ust arse he ntire CD ile. t uses 00 B of memory uring xecution,
and s a simple rogram ith pproximately 0 ines f C code.
3.2.4 parsespf
Thepurposeof p ar s e sp f is to extract the capacitances or the nets in the design. The output is a
list of net names with their associated capacitances.
This is simply a parser written in C++ hat steps through the SPF ile. It executes quickly and uses
less than 200 KB of memory, parsespf is implemented with approximately 80 lines of C code.
A gate level simulator for power consumption nalysis May 1, 1996 10
-
8/10/2019 CMU-ECE-1996-018
13/28
3.2.5 listdrivers
In hierarchical Verilog descriptions several named ets in the hierarchy often refer to the same
physical net. Hereafter, I shall term such Verilog nets analogous nets. To accurately correlate
powerestimations with modules, the Verilog name of the driver of the physical net must be deter-
mined. Then all power consumed n the net is attributed to the module ontaining the driver.
I i s tdr vers utputs a list of the Verilog nets connected o the drivers.
The driving net is determined by both looking at the header of the VCD ile and through use of the
Verilog programming anguage interface (PLI). The VCD eader file is used to rapidly determined
analogous nets, since analogous nets are assigned to the same oken in the VCD ile. Once he
analogousnets are found, the PLI is used to determine whichof the nets is connected o the driver.
Note that although the PLI is used, the simulation is not run a second time; I istdrivers eter-
mines he drivers at the end of compilation and then exits.
1 is tdr ivers still executes fairly quickly, but the memory equirement is much arger, as much
as 60 MB or a 21,000 net design. However, pproximately 23 MB f this is the overhead involved
in running the Verilog simulator. The amount of memory sed by the PLI code is O(n), where n
is the number f distinct nets.
1 i s tdr ivers was implemented with approximately 760 lines of C code linked to the verilog
simulator through the PLI.
3.2.6 power_~arser
Finally, the header file, control state alue change ile and driver list are parsed along with the
VCDile and power estimations are produced. A general discussion of the algorithm used and its
complexity follows.
A gate level simulator for power onsumption nalysis May 1, 1996 11
-
8/10/2019 CMU-ECE-1996-018
14/28
First, the header ile is read and a sparse table data structure is created for the nets so that the
search time for any net is O(1) when he nets token (as specified in the VCD ile) is known.
ating and initializing this structure is O(n), wheren is the number f distinct nets.
Thedriver list is then parsed and the driving nets are tagged. The driver names re first placed n a
binary tree. Building such a tree has time complexity O(nlogn). Eachnet must search the tree for
a driver, so the time complexity of the all of the searches is n O(logn) O(nlogn). Threfore,
the total time complexity or parsing and tagging the drivers is O(nlogn).
Thestripped SPF ile is then read in and stored in a binary tree and the nets are assigned heir cor-
responding capacitances with an algorithm similar to that used above. By a similar argument, the
total time complexity or assigning the capacitance values is O(nlogn).
The VCDile is then parsed and transitions are counted for each net and also categorized by time.
Adding transition for a net is O(1), as discussed above. Adding ransitions for a certain Verilog
time step is also O(1), so the total time for the parsing of the VCD ile and counting ransitions
O(v), where v is the number of value changes.
Next, power s calculated for each net. This involves one floating-point multiply for each net, so
the time complexity of this is simply O(n). Next, the state value changes are parsed and transi-
tions are characterized by time. This involves stepping through an array with one entry for each
simulated time step, so the time complexity or this step is O(t), where is the number f time
steps in the simulation.
Next, statistics are gathered by module. This involves stepping through he table of nets and doing
a strcmt3 ( ) function call for each nets driver. Thus, O(n) s~rcmt3 ) s will be called.
A gate level simulator for power consumption nalysis May 1, 1996 12
-
8/10/2019 CMU-ECE-1996-018
15/28
Finally, statistics are gathered by control state. This involves O(t) comparisons nd additions.
Note that O(t) operations must be performed or the entire system as well as for each module or
which tatistics are being gathered. Since the number f modules or which tatistics are being col-
lected can be assumed o be a small constant, the total number f operations that must be done to
collect state statistics is O(t).
Thus, the total worst-case rime complexity of the VCD arser would be O(nlogn) for large
designs with a shorter simulation time or O(v) for lo nger si mulations. The memory usage for
shorter and medium ize simulations is dominated by O(n) because each net is represented by a
class instantiation. The memory sage for a very long simulation would be O(t), as one double
and one nt are malloced for each Verilog time step.
power__~ar et is the portion of the power estimation tool that consumes he most CPU ime, as
will be shown n Section 4.0. It also is the largest piece of code, implemented n over 2200 ines of
C++.
3.3 Estimating capacitance
Accuratecapacitance estimations are essential for accurate power stimations. In the set of tools
described above, capacitance value are extracted from the design once a layout has been com-
pleted. Although his does give very accurate estimates of capacitance, it is not the only way such
estimates could be produced. Since t~ower_pars er simply reads in a file of net names and their
associated capacitances, the tool is highly flexible, allowing designers to use either high level esti-
marion techniques such as those presented in [Don79]lFeu82][Lan94] r to back-annotate capaci-
tances by extracting them from the layout once it has been done. If high level estimation
1. v will always e greater han or equal o t, since a time step is executed n a simulation f andonly if oneor more alue changes ccur during hat time step. Therefore, O(v) dominatesO(t).
A gate level simulator for power consumption nalysis May1, 1996 13
-
8/10/2019 CMU-ECE-1996-018
16/28
techniques were used, the only change to the tools above would be the modification or omission of
parsespf.
A gate level simulator for power consumption nalysis May 1, 1996 14
-
8/10/2019 CMU-ECE-1996-018
17/28
4.0 Case study: DCT
Wehave run twelve different versions of the one-dimensional discrete cosine transform. These
designs were created by Coumeri and the behavioral level differences in the designs can be seen in
Table 1 [Cou96]. Note that # of partitions refers to the number of pipeline stages (each with its
own control logic) in the design. DCT1 hrough DCT8 re not pipelined; they have only one parti-
tion. # of mult is the number of multipliers in the design. memory refetch is yes if values
for iteration i+1 of the loop are being fetched while iteration i is being executed. This column does
grey code grey code of memory state memory
example partitions of mult prefetch encoding access of nets
DCT1 1 3 no no no 15810DCT2 1 3 no yes no 15879
DCT3 1 2 no no no 12196
DCT4 1 2 no yes no 12226
DCT5 1 3 yes no no 16848
DCT6 1 3 yes yes yes 16833
DCT7 1 2 yes no no 13306
DCT8 1 2 yes yes yes 13280
DCT9 6 3 ..... no no 21327
DCT10 6 3 ..... yes yes 21346
DCTll 2 3 ..... no no 16725
DCT12 2 3 ..... yes no 16745TABLE . DCT Descriptions
not apply to the pipelined designs (DCT9 hrough DCT12) ince they are already executing multi-
ple iterations of the loop at the same time. grey code state encoding and grey code memory
accesses are yes if the states and memory ddresses, respectively, are accessed in grey code
order. Finally, # of nets is the number of physical nets in each design.
Table 2 shows the CPU time and memory usage for headstripper, tatestripper,
i is tdrivers nd power__parser eing executed on three of the DCT designs. These three
designs were chosen because DCT3 s the smallest example (i.e. fewest number of nets), DCT9
one of the largest examples, and DCT1 s somewhere in between. The CPU ime is reported in
A gate level simulator for power consumption nalysis May1, 1996 15
-
8/10/2019 CMU-ECE-1996-018
18/28
minutes and seconds. Note that these results are dependent on which tatistics are being collected.
headstripper statestripper listdrivers power_parser
CPU time Memory CPU time Memory CPU time Memory CPU time Memory
DCT1 0:21 92 KB 6:56 100 KB 1:32 52196 KB 24:44 40940 KBDCT3 0:17 92 KB 7:42 100 KB 1:19 50368 KB 20:05 39408 KB
DCT9 0:26 92 KB 5:33 100 KB 2:20 60000 KB 32:59 45212 KBTABLE . Execution imes and memory sage for the power stimation for three of the DCT esigns
As stated in Section 3.2.6, collecting statistics by module nd state both involve some overhead n
computation ime. For each of the examples, he same statistics were being collected: total energy,
energy consumed y each multiplier, energy consumed y all of the adders and subtracters, energy
consumed y registers, energy consumed y random glue logic (everything except the above mod-
ules) and total energy by control state. The statistics were gathered on an IBM RS/6000worksta-
tion with 384 MB f memory.
For our DCT xamples, power_parser never took more than 35 minutes of CPU ime on the
RS/6000,and most of the time was spent during the actual parsing of the VCD ile. Also, the
power estimation environment never required more than 60 MB f memory.
Thedisk space overhead or the output files is presented in Table 3. For the 12 versions of the DCT
we tested, ranging from 12,000 to 21,000 nets, heads t ripper and s t a t es tripper
involved an overhead of 1.0 to 1.7 megabytes of hard disk storage for the header information and
56 to 140 kilobytes for the state information. Note that the input for the DCTs was25 8x8 blocks
of imagedata, and that the disk space overhead or state information scales linearly with the num-
ber of blocks in the simulation. The header information is unaffected by the length of the simula-
A gate level simulator for power consumption nalysis May 1, 1996 16
-
8/10/2019 CMU-ECE-1996-018
19/28
tion. The hard disk space overhead involved in storing the list of drivers is 297 to 490 kilobytes for
the DCT xamples. This is independent of the length of the simulations.
headstripper statestripper listdrivers power_parser
DCT1 1216 KB 116 KB 395 KB < 1 KBDCT2 1223 K_B 116 KB 396 KB < 1 KB
DCT3 1000 KB 137 KB 290 KB < 1 KB
DCT4 1004 KB 137 KB 291 KB < 1 KB
DCT5 1335 KB 81 KB 412 KB < 1 KB
DCT6 1335 KB 81 KB 412 KB < 1 KB
DCT7 1129 KB 101 KB 309 KB < 1 KB
DCT8 1126 KB 101 KB 308 KB < 1 KB
DCT9 1723 KI3 55 K13 479 KB < 1 KB
DCT10 1726 KB 60 KB 479 KB < 1 KB
DCTll 1318 KB 73 KB 411 KB < 1 KBDCT12 1320 KB 73 KB 411 KB < 1 KB
TABLE . Disk space usage for power estimation of the 12 DCT esigns
4.1 Results
Figure 3 gives a comparison of the results for the 12 designs. Since all 12 designs were not laid-
out, the results shown assume that all nets have the same capacitance, i. e. only transition counts
are reported. One nteresting thing we immediately noticed from these results is that using grey
code for the state encoding produced a 10% or better reduction in transition count for all of the
designs. Also, notice that pipelining the design reduces power consumption.However, for these
designs it would be arguable if pipelining into six stages, since in DCT9 nd DCT10, nstead of
two stages, as in DCT11 nd DCT12, is worthwhile as power savings is about 10% while the com-
plexity (as measured by number of nets) has increased over 30%. Still, if power savings is more
important than minimizing complexity of the circuit, the six stage pipeline designs would be
desired.
Figure 4 shows the transition count by control state for the DCT and DCT designs. Again, the
value of using grey code for state encoding can be seen here. By comparing the states in DCT1 nd
A gate level simulator for power consumption nalysis May1, 1996 17
-
8/10/2019 CMU-ECE-1996-018
20/28
Transitions by Module for Twelve DCT Designs
8000000
7000000
6000000
5000000
4000000
~3000000
2000000
1000000
DCT1 DCT2 DCT3[] DCT4 DCT5~a DCT6 DCT7 DCT8 DCT9 DCT10[] DCT11 DCT12
Module
FIGURE . Comparison of 12 DCT designs
DCT2 hat perform the same functions, we see that the approximate 10% eduction is apparent in
all similar states. (Remember hat because the states of DCT2 re grey coded, DCT1 tate 2 is the
same as DCT2 tate 3, DCT1 tate 3 is the same as DCT2 tate 2, DCT1 tate 4 is the same as DCT
state 6, etc.)
Transitions by Control State for Two DCTDesigns
140000012000001000000
8000006000004200000
1 2 3 4 5 6 7Control State
FIGURE . Transition count by control state for two DCT esigns
A gate level simulator for power onsumption nalysis April 30, 1996 18
-
8/10/2019 CMU-ECE-1996-018
21/28
Finally, the DCT design was laid-out and capacitance values were extracted to give us the results
in Figure 5. Notice that although the transition counts appear to be a good predictor of energy for
the multipliers, transition counting alone underestimates the amount of power consumed y ran-
dom ogic and overestimates the power consumed y the adders and subtracters. This makes ense
as one can imagine hat the random ogic would often be driving fairly long nets with a large
capacitance, while the nets inside the adders and subtracters would be very short 0.e. low capaci-
tance) most of the time. The nets inside the array multipliers would not be as short as those inside
the adders nor as long as those in the random ogic. Correspondingly, he transition counts for the
multipliers are a better predictor of energy consumed han for either the random ogic or the adders
and subtracters.
The ransition counts in Figure 3 and Figure 5 differ because different target cell libraries were
used. Also, because the library used for Figure 5 had more accurate timing models, more glitching
occurred within the circuit. As a result, only 16 8x8 blocks were able to be simulated because of
the larger size of the VCD ile. (The 16 block example roduced a VCD ile four times larger than
the 25 block examples hat assume qual delay for all cells.) Thus, the results in Figure 3 and
Figure 5 should not be compared gainst each other.
Our gate level power estimation tool has produced estimations for 12 different DCT mplementa-
tions. These estimations have allowed us to determine he relative impact of several high level
transformations, such as grey coding state assignments, resource sharing and pipelining, on
dynamicpowerdissipation for these designs.
A gate level simulator for power consumption nalysis May1, 1996 19
-
8/10/2019 CMU-ECE-1996-018
22/28
-
8/10/2019 CMU-ECE-1996-018
23/28
-
8/10/2019 CMU-ECE-1996-018
24/28
Appendix A - Users manual
In the following sections, I shall outline how o use each subprogram f the power estimation tool.
Please refer to Figure 2 on page 9 to see how ach of these tools are tied together.
A.1 headstripper
heads ripper eads he CD ile rom tdin nd utputS the eader ile o s tdout. o
invoke t from he ommand ine, ou ould nter:
headstripper < vcdfile > vcd.head
or, if the VCDile is compressed:
zcat vcdfile.gz [ headstrip~er > vcd.head
A.2 statestripper
statestripper lSO reads the VCD ile from stdin nd outputs to stdout. t also takes as
an argument he hierarchical Verilog name of the net with the value of the control state. As men-
tioned in Section 3.2.3 on page 10, the net must be a single multi-bit net and not a concatenation of
nets. To invoke the program rom the command ine, you would enter:
statestripper top.foo.bar. CSTATE < vcdfile > vcd. state
or if the VCD ile is compressed:
zcat vcdfile.gz I statestripper top.foo.bar.CSTATE > vcd. state
A.3 l arsespf
parsesp also eads rom tdin nd utputs o s tdout. t akes s an rgument he ierarchi-
cal refix to be added o the net names resent in the SPF ile. For example, n our DCT xamples,
the DCTwas laid-out, but in the Verilog source that passes input vectors to the DCT he DCT
module was named top. dct_l. To invoke parsespf in this case we used:
parsespf top.dct_l. < spffile > strippedspf
A gate level simulator for power consumption nalysis May1, 1996 22
-
8/10/2019 CMU-ECE-1996-018
25/28
listdrivers-v ../mod.v+turbo+3
Note that the trailing period on tOp. ct_l, s necessary. If multiple modules were laid-out,
then par s e sp f would be executed for each and then all of the SPF files could then be concate-
nated together as follows:
first_strippedspf second_strippedspf > final_strippedspf
A.4 listdrivers
1 i s t dr ive r s is a PLI routine and should be invoked in the same way as the original simula-
tion. One additional argument must be added +dump_ ollowed by the name of the VCD eader
file. For one of our DCT imulations, we invoked 1 i s tdr iver s as follows:
+dump_dct.head sim.v -v dctlf.map2.v \-v ../ms0803vcells_mosis -v ../ms080_3vprims \
The output of listdrivers list of the driver nets in the file DRV. ist.
A.5 power_parser
power_parser also accepts input from stdin and outputs to stdout. There are a number of arguments
that can be passed to power_parser, outlined below.
-fdrv driverfile name of the driver file created by listdrivers
-fhead cd. head name of the VCD header file created by heads tripper
-fstate cd. state name of the VCD state file created by statestripper
- f s sp s t r i pp d s p f name f the ~ipped PF ile reated y a r s e s p f
-state op. oo. bar. STATE name of the control state net. The net name should be
the same as the one used in statestripper.
A gate level simulator for power consumption nalysis May 1, 1996 23
-
8/10/2019 CMU-ECE-1996-018
26/28
-net top. foo. bar. \* names of nets to determine energy consumption or. In this case,
statistics would be gathered for the module op. foo. bar. Note that both leading and trailing
s are legals, but internal *s are not. Note also that the * must be escaped with the \ so that
the shell does not try to expand t. Any number f -net arguments s legal.
As an example of the use of power_s)arser, the following command as used to get the statis-
tics displayed in Figure 5, Power estimates including capacitance information for one DCT
design (DCT ), on page 20:
zcat ../tmp/dctll.gz I ../parser -fdrv dctll.drv \
-fhead dctll.head -fstate dctll.state -state top.dct_l.CSTATE \
-net top.dct_l.multi* -net top.dct_l.mult_l\* \
-net top.dct_l.mult_2\* -net top.dct_l.mult_3\* \
-net top.dct_l.U\* -net top.dct_l.r\* -net \*_reg\* \
-net top.\* -fsspf dctll.spf > dctll.results
A gate level simulator for power onsumption nalysis May , 1996 24
-
8/10/2019 CMU-ECE-1996-018
27/28
References
[Cho94] T. Chou, K. Roy and S. Prasad, Estimation of circuit activity considering signalcorrelations and simultaneous switching, Proceedings of ICCAD 4, pp. 300-303, Nov. 1994.
[Cor90] T. H. Cormen, C. E. Leiserson and R. L. Rivest, Introduction to algorithms, NewYork: McGraw-Hill Book Company, pp. 244-259, 1990.
[Cou96] S. L. Coumeri, private communication, April 1996.
[Epi96] Epic Design Technologies, Inc., http://www.epic.com/powermill.html, 1996.
[Dev90] S. Devadas, K. Keutzer and J. White, Estimation of power dissipation in CMOScombinational circuits, Proceedings of Custom C Conference 90, pp. 19.7.1-19.7.6.
[Don79] W. Donath, Placement and average interconnection lengths of computer ogic,
IEEETransactions on Circuits and Systems, pp. 272-277, April 1979.
[Feu82] M. Feuer, Connectivity of random ogic, IEEETransactions on Computers, pp.29-33, Jan. 1982.
[Gho92] A. Ghosh, S. Devadas, K. Keutzer, and J. White, Estimation of average switch-ing activity in combinational nd sequential circuits, Proceedings of DAC 2, pp.253-259, 1992.
[Kri961 R. K. Krishnamurthy, . Lys and L. R. Carley, Static power driven voltage scalingand delay driven buffer sizing in mixed swing quadrail for sub-IV I/O swings,submitted to IEEE/ACM nternational Symposium n Low Power Electronics andDesign 96, August 1996.
[Lan931 R E. Landman nd J. M. Rabaey, Power estimation for high level synthesis,Proceedings of EuroDAC 3, pp.361-366, Feb. 1993.
[Lan941 P. E. Landman, Low-power rchitectural design methodologies, ElectronicsResearch Laboratory, College of Engineering, University of California, Berkeley(UCB/ERLM94/62), 1994.
lLan95] E E. Landman nd J. M. Rabaey, Architectural power analysis: the dual type bitmethod, IEEE Transactions on VLSISystems, pp. 173-187, June 1995.
[Mar94] R. Marculescu, D. Marculescu and M. Pedram, Switching activity analysis con-sidering spatiotemporal correlations, Proceedings oflCCAD 4, pp. 294-299,Nov. 1994.
[Nag75] L. W. Nagel, SPICE2: computer program to simulate semiconductor circuits,Technical report, University of California, Berkeley (ERL-M520), 975.
[Naj91] E Najm, Transition density, a stochastic measure of activity in digital circuits,Proceedings of DAC 1, pp. 644-649, June 1991.
A gate level simulator for power consumption nalysis May 1, 1996 25
-
8/10/2019 CMU-ECE-1996-018
28/28