cmu-ece-1996-018

Upload: akshat-singh

Post on 02-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 CMU-ECE-1996-018

    1/28

  • 8/10/2019 CMU-ECE-1996-018

    2/28

    A gate level simulator for powerconsumption analysis

    David J. Pursley ([email protected])

    Department of Electrical and Computer EngineeringCarnegie Mellon University

    Pittsburgh, PA 15213

    Power onsumption f digital circuits has become critical design parameter. As such, it is neces-

    sary that the system designer is able to estimate power consumption nd correlate the results back

    to high level specifications. A gate level tool that estimates power consumption nd correlates the

    results with functiona modules and control states has been designed. This tool has produced sti-

    mations of the power consumption f twelve different implementations f the discrete cosine trans-

    form (DCT). These results are being used to judge the relative impact of high-leve

    transformations, such as pipelinin$ and varying the amount of resource sharing and parallelism,

    on power dissipation for the D CT algorithm.

    A gate level simulator for power onsumption nalysis May 1, 1996 1

  • 8/10/2019 CMU-ECE-1996-018

    3/28

    Acknowledgments

    I would ike to first thank my advisor, Don Thomas, or his patience, guidance and exam-

    ple over the past two years.

    I wouldalso like to thank my research partners and officemates, Pinar Ceyhan nd Sad

    Coumeri, or both their help and their willingness to always end an ear.

    Finally, I thank my professors at Bucknell University who irst interested me n the field of

    computer engineering and then aided me in my decision to continue towards my Masters and

    (someday) my Ph.D. Those helpful professors include Daniel Hyde, Jerud Mead, Xiannong Meng

    (who is now at The University of Texas-Pan American), James Lu and Maurice Aburdene.

  • 8/10/2019 CMU-ECE-1996-018

    4/28

    1.0 Introduction

    Powerconsumption f digital circuits has become critical design parameter. For example, porta-

    ble applications require low power ircuits to extend battery life, and all circuits have o deal with

    the problemof electromigration. Thus, it is important hat the system designer is able to estimate

    power onsumption nd correlate the results with high level specifications.

    We ave designed a gate level tool that estimates power consumption nd correlates the results to

    the original register-transfer level (RTL) pecifications. A unique aspect of this tool is that power

    consumption s both estimated for individual modules nd reported by control state. It can also be

    back-annotated with actual capacitance values from layout to produce more accurate estimations.

    This tool is also being used to help pinpoint areas where power-saving optimizations are most

    neededand to verify the accuracy of existing statistical power stimation techniques. We ave esti-

    mated he power consumptionof 12 different implementations of the discrete cosine transform,

    and we are currently laying out the designs in order to obtain capacitance values for back-annota-

    tion. In the future, we hope o use this tool to aid in the design of systems using QuadRall echnol-

    ogy, a low-power CMOS-based echnology currently being designed at CMU Kri96].

    1.1 Our approach

    Our goal is to provide a power stimation tool that will be maximally seful to the system designer

    in considering various high level transformations, such as pipelining and varying the amount of

    resource sharing and parallelism, and their effect on power onsumption. Therefore, this tool must

    be easy to integrate into existing high level design tool flows, and its results must aid the designer

    in clearly identifying power onsumption rade-offs. Our ool is easy to integrate into existing tool

    flows as it accepts as input gate level Verilog code. Optionally, he tool also takes a list of capaci-

    tance estimates for each of the nets in the design. These estimates can be extracted from the layout

    A gate level simulator for power consumption nalysis May 1, 1996 2

  • 8/10/2019 CMU-ECE-1996-018

    5/28

  • 8/10/2019 CMU-ECE-1996-018

    6/28

    2.0 Power estimation

    Thepurpose of this tool is to accurately estimate power onsumption t a high level of abstraction.

    More pecifically, this tool estimates dynamic ower dissipation based on simulation results.

    Dynamic ower dissipation of a CMOS ircuit can be calculated with the following equation

    [Wes85]:

    1 __2P :

    where C is load capacitance and fs is the switching frequency of the circuit. Our tool calculates

    fs and takes as inputs the values of Vddand C, which can either be extracted from ayout or esti-

    mated by other tools [Don79][Feu82][Lan94]. tatic and short circuit power estimation is not

    taken into consideration here, although t is assumed hat if the target cell library is known, hese

    could be calculated and added o the results produced by this tool.

    2.1 Related work

    Most of the previous work done in power consumption stimation differs from our approach in the

    level of abstraction at which he estimations are made or the methodof obtaining the estimations.

    Also, none of this work correlates the power estimates to both functional modules and control

    states.

    At the circuit level, both SPICE Nag75] and PowerMill Epi96] can be used to measure power

    consumption by digital systems. Although PowerMill can run over 1000 times faster than SPICE,

    it is still impractical o simulate at the circuit level for large designs or if manynputs vectors are to

    be simulated. At the next level of abstraction, the switch level, simulators such as IRSIM Sa189]

    are able to simulate circuits over 500 times faster than SPICE, with a root mean quare error of

    less than 15% Lan94]. Still, faster simulations could be done at the gate level.

    A gate level simulator for power consumption nalysis May 1, 1996 4

  • 8/10/2019 CMU-ECE-1996-018

    7/28

    Several faster gate level tools have been developed, but none are generally applicable to a wide

    variety of applications. Several require that the input vectors are able to be characterized probabi-

    listically a priori [Naj91 [Gho92] Cho94] Mar94]. Other work s designed for and applicable only

    to signal-processing algorithms [Pow90]. Devedas, et.al, have designed a generally-applicable

    gate level algorithm, but it only predicts worst-case power dissipation [Dev90].

    Landman nd Rabaey have developed architecture level techniques, [Lan93] [Lan94] [Lan95], but

    these also require a priori characterization of the input vectors. Although hese tools provide accu-

    rate results for the types of applications they are geared toward, a more generally applicable tool is

    needed.

    By working t the gate level of abstraction our tool is able to simulate larger designs than the

    switch level simulators, and by making stimations based on simulation, our tool can be used to

    estimate power egardless of whether heir inputs can be readily characterized.

    We hose to use simulation-based power estimation over probabilistic power estimation tech-

    niques for several reasons. First, not all systems have nputs that are easily or accurately character-

    ized by probabilistic methods. Second, his tool fits into existing tool flows easily. Even f the

    systemcould be accurately characterized for probabilistic estimation, designing he statistical

    models may nvolve a significant amount of additional work or the designer. Simulation-based

    estimation requires very little extra work or the designer. Finally, we hope o use this tool to verify

    the results of probabilistic estimation methods or various types of algorithms.

    A gate level simulator for power consumption nalysis May 1, 1996 5

  • 8/10/2019 CMU-ECE-1996-018

    8/28

    3.0 Implementation

    Our goal is to provide an easily integratable tool that accurately estimates power onsumption t a

    high level of abstraction. Our powerestimates are calculated by simulating the hierarchical gate

    level Verilog description and then post-processing the value change dump VCD) ile produced

    the simulation. Thus, the designer does not need to alter the existing tool flow. This tool can be

    added on the side for additional help in evaluating power rade-offs.

    3.1 Tool flow

    Oneof the goals in creating this tool is that it must be easily integratable into existing tool flows.

    Since its input is hierarchical gate level Verilog, this tool can easily be inserted in existing tool

    flows. Figure 1 illustrates where he power stimator can be used in the tool flow currently used in

    the Center for Electronic Design Automation t Carnegie Mellon University. As shown n the dia-

    gram, this tool is used to estimate power fter logic and datapath synthesis has been performed.

    Powerestimation can be performed again after the circuit has been laid-out to produce a more

    accurate estimation using capacitance values extracted from the layout information. Note also that

    the addition of the power stimation tool does not alter the original tool flow at all. It merely adds

    another tool that can be used when high level power stimates are desired.

    3.2 Estimating power

    Thepowerestimation tool is actually a series of programs, as shown n Figure 2. The gate level

    Verflog code produced by logic and datapath synthesis is simulated with a standard Verilog simu-

    lator and a VCD ile is created. The VCD ile is then passed to heads~rpl~er and s~a~-

    e s t r i pp e r, which xtracts the information rom he VCD ile. Then1 i s t dr i ver s is invoked

    and, through use of the Verilog programming anguage nterface (PLI), a list of the drivers of the

    nets in the design is produced. If the design has been laid-out, parsespf is used to extract capac-

    A gate level simulator for power consumption nalysis May 1, 1996 6

  • 8/10/2019 CMU-ECE-1996-018

    9/28

    BehavioralLevel Verilog

    Behavioral Synthesis SAW

    Register TransferLevel Verilog

    Logic and Datapath f Synopsys ~D,e_sign Compiler/~Synthesis ~ CASCADEEpoch

    Gate LevelVerilog

    Place and Route

    StandardParasitics File(SPF)

    FIGURE . Tool flow at CEDA

    itances from he standard parasitics file (SPF). The results of all the programs re finally passed

    pc~wer___parser which produces power estimates by module and control state.

    Note that the implementation resented in Figure 2 assumes hat all of the functionality of the

    powerestimation tool is being used. If, for example, ayout has not yet been performed nd no

    standard parasitics file (SPF) s available, lhe par s e sp f program wouldnever be invoked and

    A gate level simulator for power consumption nalysis May 1, 1996 7

  • 8/10/2019 CMU-ECE-1996-018

    10/28

    stripped SPF ile would be passed to power_gars r. f power estimates by control state were

    not desired, then states tril2per would not be invoked and no state information file would be

    passed to t~ower_~arser. Similarly, if the power estimates are not to be correlated with func-

    tional modules, the 1 i s tdr iver s program would not be used.

    Belowwe will discuss the functionality and implementation of each of the programs nvolved in

    the power stimation tool., and a users manual or the tool is located in the Appendix.

    3.2.1 Simulation

    Thefirst step of the power stimation tool is simply a straightforward Verilog simulation that cre-

    ates a value change dump VCD) ile. In general, a Verilog simulation would be done at this phase

    of the design process even f no power stimates were to be made n order to verify the gate level

    design. Thus, no overhead is added to the design process by running the simulation. The only

    modification of the Verilog description that needs to be made s the addition of the value change

    dumpVerilog commands, dumpy 1 e and $ dumpvars, f these are not already included in the

    code.

    By creating a VCD ile, the designer can perform many different analyses on the same gate level

    design while only running the actual Verilog simulation once. Because of their size, we usually

    compress the VCD iles and then pass them to the other programs hrough a pipe from zcat, a

    UNIX rogram hat outputs the contents of a compressed ile. As a result, the VCD ile cannot be

    rewound uring reading. This is largely why he estimation environment has been broken into sev-

    eral smaller programs.

    3.2.2 headstripller

    The purpose of heads tr ipl3er is to create a copy of the portion of the VCD ile that defines the

    tokens of all nets, registers and variables. This information is needed for the 1 istdrivers and

    A gate level simulator for power onsumption nalysis May1, 1996 8

  • 8/10/2019 CMU-ECE-1996-018

    11/28

  • 8/10/2019 CMU-ECE-1996-018

    12/28

    headstripper uns airly uickly, nd uses ess han 00KB f memory, eadstripper

    was mplemented ith pproximately 0 lines f C code.

    3.2.3 statestril l~er

    statestripper s much ike eadstripper xcept hat t parses he ntire CD ile nd

    copies only those lines that have o do with the control state ariable. This is necessary so that the

    VCDile needs to be parsed only once during the execution of power__pars er.

    statestripper requires that the designer knows he names of the nets whose value is the con-

    trol state. One imitation of this tool is that the control state must have one net name such as

    CSTATE 3 : 0 ] ) and cannot be a concatenation of several nets (such

    ( a [ 1 ], foo, bar, cout [ 3 ] }). Currently, the designer must manually lter the gate level Ver-

    ilog code, if necessary. Since we are using the Synopsys Design Compiler or our designs, we have

    always ound t easy to make his alteration since the names f the control state registers are nearly

    identical to the control state register names t the register transfer level.

    s tatestripper xecutes airly uickly, lthough t does ake onger han eadstripper s

    s tares ripper ust arse he ntire CD ile. t uses 00 B of memory uring xecution,

    and s a simple rogram ith pproximately 0 ines f C code.

    3.2.4 parsespf

    Thepurposeof p ar s e sp f is to extract the capacitances or the nets in the design. The output is a

    list of net names with their associated capacitances.

    This is simply a parser written in C++ hat steps through the SPF ile. It executes quickly and uses

    less than 200 KB of memory, parsespf is implemented with approximately 80 lines of C code.

    A gate level simulator for power consumption nalysis May 1, 1996 10

  • 8/10/2019 CMU-ECE-1996-018

    13/28

    3.2.5 listdrivers

    In hierarchical Verilog descriptions several named ets in the hierarchy often refer to the same

    physical net. Hereafter, I shall term such Verilog nets analogous nets. To accurately correlate

    powerestimations with modules, the Verilog name of the driver of the physical net must be deter-

    mined. Then all power consumed n the net is attributed to the module ontaining the driver.

    I i s tdr vers utputs a list of the Verilog nets connected o the drivers.

    The driving net is determined by both looking at the header of the VCD ile and through use of the

    Verilog programming anguage interface (PLI). The VCD eader file is used to rapidly determined

    analogous nets, since analogous nets are assigned to the same oken in the VCD ile. Once he

    analogousnets are found, the PLI is used to determine whichof the nets is connected o the driver.

    Note that although the PLI is used, the simulation is not run a second time; I istdrivers eter-

    mines he drivers at the end of compilation and then exits.

    1 is tdr ivers still executes fairly quickly, but the memory equirement is much arger, as much

    as 60 MB or a 21,000 net design. However, pproximately 23 MB f this is the overhead involved

    in running the Verilog simulator. The amount of memory sed by the PLI code is O(n), where n

    is the number f distinct nets.

    1 i s tdr ivers was implemented with approximately 760 lines of C code linked to the verilog

    simulator through the PLI.

    3.2.6 power_~arser

    Finally, the header file, control state alue change ile and driver list are parsed along with the

    VCDile and power estimations are produced. A general discussion of the algorithm used and its

    complexity follows.

    A gate level simulator for power onsumption nalysis May 1, 1996 11

  • 8/10/2019 CMU-ECE-1996-018

    14/28

    First, the header ile is read and a sparse table data structure is created for the nets so that the

    search time for any net is O(1) when he nets token (as specified in the VCD ile) is known.

    ating and initializing this structure is O(n), wheren is the number f distinct nets.

    Thedriver list is then parsed and the driving nets are tagged. The driver names re first placed n a

    binary tree. Building such a tree has time complexity O(nlogn). Eachnet must search the tree for

    a driver, so the time complexity of the all of the searches is n O(logn) O(nlogn). Threfore,

    the total time complexity or parsing and tagging the drivers is O(nlogn).

    Thestripped SPF ile is then read in and stored in a binary tree and the nets are assigned heir cor-

    responding capacitances with an algorithm similar to that used above. By a similar argument, the

    total time complexity or assigning the capacitance values is O(nlogn).

    The VCDile is then parsed and transitions are counted for each net and also categorized by time.

    Adding transition for a net is O(1), as discussed above. Adding ransitions for a certain Verilog

    time step is also O(1), so the total time for the parsing of the VCD ile and counting ransitions

    O(v), where v is the number of value changes.

    Next, power s calculated for each net. This involves one floating-point multiply for each net, so

    the time complexity of this is simply O(n). Next, the state value changes are parsed and transi-

    tions are characterized by time. This involves stepping through an array with one entry for each

    simulated time step, so the time complexity or this step is O(t), where is the number f time

    steps in the simulation.

    Next, statistics are gathered by module. This involves stepping through he table of nets and doing

    a strcmt3 ( ) function call for each nets driver. Thus, O(n) s~rcmt3 ) s will be called.

    A gate level simulator for power consumption nalysis May 1, 1996 12

  • 8/10/2019 CMU-ECE-1996-018

    15/28

    Finally, statistics are gathered by control state. This involves O(t) comparisons nd additions.

    Note that O(t) operations must be performed or the entire system as well as for each module or

    which tatistics are being gathered. Since the number f modules or which tatistics are being col-

    lected can be assumed o be a small constant, the total number f operations that must be done to

    collect state statistics is O(t).

    Thus, the total worst-case rime complexity of the VCD arser would be O(nlogn) for large

    designs with a shorter simulation time or O(v) for lo nger si mulations. The memory usage for

    shorter and medium ize simulations is dominated by O(n) because each net is represented by a

    class instantiation. The memory sage for a very long simulation would be O(t), as one double

    and one nt are malloced for each Verilog time step.

    power__~ar et is the portion of the power estimation tool that consumes he most CPU ime, as

    will be shown n Section 4.0. It also is the largest piece of code, implemented n over 2200 ines of

    C++.

    3.3 Estimating capacitance

    Accuratecapacitance estimations are essential for accurate power stimations. In the set of tools

    described above, capacitance value are extracted from the design once a layout has been com-

    pleted. Although his does give very accurate estimates of capacitance, it is not the only way such

    estimates could be produced. Since t~ower_pars er simply reads in a file of net names and their

    associated capacitances, the tool is highly flexible, allowing designers to use either high level esti-

    marion techniques such as those presented in [Don79]lFeu82][Lan94] r to back-annotate capaci-

    tances by extracting them from the layout once it has been done. If high level estimation

    1. v will always e greater han or equal o t, since a time step is executed n a simulation f andonly if oneor more alue changes ccur during hat time step. Therefore, O(v) dominatesO(t).

    A gate level simulator for power consumption nalysis May1, 1996 13

  • 8/10/2019 CMU-ECE-1996-018

    16/28

    techniques were used, the only change to the tools above would be the modification or omission of

    parsespf.

    A gate level simulator for power consumption nalysis May 1, 1996 14

  • 8/10/2019 CMU-ECE-1996-018

    17/28

    4.0 Case study: DCT

    Wehave run twelve different versions of the one-dimensional discrete cosine transform. These

    designs were created by Coumeri and the behavioral level differences in the designs can be seen in

    Table 1 [Cou96]. Note that # of partitions refers to the number of pipeline stages (each with its

    own control logic) in the design. DCT1 hrough DCT8 re not pipelined; they have only one parti-

    tion. # of mult is the number of multipliers in the design. memory refetch is yes if values

    for iteration i+1 of the loop are being fetched while iteration i is being executed. This column does

    grey code grey code of memory state memory

    example partitions of mult prefetch encoding access of nets

    DCT1 1 3 no no no 15810DCT2 1 3 no yes no 15879

    DCT3 1 2 no no no 12196

    DCT4 1 2 no yes no 12226

    DCT5 1 3 yes no no 16848

    DCT6 1 3 yes yes yes 16833

    DCT7 1 2 yes no no 13306

    DCT8 1 2 yes yes yes 13280

    DCT9 6 3 ..... no no 21327

    DCT10 6 3 ..... yes yes 21346

    DCTll 2 3 ..... no no 16725

    DCT12 2 3 ..... yes no 16745TABLE . DCT Descriptions

    not apply to the pipelined designs (DCT9 hrough DCT12) ince they are already executing multi-

    ple iterations of the loop at the same time. grey code state encoding and grey code memory

    accesses are yes if the states and memory ddresses, respectively, are accessed in grey code

    order. Finally, # of nets is the number of physical nets in each design.

    Table 2 shows the CPU time and memory usage for headstripper, tatestripper,

    i is tdrivers nd power__parser eing executed on three of the DCT designs. These three

    designs were chosen because DCT3 s the smallest example (i.e. fewest number of nets), DCT9

    one of the largest examples, and DCT1 s somewhere in between. The CPU ime is reported in

    A gate level simulator for power consumption nalysis May1, 1996 15

  • 8/10/2019 CMU-ECE-1996-018

    18/28

    minutes and seconds. Note that these results are dependent on which tatistics are being collected.

    headstripper statestripper listdrivers power_parser

    CPU time Memory CPU time Memory CPU time Memory CPU time Memory

    DCT1 0:21 92 KB 6:56 100 KB 1:32 52196 KB 24:44 40940 KBDCT3 0:17 92 KB 7:42 100 KB 1:19 50368 KB 20:05 39408 KB

    DCT9 0:26 92 KB 5:33 100 KB 2:20 60000 KB 32:59 45212 KBTABLE . Execution imes and memory sage for the power stimation for three of the DCT esigns

    As stated in Section 3.2.6, collecting statistics by module nd state both involve some overhead n

    computation ime. For each of the examples, he same statistics were being collected: total energy,

    energy consumed y each multiplier, energy consumed y all of the adders and subtracters, energy

    consumed y registers, energy consumed y random glue logic (everything except the above mod-

    ules) and total energy by control state. The statistics were gathered on an IBM RS/6000worksta-

    tion with 384 MB f memory.

    For our DCT xamples, power_parser never took more than 35 minutes of CPU ime on the

    RS/6000,and most of the time was spent during the actual parsing of the VCD ile. Also, the

    power estimation environment never required more than 60 MB f memory.

    Thedisk space overhead or the output files is presented in Table 3. For the 12 versions of the DCT

    we tested, ranging from 12,000 to 21,000 nets, heads t ripper and s t a t es tripper

    involved an overhead of 1.0 to 1.7 megabytes of hard disk storage for the header information and

    56 to 140 kilobytes for the state information. Note that the input for the DCTs was25 8x8 blocks

    of imagedata, and that the disk space overhead or state information scales linearly with the num-

    ber of blocks in the simulation. The header information is unaffected by the length of the simula-

    A gate level simulator for power consumption nalysis May 1, 1996 16

  • 8/10/2019 CMU-ECE-1996-018

    19/28

    tion. The hard disk space overhead involved in storing the list of drivers is 297 to 490 kilobytes for

    the DCT xamples. This is independent of the length of the simulations.

    headstripper statestripper listdrivers power_parser

    DCT1 1216 KB 116 KB 395 KB < 1 KBDCT2 1223 K_B 116 KB 396 KB < 1 KB

    DCT3 1000 KB 137 KB 290 KB < 1 KB

    DCT4 1004 KB 137 KB 291 KB < 1 KB

    DCT5 1335 KB 81 KB 412 KB < 1 KB

    DCT6 1335 KB 81 KB 412 KB < 1 KB

    DCT7 1129 KB 101 KB 309 KB < 1 KB

    DCT8 1126 KB 101 KB 308 KB < 1 KB

    DCT9 1723 KI3 55 K13 479 KB < 1 KB

    DCT10 1726 KB 60 KB 479 KB < 1 KB

    DCTll 1318 KB 73 KB 411 KB < 1 KBDCT12 1320 KB 73 KB 411 KB < 1 KB

    TABLE . Disk space usage for power estimation of the 12 DCT esigns

    4.1 Results

    Figure 3 gives a comparison of the results for the 12 designs. Since all 12 designs were not laid-

    out, the results shown assume that all nets have the same capacitance, i. e. only transition counts

    are reported. One nteresting thing we immediately noticed from these results is that using grey

    code for the state encoding produced a 10% or better reduction in transition count for all of the

    designs. Also, notice that pipelining the design reduces power consumption.However, for these

    designs it would be arguable if pipelining into six stages, since in DCT9 nd DCT10, nstead of

    two stages, as in DCT11 nd DCT12, is worthwhile as power savings is about 10% while the com-

    plexity (as measured by number of nets) has increased over 30%. Still, if power savings is more

    important than minimizing complexity of the circuit, the six stage pipeline designs would be

    desired.

    Figure 4 shows the transition count by control state for the DCT and DCT designs. Again, the

    value of using grey code for state encoding can be seen here. By comparing the states in DCT1 nd

    A gate level simulator for power consumption nalysis May1, 1996 17

  • 8/10/2019 CMU-ECE-1996-018

    20/28

    Transitions by Module for Twelve DCT Designs

    8000000

    7000000

    6000000

    5000000

    4000000

    ~3000000

    2000000

    1000000

    DCT1 DCT2 DCT3[] DCT4 DCT5~a DCT6 DCT7 DCT8 DCT9 DCT10[] DCT11 DCT12

    Module

    FIGURE . Comparison of 12 DCT designs

    DCT2 hat perform the same functions, we see that the approximate 10% eduction is apparent in

    all similar states. (Remember hat because the states of DCT2 re grey coded, DCT1 tate 2 is the

    same as DCT2 tate 3, DCT1 tate 3 is the same as DCT2 tate 2, DCT1 tate 4 is the same as DCT

    state 6, etc.)

    Transitions by Control State for Two DCTDesigns

    140000012000001000000

    8000006000004200000

    1 2 3 4 5 6 7Control State

    FIGURE . Transition count by control state for two DCT esigns

    A gate level simulator for power onsumption nalysis April 30, 1996 18

  • 8/10/2019 CMU-ECE-1996-018

    21/28

    Finally, the DCT design was laid-out and capacitance values were extracted to give us the results

    in Figure 5. Notice that although the transition counts appear to be a good predictor of energy for

    the multipliers, transition counting alone underestimates the amount of power consumed y ran-

    dom ogic and overestimates the power consumed y the adders and subtracters. This makes ense

    as one can imagine hat the random ogic would often be driving fairly long nets with a large

    capacitance, while the nets inside the adders and subtracters would be very short 0.e. low capaci-

    tance) most of the time. The nets inside the array multipliers would not be as short as those inside

    the adders nor as long as those in the random ogic. Correspondingly, he transition counts for the

    multipliers are a better predictor of energy consumed han for either the random ogic or the adders

    and subtracters.

    The ransition counts in Figure 3 and Figure 5 differ because different target cell libraries were

    used. Also, because the library used for Figure 5 had more accurate timing models, more glitching

    occurred within the circuit. As a result, only 16 8x8 blocks were able to be simulated because of

    the larger size of the VCD ile. (The 16 block example roduced a VCD ile four times larger than

    the 25 block examples hat assume qual delay for all cells.) Thus, the results in Figure 3 and

    Figure 5 should not be compared gainst each other.

    Our gate level power estimation tool has produced estimations for 12 different DCT mplementa-

    tions. These estimations have allowed us to determine he relative impact of several high level

    transformations, such as grey coding state assignments, resource sharing and pipelining, on

    dynamicpowerdissipation for these designs.

    A gate level simulator for power consumption nalysis May1, 1996 19

  • 8/10/2019 CMU-ECE-1996-018

    22/28

  • 8/10/2019 CMU-ECE-1996-018

    23/28

  • 8/10/2019 CMU-ECE-1996-018

    24/28

    Appendix A - Users manual

    In the following sections, I shall outline how o use each subprogram f the power estimation tool.

    Please refer to Figure 2 on page 9 to see how ach of these tools are tied together.

    A.1 headstripper

    heads ripper eads he CD ile rom tdin nd utputS the eader ile o s tdout. o

    invoke t from he ommand ine, ou ould nter:

    headstripper < vcdfile > vcd.head

    or, if the VCDile is compressed:

    zcat vcdfile.gz [ headstrip~er > vcd.head

    A.2 statestripper

    statestripper lSO reads the VCD ile from stdin nd outputs to stdout. t also takes as

    an argument he hierarchical Verilog name of the net with the value of the control state. As men-

    tioned in Section 3.2.3 on page 10, the net must be a single multi-bit net and not a concatenation of

    nets. To invoke the program rom the command ine, you would enter:

    statestripper top.foo.bar. CSTATE < vcdfile > vcd. state

    or if the VCD ile is compressed:

    zcat vcdfile.gz I statestripper top.foo.bar.CSTATE > vcd. state

    A.3 l arsespf

    parsesp also eads rom tdin nd utputs o s tdout. t akes s an rgument he ierarchi-

    cal refix to be added o the net names resent in the SPF ile. For example, n our DCT xamples,

    the DCTwas laid-out, but in the Verilog source that passes input vectors to the DCT he DCT

    module was named top. dct_l. To invoke parsespf in this case we used:

    parsespf top.dct_l. < spffile > strippedspf

    A gate level simulator for power consumption nalysis May1, 1996 22

  • 8/10/2019 CMU-ECE-1996-018

    25/28

    listdrivers-v ../mod.v+turbo+3

    Note that the trailing period on tOp. ct_l, s necessary. If multiple modules were laid-out,

    then par s e sp f would be executed for each and then all of the SPF files could then be concate-

    nated together as follows:

    first_strippedspf second_strippedspf > final_strippedspf

    A.4 listdrivers

    1 i s t dr ive r s is a PLI routine and should be invoked in the same way as the original simula-

    tion. One additional argument must be added +dump_ ollowed by the name of the VCD eader

    file. For one of our DCT imulations, we invoked 1 i s tdr iver s as follows:

    +dump_dct.head sim.v -v dctlf.map2.v \-v ../ms0803vcells_mosis -v ../ms080_3vprims \

    The output of listdrivers list of the driver nets in the file DRV. ist.

    A.5 power_parser

    power_parser also accepts input from stdin and outputs to stdout. There are a number of arguments

    that can be passed to power_parser, outlined below.

    -fdrv driverfile name of the driver file created by listdrivers

    -fhead cd. head name of the VCD header file created by heads tripper

    -fstate cd. state name of the VCD state file created by statestripper

    - f s sp s t r i pp d s p f name f the ~ipped PF ile reated y a r s e s p f

    -state op. oo. bar. STATE name of the control state net. The net name should be

    the same as the one used in statestripper.

    A gate level simulator for power consumption nalysis May 1, 1996 23

  • 8/10/2019 CMU-ECE-1996-018

    26/28

    -net top. foo. bar. \* names of nets to determine energy consumption or. In this case,

    statistics would be gathered for the module op. foo. bar. Note that both leading and trailing

    s are legals, but internal *s are not. Note also that the * must be escaped with the \ so that

    the shell does not try to expand t. Any number f -net arguments s legal.

    As an example of the use of power_s)arser, the following command as used to get the statis-

    tics displayed in Figure 5, Power estimates including capacitance information for one DCT

    design (DCT ), on page 20:

    zcat ../tmp/dctll.gz I ../parser -fdrv dctll.drv \

    -fhead dctll.head -fstate dctll.state -state top.dct_l.CSTATE \

    -net top.dct_l.multi* -net top.dct_l.mult_l\* \

    -net top.dct_l.mult_2\* -net top.dct_l.mult_3\* \

    -net top.dct_l.U\* -net top.dct_l.r\* -net \*_reg\* \

    -net top.\* -fsspf dctll.spf > dctll.results

    A gate level simulator for power onsumption nalysis May , 1996 24

  • 8/10/2019 CMU-ECE-1996-018

    27/28

    References

    [Cho94] T. Chou, K. Roy and S. Prasad, Estimation of circuit activity considering signalcorrelations and simultaneous switching, Proceedings of ICCAD 4, pp. 300-303, Nov. 1994.

    [Cor90] T. H. Cormen, C. E. Leiserson and R. L. Rivest, Introduction to algorithms, NewYork: McGraw-Hill Book Company, pp. 244-259, 1990.

    [Cou96] S. L. Coumeri, private communication, April 1996.

    [Epi96] Epic Design Technologies, Inc., http://www.epic.com/powermill.html, 1996.

    [Dev90] S. Devadas, K. Keutzer and J. White, Estimation of power dissipation in CMOScombinational circuits, Proceedings of Custom C Conference 90, pp. 19.7.1-19.7.6.

    [Don79] W. Donath, Placement and average interconnection lengths of computer ogic,

    IEEETransactions on Circuits and Systems, pp. 272-277, April 1979.

    [Feu82] M. Feuer, Connectivity of random ogic, IEEETransactions on Computers, pp.29-33, Jan. 1982.

    [Gho92] A. Ghosh, S. Devadas, K. Keutzer, and J. White, Estimation of average switch-ing activity in combinational nd sequential circuits, Proceedings of DAC 2, pp.253-259, 1992.

    [Kri961 R. K. Krishnamurthy, . Lys and L. R. Carley, Static power driven voltage scalingand delay driven buffer sizing in mixed swing quadrail for sub-IV I/O swings,submitted to IEEE/ACM nternational Symposium n Low Power Electronics andDesign 96, August 1996.

    [Lan931 R E. Landman nd J. M. Rabaey, Power estimation for high level synthesis,Proceedings of EuroDAC 3, pp.361-366, Feb. 1993.

    [Lan941 P. E. Landman, Low-power rchitectural design methodologies, ElectronicsResearch Laboratory, College of Engineering, University of California, Berkeley(UCB/ERLM94/62), 1994.

    lLan95] E E. Landman nd J. M. Rabaey, Architectural power analysis: the dual type bitmethod, IEEE Transactions on VLSISystems, pp. 173-187, June 1995.

    [Mar94] R. Marculescu, D. Marculescu and M. Pedram, Switching activity analysis con-sidering spatiotemporal correlations, Proceedings oflCCAD 4, pp. 294-299,Nov. 1994.

    [Nag75] L. W. Nagel, SPICE2: computer program to simulate semiconductor circuits,Technical report, University of California, Berkeley (ERL-M520), 975.

    [Naj91] E Najm, Transition density, a stochastic measure of activity in digital circuits,Proceedings of DAC 1, pp. 644-649, June 1991.

    A gate level simulator for power consumption nalysis May 1, 1996 25

  • 8/10/2019 CMU-ECE-1996-018

    28/28