architect cheatsheet
TRANSCRIPT
© 2003 Karthik Ethirajan, all rights reserved
Designer’s Cheat Sheet (tsmc13lp)
Components
Best Area Best Timing
Area
(um)
Delay
(nS)
Dynamic
(uA/MHz)
Leakage
(nA)
Instance Area
(um)
Delay
(nS)
Dynamic
(uA/MHz)
Leakage
(nA)
Instance
Regi
-ster
1-bit 61 0.76 0.01 0.06 sdf1fc + invfc 77 0.37 0.02 0.10 sdf1fe
en 1-bit 77 0.79 0.01 0.07 sdfe1i1fc+invfc 94 0.38 0.02 0.14 sdfe1i1fe
cgc 61 0.09 0.02 0.06 cgcl1te - - - - -
Mux
2-to-1 18 0.41 0.02 0.02 mxd2i1td - - - - -
4-to-1 49 0.79 0.05 0.03 mxd4i1te - - - - -
8-to-1 94 1.12 0.08 0.04 mxd8i1te - - - - -
Adder
8-bit 395 3.52 1.14 0.77 fa_Array8_d 1,373 0.59 1.54 1.52 bk
16-bit 785 6.91 2.39 1.24 fa_Array16_d 3,122 0.80 3.62 3.02 bk
32-bit 1,566 13.69 4.93 2.16 fa_Array32_d 6,968 1.03 8.13 6.27 bk
Multi
-plier
8-bit 3,508 7.85 5.61 4.03 csa 4,745 5.80 8.67 5.57 csa
16-bit 14,630 16.61 39.43 16.62 csa 17,483 11.64 50.33 19.78 csa
32-bit 59,608 32.08 258.00 68.49 csa 65,386 23.32 316.09 74.63 csa
RAM
128x10 14,471 4.41 7.18 8.29 0.97 20.32 qcsram1fkg00
_128x10_4_fs
21,659 2.74 9.43 10.78 2.04 188 qcsram1fkg00
_128x10_8_ff
1024x10 52,587 6.05 20.70 22.23 2.32 149.25 qcsram1fkg00
_1024x10_8_fs
80,343 3.12 29.47 31.47 4.73 552 qcsram2fkg00
_1024x10_32_ff
2048x16 134,251 7.81 41.33 44.68 3.77 469.23 qcsram1fki20
_2048x16_8_fs
176,127 3.54 54.18 57.26 7.32 1200 qcsram2fki20
_2048x16_32_ff
ROM 4096x16 68,495 5.65 43.11 - 23.63 39.78 qcrom1fti_i10_
4096x16_32_qf
- - - - - - -
Read Write Idle Read Write Idle
Conditions:
Logic:
- used q3gl, 2002.05 library
- hvt devices at 1.3v
- output load of 0fF
- 50% of all inputs toggle for power numbers
- used Power Compiler to get power numbers
- enabled DW Foundation
- no wire-load models used for inter-connect
- used a toggle/clock rate of 50MHz
- process corner for power is “nominal” & for timing is “worse”
Memory:
- memory numbers are from Saber database
- used sram_analysis script & rom_all.lef for area numbers
- used nominal 1.3v for power numbers
- used worst 1.17v & a slew of 0.2 (10% to 50%) for timing numbers
- output load is 49 fF, 88 fF, 139 fF for rams in the increasing order of their size
- output load is 164 fF for rom
- Idle power is when 50% of all inputs toggle when clk = cs_n = ‘1’
- fs rams are all hvt’s; fm/ff/qf memories are hvt & lvt mixed
© 2003 Karthik Ethirajan, all rights reserved
Designer’s Cheat Sheet (ibm8sf)
Components
Best Area Best Timing
Area
(um)
Delay
(nS)
Dynamic
(uA/MHz)
Leakage
(nA)
Instance Area
(um)
Delay
(nS)
Dynamic
(uA/MHz)
Leakage
(nA)
Instance
Regi
-ster
1-bit 63 0.52 0.01 0.09 sdf1fc + invfc 74 0.32 0.01 0.15 sdf1fe
en 1-bit 76 0.56 0.01 0.12 sdfe1i1fc + invfc 99 0.33 0.02 0.17 sdfe1i1fe
cgc 51 0.10 0.01 0.13 cgcl1te - - - - -
Mux
2-to-1 28 0.19 0.03 0.08 mxd2i1td - - - - -
4-to-1 62 0.32 0.06 0.13 mxd4i1td - - - - -
8-to-1 84 0.49 0.06 0.10 mxd8i1tc - - - - -
Adder
8-bit 532 2.05 1.52 1.68 fa_Array8_d 1,482 0.41 1.28 3.34 bk
16-bit 1,056 4.25 3.13 3.35 fa_Array16_d 3,525 0.56 3.33 8.17 bk
32-bit 2,105 8.63 6.39 6.71 fa_Array32_d 6,767 0.70 6.16 14.63 bk
Multi
-plier
8-bit 4,400 4.43 7.15 12.45 csa 6,072 3.38 10.61 17.12 csa
16-bit 19,613 9.16 50.67 56.81 csa 24,900 6.43 76.56 73.36 csa
32-bit 82,695 18.09 332.18 241.86 csa 91,622 12.65 451.77 271.49 csa
RAM
128x10 14,603 3.37 6.00 7.20 0.85 35.76 qcsram1fkg00
_128x10_4_fs
- - - - - - qcsram1fkg00
_128x10_8_ff
1024x10 52,742 4.52 17.27 18.99 2.19 221.38 qcsram1fkg00
_1024x10_8_fs
- - - - - - qcsram2fkg00
_1024x10_32_ff
2048x16 141,129 4.51 37.42 40.24 4.81 685.05 qcsram1fti20
_2048x16_16_fm
- - - - - - qcsram2fki20
_2048x16_32_ff
ROM 4096x16 68,947 4.46 33.85 - 3.66 125.75 qcrom1fti_i10_
4096x16_32_qf
- - - - - - -
Read Write Idle Read Write Idle
Conditions:
Logic:
- used q3gl, 2002.12 library
- hvt devices at 1.3v
- output load of 0fF
- 50% of all inputs toggle for power numbers
- used Power Compiler to get power numbers
- enabled DW Foundation
- no wire-load models used for inter-connect
- used a toggle/clock rate of 50MHz
- process corner for power is “nominal” & for timing is “worse”
Memory:
- memory numbers are from Cougar3 database
- used cougar3_ram/rom.lef for area numbers
- used nominal 1.3v for power numbers
- used worst 1.17v & a slew of 0.15 (35% to 65%) for timing numbers
- output load is 38 fF, 66 fF, 181 fF for rams in the increasing order of their size
- output load is 180 fF for rom
- idle power is when 50% of all inputs toggle when clk = cs_n = ‘1’
- fs rams are all hvt’s; fm/ff/qf memories are hvt & lvt mixed
© 2003 Karthik Ethirajan, all rights reserved
An Example: Shift Register Power Vs. Area trade-off
Design A is the traditional shift register design and design B is an alternative for better power.
085-059
d q d q d qdin
clk
[1023] [1022] [0]
Design A
dout
d q
en
d q
en
[1023]
d q
en
[1022]
dout
din
clk
[1023]
[0]
[0]
[1022]
[0]
[1022]
[1023]
10 10
10
Design B
d0 d1 d1023 d1024 d1025
d0 d1 d1023 d1024 d1025
0 1 1023 0 1
d0 d1024
d1023
clk
din
dout
en0
en1
en1023
mux_sel
q0
q1 d1 d1025
q1023
CO
UN
TER
ER
ECO
D
D
Design A = 1024 x 1-bit register
Design B = 1024 x 1-bit enable registers
+ 10-bit counter -- 10 x 1-bit register + 10-bit adder (interpolate from 8-bit adder)
+ 1024-to-1 mux -- (128 + 16 + 2) x 8-to-1 mux + 1 x 2-to-1 mux
+ 10-to-1024 decoder -- ~ 1024-to-1 mux
for tsmc13lp process Design A Design B % Change from Design A B
Area (microns) 62,464 107,578 +72%
Dynamic Power (uA/MHz) 10 8 -20%
Leakage Power (nA) 61 85 +39%
Assumptions for Design B:
1. When output of a register is not toggling it consumes 50% less dynamic power
2. Area of 10-to-1024 decoder is the same as 1024-to-1 mux
3. Mux consumes negligible dynamic power since only one input toggles every cycle
4. Dynamic power for the decoder equals 1/8th
that of 146 x 8-to-1 mux from cheatsheet