architect cheatsheet

3
© 2003 Karthik Ethirajan, all rights reserved Designer’s Cheat Sheet (tsmc13lp) Components Best Area Best Timing Area (um) Delay (nS) Dynamic (uA/MHz) Leakage (nA) Instance Area (um) Delay (nS) Dynamic (uA/MHz) Leakage (nA) Instance Regi -ster 1-bit 61 0.76 0.01 0.06 sdf1fc + invfc 77 0.37 0.02 0.10 sdf1fe en 1-bit 77 0.79 0.01 0.07 sdfe1i1fc+invfc 94 0.38 0.02 0.14 sdfe1i1fe cgc 61 0.09 0.02 0.06 cgcl1te - - - - - Mux 2-to-1 18 0.41 0.02 0.02 mxd2i1td - - - - - 4-to-1 49 0.79 0.05 0.03 mxd4i1te - - - - - 8-to-1 94 1.12 0.08 0.04 mxd8i1te - - - - - Adder 8-bit 395 3.52 1.14 0.77 fa_Array8_d 1,373 0.59 1.54 1.52 bk 16-bit 785 6.91 2.39 1.24 fa_Array16_d 3,122 0.80 3.62 3.02 bk 32-bit 1,566 13.69 4.93 2.16 fa_Array32_d 6,968 1.03 8.13 6.27 bk Multi -plier 8-bit 3,508 7.85 5.61 4.03 csa 4,745 5.80 8.67 5.57 csa 16-bit 14,630 16.61 39.43 16.62 csa 17,483 11.64 50.33 19.78 csa 32-bit 59,608 32.08 258.00 68.49 csa 65,386 23.32 316.09 74.63 csa RAM 128x10 14,471 4.41 7.18 8.29 0.97 20.32 qcsram1fkg00 _128x10_4_fs 21,659 2.74 9.43 10.78 2.04 188 qcsram1fkg00 _128x10_8_ff 1024x10 52,587 6.05 20.70 22.23 2.32 149.25 qcsram1fkg00 _1024x10_8_fs 80,343 3.12 29.47 31.47 4.73 552 qcsram2fkg00 _1024x10_32_ff 2048x16 134,251 7.81 41.33 44.68 3.77 469.23 qcsram1fki20 _2048x16_8_fs 176,127 3.54 54.18 57.26 7.32 1200 qcsram2fki20 _2048x16_32_ff ROM 4096x16 68,495 5.65 43.11 - 23.63 39.78 qcrom1fti_i10_ 4096x16_32_qf - - - - - - - Read Write Idle Read Write Idle Conditions: Logic: - used q3gl, 2002.05 library - hvt devices at 1.3v - output load of 0fF - 50% of all inputs toggle for power numbers - used Power Compiler to get power numbers - enabled DW Foundation - no wire-load models used for inter-connect - used a toggle/clock rate of 50MHz - process corner for power is “nominal” & for timing is “worse” Memory: - memory numbers are from Saber database - used sram_analysis script & rom_all.lef for area numbers - used nominal 1.3v for power numbers - used worst 1.17v & a slew of 0.2 (10% to 50%) for timing numbers - output load is 49 fF, 88 fF, 139 fF for rams in the increasing order of their size - output load is 164 fF for rom - Idle power is when 50% of all inputs toggle when clk = cs_n = ‘1’ - fs rams are all hvt’s; fm/ff/qf memories are hvt & lvt mixed

Upload: karthik-ethirajan

Post on 27-Jan-2017

81 views

Category:

Devices & Hardware


0 download

TRANSCRIPT

© 2003 Karthik Ethirajan, all rights reserved

Designer’s Cheat Sheet (tsmc13lp)

Components

Best Area Best Timing

Area

(um)

Delay

(nS)

Dynamic

(uA/MHz)

Leakage

(nA)

Instance Area

(um)

Delay

(nS)

Dynamic

(uA/MHz)

Leakage

(nA)

Instance

Regi

-ster

1-bit 61 0.76 0.01 0.06 sdf1fc + invfc 77 0.37 0.02 0.10 sdf1fe

en 1-bit 77 0.79 0.01 0.07 sdfe1i1fc+invfc 94 0.38 0.02 0.14 sdfe1i1fe

cgc 61 0.09 0.02 0.06 cgcl1te - - - - -

Mux

2-to-1 18 0.41 0.02 0.02 mxd2i1td - - - - -

4-to-1 49 0.79 0.05 0.03 mxd4i1te - - - - -

8-to-1 94 1.12 0.08 0.04 mxd8i1te - - - - -

Adder

8-bit 395 3.52 1.14 0.77 fa_Array8_d 1,373 0.59 1.54 1.52 bk

16-bit 785 6.91 2.39 1.24 fa_Array16_d 3,122 0.80 3.62 3.02 bk

32-bit 1,566 13.69 4.93 2.16 fa_Array32_d 6,968 1.03 8.13 6.27 bk

Multi

-plier

8-bit 3,508 7.85 5.61 4.03 csa 4,745 5.80 8.67 5.57 csa

16-bit 14,630 16.61 39.43 16.62 csa 17,483 11.64 50.33 19.78 csa

32-bit 59,608 32.08 258.00 68.49 csa 65,386 23.32 316.09 74.63 csa

RAM

128x10 14,471 4.41 7.18 8.29 0.97 20.32 qcsram1fkg00

_128x10_4_fs

21,659 2.74 9.43 10.78 2.04 188 qcsram1fkg00

_128x10_8_ff

1024x10 52,587 6.05 20.70 22.23 2.32 149.25 qcsram1fkg00

_1024x10_8_fs

80,343 3.12 29.47 31.47 4.73 552 qcsram2fkg00

_1024x10_32_ff

2048x16 134,251 7.81 41.33 44.68 3.77 469.23 qcsram1fki20

_2048x16_8_fs

176,127 3.54 54.18 57.26 7.32 1200 qcsram2fki20

_2048x16_32_ff

ROM 4096x16 68,495 5.65 43.11 - 23.63 39.78 qcrom1fti_i10_

4096x16_32_qf

- - - - - - -

Read Write Idle Read Write Idle

Conditions:

Logic:

- used q3gl, 2002.05 library

- hvt devices at 1.3v

- output load of 0fF

- 50% of all inputs toggle for power numbers

- used Power Compiler to get power numbers

- enabled DW Foundation

- no wire-load models used for inter-connect

- used a toggle/clock rate of 50MHz

- process corner for power is “nominal” & for timing is “worse”

Memory:

- memory numbers are from Saber database

- used sram_analysis script & rom_all.lef for area numbers

- used nominal 1.3v for power numbers

- used worst 1.17v & a slew of 0.2 (10% to 50%) for timing numbers

- output load is 49 fF, 88 fF, 139 fF for rams in the increasing order of their size

- output load is 164 fF for rom

- Idle power is when 50% of all inputs toggle when clk = cs_n = ‘1’

- fs rams are all hvt’s; fm/ff/qf memories are hvt & lvt mixed

© 2003 Karthik Ethirajan, all rights reserved

Designer’s Cheat Sheet (ibm8sf)

Components

Best Area Best Timing

Area

(um)

Delay

(nS)

Dynamic

(uA/MHz)

Leakage

(nA)

Instance Area

(um)

Delay

(nS)

Dynamic

(uA/MHz)

Leakage

(nA)

Instance

Regi

-ster

1-bit 63 0.52 0.01 0.09 sdf1fc + invfc 74 0.32 0.01 0.15 sdf1fe

en 1-bit 76 0.56 0.01 0.12 sdfe1i1fc + invfc 99 0.33 0.02 0.17 sdfe1i1fe

cgc 51 0.10 0.01 0.13 cgcl1te - - - - -

Mux

2-to-1 28 0.19 0.03 0.08 mxd2i1td - - - - -

4-to-1 62 0.32 0.06 0.13 mxd4i1td - - - - -

8-to-1 84 0.49 0.06 0.10 mxd8i1tc - - - - -

Adder

8-bit 532 2.05 1.52 1.68 fa_Array8_d 1,482 0.41 1.28 3.34 bk

16-bit 1,056 4.25 3.13 3.35 fa_Array16_d 3,525 0.56 3.33 8.17 bk

32-bit 2,105 8.63 6.39 6.71 fa_Array32_d 6,767 0.70 6.16 14.63 bk

Multi

-plier

8-bit 4,400 4.43 7.15 12.45 csa 6,072 3.38 10.61 17.12 csa

16-bit 19,613 9.16 50.67 56.81 csa 24,900 6.43 76.56 73.36 csa

32-bit 82,695 18.09 332.18 241.86 csa 91,622 12.65 451.77 271.49 csa

RAM

128x10 14,603 3.37 6.00 7.20 0.85 35.76 qcsram1fkg00

_128x10_4_fs

- - - - - - qcsram1fkg00

_128x10_8_ff

1024x10 52,742 4.52 17.27 18.99 2.19 221.38 qcsram1fkg00

_1024x10_8_fs

- - - - - - qcsram2fkg00

_1024x10_32_ff

2048x16 141,129 4.51 37.42 40.24 4.81 685.05 qcsram1fti20

_2048x16_16_fm

- - - - - - qcsram2fki20

_2048x16_32_ff

ROM 4096x16 68,947 4.46 33.85 - 3.66 125.75 qcrom1fti_i10_

4096x16_32_qf

- - - - - - -

Read Write Idle Read Write Idle

Conditions:

Logic:

- used q3gl, 2002.12 library

- hvt devices at 1.3v

- output load of 0fF

- 50% of all inputs toggle for power numbers

- used Power Compiler to get power numbers

- enabled DW Foundation

- no wire-load models used for inter-connect

- used a toggle/clock rate of 50MHz

- process corner for power is “nominal” & for timing is “worse”

Memory:

- memory numbers are from Cougar3 database

- used cougar3_ram/rom.lef for area numbers

- used nominal 1.3v for power numbers

- used worst 1.17v & a slew of 0.15 (35% to 65%) for timing numbers

- output load is 38 fF, 66 fF, 181 fF for rams in the increasing order of their size

- output load is 180 fF for rom

- idle power is when 50% of all inputs toggle when clk = cs_n = ‘1’

- fs rams are all hvt’s; fm/ff/qf memories are hvt & lvt mixed

© 2003 Karthik Ethirajan, all rights reserved

An Example: Shift Register Power Vs. Area trade-off

Design A is the traditional shift register design and design B is an alternative for better power.

085-059

d q d q d qdin

clk

[1023] [1022] [0]

Design A

dout

d q

en

d q

en

[1023]

d q

en

[1022]

dout

din

clk

[1023]

[0]

[0]

[1022]

[0]

[1022]

[1023]

10 10

10

Design B

d0 d1 d1023 d1024 d1025

d0 d1 d1023 d1024 d1025

0 1 1023 0 1

d0 d1024

d1023

clk

din

dout

en0

en1

en1023

mux_sel

q0

q1 d1 d1025

q1023

CO

UN

TER

ER

ECO

D

D

Design A = 1024 x 1-bit register

Design B = 1024 x 1-bit enable registers

+ 10-bit counter -- 10 x 1-bit register + 10-bit adder (interpolate from 8-bit adder)

+ 1024-to-1 mux -- (128 + 16 + 2) x 8-to-1 mux + 1 x 2-to-1 mux

+ 10-to-1024 decoder -- ~ 1024-to-1 mux

for tsmc13lp process Design A Design B % Change from Design A B

Area (microns) 62,464 107,578 +72%

Dynamic Power (uA/MHz) 10 8 -20%

Leakage Power (nA) 61 85 +39%

Assumptions for Design B:

1. When output of a register is not toggling it consumes 50% less dynamic power

2. Area of 10-to-1024 decoder is the same as 1024-to-1 mux

3. Mux consumes negligible dynamic power since only one input toggles every cycle

4. Dynamic power for the decoder equals 1/8th

that of 146 x 8-to-1 mux from cheatsheet