ece260b w05 design style
DESCRIPTION
multi vdd designTRANSCRIPT
-
ECE 260B CSE 241A Design Styles 1 http:/ /vlsicad.ucsd.edu
ECE260B CSE241AWinter 2005
Design StylesMulti-Vdd/Vth Designs
Website: http:/ /vlsicad.ucsd.edu/courses/ece260b-w05
-
ECE 260B CSE 241A Design Styles 2 http:/ /vlsicad.ucsd.edu
The Design Problem
Source: sematech97
A growing gap between design complexity and design productivity
-
ECE 260B CSE 241A Design Styles 3 http:/ /vlsicad.ucsd.edu
Design Methodology
Design process traverses iteratively between three abstractions: behavior, structure, and geometry More and more automation for each of these steps
-
ECE 260B CSE 241A Design Styles 4 http:/ /vlsicad.ucsd.edu
Behavioral Description of Accumulator
entity accumulator isport (
DI : in integer;DO : inout integer := 0;CLK : in bit
);end accumulator;
architecture behavior of accumulator isbegin
process(CLK)variable X : integer := 0; -- intermediate variablebegin
if CLK = '1' thenX
-
ECE 260B CSE 241A Design Styles 5 http:/ /vlsicad.ucsd.edu
Structural Description of Accumulator
entity accumulator isport ( -- definition of input and output terminals
DI: in bit_vector(15 downto 0) -- a vector of 16 bit wideDO: inout bit_vector(15 downto 0);CLK: in bit
);end accumulator;
architecture structure of accumulator iscomponent reg -- definition of register ports
port (DI : in bit_vector(15 downto 0);DO : out bit_vector(15 downto 0);CLK : in bit
);end component;component add -- definition of adder ports
port (IN0 : in bit_vector(15 downto 0);IN1 : in bit_vector(15 downto 0);OUT0 : out bit_vector(15 downto 0)
);end component;
-- definition of accumulator structuresignal X : bit_vector(15 downto 0);begin
add1 : addport map (DI, DO, X); -- defines port connectivity
reg1 : regport map (X, DO, CLK);
end structure;
Design defined as composition ofregister and full-adder cells (netlist)
Data represented as {0,1,Z}
Time discretized and progresses withunit steps
Description language: VHDLOther options: schematics, Verilog
-
ECE 260B CSE 241A Design Styles 6 http:/ /vlsicad.ucsd.edu
Implementation Methodologies
Digital Circuit Implementation Approaches
Custom Semi-custom
Cell-Based Array-Based
Standard Cells Macro Cells Pre-diffused Pre-wired(FPGA)Compiled Cells (Gate Arrays)
-
ECE 260B CSE 241A Design Styles 7 http:/ /vlsicad.ucsd.edu
Full Custom
Hand drawn geometryAll layers customizedDigital and analogSimulation at transistor level High densityHigh performanceLong design time
Magic Layout Editor(UC Berkeley)
-
ECE 260B CSE 241A Design Styles 8 http:/ /vlsicad.ucsd.edu
Symbolic Layout
1
3
I n O u t
V D D
G N D
Stick diagram of inverter
Dimensionless layout entities Only topology is important Final layout generated by compaction program
-
ECE 260B CSE 241A Design Styles 9 http:/ /vlsicad.ucsd.edu
Standard Cells
FunctionalModule(RAM,multiplier,
)
Row
s o
f Cel
ls
Logic Cell
RoutingChannel
Feedthrough Cell
Routing channel requirements arereduced by presenceof more interconnectlayers
Organized in rowsCells made as full custom by
vendor (not user)All layers customizedDigital with possible special
analog cells
Simulation at gate level (digital)
Medium-high densityMedium-high performanceReasonable design time
-
ECE 260B CSE 241A Design Styles 10 http:/ /vlsicad.ucsd.edu
Standard Cell Example
[Brodersen92]
-
ECE 260B CSE 241A Design Styles 11 http:/ /vlsicad.ucsd.edu
Standard Cell - Example
3-input NAND cell(from Mississippi State Library)characterized for fanout of 4 andfor three different technologies
-
ECE 260B CSE 241A Design Styles 12 http:/ /vlsicad.ucsd.edu
Automatic Cell Generation
Random-logic layoutgenerated by CLEOcell compiler (Digital)
-
ECE 260B CSE 241A Design Styles 13 http:/ /vlsicad.ucsd.edu
Module Generators Compiled Datapath
add
er
buffe
r
reg0
reg1
mu
x
bus0
bus2
bus1
bit-slicerouting area feed-through
Advantages: One-dimensional placement/routing problem
-
ECE 260B CSE 241A Design Styles 14 http:/ /vlsicad.ucsd.edu
Macrocell-Based Design
Macrocell
Interconnect Bus
Routing Channel
Predefined macro blocks (uP, RAM, etc.)Macro blocks made as full custom by vendor (IP blocks)All layers customizedDigital and some analogSimulation at behavior
or gate level
High densityHigh performanceShort design timeUse standard on-chip bussesSystem on a chip (SOC)
-
ECE 260B CSE 241A Design Styles 15 http:/ /vlsicad.ucsd.edu
Macrocell Design Methodogoly
Video-encoder chip[Brodersen92]
SRAM
SRAM
Rout
i ng
Chan
nel
Data paths
Standard cells
Floorplan:Defines overalltopology of design,relative placement ofmodules, and global routes of busses,supplies, and clocks
-
ECE 260B CSE 241A Design Styles 16 http:/ /vlsicad.ucsd.edu
Gate Array
rows ofcells
routing channel
uncommitted
Predefined transistors connected via metalTwo types: channel based, sea of gatesOnly metal layers customizedFixed array sizesDigital cells in librarySimulation at gate level (digital)Medium densityMedium performanceReasonable design time
-
ECE 260B CSE 241A Design Styles 17 http:/ /vlsicad.ucsd.edu
Gate Array Primitive Cells
VD D
GND
polysilicon
metal
possiblecontact
In1 In2 In3 In4
Out
UncommitedCell
CommittedCell(4-input NOR)
-
ECE 260B CSE 241A Design Styles 18 http:/ /vlsicad.ucsd.edu
Sea-of-gate Primitive Cells
N M O S
P M O S
O x id e - i s o l a t io n
P M O S
N M O S
N M O S
Using oxide-isolation Using gate-isolation
-
ECE 260B CSE 241A Design Styles 19 http:/ /vlsicad.ucsd.edu
Sea-of-gates
Random Logic
MemorySubsystem
LSI Logic LEA300K(0.6 m CMOS)
-
ECE 260B CSE 241A Design Styles 20 http:/ /vlsicad.ucsd.edu
Prewired ArraysProgrammable logic blocksProgrammable connections between logic blocksNo layers customized (standard devices)Digital onlyLow-medium performanceLow-medium densityProgrammable: SRAM, EPROM, Flash,
Anti-fuse, etc.
Easy and quick design changesCheap design toolsLow development costHigh device costNOT a real ASIC
Courtesy Altera Corp.
-
ECE 260B CSE 241A Design Styles 21 http:/ /vlsicad.ucsd.edu
Programmable Logic Devices
PLA PROM PAL
-
ECE 260B CSE 241A Design Styles 22 http:/ /vlsicad.ucsd.edu
EPLD Block Diagram
Macrocell
Courtesy Altera Corp.
Primary inputs
-
ECE 260B CSE 241A Design Styles 23 http:/ /vlsicad.ucsd.edu
Field-Programmable Gate Arrays - Fuse-based
I / O B u f f e r s
P r o g r a m / T e s t / D i a g n o s t i c s
I / O B u f f e r s
I/O
Buffe
rs
I/O
Buffe
rs
V e r t i c a l r o u t e s
R o w s o f l o g i c m o d u l e sR o u t i n g c h a n n e l s
Standard-cell likefloorplan
-
ECE 260B CSE 241A Design Styles 24 http:/ /vlsicad.ucsd.edu
Interconnect
C e l l
H o r i z o n t a lt r a c k s
V e r t i c a l t r a c k s
I n p u t / o u t p u t p i n
A n t i f u s e
P r o g r a m m e d i n t e r c o n n e c t i o n
Programming interconnect using anti-fuses
-
ECE 260B CSE 241A Design Styles 25 http:/ /vlsicad.ucsd.edu
Field-Programmable Gate Arrays - RAM-based
CLB CLB
CLBCLB
switching matrixHorizontalroutingchannel
Vertical routing channel
Interconnect point
-
ECE 260B CSE 241A Design Styles 26 http:/ /vlsicad.ucsd.edu
RAM-based FPGA - Basic Cell (CLB)
RQ 1D
C E
RQ 2D
C E
FG
FG
F
G
RD i n
C l o c k
C E
F
G
AB / Q 1 / Q 2C / Q 1 / Q 2
D
AB / Q 1 / Q 2C / Q 1 / Q 2
D
E
C o m b i n a t i o n a l l o g i c S t o r a g e e l e m e n t s
Any function of up to 4 variables
Any function of up to 4 variables
Courtesy of Xilinx
-
ECE 260B CSE 241A Design Styles 27 http:/ /vlsicad.ucsd.edu
RAM-based FPGA
Xilinx XC4025
-
ECE 260B CSE 241A Design Styles 28 http:/ /vlsicad.ucsd.edu
High Performance Devices
Mixture of full custom, standard cells and macros Full custom for special blocks: Adder (data path), etc.Macros for standard blocks: RAM, ROM, etc.Standard cells for non critical digital blocks
-
ECE 260B CSE 241A Design Styles 29 http:/ /vlsicad.ucsd.edu
Global Signaling and Layout
Global signaling and layout optimization
Multi-VddStatic power analysisMulti-Vth + Vdd + sizing
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 30 http:/ /vlsicad.ucsd.edu
Global SignalingCurrent global signaling paradigm insert large static
CMOS repeaters to reduce wire RC delay
Impending problems:l Too many repeaters
- 180nm processors: 22K repeaters (Itanium), 70K (Power4)- Project 1-1.5M repeaters at 45-65nm technologies
l Too much power- Many large repeaters = significant static and dynamic power
l Too much noise- Repeater clustering complicates power distribution- Inductive coupling across wide bus structures
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 31 http:/ /vlsicad.ucsd.edu
Cell Layout OptimizationAdvanced layout techniques must allow
l Continuous individual device sizingl Variable p/n ratiosl Tapered FET stacking sizesl Arbitrary Vth assignments within gates
First cut: Cadabra 15-22% power reduction using 1st two approaches under fixed footprint constraint
GDSII Import Compact fixed widthRef: Hurat, Cadabra
Optimize specific instances of
standard gates
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 32 http:/ /vlsicad.ucsd.edu
Multi-Vdd
Global signaling and layout optimization
Multi-VddStatic power analysisMulti-Vth + Vdd + sizing
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 33 http:/ /vlsicad.ucsd.edu
Multi-Vdd Status
Idea: Incorporate two Vdds to reduce dynamic power
Limited to a few recent Japanese multimedia processorsl Example 0.3 m, 75MHz, 3.3V media processor (Toshiba)
- Total power savings of 47% in logic, 69% in clockl Dynamic voltage scaling of mobile processors
- Transmeta Crusoe, Intel Speedstep, etc.- Not considered in this talk
Very powerful technique currently applied only inlow-performance designs
l Mentality: todays high performance parts arent limited by power
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 34 http:/ /vlsicad.ucsd.edu
Lower Power Via Rich Replacement
Media processors and other low speed designs have many non-critical paths
l 60-70% of paths have delay half the clock period
l After replacement, most paths become near critical
What about high-speed microprocessors?
% of t
ota
l pa
ths
Path delay (normalized to clock period)
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 35 http:/ /vlsicad.ucsd.edu
Similar Story For High-Performance
IBM 480 MHz PowerPC shows over 50% of paths have delay less than half the clock period
l Implies that high-performance designs can benefit from multi-Vdd
Ref: Akrout, JSSC98D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 36 http:/ /vlsicad.ucsd.edu
Resizing Is Not The Right Answer
Post-synthesis optimizations resize gates to recover power on non-critical paths
l Looks similar to pre- and post-replacement figures in media processor
Before post-synthesis resizing
After post-synthesis resizing
Ref: Sirichotiyakul, DAC99
This is the wrong approach for nanometer design!
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 37 http:/ /vlsicad.ucsd.edu
Multi-Vdd Instead of Sizing
Power ~ C Vdd2 f, where f is fixed
Key: Reducing gate width impacts power sub-linearlyl Interconnect capacitance is not affected
Reducing supply voltage cuts power quadraticallyl All capacitive loads have lower voltage swing
How can we minimize delay penalty at low Vdd?
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 38 http:/ /vlsicad.ucsd.edu
Challenges For Multi-Vdd
Area overheadl Toshiba reported 7% rise in area due to placement restrictions,
level converters, additional power grid routing
EDA tool support for the above issues (placement, dual power routing)
Noise analysisl Additional shielding required between Vdd,low and Vdd,high
signals?l Including clock network
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 39 http:/ /vlsicad.ucsd.edu
Static Power
Global signaling and layout optimizationMulti-Vdd
Static powerMulti-Vth + Vdd + sizing
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 40 http:/ /vlsicad.ucsd.edu
Static PowerWhy do we care about static power in non-portable
devices?l Standby power is wasted -- leaves fewer Watts for
computationl Worsens reliability by raising die temperatures
Leakage current is a function of Vth and subthreshold swing (Ss) (x10 at operating vs. room temp!)
Ss expected to remain at 80-85 mV/dec (room temp)l Device technology may cut this by ~20%
Vth reductions are mandated by scaling Vddl Vth has been around Vdd/5
I off10 10
V thS s A/ m
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 41 http:/ /vlsicad.ucsd.edu
Current StatusNo sub-1V technologies demonstrate good on/off current
performance (yet expect improvements in production) Oxide scaling is running out of steam; overall ~3x Ioff per node
807500.66-8 (physical)50ITRS 2000300012500.611 (uses high-k)45ITRS 2001
407500.98-12 (physical)70ITRS 2000137501.212-15 (physical)100ITRS 2000167231.013 (physical)100NEC,0036501.23270Intel,99
108001.227100TI,99
106971.22570NEC,00
108601.221100Samsung,00
1005140.851850-70Intel,00
Ioff (nA/m)
Ion (A/m)
VddTox () (electrical)ITRS node
Reference
Working numbers
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 42 http:/ /vlsicad.ucsd.edu
Leakage Suppression Approaches
Dual-Vth (most common)l Low-Vth on critical paths, high-Vth offl Only cost is additional masks
MTCMOSl Series inserted high-Vth device cuts
leakage current when off (sleep mode)l Delay and area penalties, control
device sizing is critical
Other techniquesl Substrate biasing to control Vthl Dual-Vth domino
- Use low-Vth devices only inevaluate paths
Pull Up
Pull Down
ParasiticNode
Vcontrol
Vout
Vdd
High Vth Device
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 43 http:/ /vlsicad.ucsd.edu
Can Gate-length biasing help leakage reduction?
Reduce leakage?
00.20.40.60.8
11.2
130
131
132
133
134
135
136
137
138
139
140
Gate-length (nm)
LeakageDelay
Variation of leakage and delay (each normalized to 1) for an NMOS device in an industrial 130nm technology
Reduce leakage variability?Leakage Variability
Gate-length
Leak
age
Leakage Variability
Gate-length
Leak
age
Biasing
-
ECE 260B CSE 241A Design Styles 44 http:/ /vlsicad.ucsd.edu
Gate-length Biasing
First proposed by Sirisantana et al.l Comparative study of effect of doping, tox and gate-lengthl
Large bias used, significant slow down
Small biasl Little reduction in leakage beyond 10% bias while delay degrades
linearlyl Preserves pin compatibility Technique applicable as post-RET step
Salient featuresl Design cycle not interferedl Zero cost (no additional masks)
-
ECE 260B CSE 241A Design Styles 45 http:/ /vlsicad.ucsd.edu
Granularity
Technology-levelAll devices in all cells have one biased gate-length
Cell-levelAll devices in a cell have one biased gate-length
Device-levelAll devices have independent biased gate-lengthSimplification: In each cell, NMOS devices have one gate-length and PMOS devices have another
-
ECE 260B CSE 241A Design Styles 46 http:/ /vlsicad.ucsd.edu
Device-Level Leakage Reduction
0
5
10
15
20
25
30
35
40
INVX4 NANDX4 BUFX4 ANDX6
Leakage saving with a delay penalty of up to 10% (Simplified device level biasing)
Low VtNom VtHigh Vt
-
ECE 260B CSE 241A Design Styles 47 http:/ /vlsicad.ucsd.edu
Circuit level
Bias gate-length for non-critical cellsLibrary extended with each cell having a biased versionBenefits analyzed in conjunction with Multi-VT
assignment and in isolationl SVT-SGLl DVT-SGLl SVT-DGLl DVT-DGL
-
ECE 260B CSE 241A Design Styles 48 http:/ /vlsicad.ucsd.edu
Results: Leakage Reduction
00.10.20.30.40.50.60.70.80.9
1No
rmal
ized
Le
akag
e
c5315 c6288 c7552 alu128
SVT-SGLSVT-DGLDVT-SGLDVT-DGL
With less than 2.5% delay penalty
Design Compiler used for VT assignment and gate-length biasing Better results expected with Duet (academic sizer from Michigan)
-
ECE 260B CSE 241A Design Styles 49 http:/ /vlsicad.ucsd.edu
Results: Leakage Variability
Leakage distribution for the testcase alu128Traces shown Unbiased circuit Technology level biasing Uniform biasing
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
c5315 c6288 c7552 alu128
Percentage Reduction in Leakage Spread
-
ECE 260B CSE 241A Design Styles 50 http:/ /vlsicad.ucsd.edu
Futures
Construction of effective biasing based leakage optimization heuristics
Gate-length selection at true device-level granularityEvaluation of gate-length biasing at future technology
nodes
-
ECE 260B CSE 241A Design Styles 51 http:/ /vlsicad.ucsd.edu
Multi-Vth + Vdd + Sizing
Global signaling and layout optimizationMulti-VddStatic power analysis
Multi-Vth + Vdd + sizing
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 52 http:/ /vlsicad.ucsd.edu
Multi-Everything
Need an approach that selects between speed, static power, and dynamic power
Should be scalable to nanometer designl Rules out dual-Vth domino or other dynamic logic families (low
supplies kill performance advantages)Techniques mentioned so far
l Flexible, optimized cell layoutsl Multi-Vddl Dual-Vth
Put them all together
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 53 http:/ /vlsicad.ucsd.edu
Multi-Vdd Can Leverage Vths
Existing designs using multi-Vdd do not alter Vth in low-Vdd cells
l Highly sub-optimal, delay is fully penalizedl Limits cell replacement limits power savings
Much better solution: reduce Vth in low-Vdd cells to carefully balance delay, static power, and dynamic power
l Enforce technology scaling within a chip whenever we reduce Vdd, we also reduce Vth to maintain speed
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 54 http:/ /vlsicad.ucsd.edu
Multi-Vdd + Vth Negates Delay PenaltyDelay ~ CVdd/ Ion
Scenariosl Constant Vth (current paradigm)l Scale Vth to maintain constant static powerl Scale Vth to reduce static power linearly with Vdd
Delay penalty is substantially offset Ion is very sensitive to Vth
at Vdd < 1V
Pstatic reduces with Vdd due to linear term and smaller Ioff (Ion and DIBL )
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 55 http:/ /vlsicad.ucsd.edu
Now Add Sizing
Multi-Vdd + multi-Vth + sizing/cell layout optimization attacks power from many angles (multi-dimensional)
Depending on criticality and switching activities, non-critical gates can be:
l Assigned Vdd,lowl Assigned Vdd,low + lower Vthl Assigned Vth,highl Downsized (at the individual transistor level if advantageous)l Assigned Vdd,low and upsized
- For gates that cannot tolerate Vdd,low delay, this can be power efficient
l And others
D. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 56 http:/ /vlsicad.ucsd.edu
SummaryPower density must saturate to maintain affordable
packaging optionsl 50 W/cm2 means 200-250W for future large MPUsl Dynamic thermal management saves 25% on packaging power
budget
Multi-Vdd will leverage multiple Vths to offset delay penalty at low Vdd
l More widespread re-assignment to Vdd,lowl Use Vdd first instead of re-sizing to take advantage of large
path slacksl Anticipated power savings of 50-80%
Static power also addressed through multi-Vth + Vdd + sizing
l Vth difficult to control in ultra-short channelsl Intra-cell Vth assignment + MTCMOS/variants + sleep modesD. Sylvester, DAC-2001
-
ECE 260B CSE 241A Design Styles 57 http:/ /vlsicad.ucsd.edu
Next Week: Project Meetings
D. Sylvester, DAC-2001