ece260b w05 design style

57
ECE 260B – CSE 241A Design Styles 1 http:/ / vlsicad.ucsd.edu ECE260B – CSE241A Winter 2005 Design Styles Multi-Vdd/ Vth Designs Website: http://vlsicad.ucsd.edu/courses/ece260b- w05

Upload: harish-kumar

Post on 06-Nov-2015

232 views

Category:

Documents


5 download

DESCRIPTION

multi vdd design

TRANSCRIPT

  • ECE 260B CSE 241A Design Styles 1 http:/ /vlsicad.ucsd.edu

    ECE260B CSE241AWinter 2005

    Design StylesMulti-Vdd/Vth Designs

    Website: http:/ /vlsicad.ucsd.edu/courses/ece260b-w05

  • ECE 260B CSE 241A Design Styles 2 http:/ /vlsicad.ucsd.edu

    The Design Problem

    Source: sematech97

    A growing gap between design complexity and design productivity

  • ECE 260B CSE 241A Design Styles 3 http:/ /vlsicad.ucsd.edu

    Design Methodology

    Design process traverses iteratively between three abstractions: behavior, structure, and geometry More and more automation for each of these steps

  • ECE 260B CSE 241A Design Styles 4 http:/ /vlsicad.ucsd.edu

    Behavioral Description of Accumulator

    entity accumulator isport (

    DI : in integer;DO : inout integer := 0;CLK : in bit

    );end accumulator;

    architecture behavior of accumulator isbegin

    process(CLK)variable X : integer := 0; -- intermediate variablebegin

    if CLK = '1' thenX

  • ECE 260B CSE 241A Design Styles 5 http:/ /vlsicad.ucsd.edu

    Structural Description of Accumulator

    entity accumulator isport ( -- definition of input and output terminals

    DI: in bit_vector(15 downto 0) -- a vector of 16 bit wideDO: inout bit_vector(15 downto 0);CLK: in bit

    );end accumulator;

    architecture structure of accumulator iscomponent reg -- definition of register ports

    port (DI : in bit_vector(15 downto 0);DO : out bit_vector(15 downto 0);CLK : in bit

    );end component;component add -- definition of adder ports

    port (IN0 : in bit_vector(15 downto 0);IN1 : in bit_vector(15 downto 0);OUT0 : out bit_vector(15 downto 0)

    );end component;

    -- definition of accumulator structuresignal X : bit_vector(15 downto 0);begin

    add1 : addport map (DI, DO, X); -- defines port connectivity

    reg1 : regport map (X, DO, CLK);

    end structure;

    Design defined as composition ofregister and full-adder cells (netlist)

    Data represented as {0,1,Z}

    Time discretized and progresses withunit steps

    Description language: VHDLOther options: schematics, Verilog

  • ECE 260B CSE 241A Design Styles 6 http:/ /vlsicad.ucsd.edu

    Implementation Methodologies

    Digital Circuit Implementation Approaches

    Custom Semi-custom

    Cell-Based Array-Based

    Standard Cells Macro Cells Pre-diffused Pre-wired(FPGA)Compiled Cells (Gate Arrays)

  • ECE 260B CSE 241A Design Styles 7 http:/ /vlsicad.ucsd.edu

    Full Custom

    Hand drawn geometryAll layers customizedDigital and analogSimulation at transistor level High densityHigh performanceLong design time

    Magic Layout Editor(UC Berkeley)

  • ECE 260B CSE 241A Design Styles 8 http:/ /vlsicad.ucsd.edu

    Symbolic Layout

    1

    3

    I n O u t

    V D D

    G N D

    Stick diagram of inverter

    Dimensionless layout entities Only topology is important Final layout generated by compaction program

  • ECE 260B CSE 241A Design Styles 9 http:/ /vlsicad.ucsd.edu

    Standard Cells

    FunctionalModule(RAM,multiplier,

    )

    Row

    s o

    f Cel

    ls

    Logic Cell

    RoutingChannel

    Feedthrough Cell

    Routing channel requirements arereduced by presenceof more interconnectlayers

    Organized in rowsCells made as full custom by

    vendor (not user)All layers customizedDigital with possible special

    analog cells

    Simulation at gate level (digital)

    Medium-high densityMedium-high performanceReasonable design time

  • ECE 260B CSE 241A Design Styles 10 http:/ /vlsicad.ucsd.edu

    Standard Cell Example

    [Brodersen92]

  • ECE 260B CSE 241A Design Styles 11 http:/ /vlsicad.ucsd.edu

    Standard Cell - Example

    3-input NAND cell(from Mississippi State Library)characterized for fanout of 4 andfor three different technologies

  • ECE 260B CSE 241A Design Styles 12 http:/ /vlsicad.ucsd.edu

    Automatic Cell Generation

    Random-logic layoutgenerated by CLEOcell compiler (Digital)

  • ECE 260B CSE 241A Design Styles 13 http:/ /vlsicad.ucsd.edu

    Module Generators Compiled Datapath

    add

    er

    buffe

    r

    reg0

    reg1

    mu

    x

    bus0

    bus2

    bus1

    bit-slicerouting area feed-through

    Advantages: One-dimensional placement/routing problem

  • ECE 260B CSE 241A Design Styles 14 http:/ /vlsicad.ucsd.edu

    Macrocell-Based Design

    Macrocell

    Interconnect Bus

    Routing Channel

    Predefined macro blocks (uP, RAM, etc.)Macro blocks made as full custom by vendor (IP blocks)All layers customizedDigital and some analogSimulation at behavior

    or gate level

    High densityHigh performanceShort design timeUse standard on-chip bussesSystem on a chip (SOC)

  • ECE 260B CSE 241A Design Styles 15 http:/ /vlsicad.ucsd.edu

    Macrocell Design Methodogoly

    Video-encoder chip[Brodersen92]

    SRAM

    SRAM

    Rout

    i ng

    Chan

    nel

    Data paths

    Standard cells

    Floorplan:Defines overalltopology of design,relative placement ofmodules, and global routes of busses,supplies, and clocks

  • ECE 260B CSE 241A Design Styles 16 http:/ /vlsicad.ucsd.edu

    Gate Array

    rows ofcells

    routing channel

    uncommitted

    Predefined transistors connected via metalTwo types: channel based, sea of gatesOnly metal layers customizedFixed array sizesDigital cells in librarySimulation at gate level (digital)Medium densityMedium performanceReasonable design time

  • ECE 260B CSE 241A Design Styles 17 http:/ /vlsicad.ucsd.edu

    Gate Array Primitive Cells

    VD D

    GND

    polysilicon

    metal

    possiblecontact

    In1 In2 In3 In4

    Out

    UncommitedCell

    CommittedCell(4-input NOR)

  • ECE 260B CSE 241A Design Styles 18 http:/ /vlsicad.ucsd.edu

    Sea-of-gate Primitive Cells

    N M O S

    P M O S

    O x id e - i s o l a t io n

    P M O S

    N M O S

    N M O S

    Using oxide-isolation Using gate-isolation

  • ECE 260B CSE 241A Design Styles 19 http:/ /vlsicad.ucsd.edu

    Sea-of-gates

    Random Logic

    MemorySubsystem

    LSI Logic LEA300K(0.6 m CMOS)

  • ECE 260B CSE 241A Design Styles 20 http:/ /vlsicad.ucsd.edu

    Prewired ArraysProgrammable logic blocksProgrammable connections between logic blocksNo layers customized (standard devices)Digital onlyLow-medium performanceLow-medium densityProgrammable: SRAM, EPROM, Flash,

    Anti-fuse, etc.

    Easy and quick design changesCheap design toolsLow development costHigh device costNOT a real ASIC

    Courtesy Altera Corp.

  • ECE 260B CSE 241A Design Styles 21 http:/ /vlsicad.ucsd.edu

    Programmable Logic Devices

    PLA PROM PAL

  • ECE 260B CSE 241A Design Styles 22 http:/ /vlsicad.ucsd.edu

    EPLD Block Diagram

    Macrocell

    Courtesy Altera Corp.

    Primary inputs

  • ECE 260B CSE 241A Design Styles 23 http:/ /vlsicad.ucsd.edu

    Field-Programmable Gate Arrays - Fuse-based

    I / O B u f f e r s

    P r o g r a m / T e s t / D i a g n o s t i c s

    I / O B u f f e r s

    I/O

    Buffe

    rs

    I/O

    Buffe

    rs

    V e r t i c a l r o u t e s

    R o w s o f l o g i c m o d u l e sR o u t i n g c h a n n e l s

    Standard-cell likefloorplan

  • ECE 260B CSE 241A Design Styles 24 http:/ /vlsicad.ucsd.edu

    Interconnect

    C e l l

    H o r i z o n t a lt r a c k s

    V e r t i c a l t r a c k s

    I n p u t / o u t p u t p i n

    A n t i f u s e

    P r o g r a m m e d i n t e r c o n n e c t i o n

    Programming interconnect using anti-fuses

  • ECE 260B CSE 241A Design Styles 25 http:/ /vlsicad.ucsd.edu

    Field-Programmable Gate Arrays - RAM-based

    CLB CLB

    CLBCLB

    switching matrixHorizontalroutingchannel

    Vertical routing channel

    Interconnect point

  • ECE 260B CSE 241A Design Styles 26 http:/ /vlsicad.ucsd.edu

    RAM-based FPGA - Basic Cell (CLB)

    RQ 1D

    C E

    RQ 2D

    C E

    FG

    FG

    F

    G

    RD i n

    C l o c k

    C E

    F

    G

    AB / Q 1 / Q 2C / Q 1 / Q 2

    D

    AB / Q 1 / Q 2C / Q 1 / Q 2

    D

    E

    C o m b i n a t i o n a l l o g i c S t o r a g e e l e m e n t s

    Any function of up to 4 variables

    Any function of up to 4 variables

    Courtesy of Xilinx

  • ECE 260B CSE 241A Design Styles 27 http:/ /vlsicad.ucsd.edu

    RAM-based FPGA

    Xilinx XC4025

  • ECE 260B CSE 241A Design Styles 28 http:/ /vlsicad.ucsd.edu

    High Performance Devices

    Mixture of full custom, standard cells and macros Full custom for special blocks: Adder (data path), etc.Macros for standard blocks: RAM, ROM, etc.Standard cells for non critical digital blocks

  • ECE 260B CSE 241A Design Styles 29 http:/ /vlsicad.ucsd.edu

    Global Signaling and Layout

    Global signaling and layout optimization

    Multi-VddStatic power analysisMulti-Vth + Vdd + sizing

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 30 http:/ /vlsicad.ucsd.edu

    Global SignalingCurrent global signaling paradigm insert large static

    CMOS repeaters to reduce wire RC delay

    Impending problems:l Too many repeaters

    - 180nm processors: 22K repeaters (Itanium), 70K (Power4)- Project 1-1.5M repeaters at 45-65nm technologies

    l Too much power- Many large repeaters = significant static and dynamic power

    l Too much noise- Repeater clustering complicates power distribution- Inductive coupling across wide bus structures

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 31 http:/ /vlsicad.ucsd.edu

    Cell Layout OptimizationAdvanced layout techniques must allow

    l Continuous individual device sizingl Variable p/n ratiosl Tapered FET stacking sizesl Arbitrary Vth assignments within gates

    First cut: Cadabra 15-22% power reduction using 1st two approaches under fixed footprint constraint

    GDSII Import Compact fixed widthRef: Hurat, Cadabra

    Optimize specific instances of

    standard gates

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 32 http:/ /vlsicad.ucsd.edu

    Multi-Vdd

    Global signaling and layout optimization

    Multi-VddStatic power analysisMulti-Vth + Vdd + sizing

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 33 http:/ /vlsicad.ucsd.edu

    Multi-Vdd Status

    Idea: Incorporate two Vdds to reduce dynamic power

    Limited to a few recent Japanese multimedia processorsl Example 0.3 m, 75MHz, 3.3V media processor (Toshiba)

    - Total power savings of 47% in logic, 69% in clockl Dynamic voltage scaling of mobile processors

    - Transmeta Crusoe, Intel Speedstep, etc.- Not considered in this talk

    Very powerful technique currently applied only inlow-performance designs

    l Mentality: todays high performance parts arent limited by power

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 34 http:/ /vlsicad.ucsd.edu

    Lower Power Via Rich Replacement

    Media processors and other low speed designs have many non-critical paths

    l 60-70% of paths have delay half the clock period

    l After replacement, most paths become near critical

    What about high-speed microprocessors?

    % of t

    ota

    l pa

    ths

    Path delay (normalized to clock period)

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 35 http:/ /vlsicad.ucsd.edu

    Similar Story For High-Performance

    IBM 480 MHz PowerPC shows over 50% of paths have delay less than half the clock period

    l Implies that high-performance designs can benefit from multi-Vdd

    Ref: Akrout, JSSC98D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 36 http:/ /vlsicad.ucsd.edu

    Resizing Is Not The Right Answer

    Post-synthesis optimizations resize gates to recover power on non-critical paths

    l Looks similar to pre- and post-replacement figures in media processor

    Before post-synthesis resizing

    After post-synthesis resizing

    Ref: Sirichotiyakul, DAC99

    This is the wrong approach for nanometer design!

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 37 http:/ /vlsicad.ucsd.edu

    Multi-Vdd Instead of Sizing

    Power ~ C Vdd2 f, where f is fixed

    Key: Reducing gate width impacts power sub-linearlyl Interconnect capacitance is not affected

    Reducing supply voltage cuts power quadraticallyl All capacitive loads have lower voltage swing

    How can we minimize delay penalty at low Vdd?

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 38 http:/ /vlsicad.ucsd.edu

    Challenges For Multi-Vdd

    Area overheadl Toshiba reported 7% rise in area due to placement restrictions,

    level converters, additional power grid routing

    EDA tool support for the above issues (placement, dual power routing)

    Noise analysisl Additional shielding required between Vdd,low and Vdd,high

    signals?l Including clock network

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 39 http:/ /vlsicad.ucsd.edu

    Static Power

    Global signaling and layout optimizationMulti-Vdd

    Static powerMulti-Vth + Vdd + sizing

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 40 http:/ /vlsicad.ucsd.edu

    Static PowerWhy do we care about static power in non-portable

    devices?l Standby power is wasted -- leaves fewer Watts for

    computationl Worsens reliability by raising die temperatures

    Leakage current is a function of Vth and subthreshold swing (Ss) (x10 at operating vs. room temp!)

    Ss expected to remain at 80-85 mV/dec (room temp)l Device technology may cut this by ~20%

    Vth reductions are mandated by scaling Vddl Vth has been around Vdd/5

    I off10 10

    V thS s A/ m

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 41 http:/ /vlsicad.ucsd.edu

    Current StatusNo sub-1V technologies demonstrate good on/off current

    performance (yet expect improvements in production) Oxide scaling is running out of steam; overall ~3x Ioff per node

    807500.66-8 (physical)50ITRS 2000300012500.611 (uses high-k)45ITRS 2001

    407500.98-12 (physical)70ITRS 2000137501.212-15 (physical)100ITRS 2000167231.013 (physical)100NEC,0036501.23270Intel,99

    108001.227100TI,99

    106971.22570NEC,00

    108601.221100Samsung,00

    1005140.851850-70Intel,00

    Ioff (nA/m)

    Ion (A/m)

    VddTox () (electrical)ITRS node

    Reference

    Working numbers

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 42 http:/ /vlsicad.ucsd.edu

    Leakage Suppression Approaches

    Dual-Vth (most common)l Low-Vth on critical paths, high-Vth offl Only cost is additional masks

    MTCMOSl Series inserted high-Vth device cuts

    leakage current when off (sleep mode)l Delay and area penalties, control

    device sizing is critical

    Other techniquesl Substrate biasing to control Vthl Dual-Vth domino

    - Use low-Vth devices only inevaluate paths

    Pull Up

    Pull Down

    ParasiticNode

    Vcontrol

    Vout

    Vdd

    High Vth Device

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 43 http:/ /vlsicad.ucsd.edu

    Can Gate-length biasing help leakage reduction?

    Reduce leakage?

    00.20.40.60.8

    11.2

    130

    131

    132

    133

    134

    135

    136

    137

    138

    139

    140

    Gate-length (nm)

    LeakageDelay

    Variation of leakage and delay (each normalized to 1) for an NMOS device in an industrial 130nm technology

    Reduce leakage variability?Leakage Variability

    Gate-length

    Leak

    age

    Leakage Variability

    Gate-length

    Leak

    age

    Biasing

  • ECE 260B CSE 241A Design Styles 44 http:/ /vlsicad.ucsd.edu

    Gate-length Biasing

    First proposed by Sirisantana et al.l Comparative study of effect of doping, tox and gate-lengthl

    Large bias used, significant slow down

    Small biasl Little reduction in leakage beyond 10% bias while delay degrades

    linearlyl Preserves pin compatibility Technique applicable as post-RET step

    Salient featuresl Design cycle not interferedl Zero cost (no additional masks)

  • ECE 260B CSE 241A Design Styles 45 http:/ /vlsicad.ucsd.edu

    Granularity

    Technology-levelAll devices in all cells have one biased gate-length

    Cell-levelAll devices in a cell have one biased gate-length

    Device-levelAll devices have independent biased gate-lengthSimplification: In each cell, NMOS devices have one gate-length and PMOS devices have another

  • ECE 260B CSE 241A Design Styles 46 http:/ /vlsicad.ucsd.edu

    Device-Level Leakage Reduction

    0

    5

    10

    15

    20

    25

    30

    35

    40

    INVX4 NANDX4 BUFX4 ANDX6

    Leakage saving with a delay penalty of up to 10% (Simplified device level biasing)

    Low VtNom VtHigh Vt

  • ECE 260B CSE 241A Design Styles 47 http:/ /vlsicad.ucsd.edu

    Circuit level

    Bias gate-length for non-critical cellsLibrary extended with each cell having a biased versionBenefits analyzed in conjunction with Multi-VT

    assignment and in isolationl SVT-SGLl DVT-SGLl SVT-DGLl DVT-DGL

  • ECE 260B CSE 241A Design Styles 48 http:/ /vlsicad.ucsd.edu

    Results: Leakage Reduction

    00.10.20.30.40.50.60.70.80.9

    1No

    rmal

    ized

    Le

    akag

    e

    c5315 c6288 c7552 alu128

    SVT-SGLSVT-DGLDVT-SGLDVT-DGL

    With less than 2.5% delay penalty

    Design Compiler used for VT assignment and gate-length biasing Better results expected with Duet (academic sizer from Michigan)

  • ECE 260B CSE 241A Design Styles 49 http:/ /vlsicad.ucsd.edu

    Results: Leakage Variability

    Leakage distribution for the testcase alu128Traces shown Unbiased circuit Technology level biasing Uniform biasing

    0.00%

    10.00%

    20.00%

    30.00%

    40.00%

    50.00%

    60.00%

    c5315 c6288 c7552 alu128

    Percentage Reduction in Leakage Spread

  • ECE 260B CSE 241A Design Styles 50 http:/ /vlsicad.ucsd.edu

    Futures

    Construction of effective biasing based leakage optimization heuristics

    Gate-length selection at true device-level granularityEvaluation of gate-length biasing at future technology

    nodes

  • ECE 260B CSE 241A Design Styles 51 http:/ /vlsicad.ucsd.edu

    Multi-Vth + Vdd + Sizing

    Global signaling and layout optimizationMulti-VddStatic power analysis

    Multi-Vth + Vdd + sizing

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 52 http:/ /vlsicad.ucsd.edu

    Multi-Everything

    Need an approach that selects between speed, static power, and dynamic power

    Should be scalable to nanometer designl Rules out dual-Vth domino or other dynamic logic families (low

    supplies kill performance advantages)Techniques mentioned so far

    l Flexible, optimized cell layoutsl Multi-Vddl Dual-Vth

    Put them all together

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 53 http:/ /vlsicad.ucsd.edu

    Multi-Vdd Can Leverage Vths

    Existing designs using multi-Vdd do not alter Vth in low-Vdd cells

    l Highly sub-optimal, delay is fully penalizedl Limits cell replacement limits power savings

    Much better solution: reduce Vth in low-Vdd cells to carefully balance delay, static power, and dynamic power

    l Enforce technology scaling within a chip whenever we reduce Vdd, we also reduce Vth to maintain speed

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 54 http:/ /vlsicad.ucsd.edu

    Multi-Vdd + Vth Negates Delay PenaltyDelay ~ CVdd/ Ion

    Scenariosl Constant Vth (current paradigm)l Scale Vth to maintain constant static powerl Scale Vth to reduce static power linearly with Vdd

    Delay penalty is substantially offset Ion is very sensitive to Vth

    at Vdd < 1V

    Pstatic reduces with Vdd due to linear term and smaller Ioff (Ion and DIBL )

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 55 http:/ /vlsicad.ucsd.edu

    Now Add Sizing

    Multi-Vdd + multi-Vth + sizing/cell layout optimization attacks power from many angles (multi-dimensional)

    Depending on criticality and switching activities, non-critical gates can be:

    l Assigned Vdd,lowl Assigned Vdd,low + lower Vthl Assigned Vth,highl Downsized (at the individual transistor level if advantageous)l Assigned Vdd,low and upsized

    - For gates that cannot tolerate Vdd,low delay, this can be power efficient

    l And others

    D. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 56 http:/ /vlsicad.ucsd.edu

    SummaryPower density must saturate to maintain affordable

    packaging optionsl 50 W/cm2 means 200-250W for future large MPUsl Dynamic thermal management saves 25% on packaging power

    budget

    Multi-Vdd will leverage multiple Vths to offset delay penalty at low Vdd

    l More widespread re-assignment to Vdd,lowl Use Vdd first instead of re-sizing to take advantage of large

    path slacksl Anticipated power savings of 50-80%

    Static power also addressed through multi-Vth + Vdd + sizing

    l Vth difficult to control in ultra-short channelsl Intra-cell Vth assignment + MTCMOS/variants + sleep modesD. Sylvester, DAC-2001

  • ECE 260B CSE 241A Design Styles 57 http:/ /vlsicad.ucsd.edu

    Next Week: Project Meetings

    D. Sylvester, DAC-2001