lecture 5: state-of-the-art design practice: case...

Post on 22-May-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© T. Kuroda (1/99)

Lecture 5: State-of-the-art Design Practice: Case Study

Tadahiro KurodaVisiting MacKay ProfessorDepartment of EECSUniversity of California, Berkeleytadahiro@eecs.berkeley.edu, kuroda@elec.keio.ac.jphttp://bwrc.eecs.berkeley.edu/Classes/ee290c_s07http://www.kuroda.elec.keio.ac.jp/

EE290c Spring 2007, Tues & Thurs 9:30-11:00, 212 Cory UCB

With materials from T. Hattori, A. Inoue, M. Sumita, M. Hamada

© T. Kuroda (2/99)

1. Application Processor (SH-mobile) : Renesas Technology

2. Embedded Processor : Fujitsu

3. Digital Consumer : Panasonic

4. Wireless SoC (Bluetooth) : Toshiba

Case Studies

© T. Kuroda (3/99)

1. Application Processor (SH-mobile) : Renesas Technology

2. Embedded Processor : Fujitsu

3. Digital Consumer : Panasonic

4. Wireless SoC (Bluetooth) : Toshiba

Case Studies

Courtesy T. Hattori, Renesas Technology

© T. Kuroda (4/99)

Embedded Processor Power Analysis • Dynamic power is dominant in LP-process

Dynamic power component ratio for 130-nm CPU core• FF power & Clock power

Logic

40%25%SRAM FF

Clock20%15%

leakage leakage[Generic] [Low-power]

Benchmark: Matrix mul.Tj=85 C

Benchmark: Dhrystone2.1Tj=45 C

Active power = leakage + dynamic

dynamic dynamic

© T. Kuroda (5/99)

Standby Status of ProcessorsWaiting for responseDominant in real application (Wait for key-in, etc.)No relation with operation speed

Higher Vth only when standby mode!-> Back biasing on substrates

Stable substrates voltage in operationEfficient layout topology required

Leakage Power Reduction by Body Bias

© T. Kuroda (6/99)

VDD

VSS

3.3V 1.8V 0V

switch cell switch cell

1.8Vcell

200µm

Cbp Cbn

Vbn

Vbp

-1.5/0V

3.3/1.8V

standby control

3.3V region

1.8V region

bias control real-timeclock

Block Diagram of Back Biasing

© T. Kuroda (7/99)

vdd(M1)

vbp(M1)

vss(M1)vbn(M1)

pMOS

nMOS

vbpvbn cbn cbpvss(M2)vdd(M2)

(M2) (M2) (M2)(M2)

switch cellstd cell

pMOS

nMOS

Switch Cell Layout

© T. Kuroda (8/99)

1.8-V logic

vbnvbp

cbn cbp

vdd vss

vddvss

vbp

vbn

(M2) (M2) (M2)

(M2)

(M2,M4)(M1)

(M1,M5)

(M1)

(M1,M5)

(M2,M4)std cell

1.8-V logic

1.8-V logic

1.8-V logic

1.8-V logic

1.8-V logic

1.8-V logic

vss (M1)

~200 µm

switch cell

Back Biasing Layout Topology

© T. Kuroda (9/99)

off

on

subs

trat

e co

ntro

l

50 100Leak (µA)

0

*1 : sub threshold leak (1.8V area)*2 : pn junction leak (1.8V area)*3 : 3.3V area leak*1 *2 *3

46.5

1300

Power Reduction by Back Biasing

© T. Kuroda (10/99)

VBC (0.13mm2)

Implemented SOC

0.25µm CMOS

5 layers

1.8V (internal)

3.3V (I/O)

3.3M Tr.

© T. Kuroda (11/99)

20

18

16

14

1210

8

6

4

20

1.0 1.5 2.0 2.5Delay (nsec)

450MHz 400MHz

12.5%

Single Vth Cells (High)

Dual Vth Cells (High & Low)N

umbe

r of p

aths

(K)

Path Delay Distribution

© T. Kuroda (12/99)

0 0.2 0.4 0.6 0.8 1

Num

ber o

f pat

hs

Path delay [a. u.]0 0.2 0.4 0.6 0.8 1N

umbe

r of p

aths

0 0.2 0.4 0.6 0.8 1

Num

ber o

f pat

hs(a) All low-Vth cells (b) Binary modulation (c) Gradated modulation

Path

Del

ay D

istr

ibut

ion

Circ

uit

Path delay [a. u.]Path delay [a. u.]

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

Low-VthHigh-Vth

Dual Vth Cell Optimization

© T. Kuroda (13/99)

Implemented SOCProcess Technology:

0.13µm Dual Vth Low-Power CMOS5 Metal Layers (Cu)

Chip size: 58.5 mm2

URAM

7.7mm

7.6m

m

ProcessorCore

3DGraphicsEngine

MPEG-4

VIFLC

DC

© T. Kuroda (14/99)

Leakage current is increasing in sub-micron processFinal solution = Power off unused circuitsNew technical issues for power off/on management

Short recovery time is requiredunstable output signal from power-off region may cause short circuits currentSDRAM setting parameter etc. should be keptLeakage current in switch cells may be problemRush current when power-on may cause noise

Power switch cell, µIO cellData retention low-leak SRAMTrade off between LP mode and recovery timeHierarchical multi-power domain management

Power Management

© T. Kuroda (15/99)

URAM

µI/O

VDD

GND

ProcessorCore, MultimediaIPs, etc.

PSW1PSW2

backuplatchregister µ

I/O o

pen

PSC

INTC

PSWC2PSWC1

Partial Power off Inside SOC

Isolation Cell

Power Controller

Power Switch ControllerPower Switch

© T. Kuroda (16/99)

SRAM module has self-controlled power switchesActive mode

Vssm, Vssa, Vssc = Vss (always)Vddw = Vdd (access cycle only)

Data retention modeVssa, Vssc: cut offVddw: ~0.4 V downVssm: ~0.4 V up

PowerSWCtrl.

ctrl. Sense Amp.

WD

Drv

.

Memory Cell

Vss

VddVddw

VsscVssa

Vssmleakage reductionleakage reductionwith data retentionwith data retention

Low Leakage SRAM with Data Retention

© T. Kuroda (17/99)

Memory cellMemory cell Word driverWord driver AmpAmp

0 200 400 600 800 1000

conventional

(µA)

proposed(active)

proposed(retention)

920 µA

700 µA

50 µA

-25%

-95%

256-kB, Room Temp. VDD = 1.2 V

Power Reduction in SRAM

© T. Kuroda (18/99)

CPU coreIP & Peri.

URAMBkup. Reg.

I/OI/O ctrl.

Core(1.2 V)

I/O(2.85 V)

PSC

Two standby modes is provided using power switch

R-standbyLow leakage (~100µA)Short recovery time (<3ms)

U-standbyUltra low leakage(~10µA)

CPU coreIP & Peri.

URAMBkup. Reg.

I/OI/O ctrl.

Core(1.2 V)

I/O(2.85 V)

PSC

PSW1

PSW2

PSW1

PSW2

R-standby & U-standby

© T. Kuroda (19/99)

R-standby Recovery OperationHardware operations

Software operations

Power switch control (by PSC)Clock generation (PLL, DLL lock)Data backup using backup latch in µI/O circuit

BAR (Boot Address Register) holds restart addressClock and interrupt setting needed just after wake-up

URAM : data backup mem.Control registers OS task tableetc.

µI/O

VDD

GNDPSW1 PSW2

backuplatchBAR, etc.

URAM

© T. Kuroda (20/99)

Recovery Time from R-standby

0 1 2 3Time [ms]

PSW OnPLL & DLL Lock-inState TransitionRestore Reg.Restart Tasks

Recovery is triggered by Interrupt

Interrupt HardwareRecovery

SoftwareRecovery

InterruptHandler

PSW on, PLL&DLL lock(Backup latch)

Start from BAR addressRestore from URAM

Recovery Time

2.8 ms

© T. Kuroda (21/99)

URAM

PSW1

PSW2

7.7mm

7.6m

m

ProcessorCore

PSC3D

GraphicsEngine

MPEG-4

VIFLC

DC

Implementation in SOC

Process Technology:0.13µm Dual Vth Low-Power CMOS5 Metal Layers (Cu)

Chip size: 58.5 mm2

• PSW1area = 0.10 mm2

• PSW2area = 0.03 mm2

Only 0.2% of chip size

© T. Kuroda (22/99)

One chip integration of existing multi-chipsDivide into power domains

Partial power off -> reduce leakage currentisolation cell insertion, complex control-> Hierarchical power domain management

Chip level parts for special purposerepeater, clock distribution, back-up resisters-> common power domain

Layout topology for power switch and power lineUsage scene in mobile phone

Hierarchical Multi Power Domains

© T. Kuroda (23/99)

CPD

Chip Floorplan Power Domains

A1RA4U1

A1AA2 C5BW2

BG1

BA2

GSM

W-CDMA

BB-CU

APL-RTCPU

AP-SYSCPU

BB-Misc

AP-Misc

MediaRAM

3D G MPEG

Camera

Sound

JPEG

Power Domains20 power domains for partial power-offCPD (Common power domain) for repeater etc.

B* domains for BB, A* domains for AP

© T. Kuroda (24/99)

CPD: Common Power Domain

Sig1

Sig1CTL

µIOCPD

RepeaterRepeater

DCK1

QCTL

Backup latch Original FF

CPD

Backup latches

PLLCTL

µIOGlobal Clock Local Clock

CPD

Global Clock Buffer

Common Power-Domain Implementation

© T. Kuroda (25/99)

(thin Tox MOS)

VSSM_PDe

PSW (thick Tox MOS)

PDeVSSVDD

CPDPSW for PDx

PDx

VSSM_PDw

PDw

PSW for CPD

Power switch (PSW) implementation

1/4000 leakage @ 1MG

VSSM_PDeVSSM_CPD

global

local

VDDVSS

VSSM_PDw

PDePDw

Power line implementation

PDwPDe

Power switch controllers

Power Switch and Power Line Implementation

© T. Kuroda (26/99)

A1R

A4U2A3

A4U1

A1A

A2A4

C5AC

BW2

BA3

BW1

BG1 BG2BG3

BC

BA4BW3

C4

BA2

Video telephony

Measured Leakage Current (@ Room Temp, 1.2V)

Control

W-CDMA

GSM

System-domain

Realtime-domain

Basebandpart

Applicationpart

Power onPower off

ON

ON

ON / OFF

ON

ON

849 µA

Leakage Current in Usage Scenes (1)

© T. Kuroda (27/99)

Telephony (W-CDMA)

Measured Leakage Current (@ Room Temp, 1.2V)

Control

W-CDMA

GSM

System-domain

Realtime-domain

Basebandpart

Applicationpart

Power onPower off

ON

ON

ON / OFF

ON

OFF

407 µA

A1R

A4U2A3

A4U1

A1A

A2A4

C5AC

BW2

BA3

BW1

BG1 BG2BG3

BC

BA4BW3

C4

BA2

Leakage Current in Usage Scenes (2)

© T. Kuroda (28/99)

Waiting for Calling

Measured Leakage Current (@ Room Temp, 1.2V)

Control

W-CDMA

GSM

System-domain

Realtime-domain

Basebandpart

Applicationpart

Power onPower off

ON

OFF

OFF

OFF

OFF

299 µA

A1R

A4U2A3

A4U1

A1A

A2A4

C5AC

BW2

BA3

BW1

BG1 BG2BG3

BC

BA4BW3

C4

BA2

Leakage Current in Usage Scenes (3)

© T. Kuroda (29/99)

Power off ( I/O fixed)

Measured Leakage Current (@ Room Temp, 1.2V)

Control

W-CDMA

GSM

System-domain

Realtime-domain

Basebandpart

Applicationpart

Power onPower off

OFF

OFF

OFF

OFF

OFF

7 µA

A1R

A4U2A3

A4U1

A1A

A2A4

C5AC

BW2

BA3

BW1

BG1 BG2BG3

BC

BA4BW3

C4

BA2

Leakage Current in Usage Scenes (4)

© T. Kuroda (30/99)

EDA SupportsPower-off gate-level simulation

Set X to all FFs in power-off domain Transistor level leakage path checker

Check leakage path through well connect etc.µIO (isolation cell) insertion toolInter-power-domain isolation checkerDFT insertion considering power domainsIR drop calculation considering power switch

© T. Kuroda (31/99)

Die size 11.15mm x 11.15mm

# of TRs, gate, memory

181M TRs, 13.5M Gate 20.2 Mbit mem

Process 90nm LP8M(7Cu+1Al)CMOS dual-Vth

Supply voltage

1.2V(internal), 1.8/2.5/3.3V(I/O)

GSM

W-CDMA

BB-CPU

APL-RTCPU

AP-SYSCPU

BB-Misc

AP-Misc

MediaRAM

3D G MPEG

Camera

SoundCPG

LCDC

JPEG

DDR

SRAM

Implemented SOC

© T. Kuroda (32/99)

Summary (1)Three Key Low Power Techniques

Application specific low powerSoftware for low powerLeakage reduction by power switch

Why not Dynamic Voltage Frequency Scaling?Real time response, power management software difficulty, SRAM Vdd-min, level-shifter overhead, etc.

Request for EDASystem level power estimationPower switch verification

© T. Kuroda (33/99)

ISSCC98 ISSCC02 ISSCC04 ISSCC06

0.25um 0.18um 0.13um 0.09umClock GearStandby

Dual VthµIOOn chip SRAM

Pointer pipelineActivation ControlPhysical-synthesis

+DualVth

Fine clock gatingSpecific busMulti-media IP

Back-bias U-standby(Logic off)

R-standby(Data retention)

HierarchicalPower domain

@ Clock stop @ Power-off @ data retention @ active

GSM

W-CDMA

BB-CPU

APL-RTCPU

AP-SYSCPU

BB-Misc

AP-Misc

MediaRAM

3D G MPEG

Camera

SoundCPG

LCDC

JPEG

DDR

SRAMGSM

W-CDMA

BB-CPU

APL-RTCPU

AP-SYSCPU

BB-Misc

AP-Misc

MediaRAM

3D G MPEG

Camera

SoundCPG

LCDC

JPEG

DDR

SRAM

Summary (2)

© T. Kuroda (34/99)

1. Application Processor (SH-mobile) : Renesas Technology

2. Embedded Processor : Fujitsu

3. Digital Consumer : Panasonic

4. Wireless SoC (Bluetooth) : Toshiba

Case Studies

Courtesy A. Inoue, Fujitsu Laboratories Ltd.

© T. Kuroda (35/99)

VDD/VTH Control SchemeInitial Proposal

1994 Supply Voltage Control B. Brodersen et al. 1994 Body Bias Control T. Sakurai et al.

Apply to General Processors2000 Intel SpeedStepTM Mobile PIII 2000 AMD Power Now!TM Mobile AMD-K6-2 2000 Transmeta Longrun(2)TM Crusoe

Expand to Embedded ProcessorsARM IEMTM

Texas Instrument SmartreflexTM

Freescale Smart SpeedTM

NEC UltimateLowPowerTM

Fujitsu CoolAdjustTM

© T. Kuroda (36/99)

Power Reduction by VDD Control

Performance (MIPS)

Pow

er e

ffici

ency

(MIP

S/W

) Performance Tunable LSI

Static ControlDynamic Control

Static Control (ASV)Adjust supply voltage appropriate for each dieProcess compensationTemperature compensation

Dynamic Control (DVFS)Change supply voltage and/or body voltage during operationPower and performance trade offTuning performance

Technology trend

© T. Kuroda (37/99)

Static (ASV) vs. Dynamic (DVFS)Static (ASV) Dynamic (DVFS)

Power reduction Depend on process variation Depend on applications

Target General SoCs including ASIC and FPGA Processors

Implementation Possible only with hardware

Require hardware and software collaboration

Power estimation Easy Complex (Need

software to evaluate)Design verification testing

Easy Not so easy

Note: ASV (Adaptive Supply Voltage), DVFS (Dynamic Voltage Frequency Scaling) can co-exist on the same die.

© T. Kuroda (38/99)

FSV and ASV

Fast

Low

Hig

h

Worst case

Leakage

Path delay : VariableSupply voltage: Fixed

Process variationSlow Fast

Slow

Active

Slow Fast

Worst case

Path delay : FixedSupply voltage : Variable

Process variation

LeakageActive

Fixed Supply Voltage (FSV) Adaptive Supply Voltage (ASV)

Power Power SpeedSpeed

Low

Hig

h

Fast

Slow

© T. Kuroda (39/99)

Required Components for ASV System

Monitor process/temp.

condition

User logic

Control & I/F

DC-DCConverter

Inform necessary voltage

Supply appropriate voltageConvert to

necessary supply voltage

© T. Kuroda (40/99)

Conventional Monitoring Method

FF FF

Replica circuit of critical pathRef. clock

In Out

Supply voltage

Path delay

Slow Fixed

FixedFast

ASV

0 target

ASV

© T. Kuroda (41/99)

Limit of Replica Circuit MonitorPath-A (Low-Vth 100%)

Path-B (Med.-Vth 50%)Low-VthMed.-Vth

Multi-Vth CMOSFFFF

FF FF

Supply voltage

Path delay

Slow

Slow

Fixed

Fixed

ASV2

ASV1

Slow

Fast

0 target

ASV1ASV2

Path-A

Critical path changes!

Path-B

NGOK

Many replica circuits are required.

© T. Kuroda (42/99)

Impact on Critical Path Delay by Random Variation

Ν=

σσN

12

2

}2

)(exp{21*

}2

)(exp{21),(

2

21

∞− ⎥⎦

⎤⎢⎣

⎡ −−

−−=

∫M

ddtt

dCMdf

NN

NN

M

σµ

σπ

σµ

σπSimple critical path model

σ

N stage

M

σ σ

σ σ σd

0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

µ µ+σ

Critical Path Delay

Pro

bab

ility

Densi

ty

µ-σµ-2σµ-3σµ-4σ µ+2σ µ+3σ µ+4σ

M=1

M=10

M=100

M=1000

M=10000

M: Number of Critical Path

© T. Kuroda (43/99)

Monitoring Replica Delay with Fixed Delay Margin

µ

measd Nmeas ad σ+

Delay distribution for M critical paths

Delay distribution for single critical

path

σ σ σ

Fixed delay

margin

Naσ

10-81 10 100 1000 10000

Number of Critical Paths

Pro

babi

lity

of m

onit

orin

g fa

ilure

10-7

10-6

10-5

10-4

10-3

10-2

10-1

10 08=a7=a6=a5=a

4=a3=a1=a0=a

φ φ

© T. Kuroda (44/99)

Proposed Method for ASV

P-V table

Dependent on technology

Dependent on design

Process sensorLow-Vth

Medium-Vth(S)STA results

Conventional

ProposedP-CODE V-CODE

Ref.clock

FF FF

Ref. clockIn Out

Many replica circuits

© T. Kuroda (45/99)

Ring Osc. forlow-Vth cells

Ring Osc. formedium-Vth cells

F-Ptable

F-C

OD

E

P-C

OD

E

V-C

OD

E

Con

trol m

odul

e

User logicClock

ResetClock mode

P-Vtable D

C-D

Cco

nver

terP

ower

-su

pply

Temperature modeStatic voltage control (SVC) blockProcess sensor block

Implementation of Conversion TableEasy to implement into SOCs

No connection with user logicOnly P-V table depends on user logic

© T. Kuroda (46/99)

Comparison of Two MethodsMethod Advantage Disadvantage

Replica circuit monitor

Need to extract critical path for each design

Need to consider process variation carefully for each design

Conversion table

Easy to adopt multi-Vth circuits

Process variation can be considered by STA

It is easily implemented in general design flow

Necessary to run multiple STA to create table

© T. Kuroda (47/99)

Design Flow for Creating P-V Table

27conditions• 9 process corner• 3 supply voltage

Temporary table

Final table

Timingreport

Finallayout

Final layout

Replace

STAexecution

P-V tablecalculation

V-CODEarray

P-C

OD

E(M

ed.-V

th)

P-CODE(Low-Vth)

P-V table

© T. Kuroda (48/99)

Multi Mode/Multi Corner Analysis Runtime

0

50

100

150

200

250

300

1 2 3 4 5 6Design #

Run

time

(min

) MCMM STAConventional STA

Synthesis, Optimization, Analysis which support multi corner/multi mode (MCMM) are required to reduce design time.

Design # # of Modes

# of Corners

1 1 1

2 1 1

3 2 1

4 3 1

5 5 1

6 5 3

Analysis runtime example with MCMM STA

© T. Kuroda (49/99)

Embedded Processor Application

User logic Dual-core embedded microprocessor

Operation frequency

440MHz

Transistors Logic: 11.5 millionLow-Vth: 55%Medium-Vth: 45%High-Vth: 0.04%

RAM: 28.3 millionTotal: 39.8 million

Process technology

90nm triple-Vth CMOS

Supply voltage 1.1V-1.3V(ASV)

Chip size 7.2mm x 7.2mm(ASV module: 0.009%)

Core0

Core1

SVC block

Process sensor0

© T. Kuroda (50/99)

STA Result (P-V table)

1.30[V]

1.10[V]

Proc

ess

case

of

med

.-Vth

tran

sist

ors

- 3σSlow

0Typ.

3σFast

Process case of low-Vth transistors

- 3σSlow

0Typ.

3σFast

Tj = 125[℃], 440[MHz]Relation between supply voltage and process variation

1.15[V]

1.25[V]1.20[V]

© T. Kuroda (51/99)

Process Variation of Measured SamplesMeasurement results of 26 samples by process sensors

Process variationMed.-Vth = Low-Vth

Proc

ess

case

of m

ed.-V

thtr

ansi

stor

s

- 3σSlow

0Typ.

3σFast

Process case of low-Vth transistors

- 3σSlow

0Typ.

3σFast

© T. Kuroda (52/99)

Power Reduction Result with Proposed ASV

Three MPEG2 MP@ML stream decoding at 440MHz

1.39W

0.80

1.00

1.20

1.40

1.60

Process case of low-Vth transistors

Pow

er c

onsu

mpt

ion[

W]

0Typ.

3σFast

-1.5σSlow

(1.30V) -33%1.30W 1.34W

1.10W

0.93W

(1.25V)(1.20V)

(1.10V)● Fixed 1.3V● This work

-18%

(1.15V)

© T. Kuroda (53/99)

SummaryTechnology trend & supply voltage adjustment techniqueCombination of ring osc. and P-V table

Practical method for ASV implementationGood design portability applicable to ASICsFast MCMM Synthesis/Optimization/Timing Analysis are required.

Application example of dual core embedded processor

© T. Kuroda (54/99)

1. Application Processor (SH-mobile) : Renesas Technology

2. Embedded Processor : Fujitsu

3. Digital Consumer : Panasonic

4. Wireless SoC (Bluetooth) : Toshiba

Case Studies

Courtesy M. Sumita, Matsushita Electric Industrial Co. Ltd (Panasonic)

© T. Kuroda (55/99)

Mixed Body Bias Techniques

SRAM

Control Logic(CMOS)

Data Path Register File

(Domino circuit) Dual-Vt

High-Vt

Body Bias Gen.

Fixed Vt

Fixed Ids

Mid.-Vt

High-Vt

Supply optimal body bias voltage according to the requirements of each circuit type.

Mid.-Vt

Fixed Ids

Fixed Vt

© T. Kuroda (56/99)

Proposed Fixed Vt/Ids Body Bias Gen.

OTA

VDD3

NVDD

Vgsn

Buffer

VbnCG

Vb Max. Current : ±1mASelf Power : 210uWArea : 0.2X0.2mm2

VbnVSS

VgsnFixed Vt : VtFixed Ids : VDD1

Monitor

NMOS Type

© T. Kuroda (57/99)

Connected to Fixed Ids

Read Word

Memory Cell

Write bit line

Read bit line Read-Out Circuit

X8

Register File with Mixed Body Bias

Low : Fixed Vt -- Memory, Domino etc.High : Fixed Ids -- CMOS logic

MemoryCell

ArrayDec

oder

I/O

Fixed Ids

Noise Margin

Connected to Fixed Vt

Write Word

© T. Kuroda (58/99)

Dispersion

0.7

0.8

0.9

1

1.1

1.2

1.3

0.7 0.8 0.9 1 1.1 1.2 1.3Normalized Vt of PMOS

Nor

mal

ized

Vt o

f NM

OS

wafer1wafer2wafer3wafer4wafer5

MVt device

5 wafers with > 10% Vt variation

© T. Kuroda (59/99)

Effect of Fixed Vt Body Bias TechniqueImprovement in Noise Margin

Read Bit Line Noise Margin

0.00.10.20.30.40.50.60.7

Read Local Bit Line Read Global Bit Line

Noi

se M

argi

n [V

]

© T. Kuroda (60/99)

-50 -10 30 70 110

Nor

mal

ized

D

elay

VDD1=0.8VFreq.=100MHz

Mixed Body BiasNo Body Bias

1.0

1.2

0.6

1.0

1.4

Nor

mal

ized

C

urre

nt

Temperature [ºC]

Register File : Mixed Body Bias Results

© T. Kuroda (61/99)

SummaryProposed Mixed Body Bias techniques to fixed with Vt or Ids for each circuit.Fixed Ids and Fixed Vt Body Bias Generator - Error of Correlation: ±30mVSRAM using Fixed Vt Body Bias- Operation Voltage improved by 0.1V at low temperature.- Power reduced by 75% at high temperature.Register File using Mixed Body Bias- Delay dependence on temperature is improved - Delay variation is reduced by 85%.- Active current decreased by 15% at high temperature.

© T. Kuroda (62/99)

1. Application Processor (SH-mobile) : Renesas Technology

2. Embedded Processor : Fujitsu

3. Digital Consumer : Panasonic

4. Wireless SoC (Bluetooth) : Toshiba

Case Studies

Courtesy M. Hamada, Toshiba Corp.

© T. Kuroda (63/99)

BackgroundThe advantage of single chip CMOS implementation of wireless systems is,

Low costSmall footprintScalable

But,simple replacement of bipolar Tr leads to the increase in power dissipation

Power optimization on several levels required

© T. Kuroda (64/99)

Performance of CMOS TransistorsTech. VDD(V) Lg(um) fT(GHz)

0.4um 3.3 0.36 25

0.25um 2.5 0.25 40

0.18um 1.8 0.18 60

0.13um 1.5 0.1 90

0.09um 1.2 0.07 150

Rule of Thumbs; VDD =10 x Lg (V)Lg x fT =10 (GHz・um)

Supposing fT of 10 x fcarrier required for circuit design,fcarrier(GHz) = fT/10=1/Lg(um) ⇒ Lg(um)=1/fcarrier(GHz)

© T. Kuroda (65/99)

Carrier Freq. of Various Systems

100MHz 1GHz 10GHz 100GHz

Broadcasting

Mobile phone

LAN/PAN

Radar

FM/T-TV BS/CS-TV

Zigbee BT 802.15.3c?

Lg(um)

1um

0.1um

10nm

UWBWLAN

MAN:TBD4G:TBD

© T. Kuroda (66/99)

Present Status of RFCMOS Applications

802.11 WLAN802.11a 5GHz 54Mbps802.11b/g 2.4GHz 11Mbps,54Mbps802.11n 2.4/5GHz >100Mbps

802.15 WPAN802.15.1 Bluetooth 2.4GHz 1Mbps,3Mbps802.15.3a UWB 3.1-10.6GHz 480Mbps802.15.4 Zigbee 900MHz,2.4GHz 20k,250kbps

© T. Kuroda (67/99)

Power Crisis in Mobile HandsetsTypical battery capacity of mobile phone is 600~800mAh.100mA consumption turns out to be only 8-houroperation.Many applications have to share the battery power.

© T. Kuroda (68/99)

Applications of Bluetooth

Headset

Mobile phone

PDA

PC

Printer

Ref. www.bluetooth.com

© T. Kuroda (69/99)

Bluetooth Spec.Carrier freq. : 2.4-2.48GHz(ISM-band)Data rate: 1MbpsSpectrum Spreading: Frequency hoppingModulation: GFSK (Gaussian Freq. Shift Keying)Range: 10m-100m

© T. Kuroda (70/99)

Chip Micrograph

Radio

RAM ROM

MPU

• Technology0.18µm CMOSTriple wellMIM capacitor4 layer AL

• Power supplyDigital: 1.5VAnalog: 2.5V

• Die size29.6mm2

© T. Kuroda (71/99)

System Level Power OptimizationVarious operation modes for power reduction

Deep Sleep ModeSpectral efficiency improvement

AFH (Adaptive Frequency Hopping)

© T. Kuroda (72/99)

Operation Modes in BluetoothConnection StateConnection-Setup State

Inquiry / Inquiry ScanPage / Page Scan (closer to connection than inquiry)

Low Power StateSniff, Hold, Park (used for different purposes)

Non-active StateStand-by

© T. Kuroda (73/99)

Deep Sleep Modestops main system clock X’tal OSC.is operated by 32kHz LPO for most of the timee.g. operates for 1.8ms out of 2.56strades power for the accuracy of LPO

1.8ms

2.56s

time

Power

© T. Kuroda (74/99)

Adaptive Frequency Hopping

detects interfered channels and modifies the hopping sequence.can avoid re-transmission and achieve power saving.

power reduction in terms of transmitted data per energy

© T. Kuroda (75/99)

Operation of AFH- Detection of Interfered Channel-

Ch. #(2402+n MHz)0 20 40 60 80

PE

R802.11b(Ch10)2457+/-10MHz

© T. Kuroda (76/99)

Operation of AFH- Collision Avoidance -

Hopping sequence table before AFH0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Hopping sequence table after AFH0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

sets channels 46 thru 63 unused

© T. Kuroda (77/99)

System Block Diagram

• required ext. components # < 12

Matchingnetworks

Antenna

SWRX

TX RadioBaseband

RAMROM

Single-chip transceiver

13MHz

Host I/F (UART)

Audio I/F (PCM)

PLLLoop filter

4

4BPF

32kHz

© T. Kuroda (78/99)

Block Diagram of Digital Baseband

• Demod/sync/packet assembly by dedicated HW• Higher protocols by firmware

Radio

RX

Control

TX

Demod.Sync.

UART

PCM

Memorycontroller

ROM256kB

RAM64kB

PowerManager

ClockGen.

Baseband

PacketEncode/ Decode

MPU

© T. Kuroda (79/99)

Power Optimization in Digital BasebandExtensive Clock-gating for AC power reduction(30 clock domains)ROM integration for interface power reduction in I/Os(<10 @ nominal operation)

< 5mA @ 13MHz operationHigh-Vt MOS for leakage reduction

< 10mA @ deep sleep modePower reduction in wireless SoC leads to switching noise reduction enabling high performance operation in terms of the sensitivity.

© T. Kuroda (80/99)

Architectural Level OptimizationFrequency Planning

Tradeoff in the selection of the intermediate frequency

Gain/Noise BudgetingSensitivity level and maximum Input levelRequires fast gain control

© T. Kuroda (81/99)

Block Diagram of RF/Analog

FeedbackDet1 Det2 Feedforward

2.4GHz

PLL 1/26500kHz System clock 13MHz

Gaussian

Tuning Control data

6bitBPFPoly-phase

LNA

PA

To demod

TX data

Integer

© T. Kuroda (82/99)

Receiver Architecture

LNA

fRF

fLO

fIF=fRF-fLO

fIF= 0 Zero-IF(=Direct Conversion)fIF= ~10MHz Low-IF

© T. Kuroda (83/99)

Image Frequency Problem

LNA

fRF,fIM

fLO

fIF=fRF-fLO ,fLO-fIM

fRF and fIM falls on the same frequencywhen fRF-fLO=fLO-fIM i.e. fIM=2fLO-fRF

fLO fRFfIMf

0

Image Rejection Filter

© T. Kuroda (84/99)

Tradeoffs in fIF Selection

Zero-IF Low-IF Non Low-IFArea ○ △ ×Power ○ △ ×Sensitivity △ △ ○External Comp. ○ ○ △Carrier Leak. × ○ ○Image Rejection ○ × △

© T. Kuroda (85/99)

Tradeoffs in Low-IF Receivers

Lower Higher

Channel Select Easier HarderImage Rejection Harder Easier

In our design, 1.5MHz IF is chosen so that,

1 order lower poly-phase filter for IR than 1MHz IF,20% lower power channel select filter than 2MHz IF,

are achieved.

© T. Kuroda (86/99)

Bluetooth PHY Specifications (Receive Mode)

“Unwanted” is 100 timesstronger than “Desired”in voltage scale.

-20dBm(22mVrms)

-70dBm(71uVrms)

-114dBm(50Ω BW:1MHz)

S/N=44dB

ReceiverNF=24dB

S/N=20dB

Receiver Demod

Max. Input Level

Min. Input Level

Thermal Noise

40dB

-27dBm

-67dBm

3MHz

f

Unwanted

Desired

© T. Kuroda (87/99)

Bluetooth PHY Specifications (T/R Switch)

Fast Freq. Hopping &Short Preamble

↓Quick Response Req.・PLL Lock-up・RX Gain Control・Bit Synchronization

625

2402

2480

79ch

.

625

366

Freq.(MHz)

Time(us)

syncword trailerHeader, payloadpreamble

2µs 4µs 64µs 4µs

1 0 1 0 …△f=315kHz

TXTX

RXRX

© T. Kuroda (88/99)

RX Gain Control System

FeedbackDet1 Det2 Feedforward

2.4GHz

PLL 1/26500kHz System clock 13MHz

Gaussian

Tuning Control data

6bitBPFPoly-phase

LNA

PA

To demod

TX data

Integer

© T. Kuroda (89/99)

Gain Control System in LNA

Solution• Packet Header : Sample → Fast Gain Control• Payload : Hold → Stable Gain

Loop BW :Wide → Fast yet unstable:Narrow→ Slow yet stable

DET1Comp /Charge pump

MIXLNA

Vref1S/H

To BPF

Timing control by baseband

© T. Kuroda (90/99)

Gain-controlled Received Signal

LNA Gain Convergence < 4us

2us/div.

RF power

© T. Kuroda (91/99)

Circuit Level OptimizationProcess Variations Compensation

Automatic Tuning of Filter Transfer FunctionsDigital Circuit-Aided Analog Circuits

Modulation Index of Direct-modulated VCOLow Noise PLL while having fast lock-up time

© T. Kuroda (92/99)

BPF Auto-tuning System

• No dummy filter, dummy VCO required• Area overhead < 4%• Feed-though free (cf. FLL)• Stable Analysis free (cf. FLL)

Fromcrystal

1/13

BPF

SW

Mixer

Bias

GCA

Reference

PDControl logic

Frombaseband

Digital block

© T. Kuroda (93/99)

Tuning BPF - before and after -

Δfcutoff < 5%

100k 1M 10M-70-60-50-40-30-20-10

0102030

After tuningBefore tuning

Simulation

Frequency (Hz)

Gai

n(d

B) 4th order

Butterworth

© T. Kuroda (94/99)

VCO Direct Modulation

VddVbias

LO+LO-

Loop filter

VrefTX data

GaussianLPF

Mod. index control

CP

Div.PD

S/H

PLL loop control

Ref.

LO+/-

Integer-N

© T. Kuroda (95/99)

TX Spectrum and Modulation Index

-2.0 -1.5 -1.0 -0.5

Baseband data PRBS9RBW 30kHz, VBW 30kHz

BW < 1MHz-20dB

Frequency offset (MHz)

Pow

er(d

Bm

)

10

-10-20-30-40-50-60-70-80

0

0 0.5 1.0 1.5 2.0 0 1 2 3 4-500-400-300-200-100

0100200300400500

Freq

uenc

yde

viat

ion

(kH

z)Time (usec)

• No pulling @ TX Power of 4dBm• ΔModulation Index < 3%

© T. Kuroda (96/99)

PLL Performance

• Fast Lock-up • Low Phase Noise• Low Spurious

0 20 40 60 80 100 1202380

2400

2420

2440

2460

2480

2500

2520

85 usec

2480MHz

2402MHz

Freq

uenc

y(M

Hz)

Time (usec)1k 10k 100k 1M 10M

-150-140-130-120-110-100

-90-80-70-60-50-40

Phas

eno

ise

(dB

c/H

z)

Frequency offset (Hz)

-130dBc@3MHz

© T. Kuroda (97/99)

Achieved System Performance

Sensitivity -80 dBm

Max. Input Level -5 dBm

Interferer Resistance

-3dB @-1MHz, 11dB @co-channel, 0dB @+1MHz, -33dB @+2MHz, -43dB @+3MHz

IIP3 -19.5 dBmMax. Output Level + 4 dBm

Image Rejection Ratio > 28 dB

© T. Kuroda (98/99)

Current Consumption of the SoC

1 slot packet (173kbps) 28 mA

3 slot packet (586kbps) 43 mA

49 mA

22 mA

28 mA30 mA

10 uA

RX mode

5 slot packet (723kbps)

1 slot packet (173kbps)

3 slot packet (586kbps)5 slot packet (723kbps)

-

TX mode*

Deep sleep

* TX power : 4dBm

© T. Kuroda (99/99)

SummaryPower optimizations on different levels visited. System level optimizations are on,

Deep Sleep ModeAdaptive Frequency Hopping

Architecture level optimizations are on,Receiver ArchitectureSelection of Intermediate FrequencyGain Control Scheme for Wide Dynamic Range

Circuit level optimizations are on,Extensive Clock-gatingLeakage Reduction by using High-Vt TransistorsInterface Power Reduction by ROM IntegrationAuto-tuning of BPF Transfer FunctionDirect Modulation of VCO and Modulation Index CompensationVariable CP Current in PLL to Achieve Fast Lock-up and Low Phase Noise

top related