lecture 5: state-of-the-art design practice: case...
Post on 22-May-2020
1 Views
Preview:
TRANSCRIPT
© T. Kuroda (1/99)
Lecture 5: State-of-the-art Design Practice: Case Study
Tadahiro KurodaVisiting MacKay ProfessorDepartment of EECSUniversity of California, Berkeleytadahiro@eecs.berkeley.edu, kuroda@elec.keio.ac.jphttp://bwrc.eecs.berkeley.edu/Classes/ee290c_s07http://www.kuroda.elec.keio.ac.jp/
EE290c Spring 2007, Tues & Thurs 9:30-11:00, 212 Cory UCB
With materials from T. Hattori, A. Inoue, M. Sumita, M. Hamada
© T. Kuroda (2/99)
1. Application Processor (SH-mobile) : Renesas Technology
2. Embedded Processor : Fujitsu
3. Digital Consumer : Panasonic
4. Wireless SoC (Bluetooth) : Toshiba
Case Studies
© T. Kuroda (3/99)
1. Application Processor (SH-mobile) : Renesas Technology
2. Embedded Processor : Fujitsu
3. Digital Consumer : Panasonic
4. Wireless SoC (Bluetooth) : Toshiba
Case Studies
Courtesy T. Hattori, Renesas Technology
© T. Kuroda (4/99)
Embedded Processor Power Analysis • Dynamic power is dominant in LP-process
Dynamic power component ratio for 130-nm CPU core• FF power & Clock power
Logic
40%25%SRAM FF
Clock20%15%
leakage leakage[Generic] [Low-power]
Benchmark: Matrix mul.Tj=85 C
Benchmark: Dhrystone2.1Tj=45 C
Active power = leakage + dynamic
dynamic dynamic
© T. Kuroda (5/99)
Standby Status of ProcessorsWaiting for responseDominant in real application (Wait for key-in, etc.)No relation with operation speed
Higher Vth only when standby mode!-> Back biasing on substrates
Stable substrates voltage in operationEfficient layout topology required
Leakage Power Reduction by Body Bias
© T. Kuroda (6/99)
VDD
VSS
3.3V 1.8V 0V
switch cell switch cell
1.8Vcell
200µm
Cbp Cbn
Vbn
Vbp
-1.5/0V
3.3/1.8V
standby control
3.3V region
1.8V region
bias control real-timeclock
Block Diagram of Back Biasing
© T. Kuroda (7/99)
vdd(M1)
vbp(M1)
vss(M1)vbn(M1)
pMOS
nMOS
vbpvbn cbn cbpvss(M2)vdd(M2)
(M2) (M2) (M2)(M2)
switch cellstd cell
pMOS
nMOS
Switch Cell Layout
© T. Kuroda (8/99)
1.8-V logic
vbnvbp
cbn cbp
vdd vss
vddvss
vbp
vbn
(M2) (M2) (M2)
(M2)
(M2,M4)(M1)
(M1,M5)
(M1)
(M1,M5)
(M2,M4)std cell
1.8-V logic
1.8-V logic
1.8-V logic
1.8-V logic
1.8-V logic
1.8-V logic
vss (M1)
~200 µm
switch cell
Back Biasing Layout Topology
© T. Kuroda (9/99)
off
on
subs
trat
e co
ntro
l
50 100Leak (µA)
0
*1 : sub threshold leak (1.8V area)*2 : pn junction leak (1.8V area)*3 : 3.3V area leak*1 *2 *3
46.5
1300
Power Reduction by Back Biasing
© T. Kuroda (10/99)
VBC (0.13mm2)
Implemented SOC
0.25µm CMOS
5 layers
1.8V (internal)
3.3V (I/O)
3.3M Tr.
© T. Kuroda (11/99)
20
18
16
14
1210
8
6
4
20
1.0 1.5 2.0 2.5Delay (nsec)
450MHz 400MHz
12.5%
Single Vth Cells (High)
Dual Vth Cells (High & Low)N
umbe
r of p
aths
(K)
Path Delay Distribution
© T. Kuroda (12/99)
0 0.2 0.4 0.6 0.8 1
Num
ber o
f pat
hs
Path delay [a. u.]0 0.2 0.4 0.6 0.8 1N
umbe
r of p
aths
0 0.2 0.4 0.6 0.8 1
Num
ber o
f pat
hs(a) All low-Vth cells (b) Binary modulation (c) Gradated modulation
Path
Del
ay D
istr
ibut
ion
Circ
uit
Path delay [a. u.]Path delay [a. u.]
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
Low-VthHigh-Vth
Dual Vth Cell Optimization
© T. Kuroda (13/99)
Implemented SOCProcess Technology:
0.13µm Dual Vth Low-Power CMOS5 Metal Layers (Cu)
Chip size: 58.5 mm2
URAM
7.7mm
7.6m
m
ProcessorCore
3DGraphicsEngine
MPEG-4
VIFLC
DC
© T. Kuroda (14/99)
Leakage current is increasing in sub-micron processFinal solution = Power off unused circuitsNew technical issues for power off/on management
Short recovery time is requiredunstable output signal from power-off region may cause short circuits currentSDRAM setting parameter etc. should be keptLeakage current in switch cells may be problemRush current when power-on may cause noise
Power switch cell, µIO cellData retention low-leak SRAMTrade off between LP mode and recovery timeHierarchical multi-power domain management
Power Management
© T. Kuroda (15/99)
URAM
µI/O
VDD
GND
ProcessorCore, MultimediaIPs, etc.
PSW1PSW2
backuplatchregister µ
I/O o
pen
PSC
INTC
PSWC2PSWC1
Partial Power off Inside SOC
Isolation Cell
Power Controller
Power Switch ControllerPower Switch
© T. Kuroda (16/99)
SRAM module has self-controlled power switchesActive mode
Vssm, Vssa, Vssc = Vss (always)Vddw = Vdd (access cycle only)
Data retention modeVssa, Vssc: cut offVddw: ~0.4 V downVssm: ~0.4 V up
PowerSWCtrl.
ctrl. Sense Amp.
WD
Drv
.
Memory Cell
Vss
VddVddw
VsscVssa
Vssmleakage reductionleakage reductionwith data retentionwith data retention
Low Leakage SRAM with Data Retention
© T. Kuroda (17/99)
Memory cellMemory cell Word driverWord driver AmpAmp
0 200 400 600 800 1000
conventional
(µA)
proposed(active)
proposed(retention)
920 µA
700 µA
50 µA
-25%
-95%
256-kB, Room Temp. VDD = 1.2 V
Power Reduction in SRAM
© T. Kuroda (18/99)
CPU coreIP & Peri.
URAMBkup. Reg.
I/OI/O ctrl.
Core(1.2 V)
I/O(2.85 V)
PSC
Two standby modes is provided using power switch
R-standbyLow leakage (~100µA)Short recovery time (<3ms)
U-standbyUltra low leakage(~10µA)
CPU coreIP & Peri.
URAMBkup. Reg.
I/OI/O ctrl.
Core(1.2 V)
I/O(2.85 V)
PSC
PSW1
PSW2
PSW1
PSW2
R-standby & U-standby
© T. Kuroda (19/99)
R-standby Recovery OperationHardware operations
Software operations
Power switch control (by PSC)Clock generation (PLL, DLL lock)Data backup using backup latch in µI/O circuit
BAR (Boot Address Register) holds restart addressClock and interrupt setting needed just after wake-up
URAM : data backup mem.Control registers OS task tableetc.
µI/O
VDD
GNDPSW1 PSW2
backuplatchBAR, etc.
URAM
© T. Kuroda (20/99)
Recovery Time from R-standby
0 1 2 3Time [ms]
PSW OnPLL & DLL Lock-inState TransitionRestore Reg.Restart Tasks
Recovery is triggered by Interrupt
Interrupt HardwareRecovery
SoftwareRecovery
InterruptHandler
PSW on, PLL&DLL lock(Backup latch)
Start from BAR addressRestore from URAM
Recovery Time
2.8 ms
© T. Kuroda (21/99)
URAM
PSW1
PSW2
7.7mm
7.6m
m
ProcessorCore
PSC3D
GraphicsEngine
MPEG-4
VIFLC
DC
Implementation in SOC
Process Technology:0.13µm Dual Vth Low-Power CMOS5 Metal Layers (Cu)
Chip size: 58.5 mm2
• PSW1area = 0.10 mm2
• PSW2area = 0.03 mm2
Only 0.2% of chip size
© T. Kuroda (22/99)
One chip integration of existing multi-chipsDivide into power domains
Partial power off -> reduce leakage currentisolation cell insertion, complex control-> Hierarchical power domain management
Chip level parts for special purposerepeater, clock distribution, back-up resisters-> common power domain
Layout topology for power switch and power lineUsage scene in mobile phone
Hierarchical Multi Power Domains
© T. Kuroda (23/99)
CPD
Chip Floorplan Power Domains
A1RA4U1
A1AA2 C5BW2
BG1
BA2
GSM
W-CDMA
BB-CU
APL-RTCPU
AP-SYSCPU
BB-Misc
AP-Misc
MediaRAM
3D G MPEG
Camera
Sound
JPEG
Power Domains20 power domains for partial power-offCPD (Common power domain) for repeater etc.
B* domains for BB, A* domains for AP
© T. Kuroda (24/99)
CPD: Common Power Domain
Sig1
Sig1CTL
µIOCPD
RepeaterRepeater
DCK1
QCTL
Backup latch Original FF
CPD
Backup latches
PLLCTL
µIOGlobal Clock Local Clock
CPD
Global Clock Buffer
Common Power-Domain Implementation
© T. Kuroda (25/99)
(thin Tox MOS)
VSSM_PDe
PSW (thick Tox MOS)
PDeVSSVDD
CPDPSW for PDx
PDx
VSSM_PDw
PDw
PSW for CPD
Power switch (PSW) implementation
1/4000 leakage @ 1MG
VSSM_PDeVSSM_CPD
global
local
VDDVSS
VSSM_PDw
PDePDw
Power line implementation
PDwPDe
Power switch controllers
Power Switch and Power Line Implementation
© T. Kuroda (26/99)
A1R
A4U2A3
A4U1
A1A
A2A4
C5AC
BW2
BA3
BW1
BG1 BG2BG3
BC
BA4BW3
C4
BA2
Video telephony
Measured Leakage Current (@ Room Temp, 1.2V)
Control
W-CDMA
GSM
System-domain
Realtime-domain
Basebandpart
Applicationpart
Power onPower off
ON
ON
ON / OFF
ON
ON
849 µA
Leakage Current in Usage Scenes (1)
© T. Kuroda (27/99)
Telephony (W-CDMA)
Measured Leakage Current (@ Room Temp, 1.2V)
Control
W-CDMA
GSM
System-domain
Realtime-domain
Basebandpart
Applicationpart
Power onPower off
ON
ON
ON / OFF
ON
OFF
407 µA
A1R
A4U2A3
A4U1
A1A
A2A4
C5AC
BW2
BA3
BW1
BG1 BG2BG3
BC
BA4BW3
C4
BA2
Leakage Current in Usage Scenes (2)
© T. Kuroda (28/99)
Waiting for Calling
Measured Leakage Current (@ Room Temp, 1.2V)
Control
W-CDMA
GSM
System-domain
Realtime-domain
Basebandpart
Applicationpart
Power onPower off
ON
OFF
OFF
OFF
OFF
299 µA
A1R
A4U2A3
A4U1
A1A
A2A4
C5AC
BW2
BA3
BW1
BG1 BG2BG3
BC
BA4BW3
C4
BA2
Leakage Current in Usage Scenes (3)
© T. Kuroda (29/99)
Power off ( I/O fixed)
Measured Leakage Current (@ Room Temp, 1.2V)
Control
W-CDMA
GSM
System-domain
Realtime-domain
Basebandpart
Applicationpart
Power onPower off
OFF
OFF
OFF
OFF
OFF
7 µA
A1R
A4U2A3
A4U1
A1A
A2A4
C5AC
BW2
BA3
BW1
BG1 BG2BG3
BC
BA4BW3
C4
BA2
Leakage Current in Usage Scenes (4)
© T. Kuroda (30/99)
EDA SupportsPower-off gate-level simulation
Set X to all FFs in power-off domain Transistor level leakage path checker
Check leakage path through well connect etc.µIO (isolation cell) insertion toolInter-power-domain isolation checkerDFT insertion considering power domainsIR drop calculation considering power switch
© T. Kuroda (31/99)
Die size 11.15mm x 11.15mm
# of TRs, gate, memory
181M TRs, 13.5M Gate 20.2 Mbit mem
Process 90nm LP8M(7Cu+1Al)CMOS dual-Vth
Supply voltage
1.2V(internal), 1.8/2.5/3.3V(I/O)
GSM
W-CDMA
BB-CPU
APL-RTCPU
AP-SYSCPU
BB-Misc
AP-Misc
MediaRAM
3D G MPEG
Camera
SoundCPG
LCDC
JPEG
DDR
SRAM
Implemented SOC
© T. Kuroda (32/99)
Summary (1)Three Key Low Power Techniques
Application specific low powerSoftware for low powerLeakage reduction by power switch
Why not Dynamic Voltage Frequency Scaling?Real time response, power management software difficulty, SRAM Vdd-min, level-shifter overhead, etc.
Request for EDASystem level power estimationPower switch verification
© T. Kuroda (33/99)
ISSCC98 ISSCC02 ISSCC04 ISSCC06
0.25um 0.18um 0.13um 0.09umClock GearStandby
Dual VthµIOOn chip SRAM
Pointer pipelineActivation ControlPhysical-synthesis
+DualVth
Fine clock gatingSpecific busMulti-media IP
Back-bias U-standby(Logic off)
R-standby(Data retention)
HierarchicalPower domain
@ Clock stop @ Power-off @ data retention @ active
GSM
W-CDMA
BB-CPU
APL-RTCPU
AP-SYSCPU
BB-Misc
AP-Misc
MediaRAM
3D G MPEG
Camera
SoundCPG
LCDC
JPEG
DDR
SRAMGSM
W-CDMA
BB-CPU
APL-RTCPU
AP-SYSCPU
BB-Misc
AP-Misc
MediaRAM
3D G MPEG
Camera
SoundCPG
LCDC
JPEG
DDR
SRAM
Summary (2)
© T. Kuroda (34/99)
1. Application Processor (SH-mobile) : Renesas Technology
2. Embedded Processor : Fujitsu
3. Digital Consumer : Panasonic
4. Wireless SoC (Bluetooth) : Toshiba
Case Studies
Courtesy A. Inoue, Fujitsu Laboratories Ltd.
© T. Kuroda (35/99)
VDD/VTH Control SchemeInitial Proposal
1994 Supply Voltage Control B. Brodersen et al. 1994 Body Bias Control T. Sakurai et al.
Apply to General Processors2000 Intel SpeedStepTM Mobile PIII 2000 AMD Power Now!TM Mobile AMD-K6-2 2000 Transmeta Longrun(2)TM Crusoe
Expand to Embedded ProcessorsARM IEMTM
Texas Instrument SmartreflexTM
Freescale Smart SpeedTM
NEC UltimateLowPowerTM
Fujitsu CoolAdjustTM
© T. Kuroda (36/99)
Power Reduction by VDD Control
Performance (MIPS)
Pow
er e
ffici
ency
(MIP
S/W
) Performance Tunable LSI
Static ControlDynamic Control
Static Control (ASV)Adjust supply voltage appropriate for each dieProcess compensationTemperature compensation
Dynamic Control (DVFS)Change supply voltage and/or body voltage during operationPower and performance trade offTuning performance
Technology trend
© T. Kuroda (37/99)
Static (ASV) vs. Dynamic (DVFS)Static (ASV) Dynamic (DVFS)
Power reduction Depend on process variation Depend on applications
Target General SoCs including ASIC and FPGA Processors
Implementation Possible only with hardware
Require hardware and software collaboration
Power estimation Easy Complex (Need
software to evaluate)Design verification testing
Easy Not so easy
Note: ASV (Adaptive Supply Voltage), DVFS (Dynamic Voltage Frequency Scaling) can co-exist on the same die.
© T. Kuroda (38/99)
FSV and ASV
Fast
Low
Hig
h
Worst case
Leakage
Path delay : VariableSupply voltage: Fixed
Process variationSlow Fast
Slow
Active
Slow Fast
Worst case
Path delay : FixedSupply voltage : Variable
Process variation
LeakageActive
Fixed Supply Voltage (FSV) Adaptive Supply Voltage (ASV)
Power Power SpeedSpeed
Low
Hig
h
Fast
Slow
© T. Kuroda (39/99)
Required Components for ASV System
Monitor process/temp.
condition
User logic
Control & I/F
DC-DCConverter
Inform necessary voltage
Supply appropriate voltageConvert to
necessary supply voltage
© T. Kuroda (40/99)
Conventional Monitoring Method
FF FF
Replica circuit of critical pathRef. clock
In Out
Supply voltage
Path delay
Slow Fixed
FixedFast
ASV
0 target
ASV
© T. Kuroda (41/99)
Limit of Replica Circuit MonitorPath-A (Low-Vth 100%)
Path-B (Med.-Vth 50%)Low-VthMed.-Vth
Multi-Vth CMOSFFFF
FF FF
Supply voltage
Path delay
Slow
Slow
Fixed
Fixed
ASV2
ASV1
Slow
Fast
0 target
ASV1ASV2
Path-A
Critical path changes!
Path-B
NGOK
Many replica circuits are required.
© T. Kuroda (42/99)
Impact on Critical Path Delay by Random Variation
Ν=
σσN
12
2
}2
)(exp{21*
}2
)(exp{21),(
2
21
−
∞− ⎥⎦
⎤⎢⎣
⎡ −−
−−=
∫M
ddtt
dCMdf
NN
NN
M
σµ
σπ
σµ
σπSimple critical path model
σ
N stage
M
σ σ
σ σ σd
0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
µ µ+σ
Critical Path Delay
Pro
bab
ility
Densi
ty
µ-σµ-2σµ-3σµ-4σ µ+2σ µ+3σ µ+4σ
M=1
M=10
M=100
M=1000
M=10000
M: Number of Critical Path
© T. Kuroda (43/99)
Monitoring Replica Delay with Fixed Delay Margin
µ
measd Nmeas ad σ+
Delay distribution for M critical paths
Delay distribution for single critical
path
Nσ
σ σ σ
Fixed delay
margin
Naσ
10-81 10 100 1000 10000
Number of Critical Paths
Pro
babi
lity
of m
onit
orin
g fa
ilure
10-7
10-6
10-5
10-4
10-3
10-2
10-1
10 08=a7=a6=a5=a
4=a3=a1=a0=a
φ φ
© T. Kuroda (44/99)
Proposed Method for ASV
P-V table
Dependent on technology
Dependent on design
Process sensorLow-Vth
Medium-Vth(S)STA results
Conventional
ProposedP-CODE V-CODE
Ref.clock
FF FF
Ref. clockIn Out
Many replica circuits
© T. Kuroda (45/99)
Ring Osc. forlow-Vth cells
Ring Osc. formedium-Vth cells
F-Ptable
F-C
OD
E
P-C
OD
E
V-C
OD
E
Con
trol m
odul
e
User logicClock
ResetClock mode
P-Vtable D
C-D
Cco
nver
terP
ower
-su
pply
Temperature modeStatic voltage control (SVC) blockProcess sensor block
Implementation of Conversion TableEasy to implement into SOCs
No connection with user logicOnly P-V table depends on user logic
© T. Kuroda (46/99)
Comparison of Two MethodsMethod Advantage Disadvantage
Replica circuit monitor
Need to extract critical path for each design
Need to consider process variation carefully for each design
Conversion table
Easy to adopt multi-Vth circuits
Process variation can be considered by STA
It is easily implemented in general design flow
Necessary to run multiple STA to create table
© T. Kuroda (47/99)
Design Flow for Creating P-V Table
27conditions• 9 process corner• 3 supply voltage
Temporary table
Final table
Timingreport
Finallayout
Final layout
Replace
STAexecution
P-V tablecalculation
V-CODEarray
P-C
OD
E(M
ed.-V
th)
P-CODE(Low-Vth)
P-V table
© T. Kuroda (48/99)
Multi Mode/Multi Corner Analysis Runtime
0
50
100
150
200
250
300
1 2 3 4 5 6Design #
Run
time
(min
) MCMM STAConventional STA
Synthesis, Optimization, Analysis which support multi corner/multi mode (MCMM) are required to reduce design time.
Design # # of Modes
# of Corners
1 1 1
2 1 1
3 2 1
4 3 1
5 5 1
6 5 3
Analysis runtime example with MCMM STA
© T. Kuroda (49/99)
Embedded Processor Application
User logic Dual-core embedded microprocessor
Operation frequency
440MHz
Transistors Logic: 11.5 millionLow-Vth: 55%Medium-Vth: 45%High-Vth: 0.04%
RAM: 28.3 millionTotal: 39.8 million
Process technology
90nm triple-Vth CMOS
Supply voltage 1.1V-1.3V(ASV)
Chip size 7.2mm x 7.2mm(ASV module: 0.009%)
Core0
Core1
SVC block
Process sensor0
© T. Kuroda (50/99)
STA Result (P-V table)
1.30[V]
1.10[V]
Proc
ess
case
of
med
.-Vth
tran
sist
ors
- 3σSlow
0Typ.
3σFast
Process case of low-Vth transistors
- 3σSlow
0Typ.
3σFast
Tj = 125[℃], 440[MHz]Relation between supply voltage and process variation
1.15[V]
1.25[V]1.20[V]
© T. Kuroda (51/99)
Process Variation of Measured SamplesMeasurement results of 26 samples by process sensors
Process variationMed.-Vth = Low-Vth
Proc
ess
case
of m
ed.-V
thtr
ansi
stor
s
- 3σSlow
0Typ.
3σFast
Process case of low-Vth transistors
- 3σSlow
0Typ.
3σFast
© T. Kuroda (52/99)
Power Reduction Result with Proposed ASV
Three MPEG2 MP@ML stream decoding at 440MHz
1.39W
0.80
1.00
1.20
1.40
1.60
Process case of low-Vth transistors
Pow
er c
onsu
mpt
ion[
W]
0Typ.
3σFast
-1.5σSlow
(1.30V) -33%1.30W 1.34W
1.10W
0.93W
(1.25V)(1.20V)
(1.10V)● Fixed 1.3V● This work
-18%
(1.15V)
© T. Kuroda (53/99)
SummaryTechnology trend & supply voltage adjustment techniqueCombination of ring osc. and P-V table
Practical method for ASV implementationGood design portability applicable to ASICsFast MCMM Synthesis/Optimization/Timing Analysis are required.
Application example of dual core embedded processor
© T. Kuroda (54/99)
1. Application Processor (SH-mobile) : Renesas Technology
2. Embedded Processor : Fujitsu
3. Digital Consumer : Panasonic
4. Wireless SoC (Bluetooth) : Toshiba
Case Studies
Courtesy M. Sumita, Matsushita Electric Industrial Co. Ltd (Panasonic)
© T. Kuroda (55/99)
Mixed Body Bias Techniques
SRAM
Control Logic(CMOS)
Data Path Register File
(Domino circuit) Dual-Vt
High-Vt
Body Bias Gen.
Fixed Vt
Fixed Ids
Mid.-Vt
High-Vt
Supply optimal body bias voltage according to the requirements of each circuit type.
Mid.-Vt
Fixed Ids
Fixed Vt
© T. Kuroda (56/99)
Proposed Fixed Vt/Ids Body Bias Gen.
OTA
VDD3
NVDD
Vgsn
Buffer
VbnCG
Vb Max. Current : ±1mASelf Power : 210uWArea : 0.2X0.2mm2
VbnVSS
VgsnFixed Vt : VtFixed Ids : VDD1
Monitor
NMOS Type
© T. Kuroda (57/99)
Connected to Fixed Ids
Read Word
Memory Cell
Write bit line
Read bit line Read-Out Circuit
X8
Register File with Mixed Body Bias
Low : Fixed Vt -- Memory, Domino etc.High : Fixed Ids -- CMOS logic
MemoryCell
ArrayDec
oder
I/O
Fixed Ids
Noise Margin
Connected to Fixed Vt
Write Word
© T. Kuroda (58/99)
Dispersion
0.7
0.8
0.9
1
1.1
1.2
1.3
0.7 0.8 0.9 1 1.1 1.2 1.3Normalized Vt of PMOS
Nor
mal
ized
Vt o
f NM
OS
wafer1wafer2wafer3wafer4wafer5
MVt device
5 wafers with > 10% Vt variation
© T. Kuroda (59/99)
Effect of Fixed Vt Body Bias TechniqueImprovement in Noise Margin
Read Bit Line Noise Margin
0.00.10.20.30.40.50.60.7
Read Local Bit Line Read Global Bit Line
Noi
se M
argi
n [V
]
© T. Kuroda (60/99)
-50 -10 30 70 110
Nor
mal
ized
D
elay
VDD1=0.8VFreq.=100MHz
Mixed Body BiasNo Body Bias
1.0
1.2
0.6
1.0
1.4
Nor
mal
ized
C
urre
nt
Temperature [ºC]
Register File : Mixed Body Bias Results
© T. Kuroda (61/99)
SummaryProposed Mixed Body Bias techniques to fixed with Vt or Ids for each circuit.Fixed Ids and Fixed Vt Body Bias Generator - Error of Correlation: ±30mVSRAM using Fixed Vt Body Bias- Operation Voltage improved by 0.1V at low temperature.- Power reduced by 75% at high temperature.Register File using Mixed Body Bias- Delay dependence on temperature is improved - Delay variation is reduced by 85%.- Active current decreased by 15% at high temperature.
© T. Kuroda (62/99)
1. Application Processor (SH-mobile) : Renesas Technology
2. Embedded Processor : Fujitsu
3. Digital Consumer : Panasonic
4. Wireless SoC (Bluetooth) : Toshiba
Case Studies
Courtesy M. Hamada, Toshiba Corp.
© T. Kuroda (63/99)
BackgroundThe advantage of single chip CMOS implementation of wireless systems is,
Low costSmall footprintScalable
But,simple replacement of bipolar Tr leads to the increase in power dissipation
Power optimization on several levels required
© T. Kuroda (64/99)
Performance of CMOS TransistorsTech. VDD(V) Lg(um) fT(GHz)
0.4um 3.3 0.36 25
0.25um 2.5 0.25 40
0.18um 1.8 0.18 60
0.13um 1.5 0.1 90
0.09um 1.2 0.07 150
Rule of Thumbs; VDD =10 x Lg (V)Lg x fT =10 (GHz・um)
Supposing fT of 10 x fcarrier required for circuit design,fcarrier(GHz) = fT/10=1/Lg(um) ⇒ Lg(um)=1/fcarrier(GHz)
© T. Kuroda (65/99)
Carrier Freq. of Various Systems
100MHz 1GHz 10GHz 100GHz
Broadcasting
Mobile phone
LAN/PAN
Radar
FM/T-TV BS/CS-TV
Zigbee BT 802.15.3c?
Lg(um)
1um
0.1um
10nm
UWBWLAN
MAN:TBD4G:TBD
© T. Kuroda (66/99)
Present Status of RFCMOS Applications
802.11 WLAN802.11a 5GHz 54Mbps802.11b/g 2.4GHz 11Mbps,54Mbps802.11n 2.4/5GHz >100Mbps
802.15 WPAN802.15.1 Bluetooth 2.4GHz 1Mbps,3Mbps802.15.3a UWB 3.1-10.6GHz 480Mbps802.15.4 Zigbee 900MHz,2.4GHz 20k,250kbps
© T. Kuroda (67/99)
Power Crisis in Mobile HandsetsTypical battery capacity of mobile phone is 600~800mAh.100mA consumption turns out to be only 8-houroperation.Many applications have to share the battery power.
© T. Kuroda (68/99)
Applications of Bluetooth
Headset
Mobile phone
PDA
PC
Printer
Ref. www.bluetooth.com
© T. Kuroda (69/99)
Bluetooth Spec.Carrier freq. : 2.4-2.48GHz(ISM-band)Data rate: 1MbpsSpectrum Spreading: Frequency hoppingModulation: GFSK (Gaussian Freq. Shift Keying)Range: 10m-100m
© T. Kuroda (70/99)
Chip Micrograph
Radio
RAM ROM
MPU
• Technology0.18µm CMOSTriple wellMIM capacitor4 layer AL
• Power supplyDigital: 1.5VAnalog: 2.5V
• Die size29.6mm2
© T. Kuroda (71/99)
System Level Power OptimizationVarious operation modes for power reduction
Deep Sleep ModeSpectral efficiency improvement
AFH (Adaptive Frequency Hopping)
© T. Kuroda (72/99)
Operation Modes in BluetoothConnection StateConnection-Setup State
Inquiry / Inquiry ScanPage / Page Scan (closer to connection than inquiry)
Low Power StateSniff, Hold, Park (used for different purposes)
Non-active StateStand-by
© T. Kuroda (73/99)
Deep Sleep Modestops main system clock X’tal OSC.is operated by 32kHz LPO for most of the timee.g. operates for 1.8ms out of 2.56strades power for the accuracy of LPO
1.8ms
2.56s
time
Power
© T. Kuroda (74/99)
Adaptive Frequency Hopping
detects interfered channels and modifies the hopping sequence.can avoid re-transmission and achieve power saving.
power reduction in terms of transmitted data per energy
© T. Kuroda (75/99)
Operation of AFH- Detection of Interfered Channel-
Ch. #(2402+n MHz)0 20 40 60 80
PE
R802.11b(Ch10)2457+/-10MHz
© T. Kuroda (76/99)
Operation of AFH- Collision Avoidance -
Hopping sequence table before AFH0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Hopping sequence table after AFH0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
sets channels 46 thru 63 unused
© T. Kuroda (77/99)
System Block Diagram
• required ext. components # < 12
Matchingnetworks
Antenna
SWRX
TX RadioBaseband
RAMROM
Single-chip transceiver
13MHz
Host I/F (UART)
Audio I/F (PCM)
PLLLoop filter
4
4BPF
32kHz
© T. Kuroda (78/99)
Block Diagram of Digital Baseband
• Demod/sync/packet assembly by dedicated HW• Higher protocols by firmware
Radio
RX
Control
TX
Demod.Sync.
UART
PCM
Memorycontroller
ROM256kB
RAM64kB
PowerManager
ClockGen.
Baseband
PacketEncode/ Decode
MPU
© T. Kuroda (79/99)
Power Optimization in Digital BasebandExtensive Clock-gating for AC power reduction(30 clock domains)ROM integration for interface power reduction in I/Os(<10 @ nominal operation)
< 5mA @ 13MHz operationHigh-Vt MOS for leakage reduction
< 10mA @ deep sleep modePower reduction in wireless SoC leads to switching noise reduction enabling high performance operation in terms of the sensitivity.
© T. Kuroda (80/99)
Architectural Level OptimizationFrequency Planning
Tradeoff in the selection of the intermediate frequency
Gain/Noise BudgetingSensitivity level and maximum Input levelRequires fast gain control
© T. Kuroda (81/99)
Block Diagram of RF/Analog
FeedbackDet1 Det2 Feedforward
2.4GHz
PLL 1/26500kHz System clock 13MHz
Gaussian
Tuning Control data
6bitBPFPoly-phase
LNA
PA
To demod
TX data
Integer
© T. Kuroda (82/99)
Receiver Architecture
LNA
fRF
fLO
fIF=fRF-fLO
fIF= 0 Zero-IF(=Direct Conversion)fIF= ~10MHz Low-IF
© T. Kuroda (83/99)
Image Frequency Problem
LNA
fRF,fIM
fLO
fIF=fRF-fLO ,fLO-fIM
fRF and fIM falls on the same frequencywhen fRF-fLO=fLO-fIM i.e. fIM=2fLO-fRF
fLO fRFfIMf
0
Image Rejection Filter
© T. Kuroda (84/99)
Tradeoffs in fIF Selection
Zero-IF Low-IF Non Low-IFArea ○ △ ×Power ○ △ ×Sensitivity △ △ ○External Comp. ○ ○ △Carrier Leak. × ○ ○Image Rejection ○ × △
© T. Kuroda (85/99)
Tradeoffs in Low-IF Receivers
Lower Higher
Channel Select Easier HarderImage Rejection Harder Easier
In our design, 1.5MHz IF is chosen so that,
1 order lower poly-phase filter for IR than 1MHz IF,20% lower power channel select filter than 2MHz IF,
are achieved.
© T. Kuroda (86/99)
Bluetooth PHY Specifications (Receive Mode)
“Unwanted” is 100 timesstronger than “Desired”in voltage scale.
-20dBm(22mVrms)
-70dBm(71uVrms)
-114dBm(50Ω BW:1MHz)
S/N=44dB
ReceiverNF=24dB
S/N=20dB
Receiver Demod
Max. Input Level
Min. Input Level
Thermal Noise
40dB
-27dBm
-67dBm
3MHz
f
Unwanted
Desired
© T. Kuroda (87/99)
Bluetooth PHY Specifications (T/R Switch)
Fast Freq. Hopping &Short Preamble
↓Quick Response Req.・PLL Lock-up・RX Gain Control・Bit Synchronization
625
2402
2480
79ch
.
625
366
Freq.(MHz)
Time(us)
syncword trailerHeader, payloadpreamble
2µs 4µs 64µs 4µs
1 0 1 0 …△f=315kHz
TXTX
RXRX
© T. Kuroda (88/99)
RX Gain Control System
FeedbackDet1 Det2 Feedforward
2.4GHz
PLL 1/26500kHz System clock 13MHz
Gaussian
Tuning Control data
6bitBPFPoly-phase
LNA
PA
To demod
TX data
Integer
© T. Kuroda (89/99)
Gain Control System in LNA
Solution• Packet Header : Sample → Fast Gain Control• Payload : Hold → Stable Gain
Loop BW :Wide → Fast yet unstable:Narrow→ Slow yet stable
DET1Comp /Charge pump
MIXLNA
Vref1S/H
To BPF
Timing control by baseband
© T. Kuroda (90/99)
Gain-controlled Received Signal
LNA Gain Convergence < 4us
2us/div.
RF power
© T. Kuroda (91/99)
Circuit Level OptimizationProcess Variations Compensation
Automatic Tuning of Filter Transfer FunctionsDigital Circuit-Aided Analog Circuits
Modulation Index of Direct-modulated VCOLow Noise PLL while having fast lock-up time
© T. Kuroda (92/99)
BPF Auto-tuning System
• No dummy filter, dummy VCO required• Area overhead < 4%• Feed-though free (cf. FLL)• Stable Analysis free (cf. FLL)
Fromcrystal
1/13
BPF
SW
Mixer
Bias
GCA
Reference
PDControl logic
Frombaseband
Digital block
© T. Kuroda (93/99)
Tuning BPF - before and after -
Δfcutoff < 5%
100k 1M 10M-70-60-50-40-30-20-10
0102030
After tuningBefore tuning
Simulation
Frequency (Hz)
Gai
n(d
B) 4th order
Butterworth
© T. Kuroda (94/99)
VCO Direct Modulation
VddVbias
LO+LO-
Loop filter
VrefTX data
GaussianLPF
Mod. index control
CP
Div.PD
S/H
PLL loop control
Ref.
LO+/-
Integer-N
© T. Kuroda (95/99)
TX Spectrum and Modulation Index
-2.0 -1.5 -1.0 -0.5
Baseband data PRBS9RBW 30kHz, VBW 30kHz
BW < 1MHz-20dB
Frequency offset (MHz)
Pow
er(d
Bm
)
10
-10-20-30-40-50-60-70-80
0
0 0.5 1.0 1.5 2.0 0 1 2 3 4-500-400-300-200-100
0100200300400500
Freq
uenc
yde
viat
ion
(kH
z)Time (usec)
• No pulling @ TX Power of 4dBm• ΔModulation Index < 3%
© T. Kuroda (96/99)
PLL Performance
• Fast Lock-up • Low Phase Noise• Low Spurious
0 20 40 60 80 100 1202380
2400
2420
2440
2460
2480
2500
2520
85 usec
2480MHz
2402MHz
Freq
uenc
y(M
Hz)
Time (usec)1k 10k 100k 1M 10M
-150-140-130-120-110-100
-90-80-70-60-50-40
Phas
eno
ise
(dB
c/H
z)
Frequency offset (Hz)
-130dBc@3MHz
© T. Kuroda (97/99)
Achieved System Performance
Sensitivity -80 dBm
Max. Input Level -5 dBm
Interferer Resistance
-3dB @-1MHz, 11dB @co-channel, 0dB @+1MHz, -33dB @+2MHz, -43dB @+3MHz
IIP3 -19.5 dBmMax. Output Level + 4 dBm
Image Rejection Ratio > 28 dB
© T. Kuroda (98/99)
Current Consumption of the SoC
1 slot packet (173kbps) 28 mA
3 slot packet (586kbps) 43 mA
49 mA
22 mA
28 mA30 mA
10 uA
RX mode
5 slot packet (723kbps)
1 slot packet (173kbps)
3 slot packet (586kbps)5 slot packet (723kbps)
-
TX mode*
Deep sleep
* TX power : 4dBm
© T. Kuroda (99/99)
SummaryPower optimizations on different levels visited. System level optimizations are on,
Deep Sleep ModeAdaptive Frequency Hopping
Architecture level optimizations are on,Receiver ArchitectureSelection of Intermediate FrequencyGain Control Scheme for Wide Dynamic Range
Circuit level optimizations are on,Extensive Clock-gatingLeakage Reduction by using High-Vt TransistorsInterface Power Reduction by ROM IntegrationAuto-tuning of BPF Transfer FunctionDirect Modulation of VCO and Modulation Index CompensationVariable CP Current in PLL to Achieve Fast Lock-up and Low Phase Noise
top related