![Page 1: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/1.jpg)
1
EE241 - Spring 2006Advanced Digital Integrated Circuits
Lecture 27+28Final Project Presentations
Study of Subthreshold SRAM Study of Subthreshold SRAM Operation using FinFETsOperation using FinFETs
EE241 Final ProjectEE241 Final Project
Anupama Bowonder, Pratik PatelAnupama Bowonder, Pratik Patel
![Page 2: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/2.jpg)
2
Subthreshold SRAM Operation
Motivation behind Subthreshold SRAM Operation:• Large area of chip devoted to SRAM Cache
• SRAM supply voltage needs to scale with logic supply voltage
• Reduce leakage by operating at DRV during hold
• Reduce active power consumption
Impediments to Subthreshold SRAM Operation:• Supply voltage scaling degrades cell stability
• Scaling increases sensitivity to process variations
– Variation induced local asymmetry, degraded SNM
– Spread in SNM over the whole SRAM array
• Impact of soft errors more significant at lower supply voltages
Previous Work
Leakage power reduction schemes• Offset non scalability of supply voltage
• Cell supply reduced to DRV during standby (H. Qin, et. al, 2004)
• Body biasing to control Vt of bulk transistors during standby(K. Osada et al., ISSCC 2003)
(K. Itoh, Hitachi)
![Page 3: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/3.jpg)
3
Not Disturbed during a read
Cell Area = 0.41um2
Vsn2(V)
Vsn2(V)
Vsn2(V)
14% SNM improvement with 13% area penalty
Previous Work (cont’d)Increasing Cell Stability
(L. Chang, 2005) (Z. Guo, 2005)
Cell Stability vs. Area tradeoff
Fin rotation to increased pull down strength, improve SNM
FinFETs for Subthreshold Operation
• Better subthreshold swing – Improve Ion/Ioff in subthreshold
• Lower Ioff and hence leakage power
• Undoped Channel – no random dopant fluctuation
• Back gate Vt tuning
GATE1GATE2
S
D
GATE1GATE2
S
D
30nmFin Height
1016cm-3Channel Doping
4.5-5 evGate Work-function(φm)
20nmBody Thickness
20AGate Oxide Thickness
35nmGate length
FinFETparameter
30nmFin Height
1016cm-3Channel Doping
4.5-5 evGate Work-function(φm)
20nmBody Thickness
20AGate Oxide Thickness
35nmGate length
FinFETparameterTsi
Gate1
Gate2
Source Drain
Tox
FDSOI Tsi
Gate1
Gate2
Source Drain
Tox
FDSOI
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
4.4 4.6 4.8 5 5.2
workfunction in Ev
Ion
/Ioff
at
Vd
d=0
.3V
NFINFET
PFINFET
NFIN decreasing Vt
PFIN decreasing Vt
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
4.4 4.6 4.8 5 5.2
workfunction in Ev
Ion
/Ioff
at
Vd
d=0
.3V
NFINFET
PFINFET
NFIN decreasing Vt
PFIN decreasing Vt
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
4.4 4.6 4.8 5 5.2
workfunction in Ev
Ion
/Ioff
at
Vd
d=0
.3V
NFINFET
PFINFET
NFIN decreasing Vt
PFIN decreasing Vt
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
4.4 4.6 4.8 5 5.2
workfunction in Ev
Ion
/Ioff
at
Vd
d=0
.3V
NFINFET
PFINFET
NFIN decreasing Vt
PFIN decreasing Vt
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
4.4 4.6 4.8 5 5.2
workfunction in Ev
Ion
/Ioff
at
Vd
d=0
.3V
NFINFET
PFINFET
1.00E+01
1.00E+02
1.00E+03
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
4.4 4.6 4.8 5 5.2
workfunction in Ev
Ion
/Ioff
at
Vd
d=0
.3V
NFINFET
PFINFET
NFIN decreasing Vt
PFIN decreasing Vt
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
4.4 4.6 4.8 5 5.2
workfunction in Ev
Ion
/Ioff
at
Vd
d=0
.3V
NFINFET
PFINFET
1.00E+01
1.00E+02
1.00E+03
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
4.4 4.6 4.8 5 5.2
workfunction in Ev
Ion
/Ioff
at
Vd
d=0
.3V
NFINFET
PFINFET
NFIN decreasing Vt
PFIN decreasing Vt
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
4.4 4.6 4.8 5 5.2
workfunction in Ev
Ion
/Ioff
at
Vd
d=0
.3V
NFINFET
PFINFET
1.00E+01
1.00E+02
1.00E+03
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
4.4 4.6 4.8 5 5.2
workfunction in Ev
Ion
/Ioff
at
Vd
d=0
.3V
NFINFET
PFINFET
NFIN decreasing Vt
PFIN decreasing Vt
![Page 4: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/4.jpg)
4
Leakage power reduction:
From 1v to 0.3 V = 95.7%
From 1V to 0.4V=92.1%
From 1V to 0.5V=86.6%
Standby Power of FinFET SRAM
7.493e-91.0
3.662e-90.8
1.002e-90.5
5.91e-100.4
3.189e-100.3
Standby Power
VddStandby Power =
ddleak VI ⋅
DATA RETENTION VOLTAGE OF 6T FINFET SRAM
0.00E+00
1.00E-01
2.00E-01
3.00E-01
4.00E-01
5.00E-01
0.00E+00 1.00E-01 2.00E-01 3.00E-01 4.00E-01 5.00E-01Vsn1
Vsn
2
DRV (60mV)
0.5V0.4V
0.3V
0.2V0.1V
PL PR
NL NR
VDD
1 0
Ioffp
Ioffn
Read Stability of 6T FinFET SRAM 6T SRAM SNM vs Vdd
(Read Stability and Hold Stability)
0
50
100
150
200
250
0 0.1 0.2 0.3 0.4 0.5 0.6
Vdd (V)
SN
M (m
V)
Dynamic Read Non Dynamic Read Hold
SNM vs Beta Ratio
29
33
37
20
25
30
35
40
0 1 2 3 4 5 6
Beta Ratio
SN
M (
mV
)
WL (VDD)BL (VDD) BL (VDD)
ACL ACR
PL PR
NL NR
VDD
0
0.1
0.2
0.3
0 0.05 0.1 0.15 0.2 0.25 0.3
Vsn2 (V)
Vsn
1 (V
)
Current has a linear dependence on size, exponential dependence on Vt change
>2x
![Page 5: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/5.jpg)
5
Cell Write Ability
•N Curve - One of the few metrics used to determine is the cell is writeable.
(C. Wann et. al, 2005)
•Writeable Cell – positive current at the node you write.
Cell write ability as a function of workfcuntion
0.00E+00
1.00E-06
2.00E-06
3.00E-06
4.00E-06
5.00E-06
6.00E-06
7.00E-06
4.55 4.6 4.65 4.7 4.75 4.8
workfunction (eV)
I(vq
) (A
)
Writeable
0.00E+00
2.00E-06
4.00E-06
6.00E-06
8.00E-06
1.00E-05
1.20E-05
1.40E-05
1.60E-05
1.80E-05
2.00E-05
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
V(q) (V)
I(vq
) (A
)
BL (VDD)WL (VDD)BL (VSS)
ACL ACR
PL PR
NL NR
VDD
V(q)
Impact of Process Variation on Read Stability
3σ Lg = 3σ Tsi = 12% Lg(ITRS 2005)
1.00E-10
1.00E-09
1.00E-08
1.00E-07
1.00E-06
1.00E-05
1.00E-04
1.00E-03
1.00E-02
0 0.2 0.4 0.6 0.8 1 1.2
nominal
bestlg
worstlg
smallesttsi
largesttsi
96.6%
90.1%Statistical Variations in SNM
0
1
2
3
4
5
6
7
8
9
50 55 60 65 70 75 80 85 90 95 100
SNM (mV)
SN
M D
istr
ibu
tion
Den
sity
Mean = 74mV; Sigma = 10mV
Read SNM Monte Carlo
![Page 6: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/6.jpg)
6
The project formerly known as
Design Considerations of Logic Operating in the IC=1 Regime
An EE241 Class Project byC. Marcu, M. Mark, J. A. RichmondUC Berkeley, Spring 2006
Introduction / Motivation / Outline
Strong and weak inversion has been extensively studiedModerate inversion might be a sweet spot for energy/delay trade-offsModeling, sensitivities to parameter variations, optimization of test circuit Is there actually any advantage to IC=1?
![Page 7: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/7.jpg)
7
Modeling the Inverter
Based on EKV
Td =C ⋅VDD ⋅ k fit
IC ⋅ IS
CVLTIVEOP DDPdleakDD ⋅⋅+⋅⋅⋅= 2α
( )σ+
−+=
1
1ln2 ICTT
DD
enUVV
Sensitivity to Parameter Variations
![Page 8: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/8.jpg)
8
Optimization: Adder
Simple full-adder using NAND & INV only
Modeling of Complex Gates: NAND
Logical effort
Leakage of transistor stack
![Page 9: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/9.jpg)
9
Optimization: Software
Based on work done by Dejan MarkovicExtended to moderate and weak inversion by use of our modelOptimizes VDD, VT and gate sizing to minimize energy for a given delay and activity factor
Results: Minimum EOP vs. Delay
Delay and energy normalized to minimum delay and its corresponding maximum energySignificant energy savings within strong inversionVery little energy savings moderate weakHigher potential for energy savings with lower activity factors
![Page 10: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/10.jpg)
10
Results: Optimizing VDD, Vt, W
Results: What Knob to Turn?
Optimizing different parameters at a time
![Page 11: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/11.jpg)
11
Conclusion
Ultimate design goal tends to be minimizing energy for a given delayOur work provides the tools to optimize designs across all regions of operationOperating in IC=1 is therefore a result of the optimization, not a design targetIn fact, IC=1 is neither better nor worse than any other region for a digital designer, so don’t be afraid of it
A SelfA Self--Timed, Tunable State Machine Using Timed, Tunable State Machine Using Low Energy Pass Transistor LibraryLow Energy Pass Transistor Library
Matthew Pierson, Salman Suharwardy– Electrical Engineering and Computer Science - UC Berkeley
Next State/Output Logic
State Elements
......
Inputs Outputs
![Page 12: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/12.jpg)
12
Increased focus on low energy logic and regular structuresIncreased focus on low energy logic and regular structuresRecovering performance at low supply voltages requires thresholdRecovering performance at low supply voltages requires thresholdscaling scaling
Low leakage architecture needed so leakage not dominatorLow leakage architecture needed so leakage not dominator
Regular Structures Needed to cope with manufacturing variabilityRegular Structures Needed to cope with manufacturing variability
Problem and Motivations (Prior Work)Problem and Motivations (Prior Work)
Configurable Logic Block
Inputs
Stack
“Design Considerations in CLBs for Deep Sub-Micron Technologies, ” Louis Alarconand Octavian Florescu, EE241 Spring 2005.
Problem and Motivations (Our Work)Problem and Motivations (Our Work)Until now, CLB library only applied to clocked data path.Until now, CLB library only applied to clocked data path.
Two different clocks requiredTwo different clocks required
Control logic not yet exploredControl logic not yet explored
Our goal: Our goal: Adapt this library to create a SelfAdapt this library to create a Self--Timed, Tunable Timed, Tunable State MachineState Machine
Why?Why?
1.1. Explore applicability to control logicExplore applicability to control logicState machine is most general form of control logicState machine is most general form of control logic
2.2. Asynchronous GainsAsynchronous GainsNo need for two clock networks that contribute heavily to energyNo need for two clock networks that contribute heavily to energydissipationdissipation
Robustness to manufacturing variationsRobustness to manufacturing variations
![Page 13: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/13.jpg)
13
C C
Starti
C C
Donei
Stacki
SAi
Eni
!Eni+2
SAi+1 Stacki+1
Eni+1
!Eni+1 !Eni+2
Starti+1 Donei+1
SolutionSolutionStart with Self-Timed Pipeline
Evolved Into Optimized Pipeline
C
Starti
C
Donei
Stacki
SAi
Donei-1
Eni
SAi+1 Stacki+1
Eni+1 Starti+1 Donei+1
!Eni+1
!Eni+1!Eni+2
Donei-1
Adapting the LibraryAdapting the Library1.1. Cross coupled weak Cross coupled weak NMOSesNMOSes added to stack outputsadded to stack outputs
2.2. Must generate Done signal in the stack.Must generate Done signal in the stack.Stack is a low swing network because of NMOS pass transistors, Stack is a low swing network because of NMOS pass transistors, delay line is really only option.delay line is really only option.
Sense Amp Complete
Vdd + DeltaV
Done
PMOS transistors give us full swing on Done signal which is necessary for correctC element operation at low Supply/Threshold voltages
Ground
![Page 14: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/14.jpg)
14
SelfSelf--Timed State MachineTimed State Machine
C
Starti+1
C
Donei+1
Stacki+1
SAi
Eni
!Eni+2
SAi+1 Stacki+2
Eni+1
!Eni+1
Starti+2 Donei+2
C
SAi+2
Eni+2
StartSDoneS
StackS
SAS
En
Inputs
InputsDone
State ElementState Element
!Eni
OutS
OutS
Results from Self Timed MachineResults from Self Timed Machine
But what if we don’t want to go so slow at 300 mV Supply Voltage…
![Page 15: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/15.jpg)
15
Original clocked library implemented body bias threshold scalingOriginal clocked library implemented body bias threshold scaling and and artificial threshold scalingartificial threshold scaling……
Body BiasBody Bias –– Only works for about ~50 mV of threshold scaling before Only works for about ~50 mV of threshold scaling before body body –– source junction current becomes noticeable.source junction current becomes noticeable.
Artificial threshold scalingArtificial threshold scaling -- Shifting up the output on the Sense Shifting up the output on the Sense Amplifier by Amplifier by DeltaVDeltaV..
Threshold Voltage ScalingThreshold Voltage Scaling
SA
Vdd + DeltaV
DeltaV
Dynamic Power remains the same
Vdd + DV
0
Making it work for the asynchronous version:
C
Stack
SAi
Eni
Vdd + DV
0
Vdd + DV
0
EN,Done
Delay
RootVdd
0
Vdd
0
Taken from internal node, 0 -> Vdd+DV
Done
Out
Tuned Energy RangeTuned Energy Range
![Page 16: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/16.jpg)
16
Tuned Delay RangeTuned Delay Range
Questions?Questions?
?
![Page 17: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/17.jpg)
17
EE241
UC Berkeley EE241 Term Project, Spring 2006
Platform-Based
Xuening Sun
Tsung-Te Liu
David Chen
SRAM Standby Power Minimization
EE241 – Project
Outline
Motivation
Prior Work
Global Optimization
Conclusion
![Page 18: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/18.jpg)
18
EE241 – Project
Scaling, Scaling, Scaling …Motivation
SRAMSRAMNoise
Lea
kag
e
Var
iab
ility
VelocitySaturation
“Where is the limit?”
EE241 – Project
Standby Power OptimizationMotivation
Access Time
Circuit Area
Nois
e M
argi
n
Wp, Lp, Wn, LnVsbp, Vsbn
Minimum Power@ Standby
⇓
WpLp
WpLp
WnLnWn
Ln
Vsbp Vsbp
VsbnVsbn
![Page 19: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/19.jpg)
19
EE241 – Project
DRV Measurement and Parametric AnalysisPrior Work
-0.4 -0.2 0 0.2 0.4
120
140
160
180
200
220
Vpb (V)
DRV mean (m
V)
chip 1
Vnb = -0.4Vnb = -0.2Vnb = 0Vnb = 0.2Vnb = 0.4
-0.4 -0.2 0 0.2 0.4
120
140
160
180
200
220
Vpb (V)
DRV mean (m
V)
chip 1
Vnb = -0.4Vnb = -0.2Vnb = 0Vnb = 0.2Vnb = 0.4
-0.4 -0.2 0 0.2 0.4120
130
140
150
160
170
180
190
200
210
220
Vnb (V)
DRV mean (mV)
Vpb = -0.4Vpb = -0.2Vpb = 0Vpb = 0.2Vpb = 0.4
-0.4 -0.2 0 0.2 0.4120
130
140
150
160
170
180
190
200
210
220
Vnb (V)
DRV mean (mV)
Vpb = -0.4Vpb = -0.2Vpb = 0Vpb = 0.2Vpb = 0.4
0.1 0.2 0.3130
140
150
160
170
180
190
200
210
220
Lp (um)
DR
V m
ean
+ 5*
std
(mV
)
Ln = 0.1Ln = 0.15Ln = 0.2Ln = 0.25Ln = 0.3
0.1 0.2 0.3130
140
150
160
170
180
190
200
210
220
Ln (um)
DR
V m
ean
+ 5*
std
(mV
)
Lp = 0.1Lp = 0.15Lp = 0.2Lp = 0.25Lp = 0.3
0.1 0.2 0.3130
140
150
160
170
180
190
200
210
220
Lp (um)
DR
V m
ean
+ 5*
std
(mV
)
Ln = 0.1Ln = 0.15Ln = 0.2Ln = 0.25Ln = 0.3
0.1 0.2 0.3130
140
150
160
170
180
190
200
210
220
Ln (um)
DR
V m
ean
+ 5*
std
(mV
)
Lp = 0.1Lp = 0.15Lp = 0.2Lp = 0.25Lp = 0.3
0.2 0.3 0.4 0.5190
200
210
220
230
240
250
Wp (um)D
RV
mea
n +
5*s
td (
mV
)
Wn = 0.325Wn = 0.27Wn = 0.215Wn = 0.16Wn = 0.105
0.1 0.15 0.2 0.25 0.3190
200
210
220
230
240
250
Wn (um)
DR
V m
ean
+ 5
*std
(m
V)
Wp = 0.2Wp = 0.3Wp = 0.4Wp = 0.5
0.2 0.3 0.4 0.5190
200
210
220
230
240
250
Wp (um)D
RV
mea
n +
5*s
td (
mV
)
Wn = 0.325Wn = 0.27Wn = 0.215Wn = 0.16Wn = 0.105
0.1 0.15 0.2 0.25 0.3190
200
210
220
230
240
250
Wn (um)
DR
V m
ean
+ 5
*std
(m
V)
Wp = 0.2Wp = 0.3Wp = 0.4Wp = 0.5
Courtesy of Huifang Qin, 90nm SRAM Test Chip: DRV and Leakage Measurement, 03/02/2006
EE241 – Project
Design Platform GlobalOptimization
Cir
cuit
Sys
tem
DRV = a1 log
1+ β1
ln
⎛
⎝ ⎜
⎞
⎠ ⎟
wp + k1
α1
exp γ1 −2φ1 + vsbn( )exp γ2 −2φ2 − vsbp( )
+ α1 1+ β2
lp
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟ wp + k2( )exp γ3 −2φ3 − vsbp( )exp γ 4 −2φ4 + vsbn( )
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
[3]
ID = iS expvGS −VT
nvth
⎛
⎝ ⎜
⎞
⎠ ⎟ 1− exp
−VDD
vth
⎛
⎝ ⎜
⎞
⎠ ⎟
⎛
⎝ ⎜
⎞
⎠ ⎟
[2]
tread = KSW + CBL0 + KBLSW
Kread ASW + B SW3
SN2 − C SW
2
SN− 1
2
⎛ ⎝ ⎜
⎞ ⎠ ⎟
[1]
Area = Wi × Li( )∑Power = DRV × IDj∑
Performance = f WL( )
N, W
L( )A,vsbn[ ]
![Page 20: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/20.jpg)
20
EE241 – Project
Program Demo GlobalOptimization
WpLp
WpLp
WnLnWn
Ln
Vsbp Vsbp
VsbnVsbn
EE241 – Project
Global Optimization: DRVGlobalOptimization
Lower delay costly;
P/NMOS asymmetry⇒ higher DRV;
Smaller area possible;
Lower sensitivity at looser constraints;
Global minimum DRV:35mV (74% reduction)@ Wn/Ln=0.1u/186u,
Wp/Lp=354u/186u,Vsbn=0.4V, Vsbp=−0.4V.
![Page 21: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/21.jpg)
21
EE241 – Project
DRV SensitivityGlobalOptimization
: Standard-Cell
EE241 – Project
Global Optimization: PowerGlobalOptimization
Power and DRV require different sizing;
Performance and area equally influential;
Insensitive to area beyond standard-cell size (next page);
Global minimum power: 0.3nW (70% reduction)@ Wn/Ln=0.1u/0.1u,
Wp/Lp=0.2u/0.1u,Vsbn=Vsbp=0.
![Page 22: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/22.jpg)
22
EE241 – Project
Power SensitivityGlobalOptimization
: Standard-Cell
EE241 – Project
Conclusion (and Beyond)Conclusion
A platform-based SRAM standby power optimization method is presented;
Global optimization yields lower DRV and standby power than single-dimensional parametric search;
Minimum power and minimum DRV require different sizing;
Sizing for globally minimum power exists with 70% power reduction, 5% area reduction, but 80% performance loss;
Presented method can be scaled to incorporate more design variables such as access transistor size and bit-line pre-charge;
Presented method can be scaled to incorporate process variation, where statistic-based modeling is a key component (primitive analysis on final report).
![Page 23: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/23.jpg)
23
EE241 – Project
Acknowledgement
Special thanks to Huifang Qin for generous support from test data and documents, to technical discussion and evaluation;
Thanks to Rakesh Vattikonda and Yu Cao for providing statistical models;
Thanks to Dejan Markovic for early discussion on scope of work and help with MATLab;
Thanks to Yanmei Li for discussion on platform-based design.
Key references:
[1] H. Qin, Y. Cao, D. Markovic, A. Vladimirescu, and J. Rabaey, “SRAM Leakage Suppression by Minimizing Standby Supply Voltage,” ISQED, 2004
[2] B.H. Calhoun, ad A. Chandrakasan, “Analyzing Static Noise Margin for Sub-threshold SRAM in 65nm CMOS,” ESSCIRC, 2005
[3] R. Vattikonda, “Modeling DRV of a SRAM Cell Under Variations,” Research Notes, April 2006
EE241 – Project
Appendix
![Page 24: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/24.jpg)
24
EE241 – Project
More on Statistic-Based Optimization
startDRVMAX
conditionsDRVMEAN
modeloptimal sizing for minimum DRVMEAN
update DRV model
optimal sizing for minimum DRVMAX
match?
end
Yes
No
“How about power… ?”
EE241 – Project
1st-Order VerificationBy fixing W and L to standard size, the presented optimization platform reduces to single-dimensional parametric sweep: Vpb and Vnb.
dots: measured DRV from the 90nm test chip;
line: parametric sweep using the reduced platform;
color code: same color (dots and line) represents same bias condition.
![Page 25: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/25.jpg)
25
15
Ultra Low Power Clock Generationusing Sub-Threshold MCML
Asako Toda • Anurag Pandit • Khang An Tran
EE241 FALLFinal Project Presentation
15
MCML in sub-Vth
Good
• Leakage• Ultra Low Power• Robust• Easy
Bad
• Slow ?• Big ?• Model ?
![Page 26: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/26.jpg)
26
15
Current Mode Logic (CML) in Sub-Vth
~ nA
~100mV
HL
~10MVΩ
10nA
Replica Bias
LH
Iss
L VDD–200mVVDD H
20MΩ
15
Input – Output
ou
t+, o
ut-
∆Vin
out+ out-
sub-Vth
∆Vin
∆Vin_th
∆Vin_th
3 x nVt = 117mV
Vdsat
3 x Vt
![Page 27: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/27.jpg)
27
15
Input – Output Model
VDD – 27mV
ou
t+, o
ut-
∆Vin
out+ out-
∆Vin
15
““
Iss
≡
saturated ?
Iout
—
+
Iout
0
Iout = Iss = 1/a@ ∆V: big
Iout = 0@ ∆V= 0
1/a
=
∆V
∆V
![Page 28: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/28.jpg)
28
15
Mathematica
Input – Output Model Verification Yes !
VDD – b x Iss
15
Variation & Mismatch
a ~ 1 / Iss
a, b ~ Lp, 1/Wp
Iss
PMOS Load NMOS input
Curren Source
1
2
3
VDD
Source Voltage0
![Page 29: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/29.jpg)
29
15
Variation
Sensitivity
Input-Output
worst
worst13 mV0.13b
30 mV0.3a
40m V0.4Iss
∆ Vod (10%)Sens
Constraint
Vin > Vin_th = 3 nVtVin = Vin_th + margin + ∆ Vod
Sensitivity
Example: NMOS, PMOS: min size
15
MatchingConstraint : Vos << Vos_limit
Vos
VDD – ∆ u
Vos_limit
![Page 30: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/30.jpg)
30
15
MatchingWorst Case
pair ratio : r << 10%
Vos << Vos_limitr
Vos
50mV10mV
~ ~
15
PDP
Sub-Vth MCML Static CMOS
N: Number of Stages
1.59n(tp0)
3.10n
7.56n
1 40f
td [sec]
1.59n(tp0)
3.10n
7.56n
1 40f
td [sec]
30p(tp0)
47p
34p
1 40f
td [sec]
30p(tp0)
47p
34p
1 40f
td [sec]
Delay
Power
PDP
EDP
![Page 31: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/31.jpg)
31
15
VDD=0.36 PDP_MCML
PDP_INV
15
Frequency Divider( VDD = 0.5V, Freq = 20MHz )
Fujishima, et al, JSSC,1993
7fJ/cyc 5fJ/cyc
MCML type CMOS Static
![Page 32: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/32.jpg)
32
15
Summary
Good
• PDP in Ultra Low Voltage
• Robust in Matching, Noise, and EM.
• Easy
Bad
• Slow in Middle Low Power
• Variation• Cascading• Big
Analysis of Razor
Timothy Loo, Vincent NgUniversity of California, BerkeleyEE241 Spring 20065/8/2006
![Page 33: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/33.jpg)
33
MotivationAlways correct logic not efficient
D. Ernst, N.Kim, S.Das, S.Pant, R.Rao, T.Pham, C.Ziesler, D.Blaauw, T.Austin, K. Flantner, and T. Mudge, “Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation”, MICRO-36, December 2003
SolutionRazor – an error correction mechanism
D. Ernst, N.Kim, S.Das, S.Pant, R.Rao, T.Pham, C.Ziesler, D.Blaauw, T.Austin, K. Flantner, and T. Mudge, “Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation”, MICRO-36, December 2003
![Page 34: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/34.jpg)
34
ChallengesOptimal supply voltage strongly dependent on the program and input dataHold time increased by the amount of delay of the delayed clockMeeting both the setup time constraint and hold time constraint a challengeA strong tradeoff between potential benefit and buffers neededNeed to analyze every path in the logicWill benefit increase or decrease with increase process variation?
D. Ernst, N.Kim, S.Das, S.Pant, R.Rao, T.Pham, C.Ziesler, D.Blaauw, T.Austin, K. Flantner, and T. Mudge, “Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation”, MICRO-36, December 2003
Analyzing the Problem
Add model corners to using ee240 and ST technology models.Implement and verify a pipeline to simulate data propagation throw an adder, a regular flip fop, a multiplier, and a Razor Flip Flop.
Simulate circuit to determine longest and shortest paths to see how they affect Razor Operation
![Page 35: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/35.jpg)
35
Multiplier Path VariationSimulated vs Extracted Data
10
15
20
25
30
35
40
90 65 45
Technology
% c
han
ge
fro
m T
T
SS-ST FF-ST SS-240 FF-240
Shortest-Longest Path Variation
135
140
145
150
155
160
165
90 65 45
Technology
% v
arat
ion
If Razor Shadow latch detects errors after 1/2 a clock cycle, the period can be 2/3 * Longest Path instead of 1!
Effects of Variation on Razor Efficiency
1/3 1/3 1/3
Limitation
Shortest Path Constraint may necessitate the use of delay buffers to prevent data corruption
![Page 36: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/36.jpg)
36
Additional Power UseTwo CasesLogic can be modified such that shortest path is increased without increasing longest pathAdd buffers in logic to increase shortest path
Energy Penalty 90nm Total Power Simulation
90 124fJ Energy w/o Razor: 0.9pJ
65 245fJ Energy w Razor: 45 546fJ w/o Overhead: 0.7pJ
w/ Overhead: 1.3pJ
Note: overhead may be amortized with larger logic blocks
Additional Clock PeriodWorst Case:
Logic cannot be modified internally, therefore adding buffers increases both shortest and longest path delay. Clock Period Savings are reduced.
Boundary Requirement:
Shortest Path > Longest Path / 6.
Otherwise, the adjusted clock period (to satisfy the shortest path constraint) will be longer than the Longest Path!
![Page 37: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/37.jpg)
37
Razor Efficiency
None of the 90,65,45nm models pass the shortest path requirement.
Tech % Increase % Over Original Clock Period
90 53% 2.2%65 52% 1.3%45 68% 12.0%
Setup Time Variations
Setup Times vs Technology
0
5
10
15
20
25
90 65 45
Technology
% o
f to
tal
clo
ck p
erio
d
Main FF Setup Time Shadow FF Setup Time
![Page 38: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/38.jpg)
38
Conclusion
Razor Topology is susceptible to shortest path signals corrupting shadow latch.Variations increase dramatically from 65 to 45nm technologyBenefits of Razor decreases as variations increase.
Soft Error Tolerant Logic Circuit Design
Mohammad Amin Arbabian Debopriyo Chowdhury
![Page 39: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/39.jpg)
39
What are soft errors?
Transient faults caused by high energy particles
Sources: Alpha particles from packaging material, thermal and fast neutrons from cosmic rays
[K.J. Hass et al., MWSCAS 1999]
Should we really care about SEE?
What really limits scaling? Wallmark [1962]: Power and SER
C↓ V ↓ Q ↓ ↓SER↑ ↑
Source: Shivakumar P., IEEE 2002, N.J.Wang,UIUC
Upshot: Soft error rates per SRAM or latch bit grow slowly with scaling
But: The number of bits grows with Moore’s Law!
Caution: Custom, ASIC, FPGA designs that push the density envelop
![Page 40: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/40.jpg)
40
Soft Errors in Logic Circuits
Single event transient (SET) vs upset (SEU)
1
CLK
D Q
Q’CLK
D Q
Q’
0SET
D Q
Q’CLK
D Q
Q’
00 0
00 0
CLK
SET in combinational circuitsElectrical masking
0
CLK
D Q
Q’CLK
D Q
Q’
0SET
0
CLK
Latching Window
D Q
Q’CLK
D Q
Q’
0SET
0 00
Logical maskingLatching window masking
Source: Krishnamohan et. al. IEEE 2004
Masking becomingineffective in
nanometer ccts
Available Circuit Techniques:
Triple Modular Redundancy Partial Error Masking/ Cluster Sharing to
decrease the Area Overhead
Selective Node EngineeringAdding CapacitanceAdding Cross Coupled InvertersSizing
Using Timing Slacks to resample dataConventional Hardened Latch
Huge area, poweroverhead
Large performanceoverhead
Complex clocking scheme
SET protection
![Page 41: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/41.jpg)
41
Modeling of Soft ErrorsCircuit response depends not only on deposited charge but also on amplitude, duration as well as shape of current pulse accurate models are essential for reliable results
Bit Flips!No Bit Flip!
Simulation done with inverter (NMOS: 0.51µ/90nm PMOS: 0.88µ/90nm)
Total incident charge: 15fC (1mA, 15ps ; 0.1mA, 150ps)
Same Charge!!
Modeling Neutron StrikeFast rise and gradual fall current waveforms
TrapezoidDouble ExponentialMix of Linear and Exp
Our Choice:
T fitted for 90nm ProcessSimulated by the Piece-wise Linear Function in Cadence
)exp(2
I(t)T
t
T
t
T
−=π
Charge: 10fC
Peak Current~300µA
70ps
![Page 42: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/42.jpg)
42
Our Solution:SETs hurt only if they are captured by the latchSEU protection of Latch is criticalAggressive Pipelining means less logic more flip-flopsIf we can have a latch that filters the transients, why slow all the nodes ???
concept of error-filtering latchDesign 2 new SET ANDSEU Tolerant Latches
Proposed Latch 1
Y IN(d)CLKBAR MN1MN2MN3MP1MP2MP3DCDE
C-element
• Adaptive SET protection and inherent SEU protection
• Very small area/power overhead
•Area Overhead: 3-5% including data-path
•Power Overhead: X 1.08
![Page 43: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/43.jpg)
43
Adaptive Delay Control Unit
Graph shows tradeoff between delay and QCRIT for Latch 1A four bit Digitally Controlled Delay Element (DCDE) Delay Variation: 30ps – 65ps in steps of 2-5ps
Possibility of Adaptive Soft Error Protection at run time
0
10
20
30
40
50
60
70
80
0 0.5 1 1.5 2 2.5Delay (FO4 inverter)
Qcr
itic
al (
fC)
QCRIT vs Delay for Latch 1
Proposed Latch 2Concept of feedback used
Latch does not respond to transient spikes
Less Delay Overhead with some power penalty
Stronger
Power Overhead: X 1.79
![Page 44: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/44.jpg)
44
Simulation Results (1)
Statistical (Monte Carlo type) simulation with random charge injection based on stochastic modeling of neutron energy transferLatch tested with 4-bit ripple carry mirror adder
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10 11 12
Node Number
Pro
babili
ty o
f Fa
ilure
(%)
Nominal
Protected (Scheme1)
0
10
20
30
40
50
60
70
80
90
1 2 3 4 5
Node Number
Pro
bab
lity
of F
ailu
re (%
)
Charge (fC)
Total of 50,000 Random Charges
Average SER Protection:
212% Improvement in critical charge
57 Times less probability of Failure
Simulation Results (2)
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10
Node Number
Pro
bab
ility
of F
ailu
re (
%)
NominalProtected
0
10
20
30
40
50
60
70
80
90
1 2 3 4 5
Node Number
Pro
bab
lity
of F
ailu
re (%
)
Average SER Protection:
239% Improvement in critical charge
79 Times less probability of Failure
Similar improvement in reliability obtained using some ISCAS benchmark circuits as the data-path
![Page 45: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/45.jpg)
45
What comes next?
Scaling into sub-50 nm technology might be limited by soft error, if proper protection is not taken for logic circuitsProposed latch can be combined with node engineering techniques to yield enhanced protection for critical applicationsSoft error sensing and adaptive protection schemes are attractive
Impact of Logic Styles on a Impact of Logic Styles on a ViterbiViterbi Decoder in respect of Decoder in respect of Performance and RobustnessPerformance and Robustness
JiJi--HoonHoon ParkParkSeungSeung--Bum Bum SuhSuh
![Page 46: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/46.jpg)
46
TopicsTopicsI.I. Design of AddDesign of Add--CompareCompare--Select Unit with LVS LogicSelect Unit with LVS Logic
II.II. Performance and Robustness Comparison with Performance and Robustness Comparison with
Different Logic StylesDifferent Logic Styles
ContentsContentsI.I. ViterbiViterbi DecoderDecoderII.II. Simulation Results Simulation Results –– Performance and RobustnessPerformance and RobustnessIII.III. Analysis Analysis –– Delay and VariabilityDelay and VariabilityIV.IV. ConclusionConclusion
Topics and ContentsTopics and Contents
ViterbiViterbi DecoderDecoder
niSM
niSM
niSM
njSM
njSM
njSM
1nk1SM +
1nk2SM +
nk1j,
nk1i, BMBM =
nk2j,
nk2i, BMBM =
{ }nkj
ni
nki
ni
nk ,,
1 ,min BMSMBMSMSM ++=+
•• ACS Unit is a critical block in ACS Unit is a critical block in ViterbiViterbi DecoderDecoder•• AdderAdder is a critical block in ACS Unitis a critical block in ACS Unit
![Page 47: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/47.jpg)
47
Performance of Performance of ACSsACSs and Addersand Adders
I. LVS (LowI. LVS (Low--Voltage Swing)Voltage Swing)
II. Dynamic ManchesterII. Dynamic Manchester
III. Static ManchesterIII. Static Manchester
Delay (adder) = 79.81 Delay (adder) = 79.81 pspsDelay (ACS) = 148.41 Delay (ACS) = 148.41 psps
Carry ImplementationsCarry Implementations
Robustness of AddersRobustness of Adders
LVS LVS –– Most RobustMost Robust•• Clocking of Sense AmplifierClocking of Sense Amplifier•• Sampling Level of Pass Transistor OutputSampling Level of Pass Transistor Output
![Page 48: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/48.jpg)
48
•• Delay ExpressionDelay Expression(Linear Approximation)(Linear Approximation)
⎟⎟⎠
⎞⎜⎜⎝
⎛+−≅= ∫
21
12 11
4
)(
)(
)(2
1 ii
vvCdv
vi
vCt L
v
v
Lp
⎟⎟⎠
⎞⎜⎜⎝
⎛
∂∂
+∂∂−
≅∂
∂∴
22
221
121 11
4
)(
iV
i
iV
ivvC
V
t
TT
L
T
p
⎟⎟⎠
⎞⎜⎜⎝
⎛
∂∂
+∂∂−
≅∂∂
∴22
221
121 11
4
)(
iV
i
iV
ivvC
V
t
DDDD
L
DD
p
•• Variability ExpressionsVariability Expressions
1v 2v
11
i
current1
2
1i
Analysis Analysis -- Concepts Concepts
Analysis Analysis –– Comparison with SimulationsComparison with Simulations
0.05 0.1 0.15 0.2 0.25 0.30
50
100
150
200
250
300
350
Vth (V)
Del
ay (
ps)
Delay Model
Simulation
Analysis Model Analysis Model and Simulation and Simulation
Results MatchingResults Matching
![Page 49: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/49.jpg)
49
Analysis Analysis –– Delay and VariabilityDelay and Variability
( ) ( ) 55.122 2
41
TDDTTDD VVVVV −+
+−
( )[ ] ( ) 55.2222 2
4.1242
TDDTTDD
TDD
VVVVV
VV
−+
+−
−
( )[ ] ( ) 55.2222 2
2.622
TDDTTDD
TDD
VVVVV
VV
−−
+−
+−
( ) ( ) 5.122
43
11
TDDDDTTDD VVVVVV −⋅+
+−
( )( )[ ] 5.2
32
35
32
222
43
5.122
⎟⎠
⎞⎜⎝
⎛ −
⋅+
+−
−
DDTDD
DD
TTDD
TDD
VVV
V
VVV
VV
( )( )[ ] 5.2
32
35
5.23
13
2
222
43
32
455.1
22
⎟⎠⎞
⎜⎝⎛ −
⎟⎠⎞
⎜⎝⎛ −⋅
++−
−−
DDTDD
DDTDD
TTDD
TDD
VVV
VVV
VVV
VV
( ) ( )TDDDDTDD VVVVV −⋅+
− 43
212
( ) ( )23
43
11
TDDDDTDD VVVVV −+
−
( )( )
( )23
43
23
1
TDDDD
TDD
TDD VVV
VV
VV −
−−
−−
pt
T
p
V
t
∂∂
DD
p
V
t
∂
∂
PARALLELPARALLELSERIALSERIALPTLPTL
•• Robustness: Robustness: Serial Stack > Parallel >> PTLSerial Stack > Parallel >> PTL
Analysis Analysis –– VtVt and and VddVdd SensitivitySensitivity
0
0.5
1
1.5
0
0.1
0.2
0.3
0.40
20
40
60
80
Vdd
Normalized Vt Sensitivitiy
Vt
Vt
Sen
sitiv
ity
PTL
Serial
Parallel
0
0.5
1
1.5
0
0.1
0.2
0.3
0.40
10
20
30
40
Vdd
Normalized Vdd Sensitivitiy
Vt
Vdd
Sen
sitiv
ity
PTL
SerialParallel
DDV
pt
∂
∂
T
p
V
t
∂∂
PTL
Serial
Parallel
![Page 50: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/50.jpg)
50
PTL PTL Robustness HigherRobustness Higheras Sampling as Sampling Level LowerLevel LowerVt p 5.0@
Vt p 3.0@
0
0.5
1
1.5
0
0.1
0.2
0.3
0.40
20
40
60
80
Vdd
Normalized Vt Sensitivity (PTL)
Vt
∂ t p/ ∂
Vt
@0.2V
@0.3V
@0.5V
T
p
V
t
∂∂
0 50 100 150 200 2500
0.05
0.1
0.15
0.2
0.25
Delay (ps)
Pro
babi
lity
PTL Vth Variation S imulation varying by S ampling Level
@0.2V
Analysis Analysis –– PTL Sampling Level vs. RobustnessPTL Sampling Level vs. Robustness
Strong Robustness of LVSStrong Robustness of LVS
Conclusion and Future WorksConclusion and Future Works
ConclusionConclusion•• LVS Logic StyleLVS Logic Style
-- Outperforms in Performance and RobustnessOutperforms in Performance and Robustness-- Complex to Design (Clock Timing is Critical)Complex to Design (Clock Timing is Critical)
•• Variation AnalysisVariation Analysis-- Lowering the Sampling Level in PTL and Stacking the Lowering the Sampling Level in PTL and Stacking the
Devices Improve the RobustnessDevices Improve the Robustness-- Explains the Robustness Differences between Logic StylesExplains the Robustness Differences between Logic Styles-- Explains LVS RobustnessExplains LVS Robustness
Future WorksFuture Works•• Develop Variability Analysis ModelsDevelop Variability Analysis Models
-- Apply to Complex Gate StructuresApply to Complex Gate Structures-- Apply to ShortApply to Short--Channel ModelsChannel Models
•• Robustness vs. Clocking StrategyRobustness vs. Clocking Strategy
![Page 51: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/51.jpg)
51
Analysis and Design of High Performance and Low Power Current Mode Logic CMOS
Phillip ChinJunjie Su
Xiaolan Zhong
Motivation
As technology scales, the following problems are prevalent:
Increased Leakage CurrentIncreased Power ConsumptionReduced noise margins
Today’s circuits require high performance using the low power while addressing above problems.
![Page 52: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/52.jpg)
52
Current Mode Logic (CML)
In the past it was not used, because it consumed more power than voltage mode implementations.However, today, circuits operate at higher frequencies.CML can save a lot of power and offer better performance.
Current Mirror Difficulties
Recall MOS current equations:Saturation
Subthreshold
Lambda is increasing, thus small changes in VDS has a bigger impact, therefore the Current Mirror is more sensitive to changes in VDS.
( )DSqkT
V
qnkT
V
SD VeeIIDSGS
λ+⎟⎟⎠
⎞⎜⎜⎝
⎛−=
−11 //
( ) ( )DSTGSD VVVL
WkI λ+−= 1'
2
1 2
![Page 53: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/53.jpg)
53
CML Adder Failure
Fails with smaller technology, depends heavily on current mirror accuracy
New CML AND Gate
Inspired from the previous adderSizing is very importantCan replace CMOS AND gate
![Page 54: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/54.jpg)
54
Robustness Comparison
In lower frequencies, static CMOS has a better noise margin than CML.At higher frequencies, they are pretty identical.
CML AND Gate at 1.25 GHz Static CMOS AND at 1.25 GHz
Delay and Power ComparisonPower vs. Switching Frequency
0
5
10
15
20
25
30
35
0.5 1 1.5
Switching Frequency (GHZ)
Po
wer
(fw
)
Static CMOS
CML
Delay vs. Switching Frequency
110
115
120
125
130
0.6667 0.8333 1 1.25
Switching Frequency (GHZ)
Del
ay (
ps)
Static CMOS
CML
1291171.25
1291171
1291170.833
1291170.667
CMLCMOSFreq
3.6232.41.25
426.51
3.53220.833
3.6617.50.667
CMLCMOSFreq
![Page 55: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/55.jpg)
55
Static CMOS vs. CML
CML out performs static CMOSDelay*Power vs. Switching Frequency
0
5001000
1500
2000
25003000
3500
4000
0.5 0.7 0.9 1.1 1.3
Switching Frequency (GHZ)
Del
ay*P
ow
er (
ps*
fW)
Static CMOS
CML
46737911.25
51631061
455.425740.833
471.820470.667
CMLCMOSFreq
Conclusion
CML offers a new possible solution to the issue of scalingCML AND gate offers better overall performance over its static CMOS counterpartIn the future, optimize current mirror and build bigger blocks
![Page 56: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/56.jpg)
56
UC Berkeley Spring 2006
EE241 Final Presentation
Evaluation of Adiabatic Logic for New Process Technologies
Prof. Jan Rabaey
Karl Skucha, Babak PahlavanArash Ghanadan
UC Berkeley Spring 2006
Motivation
Exploit property of Adiabatic Charging
E=(RC/T) * C Vvdd2
Asymptotically Zero Power Dissipation
How close can we get?
What are the delay trade-offs?
Future Trends
Viable in new process technologies?
Benefits getting better or worse?
Possibility for mobile or ultra-low power applications.
!
![Page 57: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/57.jpg)
57
UC Berkeley Spring 2006
Problem Statement
No formal study of Adiabatic circuits below 130nm
technology node
No formal study of trends for new technology nodes
such as 90nm, 65nm and 45nm
Power is one of the biggest issues
Are adiabatic circuits promising as technology scales
down?
UC Berkeley Spring 2006
Design Families
Non-Adiabatic design family for comparison
Dynamic DCVSL
Adiabatic families
Positive Feedback Adiabatic Logic (PFAL)
Energy Charge Recovery Logic (aka 2N2P)
![Page 58: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/58.jpg)
58
UC Berkeley Spring 2006
Design FamiliesDynamic DCVSL
Reasons Used
Same switching probability as Adiabatic circuits
Similar structure (differential, cross coupled PMOS)
UC Berkeley Spring 2006
Design Families2N2P
Reasons Used
Easy to implement
Small area
Rail-to-rail swing
OperationAdiabatic charging and
discharging with clock
4 sinusoidal clocks
90 degrees out of phase
![Page 59: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/59.jpg)
59
UC Berkeley Spring 2006
Design FamiliesPFAL
Reasons Used
High Performance [1]
Rail-to-Rail Swing
Operation
Functional blocks assist
adiabatic charging
4 sinusoidal power clocks
[1] A. Vetuli, S. Di Parcoli and L. M. Reyneri: “Positive feedback in adiabatic logic”. Ekefmnics Letters, Vol. 32, No. 20, Sep. 1996, pp. 1867ff.
UC Berkeley Spring 2006
Test Setup
Test Circuit: 4 input NAND decoder
Input Vector: 0101->1010->0101
Frequencies:HF: 1Ghz and 500MHz
MF: 250 and 100MHz
LF: 50 and 10MHz
VLF: 3 and 1MHz
Loads: 0,50,100,150fSized NANDs and scaled
voltage for lowest power
dissipation
Maintain functionality
and “full swing”
![Page 60: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/60.jpg)
60
UC Berkeley Spring 2006
Energy vs. Delay Results (1/4)
DCVSL uses highest energy, 2N2P ~40%, PFAL ~22%
Energy Per Cycle vs. Delay180nm, 100f load
1.0E-14
1.0E-13
1.0E-12
1.0E-11
1.0E-10
1.0E-09 1.0E-08 1.0E-07 1.0E-06Delay
En
erg
y P
er C
ycle 2N2P
PFALDCVSL
UC Berkeley Spring 2006
Energy vs. Delay Results (2/4)
DCVSL uses highest energy, 2N2P ~30%, PFAL ~25%
Energy Per Cycle vs. Delay90nm, 100f load
1.0E-14
1.0E-13
1.0E-12
1.0E-11
1.0E-09 1.0E-08 1.0E-07 1.0E-06Delay
En
erg
y P
er C
ycle 2N2P
PFALDCVSL
![Page 61: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/61.jpg)
61
UC Berkeley Spring 2006
Energy vs. Delay Results (3/4)
DCVSL uses highest energy, 2N2P ~40%, PFAL ~30%
Leakage begins to dominate in low frequency for all three
Energy Per Cycle vs. Delay65nm, 100f load
1.0E-14
1.0E-13
1.0E-12
1.0E-11
1.0E-10
1.0E-09 1.0E-08 1.0E-07 1.0E-06Delay
En
erg
y P
er C
ycle
2N2PPFALDCVSL
UC Berkeley Spring 2006
Energy vs. Delay Results (4/4)
All fail in high frequency, and leakage dominates in low frequency
Results promising but likely unreliable due to (much) higher power consumption and voltage requirements for 45nm vs. other models
Energy Per Cycle vs. Delay45nm , 100f load
1.0E-14
1.0E-13
1.0E-12
1.0E-11
1.0E-09 1.0E-08 1.0E-07 1.0E-06Delay
En
erg
y P
er C
ycle
2N2PPFALDCVSL
![Page 62: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/62.jpg)
62
UC Berkeley Spring 2006
Trend Results (1/2)
Adiabatic gain= (energy for DCVSL) / (energy for Adiabatic family)
High Freq Adiabatic Gain vs Model
0
0.5
1
1.5
2
2.5
3
3.5
180nm 90nm 65nm
Model
Gai
n
2N2PPFALAverage
Middle Freq Adiabatic Gain vs Model
0
1
2
3
4
5
6
7
8
180nm 90nm 65nm 45nm
Model
Gai
n
2N2PPFALAverage
• HF gain ~2• Little change
• MF gain ~4• Slight increase
UC Berkeley Spring 2006
Trend Results (2/2)
Low Freq Adiabatic Gain vs Model
0
1
2
3
4
5
6
7
180nm 90nm 65nm 45nm
Model
Gai
n
2N2PPFALAverage
• LF gain ~3.5• Slight decrease
• VLF gain ~4-2• Large decrease
Very Low Freq Adiabatic Gain vs Model
0
1
2
3
4
5
6
180nm 90nm 65nm 45nm
Model
Gai
n
2N2PPFALAverage
![Page 63: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/63.jpg)
63
UC Berkeley Spring 2006
Results Summary
Adiabatic circuits consume 70-80% less power in medium to very low frequencies If leakage dominates, the benefits decreaseAt high frequencies, benefits also decreaseIn smaller technologies, benefits decrease for low and very low frequenciesHowever, the medium frequency range remains promising.
UC Berkeley Spring 2006
Discussion & ConclusionDiscussion
More clock networks and clock generation circuits will increase clock distribution more powerAlways switching
Applicable mostly to high switching circuitryLarger design overhead~100 times lower static noise [2]
ConclusionPower saving of 70-80% and increasing for frequencies in the 100-250MHz range for new technologiesVery low frequencies and high frequencies need different low-power solutions
[2] Mahmoodi-Meimand, H.; Afzali-Kusha, A. “Low-power, low-noise adder design with pass-transistor adiabatic logic”Microelectronics, 2000. ICM 2000. Proceedings of the 12th International Conference on 31 Oct.-2 Nov. 2000 Page(s):61 – 64
![Page 64: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/64.jpg)
64
UC Berkeley Spring 2006
Q & A
Thank you for your time
Any questions?
UC Berkeley Spring 2006
Back up slides
![Page 65: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/65.jpg)
65
UC Berkeley Spring 2006
Test Setup
Robustness for Adiabatic Circuits explainedOutput signal goes within ~1% of railOutput goes below VT before next clock reaches VT
Clk2
Clk1
Output
UC Berkeley Spring 2006
Simulation Results – 150f load (1/4)
Energy Per Cycle vs. Delay180nm, 150f load
1.0E-14
1.0E-13
1.0E-12
1.0E-11
1.0E-10
1.0E-09 1.0E-08 1.0E-07 1.0E-06Delay
En
erg
y P
er C
ycle
2N2PPFALDCVSL
![Page 66: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/66.jpg)
66
UC Berkeley Spring 2006
Simulation Results – 150f load (2/4)
Energy Per Cycle vs. Delay90nm, 150f load
1.0E-14
1.0E-13
1.0E-12
1.0E-11
1.0E-09 1.0E-08 1.0E-07 1.0E-06Delay
En
erg
y P
er C
ycle
2N2PPFALDCVSL
UC Berkeley Spring 2006
Simulation Results – 150f load (3/4)
Energy Per Cycle vs. Delay65nm, 150f load
1.0E-14
1.0E-13
1.0E-12
1.0E-11
1.0E-09 1.0E-08 1.0E-07 1.0E-06Delay
En
erg
y P
er C
ycle
2N2PPFALDCVSL
![Page 67: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/67.jpg)
67
UC Berkeley Spring 2006
Simulation Results – 150f load (4/4)
Energy Per Cycle vs. Delay45nm, 150f load
1.0E-14
1.0E-13
1.0E-12
1.0E-11
1.0E-09 1.0E-08 1.0E-07 1.0E-06Delay
En
erg
y P
er C
ycle
2N2PPFALDCVSL
Algebraic Coding for Reliable Computation using Unreliable Gates
Animesh Kumar
EE 241 Class Project
Instructor: Prof. Jan M. Rabaey
![Page 68: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/68.jpg)
68
Introduction and motivation
• Combinatorial logic block in
DSM circuits
• Feature size very small
• Smaller device => more
susceptible to SER
Unreliable AND logic
• Simple AND gate – well-known problem
• Combinatorial model – correct controlled number of errors
• a, b, x are binary vectors
p
p = probability of failure
a
bx
P(x != a.b) = p, p > 0
![Page 69: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/69.jpg)
69
Past approaches
p
p
p
a
b
x1
a
b
x2
a
b
x3
Majorityvoter
x
• TMR (Triple Modulo
Redundancy)
• Works for single-error
per compute
• Wasteful for p small
p
a
b
• Encode using a linear error-control
code
• An [n, k, d] code will correct (d-1)/2
errors
Interesting idea
p
a
bx
a
b py
a b x y XOR ( x, y)
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0
= XOR ( a, b)
Approach: Encode over input and have gate-diversity
• XOR channel locates error(s)
• Correction?
![Page 70: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/70.jpg)
70
Correction – the easy part
p
a
bx
a
b py
ifnoisy (x, y) = (1, 1) or (0, 0)
then corrected(x, y) = (0, 1)
Example {Hamming 7,4,3 code}
a = 0 0 0 0 0 0 0
b = 1 1 1 1 1 1 1
x = 0 0 0 1 0 0 0
y = 1 1 1 1 1 1 1s = 1 1 1 0 1 1 1
x = 0 0 0 0 0 0 0
y = 1 1 1 1 1 1 1
☺
Correction – hard part
p
a
bx
a
b py
ifnoisy (x, y) = (1, 0) or (0, 1)
thencorrect(x, y) is in {(0, 0), (1, 1)}
Question: Is it always possible to decode t-logic errors using
a t-error correcting code?
Ans: NO
Proof:a = 1 0 0 0 0 1 1
b = 1 0 0 1 1 0 0
x = 1 0 0 0 0 0 0
y = 1 0 0 0 0 0 0
x = 0 0 0 0 0 0 0
y = 1 0 0 0 0 0 0
![Page 71: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/71.jpg)
71
Coverage results
(a,b)
Is there (c, d) such thata.b = xor(c.d, e)
a + b = xor(c + d, e)e (error) with small weight
Results:• Hamming [7,4,3] – 44% (a,b) have some (c,d) interfering
• ReedMuller[16,5,8] – Tolerates two logic compute errors
each block (obtained by simulation)
Results so far …
• Showed a novel coding and gate-diversity method which detects and corrects a controlled number of errors
• Coverage results are promising
![Page 72: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/72.jpg)
72
Conclusions & future work
• Exploring more complicated gate-network• Proving feasibility of general bounded
distance codes• Efficient decoding methods• Thinking about the overhead!
Power/Area Minimization of UWB Digital Baseband
Albert H. ChangRach Liu
University of California, BerkeleyEE 241 Spring 2006 Final Presentation
May 9th 2006
![Page 73: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/73.jpg)
73
Motivation
Study of Power and Area EfficiencyDesign Example: UWB communicationEfficiency is needed due to high throughput
Explore Micro-Architectural TechniquesParallelismTime-MultiplexingPipelining
The effect of voltage scaling on:Active PowerLeakage PowerOverall Power
Scope of Project
Design Driver: UWB Digital BasebandPulsed radio approach [1]
Goal: optimize power and areaFixed throughput constraintSimulation for all 3 corners
Simulink-to-Silicon Design EnvironmentFunctional verification of the algorithm Circuit-level characterization
[1] Mike Chen, “Ultra Wide-Band Baseband Design and Implementation,”M.S. Thesis, UC Berkeley, 2002
![Page 74: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/74.jpg)
74
Design Methodology
Model & test the design in Simulink/XSGELDO Simulation of 90nm Technology
Understand Power-Delay tradeoff for all corners
Run BWRC/INSECTA digital design flow for an FIR filter @ 1V to estimate Power/Area
Extrapolate to other points: P = 1,2,4,8,16Use Power-Delay tradeoff curve
Find the optimal level of parallelism and Vdd
For the best Power/Area Tradeoff
Simulink Design Exploration
1GHz64
taps
![Page 75: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/75.jpg)
75
Power of Major building blocks
24%9980/1=998024%9980Others of F16
20%9980/8=124767.2%9980Others of F2
15%9980/16=62374%9980Others of F1
76 %3179876 %31798PMF of F16
79.6%487432.8%4874PMF of F2
84.86%349626%3496PMF of F1
Actual Frequency => Depending of the level of parallelism
Assuming everything running at the same frequency
Cases
Power Consumption Studies in terms of number of Slices
Power of PMF block is about 80% of the overall system. Therefore, optimizing the PMF is the main task!
PMF Block, Extreme Case #1:F1 (fully time-multiplexed design)
“1x”
![Page 76: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/76.jpg)
76
PMF Block, Extreme Case #2: F16 (fully parallel design)
Tradeoff between Throughput and Power Consumption
F1 synthesized in 90nm @ 1V, fclk = 3ns. Estimates:Active Power: 28.9mWLeakage Power: 3.4mW
Power extrapolated from F1 resultsTradeoff between Throughput and Power for F1
Since there is only one level of parallelism, throughput = operating frequency
As the frequency and voltage increase, the overall power consumption increases because
Active Power ~ fclk*VDD2
Leakage Power ~ VDD3.3 (experimental data)
![Page 77: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/77.jpg)
77
Technology Characterization: ELDO FO4 Model
Extrapolate DataThroughput normalized to 1 for the highest throughput @ each corner.Normalize Delay to Delay @ 1v
Del
ay (
norm
to 1
V)
Simulated F1 of 90nm @ 1VActive Power: 28.9mWLeakage Power: 3.4mWOverall Power = 32.3mW
Throughput = 333MSamples/sSS
TT FF
1
0.1
0.01
10
100
Po
wer
(m
W)
Reference Case: F1
10.10.010.001Throughput (norm.)
![Page 78: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/78.jpg)
78
Active Power: ~ fclk*VDD
2
Leakage Power:~ VDD
3.3
~ P
Power increases as P increases
Power decreases as P increases
1 2 4 8 16
Need to look at VDD to better understand Pleakage…
1 2 48
16
Case Study: Parallelism P = 1, 2, 4, 8, 16
A Closer Look: Leakage Power
Parallelism doesn’t help much as supply voltage saturates
The rate of voltage scaling decreases
As a result, F16 consumes more power here because the effect of block parallelism kicks in
![Page 79: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/79.jpg)
79
Minimizing Overall Power
F16F8
F4F2F1
V>0.50.35<V<0.7
0.3<V<0.4
0.3<V<0.35
0.25<V<0.3
Choosing the Right Level of Parallelism
It is hard to compare different level of parallelismTherefore, normalize to F1 @ the same throughput
Find the appropriate level of parallelismThroughput = “0.6” (0.6 = 200/333)
F2 has about 60% Power Saving
F4-F16 has about 70% Power SavingThere is no benefit from F8, F16
Will try F2 and F4
![Page 80: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/80.jpg)
80
Parallelism 1,2,4,8,16: TT CornerOverall Power (Normalized to F1)
Active
Leakage
Total
Design Choice: F4
Optimal Design Parameters
Final Voltage for F4 = 0.34V
Throughput
VD
DP
ow
er
Power reduction: 80%
![Page 81: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/81.jpg)
81
Final Architecture
12.5 MHz8~256
12.5 MHz12.5 MHz
320 bits
200 MHz
20 bits Serial to ParallelPMF
(Could 1~16 different parallelism level)
16x15 bits
Corr Block
16x23 bitsMAX
27 bits
12.5 MHz8~256
MAX
30 bits
12.5 MHz8~256
pad limit
Final Voltage: 0.34VPower: 6mW
Power Saving: 80% compared to F1
Throughput: 200MHz/sample
(8 blocks)
(8 blocks)
(parallelism 4)
Future Work
Tape out the DesignExperimental Verification
![Page 82: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/82.jpg)
82
Limited Switch Dynamic Logic:VDD scaling
Josephine Chang
EE241, Spring 2006, final project
Prepared for Dr. Rabaey
Limited Switch Dynamic LogicDynamic logic LSDL
-Faster and smaller Co stack than static allow Vdd scaling
-Lower power and more robust than dynamic better power & variability immunity
![Page 83: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/83.jpg)
83
Full Adder Test CircuitLSDL
-Co logic in dynamic; S logic in static
-A,B,and S loaded with inverters; Cout loaded with Ci
-Functionality verified with 45 input patterns-Power measured with 1000 input patterns.
-Delay measured from Ci to Co
-Compared to static CMOS & Dual rail domino
0.0
0.5
1.0
1.5
2.0
2.5
0 300 600 900
W (nm)
PD
P (
ns*
uW
)
Optimized for PDP at 0.8V
0.0
0.1
0.2
0.3
0.4
0 300 600 900
W (nm)
tpro
p(n
s)
4
6
8
10
12
0 300 600 900
W (nm)
aver
age
po
wer
(u
W)
Static
Dynamic
LSDL
![Page 84: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/84.jpg)
84
Vdd scaling
0.5
1.0
1.5
2.0
2.5
0.25 0.5 0.75 1
Vdd (V)
PD
P (
ns*
uW
)
0.0
0.2
0.4
0.6
0.8
1.0
0.25 0.5 0.75 1
Vdd (V)
tpro
p(n
s)
0
5
10
15
20
0.25 0.5 0.75 1
Vdd (V)av
erag
e p
ow
er (
uW
)
Static
Dynamic
LSDL
VT variation at 0.65V
L= 39nm40nm41nm42nm43nm44nm45nm
L=45nm, VT=.24V (nom) L=44nm, VT=.225V
L=40nm, VT=.02V L=39nm, VT=-.02V
Dynamic failure:
LSDL failure:
L=43nm , VT=.21V
![Page 85: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/85.jpg)
85
Variations on LSDLBasic Clocked feedback Feed forward pulse
Controlled-Load Keeper
Summary
• LSDL power dissipation approaches static CMOS at low VDD.
• LSDL propagation delay worse than CMOS, but comparison was unfair (worst case for LSDL is much worse than nominal)
• 1-bit FA is perhaps not the best showcase demo for LSDL
![Page 86: Lecture 27+28 Final Project Presentationsbwrcs.eecs.berkeley.edu › Classes › icdesign › ee241_s06 › ... · 2006-05-16 · 1 EE241 - Spring 2006 Advanced Digital Integrated](https://reader036.vdocument.in/reader036/viewer/2022070804/5f036fa47e708231d4093154/html5/thumbnails/86.jpg)
86
Full Adder Test Circuit
Static LogicDynamic logic
That’s all!
• Excellent job …
Time for summer!